Lukas Chavez | 20 Aug 17:44 2013

Re: MEDIPS, GRanges, IRanges error processing BAM file

Hi Jonathan,

I am glad to hear that you identified unmapped reads in you bam files
causing the error. I have previously modified MEDIPS to simply neglect
unmapped reads in a given bam file. This is implemented in the latest
development version which I consider stable and ready to install, if
necessary.

All the best,
Lukas

On Mon, Aug 19, 2013 at 11:50 PM, Jonathan Ellis <Jonathan.Ellis@...
> wrote:

> Hi Lukas,
>
> Thanks for your reply.
>
> > Do they contain non-mapped reads?
>
> I hadn't realised, but yes they do.  Each of the problematic BAMs
> contains a small number of unmapped reads (between 1 and 3).
>
> > MEDIPS 1.10.0 reports an error, if there are none mapped reads in the
> > bam files,
>
> Running MEDIPS 1.10.0, I do not see any errors or warnings about
> unmapped reads only the error I previously posted.
>
> I tried the code you suggested and ended up with a data frame that did
> not contain any NA values; however, if I try the code without the
> scanFlag parameter I get a data frame that does contain NA values.  I've
> created a set of BAMs with unmapped reads filtered out, and I've run
> MEDIPS again.  This time MEDIPS successfully runs on the data set.
>
> Thanks again, your reply has saved me a great deal of time,
>
> Jonathan
>
>
> On Fri, Aug 16, 2013 at 12:26:06PM -0700, Lukas Chavez wrote:
> > Hi Jonathan,
> >
> > I have not encountered this problem previously and do not have an
> > immediate solution.
> >
> > You state that 16 out of 20 bam files can be processed without a
> > problem by applying
> > mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam", BSgenome =
> > "Hsapiens", sample_name = "0139202").
> > This surprises me a little bit, because you have to state the whole
> > BSgenome name, for example BSgenome ="BSgenome.Hsapiens.UCSC.hg19".
> > [It was pointed out few days ago that I can make the package much more
> > flexible by allowing other data types- Hervé, thank you for the hint,
> > we will definitely consider this]. However, this does not seem to be
> > your problem as you encounter problems for only 4 out of 20 bam files.
> >
> > Therefore, I assume that there is something strange with these
> > specific bam files. Do they contain non-mapped reads? MEDIPS 1.10.0
> > reports an error, if there are none mapped reads in the bam files, but
> > the latest version of MEDIPS available in the development branch can
> > deal with this. However, based on the example read you've sent this
> > might not cause the error. The last regular status report you get is
> > 'Creating GRange Object...'. This happens in the getGRange() function
> > called by MEDIPS.createSet() after the bam files has been imported.
> > The command that probably causes the error is
> >
> > regions_GRange = GRanges(seqnames=regions$chr,
> > ranges=IRanges(start=regions$start, end=regions$stop),
> > strand=regions$strand)
> >
> > but I do not understand why you have missing values. In order to
> > understand this, you might want to try to import your bam file by
> > yourself using
> >
> > library(Rsamtools)
> > scanFlag = scanBamFlag(isUnmappedQuery = F) #this will ignore unmapped
> > reads in the bam file and is implemented in the latest dev version
> > (1.11.16).
> > scanParam=ScanBamParam(flag=scanFlag, what = c("rname", "pos", "strand",
> > "qwidth"))
> > regions = scanBam(file="0139202.fq.sam.noDUP.bam.qf.bam",
> param=scanParam)
> > regions = do.call(rbind,lapply(regions, as.data.frame,
> stringsAsFactors=F))
> > regions = data.frame(chr=as.character(as.vector(regions$rname)),
> > start=as.numeric(as.vector(regions$pos)),
> > stop=as.numeric(as.vector(regions$pos)+as.vector(regions$qwidth)-1),
> > strand=as.character(as.vector(regions$strand)), stringsAsFactors=F)
> >
> > Afterwards, it will be necessary to investigate where your regions
> > object has NA values.
> >
> > Another thing: do you have bam index files in the same directory where
> > you store the bam file? MEDIPS will make use of these, if they are
> > available (using slightly different code as stated above). In case
> > there is a discrepancy between the index and bam files (having the
> > same prefix file name), this might cause a problem as well.
> >
> > Finally, you can convert your bam files into bed files (by yourself)
> > and try uploading simple txt based bed files into MEDIPS.
> >
> > I hope we will identify the root of your error.
> > Lukas
> >
> > On Thu, Aug 15, 2013 at 9:28 PM, Jonathan Ellis
> > <Jonathan.Ellis@...>wrote:
> >
> > > Dear Lukas and list,
> > >
> > > I'm trying to process a set of BAM files using the latest version of
> > > MEDIPS (1.10.0), but have run into problems creating MEDIPSset
> > > objects for some BAM file.  The following is an example, but I'm
> > > getting the same error for 4 out of 20 BAM files.
> > >
> > > > mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam",
> > > +     BSgenome = "Hsapiens", sample_name = "0139202")
> > > Reading bam alignment 0139202.fq.sam.noDUP.bam.qf.bam
> > > Total number of imported short reads: 13813686
> > > Creating GRange Object...
> > > Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE =
> "IRanges")
> > > :
> > >   solving row 11345799: range cannot be determined from the supplied
> > > arguments
> > >   (too many NAs)
> > >
> > > The data are single-end reads that were aligned with bwa version
> > > 0.5.9-r16 (the alignments were done some time ago hence the older
> > > version of bwa), and the corresponding line from the BAM (11,345,799)
> > > is:
> > >
> > > 19441177        16      chr17   38161416        18      49M     *
>   0
> > >       0       TAGAGTCCGGCGTTCAGGGGCAGGAAGCATCCAGCACGGGAGAAAGATG
> > > BBBBBBBBBBBBBBBBBd_IIX[ffgb[[YJb^d^[[ggee^^cc^^^_      X0:i:1  X1:i:3
> > >  XA:Z:chr1,+56337006,49M,3;chr12,-55896993,49M,3;chr4,-22778322,49M,3;
> > > MD:Z:8A7A32     XG:i:0  NM:i:2  XM:i:2  XO:i:0XT:A:U
> > >
> > > I'm unsure whether this is a problem with the MEDIPS package or
> > > something from GRanges/IRanges.  As far as I understand the .Call2
> > > function is part of IRanges, but I assume it's failing due to
> > > something passed by MEDIPS.  Any advice or pointers would be greatly
> > > appreciated.
> > >
> > > > sessionInfo()
> > > R version 3.0.1 (2013-05-16)
> > > Platform: x86_64-unknown-linux-gnu (64-bit)
> > >
> > > locale:
> > >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> > >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> > >  [7] LC_PAPER=C                 LC_NAME=C
> > >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> > >
> > > attached base packages:
> > > [1] parallel  stats     graphics  grDevices utils     datasets  methods
> > > [8] base
> > >
> > > other attached packages:
> > >  [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 MEDIPS_1.10.0
> > >  [3] DNAcopy_1.34.0                     BSgenome_1.28.0
> > >  [5] FDb.InfiniumMethylation.hg19_1.0.1 rtracklayer_1.20.4
> > >  [7] Biostrings_2.28.0                  GenomicFeatures_1.12.3
> > >  [9] AnnotationDbi_1.22.6               Biobase_2.20.1
> > > [11] GenomicRanges_1.12.4               IRanges_1.18.2
> > > [13] BiocGenerics_0.6.0                 BiocInstaller_1.10.3
> > >
> > > loaded via a namespace (and not attached):
> > >  [1] biomaRt_2.16.0   bitops_1.0-5     DBI_0.2-7        gtools_3.0.0
> > >  [5] RCurl_1.95-4.1   Rsamtools_1.12.3 RSQLite_0.11.4   stats4_3.0.1
> > >  [9] tools_3.0.1      XML_3.98-1.1     zlibbioc_1.6.0
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@...
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Gmane