Lukas Chavez | 16 Aug 21:26 2013

Re: MEDIPS, GRanges, IRanges error processing BAM file

Hi Jonathan,

I have not encountered this problem previously and do not have an immediate
solution.

You state that 16 out of 20 bam files can be processed without a problem by
applying
mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam", BSgenome =
"Hsapiens", sample_name = "0139202").
This surprises me a little bit, because you have to state the whole
BSgenome name, for example BSgenome ="BSgenome.Hsapiens.UCSC.hg19". [It was
pointed out few days ago that I can make the package much more flexible by
allowing other data types- Hervé, thank you for the hint, we will
definitely consider this]. However, this does not seem to be your problem
as you encounter problems for only 4 out of 20 bam files.

Therefore, I assume that there is something strange with these specific bam
files. Do they contain non-mapped reads? MEDIPS 1.10.0 reports an error, if
there are none mapped reads in the bam files, but the latest version of
MEDIPS available in the development branch can deal with this. However,
based on the example read you've sent this might not cause the error. The
last regular status report you get is 'Creating GRange Object...'. This
happens in the getGRange() function called by MEDIPS.createSet() after the
bam files has been imported. The command that probably causes the error is

regions_GRange = GRanges(seqnames=regions$chr,
ranges=IRanges(start=regions$start, end=regions$stop),
strand=regions$strand)

but I do not understand why you have missing values. In order to understand
this, you might want to try to import your bam file by yourself using

library(Rsamtools)
scanFlag = scanBamFlag(isUnmappedQuery = F) #this will ignore unmapped
reads in the bam file and is implemented in the latest dev version
(1.11.16).
scanParam=ScanBamParam(flag=scanFlag, what = c("rname", "pos", "strand",
"qwidth"))
regions = scanBam(file="0139202.fq.sam.noDUP.bam.qf.bam", param=scanParam)
regions = do.call(rbind,lapply(regions, as.data.frame, stringsAsFactors=F))
regions = data.frame(chr=as.character(as.vector(regions$rname)),
start=as.numeric(as.vector(regions$pos)),
stop=as.numeric(as.vector(regions$pos)+as.vector(regions$qwidth)-1),
strand=as.character(as.vector(regions$strand)), stringsAsFactors=F)

Afterwards, it will be necessary to investigate where your regions object
has NA values.

Another thing: do you have bam index files in the same directory where you
store the bam file? MEDIPS will make use of these, if they are available
(using slightly different code as stated above). In case there is a
discrepancy between the index and bam files (having the same prefix file
name), this might cause a problem as well.

Finally, you can convert your bam files into bed files (by yourself) and
try uploading simple txt based bed files into MEDIPS.

I hope we will identify the root of your error.
Lukas

On Thu, Aug 15, 2013 at 9:28 PM, Jonathan Ellis
<Jonathan.Ellis@...>wrote:

> Dear Lukas and list,
>
> I'm trying to process a set of BAM files using the latest version of
> MEDIPS (1.10.0), but have run into problems creating MEDIPSset objects
> for some BAM file.  The following is an example, but I'm getting the
> same error for 4 out of 20 BAM files.
>
> > mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam",
> +     BSgenome = "Hsapiens", sample_name = "0139202")
> Reading bam alignment 0139202.fq.sam.noDUP.bam.qf.bam
> Total number of imported short reads: 13813686
> Creating GRange Object...
> Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges")
> :
>   solving row 11345799: range cannot be determined from the supplied
> arguments
>   (too many NAs)
>
> The data are single-end reads that were aligned with bwa version
> 0.5.9-r16 (the alignments were done some time ago hence the older
> version of bwa), and the corresponding line from the BAM (11,345,799)
> is:
>
> 19441177        16      chr17   38161416        18      49M     *       0
>       0       TAGAGTCCGGCGTTCAGGGGCAGGAAGCATCCAGCACGGGAGAAAGATG
> BBBBBBBBBBBBBBBBBd_IIX[ffgb[[YJb^d^[[ggee^^cc^^^_      X0:i:1  X1:i:3
>  XA:Z:chr1,+56337006,49M,3;chr12,-55896993,49M,3;chr4,-22778322,49M,3;
> MD:Z:8A7A32     XG:i:0  NM:i:2  XM:i:2  XO:i:0XT:A:U
>
> I'm unsure whether this is a problem with the MEDIPS package or
> something from GRanges/IRanges.  As far as I understand the .Call2
> function is part of IRanges, but I assume it's failing due to something
> passed by MEDIPS.  Any advice or pointers would be greatly appreciated.
>
> > sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 MEDIPS_1.10.0
>  [3] DNAcopy_1.34.0                     BSgenome_1.28.0
>  [5] FDb.InfiniumMethylation.hg19_1.0.1 rtracklayer_1.20.4
>  [7] Biostrings_2.28.0                  GenomicFeatures_1.12.3
>  [9] AnnotationDbi_1.22.6               Biobase_2.20.1
> [11] GenomicRanges_1.12.4               IRanges_1.18.2
> [13] BiocGenerics_0.6.0                 BiocInstaller_1.10.3
>
> loaded via a namespace (and not attached):
>  [1] biomaRt_2.16.0   bitops_1.0-5     DBI_0.2-7        gtools_3.0.0
>  [5] RCurl_1.95-4.1   Rsamtools_1.12.3 RSQLite_0.11.4   stats4_3.0.1
>  [9] tools_3.0.1      XML_3.98-1.1     zlibbioc_1.6.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Gmane