Sam McInturf | 24 Dec 13:24 2013
Picon

Re: Problems Counting Reads with summarizeOverlaps

Valerie,
  Sorry for the lame error, was reading your comments about countOverlaps.

I found my problem in that the features data had strand information, while
my sequencing was not stranded, so by setting the strand information to "*"
in the GRanges object and then reduce(split()) I was able to map succesfully

Thanks for the help!

On Saturday, December 21, 2013, Valerie Obenchain wrote:

> On 12/21/13 05:14, Sam McInturf wrote:
>
>> Valerie and Reema,
>> I am using single end reads.
>>
>> I did as Valerie suggested
>>
>>   so <- summarizeOverlaps(tx, reads, mode="Union",type = "any")
>>
>
> This is not the example I used below. You did not include
> 'inter.feature=FALSE'. 'type' is not an argument to summarizeOverlaps(). In
> other words, do not use 'type', use 'inter.feature'. Please see my previous
> explanation and the man page for summarizeOverlaps().
>
>
> Valerie
>
>
>  colSums(assays(so)$counts
>> #185,940
>>
>> which feels silly, allow for double counts and return fewer counts than
>> prior
>>
>> P.S. thank you for your prompt reply, my email "hid" it from me
>> Best
>>
>> Sam
>>
>>
>>
>> On Mon, Dec 9, 2013 at 6:08 PM, Valerie Obenchain <vobencha@...
>> <mailto:vobencha@...>> wrote:
>>
>>     Reema,
>>
>>     Thank you for pointing out the error in the summarizeOverlaps
>>     vignette. That sentence was a leftover from a previous version and
>>     should have been removed. I have updated the vignette in both
>>     release (GenomicRanges 1.14.4) and devel (GenomicAlignments 0.99.8).
>>
>>     When the 'reads' argument is a GAlignmentPairs object the
>>     stand-alone functions of Union(), IntersectionStrict() and
>>     IntersectionNotEmpty() count the data as paired-end. If 'reads' is a
>>     GAlignments object they are counted as single-end.
>>     summarizeOverlaps() reads the data into a GAlignments or
>>     GAlignmentPairs object and calls these functions internally; they
>>     are exactly the same.
>>
>>     Valerie
>>
>>
>>
>>     On 12/09/2013 02:46 AM, Reema Singh wrote:
>>
>>         Hi all,
>>
>>         Indeed summarizeOverlaps() works on paired end data. But different
>>         counting modes [Union, Intersection and IntersectionNotEmpty]
>>         they do
>>         not handle paired end reads. Here the reference
>>         [http://www.bioconductor.org/__packages/release/bioc/__
>> vignettes/GenomicRanges/inst/__doc/summarizeOverlaps.pdf]-
>>         <http://www.bioconductor.org/packages/release/bioc/
>> vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf%5D->
>>         Page 2 [under Counting Modes].
>>
>>         Kind Regards
>>
>>
>>
>>         On Sun, Dec 8, 2013 at 6:31 PM, Valerie Obenchain
>>         <vobencha@... <mailto:vobencha@...>
>>         <mailto:vobencha@...
<mailto:vobencha@...>>> wrote:
>>
>>              summarizeOverlaps() does count paired-end reads. Use option
>>              singleEnd=FALSE. Details under 'singleEnd' in the 'Arguments'
>>              section on the man page.
>>
>>              library(GenomicAlignments) ## devel
>>              library(GenomicRanges) ## release
>>              ?summarizeOverlaps
>>
>>              Valerie
>>
>>
>>              On 12/08/13 09:42, Reema Singh wrote:
>>
>>                  Hi Sam,
>>
>>                  It depends on whether you are using single read or
>>         paired end reads
>>                  files. As far as I known summarizeOverlaps function
>>         only working
>>                  in case
>>                  of single read. Anyone please correct me if I am wrong.
>>
>>                  Kind Regards
>>
>>
>>                  On Sun, Dec 8, 2013 at 8:25 PM, Valerie Obenchain
>>                  <vobencha@... <mailto:vobencha@...>
>>         <mailto:vobencha@... <mailto:vobencha@...>>
>>                  <mailto:vobencha@... <mailto:vobencha@...>
>>         <mailto:vobencha@...
<mailto:vobencha@...>>>> wrote:
>>
>>                       Hi Sam,
>>
>>                       Are you getting any error messages related to
>>         seqlevels
>>                  when you
>>                       count? If you have confirmed that annotation
>> seqlevels
>>                  match the bam
>>                       files next look at the maximum possible overlap in
>>         a single
>>                  bam file.
>>
>>                          bf <- singleBamFile
>>                          reads <- readGAlignments(bf)  ## assuming
>>         single-end reads
>>                          so <- summarizeOverlaps(tx, reads, mode=Union,
>>                  inter.feature=FALSE)
>>
>>                       'inter.feature=FALSE' with 'mode=Union' is
>>         countOverlaps with
>>                       'type=ANY'. Reads that hit multiple features will
>>         be double
>>                  counted
>>                       in this case. The idea to to see if there are any
>>         common
>>                  regions at all.
>>
>>                       If there are still no counts, you could look more
>>         closely at a
>>                       smaller chromosome region in 'reads' where you
>>         would expect
>>                  counts.
>>                       This visual inspection might give you some insight
>>         as to
>>                  why there
>>                       is no overlap.
>>
>>
>>                       Valerie
>>
>>
>>                       On 12/05/13 18:51, Sam McInturf wrote:
>>
>>                           Hello Bioconductors,
>>                                I am working on an Arabidopsis RNA seq
>>         set, and after
>>                           executing my count
>>                           reads script, I get a count table with no
>>         reads counted (ie
>>                           sum(colSums(mat)) = 0).
>>
>>                           I mapped my reads using Tophat2 (without
>>         supplying a
>>                  gtf/gff
>>                           file) and used
>>                           samtools to sort and index my
>>         accepted_hits.bam files.
>>                    I loaded
>>                           these bam
>>                           files into IGV (integrated Genome Browser) and
>> the
>>                  reads appear
>>                           to have
>>                           been mapped (additionally the
>>         align_summary.txt says I
>>                  have good
>>                           mapping).
>>                              Script and session info below.
>>
>>                           Essentially I use Rsamtools to load in my
>>         bamfiles, and
>>                  then
>>                           makeTranscriptsFromGFF to make a txdb object,
>>                  transcriptsBy to get
>>                           transcripts, and then summarizeOverlaps to
>>         count the
>>                  reads.  My
>>                           gff file is
>>                           TAIR10.  I am relatively sure that the genome
>>         I mapped
>>                  to was
>>                           the TAIR10
>>                           release, but I am not 100% on that.
>>
>>                           I have also used biomaRt to do this (commented
>>         out),
>>                  but I had
>>                           the same
>>                           results.
>>
>>                           I am currently upgrading to R 3.0.2 and
>>         re-installing
>>                           bioconductor (
>>  "/data/smcintur/Annotation/______TAIR10_GFF3_genes.gff";
>>                           library(ShortRead)
>>                           library(GenomicFeatures)
>>                           library(Rsamtools)
>>                           #library(biomaRt)
>>
>>
>>         setwd("/data/smcintur/RNASeq/______NMseq/newTophat/BamFiles/")
>>
>>
>>                           bf <- list.files(pattern = ".bam$")
>>                           bi <- list.files(pattern = "bam.bai$")
>>                           bfl <- BamFileList(bf, bi)
>>
>>                           #mart <-
>>         makeTranscriptDbFromBiomart(______biomart =
>>
>>
>>                           "Public_TAIRV10", dataset=
>>                           "tairv10")
>>                           tx <- transcriptsBy(mart)
>>                           gff <- makeTranscriptDbFromGFF(______gffdir,
>>         format = "gff")
>>
>>
>>                           gff
>>
>>                           cds <- cdsBy(gff)
>>                           tx <- transcriptsBy(gff)
>>
>>                           olapTx <- summarizeOverlaps(tx, bfl)
>>                           olapCds <- summarizeOverlaps(cds, bfl)
>>
>>                           txMat <- assays(olapTx)$counts
>>                           cdsMat <- assays(olapCds)$counts
>>
>>                           write.table(txMat, file = "countMatTxGff.txt",
>>         sep = "\t")
>>                           write.table(cdsMat, file =
>>         "countMatCdsGff.txt", sep =
>>                  "\t")
>>
>>                           sessionInfo
>>                           R version 3.0.0 (2013-04-03)
>>                           Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>                           locale:
>>                              [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>                              [3] LC_TIME=en_US.UTF-8
>>           LC_COLLATE=en_US.UTF-8
>>                              [5] LC_MONETARY=en_US.UTF-8
>>           LC_MESSAGES=en_US.UTF-8
>>                              [7] LC_PAPER=C                 LC_NAME=C
>>                              [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>                           [11] LC_MEASUREMENT=en_US.UTF-8
>>         LC_IDENTIFICATION=C
>>
>>                           attached base packages:
>>                           [1] stats     graphics  grDevices utils
>>         datasets
>>                    methods   base
>>
>>                           other attached packages:
>>                           [1] BiocInstaller_1.12.0
>>
>>                           loaded via a namespace (and not attached):
>>                              [1] AnnotationDbi_1.24.0   Biobase_2.22.0
>>                           BiocGenerics_0.8.0
>>                              [4] biomaRt_2.18.0         Biostrings_2.30.1
>>                    bitops_1.0-6
>>                              [7] BSgenome_1.30.0        DBI_0.2-7
>>                             GenomicFeatures_1.14.2
>>                           [10] GenomicRanges_1.14.3   IRanges_1.20.6
>>                  parallel_3.0.0
>>                           [13] RCurl_1.95-4.1         Rsamtools_1.14.2
>>                  RSQLite_0.11.4
>>                           [16] rtracklayer_1.22.0     stats4_3.0.0
>>                  tools_3.0.0
>>                           [19] XML_3.98-1.1           XVector_0.2.0
>>                    zlibbioc_1.8.0
>>
>>
>>                       ______________________________
>> _______________________
>>
>>                       Bioconductor mailing list
>>         Bioconductor@... <mailto:Bioconductor@...>
>>         <mailto:Bioconductor <at> r-__project.org
>>         <mailto:Bioconductor@...>>
>>                  <mailto:Bioconductor <at> r-____project.org
>>         <mailto:Bioconductor <at> r-__project.org>
>>                  <mailto:Bioconductor <at> r-__project.org
>>         <mailto:Bioconductor@...>>>
>>         https://stat.ethz.ch/mailman/______listinfo/bioconductor
>>         <https://stat.ethz.ch/mailman/____listinfo/bioconductor>
>>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>>         <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>>
>>
>>
>>           <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>>         <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor
>>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
>>                       Search the archives:
>>         http://news.gmane.org/gmane.______science.biology.
>> informatics.______conductor
>>         <http://news.gmane.org/gmane.____science.biology.
>> informatics.____conductor>
>>
>>         <http://news.gmane.org/gmane.____science.biology.
>> informatics.____conductor
>>         <http://news.gmane.org/gmane.__science.biology.informatics._
>> _conductor>>
>>
>>
>>         <http://news.gmane.org/gmane.____science.biology.
>> informatics.____conductor
>>         <http://news.gmane.org/gmane.__science.biology.informatics._
>> _conductor>
>>
>>         <http://news.gmane.org/gmane.__science.biology.informatics._
>> _conductor
>>         <http://news.gmane.org/gmane.science.biology.informatics.
>> conductor>>>
>>
>>
>>
>>
>>
>>                  --
>>                  Reema Singh
>>                  PhD Scholar
>>                  Computational Biology and Bioinformatics
>>                  School of Computational and Integrative Sciences
>>                  Jawaharlal Nehru University
>>                  New Delhi-110067
>>                  INDIA
>>
>>
>>
>>
>>
>>         --
>>         Reema Singh
>>         Postdoctoral Research Assistant
>>         College of Life Sciences
>>         University of Dundee,
>>         Dundee DD1 4HN, Scotland
>>         United Kingdom
>>
>>
>>
>>     --
>>     Valerie Obenchain
>>
>>     Program in Computational Biology
>>     Division of Public Health Sciences
>>     Fred Hutchinson Cancer Research Center
>>     1100 Fairview Ave. N, M1-B155
>>     P.O. Box 19024
>>     Seattle, WA 98109-1024
>>
>>     E-mail: vobencha@... <mailto:vobencha@...>
>>     Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
>>
>> --
>> Sam McInturf
>>
>
>

--

-- 
Sam McInturf

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane