Valerie Obenchain | 6 Oct 00:42 2012

Re: predictCoding() return empty Granges

Hello,

This was a bug. Thanks for reporting it. It is fixed in the release 
version 1.4.1 and devel 1.5.2.  Tomorrow's builds have already started 
so these versions will available through biocLite() Sunday after 9am PST 
(or through svn immediately).

Thanks,
Valerie

On 10/05/2012 01:38 PM, sun wrote:
> Hi Valerie,
>
> After I loaded correct BSgenome:  
> library(BSgenome.Athaliana.TAIR.TAIR9), the predictCoding() returned 
> the coding SNPs results to me.
>
> However, the locateVariants() still had error message.
>
> rd <- rowData(vcf)
>  intersect(seqlevels(vcf), seqlevels(txdb))
> [1] "Chr1"
>
> > allvar <- locateVariants(rd, txdb, AllVariants())
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "locateVariants", 
> for signature "GRanges", "GRanges", "PromoterVariants"
> > allvar <- locateVariants(vcf, txdb, AllVariants())
> Error in function (classes, fdef, mtable)  :
>   unable to find an inherited method for function "locateVariants", 
> for signature "GRanges", "GRanges", "PromoterVariants"
>
> > sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] BSgenome.Athaliana.TAIR.TAIR9_1.3.18 
> BSgenome_1.26.0                      GenomicFeatures_1.10.0
>  [4] AnnotationDbi_1.20.0                 
> Biobase_2.18.0                       VariantAnnotation_1.4.0
>  [7] Rsamtools_1.10.1                     
> Biostrings_2.26.0                    GenomicRanges_1.10.0
> [10] IRanges_1.16.2                       
> BiocGenerics_0.4.0                   vimcom_0.9-2
> [13] setwidth_1.0-0                       colorout_0.9-9
>
> loaded via a namespace (and not attached):
>  [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2     
> XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1     parallel_2.15.1
>  [8] rtracklayer_1.18.0 stats4_2.15.1      tools_2.15.1       
> zlibbioc_1.4.0
>
> I actually can run my script (predictCoding() and locateVariants() )in 
> Bioconductor 2.10 now, since in my VCF, TranscriptDB, TAIR9 BSgenome, 
> the chr names are all "Chr".
>
> Thanks,
>
> Rebecca
>
> On Fri, Oct 5, 2012 at 1:11 PM, sun <fireflysrb@... 
> <mailto:fireflysrb@...>> wrote:
>
>     Hi Herve,
>
>     Thanks for reminding me. I used
>     library(BSgenome.Athaliana.TAIR.TAIR9) to instead.
>
>     Rebecca
>
>
>     On Fri, Oct 5, 2012 at 12:56 PM, Hervé Pagès <hpages@...
>     <mailto:hpages@...>> wrote:
>
>         Hi Rebecca,
>
>
>         On 10/05/2012 11:31 AM, sun wrote:
>
>             Hi Valerie,
>
>             I did make VCF object and TranscriptDb with same
>             chromosome name in my last
>             test. I am still in trouble when I use locateVariants()
>             and predictCoding()
>             in newest released Bioconductor 2.11. I can use
>             locateVariants()
>             successfully in Bioconductor 2.10. Due to I can not adjust
>             chr name of txdb
>             to match the chr name of BSgenome, so I can not run
>              predictCoding() in
>             Bioconductor 2.10 for my Arabidopsis TAIR10 dataset.
>
>             I attached an small vcf file "var.subset.vcf"  and the
>             Arabi.sqlite to you,
>             which I used in my test. The following is the script I
>             used in Bioconductor
>             2.11.
>
>             library(VariantAnnotation)
>             library(GenomicFeatures)
>
>             vcf <- readVcf("var.subset.vcf","TAIR10")e
>
>             ######################################################
>             #### Arabi.sqlite was generated by makeTranscriptDbFromGFF().
>             #### library(GenomicFeatures)
>             ####  ArabiTranDB <-
>             makeTranscriptDbFromGFF(file="TAIR10.gff3",
>             ####                format="gff3",
>             ####                dataSource="TAIR 10",
>             ####                species="Arabodpsis")
>
>             ####  saveDb(ArabiTranDB,file="Arabi.sqlite")
>             ######################################################
>
>             txdb <-loadDb("Arabi.sqlite")
>             intersect(seqlevels(vcf), seqlevels(txdb))
>             [1] "Chr1"
>
>             allvar <- locateVariants(vcf, txdb, AllVariants())
>             *Error in function (classes, fdef, mtable)  :
>
>                unable to find an inherited method for function
>             "locateVariants", for
>             signature "GRanges", "GRanges", "PromoterVariants"
>
>             *rd <- rowData(vcf)*
>             *allvar <- locateVariants(rd, txdb, AllVariants())*
>
>             Error in function (classes, fdef, mtable)  :
>                unable to find an inherited method for function
>             "locateVariants", for
>             signature "GRanges", "GRanges", "PromoterVariants"
>
>             *### try to find coding SNPs*
>             *## Get TAIR BSgenome. This BSgenome is for TAIR9, but
>             TAIR10 is the same
>
>             assembly as TAIR9, see Hervé's reply.
>             library(BSgenome.Athaliana.TAIR.04232008)
>
>
>         FWIW, you're still using the wrong BSgenome package. Please
>         re-read
>         carefully my 2 previous emails on this.
>
>         Thanks,
>         H.
>
>
>             ## adjust chr name of txdb to match chr name of BSgenome
>             seqlevels(txdb) <- sub("^C", "c", seqlevels(txdb))
>             ## adjust chr name of vcf to match chr name of BSgenome
>             vcf <- renameSeqlevels(vcf, c("Chr1"="chr1"))
>             ## confirm if the chr name mached
>             intersect(seqlevels(vcf), seqlevels(txdb))
>             [1] "chr1"
>             coding1 <- predictCoding(vcf, txdb, Athaliana)
>
>                 coding1
>
>
>             GRanges with 0 ranges and 1 metadata column:
>                 seqnames    ranges strand |      GENEID
>             <Rle> <IRanges> <Rle> | <character>
>                ---
>                seqlengths:
>
>             *In this test, I removed variants in ChrM to get a subset
>             of vcf, that is
>
>             why I didn't get the  *Warning message: In
>             .setActiveSubjectSeq(query,
>             subject) : circular sequence(s) in query 'chrM' ignored*
>             that I got in my
>             last test. However, predictCoding() still returned an
>             empty GRanges object
>             to me.
>
>             *
>
>                 sessionInfo()
>
>             R version 2.15.1 (2012-06-22)
>             Platform: x86_64-unknown-linux-gnu (64-bit)
>
>             locale:
>             [1] C
>
>             attached base packages:
>             [1] stats     graphics  grDevices utils     datasets
>              methods   base
>
>             other attached packages:
>               [1] BSgenome.Athaliana.TAIR.04232008_1.3.18
>             BSgenome_1.26.0
>             GenomicFeatures_1.10.0
>             AnnotationDbi_1.20.0
>               [5] Biobase_2.18.0
>             VariantAnnotation_1.4.0
>             Rsamtools_1.10.1
>             Biostrings_2.26.0
>               [9] GenomicRanges_1.10.0
>             IRanges_1.16.2
>             BiocGenerics_0.4.0
>             vimcom_0.9-2
>             [13] setwidth_1.0-0
>             colorout_0.9-9
>
>             loaded via a namespace (and not attached):
>               [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2
>             XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1    
>             parallel_2.15.1
>             rtracklayer_1.18.0
>               [9] stats4_2.15.1      tools_2.15.1       zlibbioc_1.4.0
>                *
>
>
>             *Thanks for the suggestion.
>
>
>             Rebecca
>
>
>             On Thu, Oct 4, 2012 at 5:01 PM, Valerie Obenchain
>             <vobencha@... <mailto:vobencha@...>>wrote:
>
>                 **
>
>                 Hi Sun,
>
>
>
>                 On 10/04/2012 02:44 PM, sun wrote:
>
>                 After I successfully adjusted the chromosomes of VCF
>                 and TranscriptDb
>                 objects to match the names of the BSgenome object, I
>                 ran predictCoding(),
>                 it returned nothing except a warning message, see below,
>
>
>                   coding <- predictCoding(vcf, txdb, seqSource=Athaliana)
>
>                   *Warning message:
>                 In .setActiveSubjectSeq(query, subject) :
>                    circular sequence(s) in query 'chrM' ignored*
>
>                   coding
>
>                   GRanges with 0 ranges and 1 metadata column:
>                     seqnames    ranges strand |      GENEID
>                 <Rle> <IRanges> <Rle> | <character>
>                    ---
>                    seqlengths:
>
>                 Here are the values of  VCF, TranscriptDb and BSgenome
>                 Object,
>
>
>                   vcf
>
>                   class: VCF
>                 dim: 551201 2
>                 genome: TAIR10
>                 exptData(1): header
>                 fixed(4): REF ALT QUAL FILTER
>                 info(18): DP DP4 ... PR VDB
>                 geno(6): GT GQ ... SP PL
>                 rownames(551201): Chr1:711 Chr1:956 ... ChrM:366919
>                 ChrM:366920  *Qustion:
>                 After used renameSeqlevels() to change chr name of vcf
>                 (Chr to chr), the
>                 rownames here are still old chr name. not sure if this
>                 will be the problem
>                 for predictCoding(). *
>
>
>                 Yes, having different chromosome names in the VCF file
>                 and the txdb will
>                 be a problem. Any time we want to match / find
>                 overlaps in ranges it is
>                 important that the seqnames (i.e., chromosomes) match.
>                 There is an example
>                 of how to use renameSeqlevels() on a VCF file on the
>                 predictCoding() man
>                 page.
>
>                 library(VariantAnnotation)
>                 ?predictCoding
>
>                 ## Read variants from a VCF file
>                 fl <- system.file("extdata", "chr22.vcf.gz",
>                 package="VariantAnnotation")
>                 vcf <- readVcf(fl, "hg19")
>
>                 ## Rename seqlevels in the VCF object to match those
>                 in the TxDb
>                 vcf<- renameSeqlevels(vcf, c("22"="chr22"))
>                 ## Confirm common seqlevels
>                 intersect(seqlevels(vcf), seqlevels(txdb))
>
>                 Be sure to confirm that you have common seqlevels
>                 between your VCF and
>                 annotation. If you are having trouble please send a
>                 small reproducible
>                 example.
>
>                 Valerie
>
>
>                   rowData values names(1): paramRangeID
>                 colnames(2): sample1_aln.sorted.bam sample2_aln.sorted.bam
>                 colData names(1): Samples
>
>
>                   txdb
>
>                   TranscriptDb object:
>                 | Db type: TranscriptDb
>                 | Supporting package: GenomicFeatures
>                 | Data source: TAIR 10
>                 | Genus and Species: Arabodpsis
>                 | miRBase build ID: NA
>                 | transcript_nrow: 35386
>                 | exon_nrow: 207465
>                 | cds_nrow: 197160
>                 | Db created by: GenomicFeatures package from Bioconductor
>                 | Creation time: 2012-09-28 16:43:28 -0700 (Fri, 28
>                 Sep 2012)
>                 | GenomicFeatures version at creation time: 1.9.37
>                 | RSQLite version at creation time: 0.11.1
>                 | DBSCHEMAVERSION: 1.0
>
>
>                   Athaliana
>
>                   Arabidopsis genome
>                 |
>                 | organism: Arabidopsis thaliana (Arabidopsis)
>                 | provider: TAIR
>                 | provider version: 04232008
>                 | release date: NA
>                 | release name: dumped from ADB: Mar/14/08
>                 |
>                 | sequences (see '?seqnames'):
>                 |   chr1  chr2  chr3  chr4  chr5  chrC  chrM
>                 |
>                 | (use the '$' or '[[' operator to access a given
>                 sequence)
>
>
>                   sessionInfo()
>
>                   R version 2.15.1 (2012-06-22)
>                 Platform: x86_64-unknown-linux-gnu (64-bit)
>
>                 locale:
>                 [1] C
>
>                 attached base packages:
>                 [1] stats     graphics  grDevices utils     datasets
>                  methods   base
>
>                 other attached packages:
>                   [1] BSgenome.Athaliana.TAIR.04232008_1.3.18
>                 BSgenome_1.26.0
>                 VariantAnnotation_1.4.0
>                   [4] Rsamtools_1.10.1
>                 Biostrings_2.26.0
>                 GenomicFeatures_1.10.0
>                   [7] AnnotationDbi_1.20.0
>                 Biobase_2.18.0
>                 GenomicRanges_1.10.0
>                 [10] IRanges_1.16.2
>                 BiocGenerics_0.4.0
>                 vimcom_0.9-2
>                 [13] setwidth_1.0-0
>                 colorout_0.9-9
>
>                 loaded via a namespace (and not attached):
>                   [1] DBI_0.2-5          RCurl_1.95-0.1.2   RSQLite_0.11.2
>                 XML_3.95-0.1       biomaRt_2.14.0     bitops_1.0-4.1
>                   [7] parallel_2.15.1    rtracklayer_1.18.0 stats4_2.15.1
>                 tools_2.15.1       zlibbioc_1.4.0
>
>                 any ideas? thank you.
>
>                 Rebecca
>
>
>                 On Thu, Oct 4, 2012 at 1:18 PM, Hervé Pagès
>                 <hpages@... <mailto:hpages@...>>
>                 <hpages@...
<mailto:hpages@...>> wrote:
>
>
>                   Hi Rebecca,
>
>
>                 On 10/04/2012 12:10 PM, sun wrote:
>
>
>                   Hi All,
>
>                 I am going to use "coding <- predictCoding(vcf, txdb,
>                 seqSource=Athaliana)"
>                 to detect coding SNPs. The problem is that the
>                 chromosome names are not
>                 consistent among VCF, txdb and BSgenome. In vcf, the
>                 chromosome name is
>                 "Chr*", in txdb, the chr name is "Chr", but in
>                 BSgenome, the chr name is
>                 "chr*" .
>
>                 I know I can use renameSeqlevels() to adjust the
>                 seqlevels (chromosome
>                 names) of the VCF object to match that of the txdb
>                 annotation. But how can
>                 I adjust the chr name of BSgenome or TranscriptDB?
>
>
>                   In BioC 2.11 (released yesterday), you can rename
>                 the chromosomes of a
>                 TranscriptDb object, so you could rename the
>                 chromosomes of your
>                 VCF and TranscriptDb objects to match the names of the
>                 BSgenome object.
>
>                 E.g. for the TranscriptDb object:
>
>                    seqlevels(txdb) <- sub("^c", "C", seqlevels(txdb))
>
>                 Note that renaming the chromosomes of a TranscriptDb
>                 object is a new
>                 feature and is not fully implemented yet. For example,
>                 if you use
>                 select() on the object you'll still get the original
>                 names (those
>                 stored in the db), and if you try to specify a
>                 chromosome name thru
>                 the 'vals' arg of the transcripts(), exons() and cds()
>                 extractors,
>                 you still need to use the original names. This will be
>                 addressed soon.
>
>                 Our plan is to also support renaming of the
>                 chromosomes of BSgenome
>                 and SNPlocs objects very soon.
>
>                 Also, an additional level of convenience will be
>                 provided via the
>                 seqnameStyle() getter and setter, so you'll be able to
>                 quickly rename
>                 with something like:
>
>                    seqnameStyle(x) <- "UCSC"
>
>                 or
>
>                    seqnameStyle(vcf) <- seqnameStyle(txdb) <-
>                 seqnameStyle(genome)
>
>                 This will work on almost any 'x' object that contains
>                 chromosome
>                 names (GRanges, GRangesList, GappedAlignments,
>                 TranscriptDb, VCF,
>                 BSgenome, SNPlocs, etc...)
>
>                 Cheers,
>                 H.
>
>
>
>
>
>                   Thanks,
>
>                 Rebecca
>
>                          [[alternative HTML version deleted]]
>
>                 ______________________________**_________________
>                 Bioconductor mailing
>                 listBioconductor@...://stat.ethz.ch/mailman/**listinfo/bioconductor
>                 <http://stat.ethz.ch/mailman/**listinfo/bioconductor><https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                 <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>
>                 Search the archives: http://news.gmane.org/gmane.**
>                 science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>                 <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>                   --
>                 Hervé Pagès
>
>                 Program in Computational Biology
>                 Division of Public Health Sciences
>                 Fred Hutchinson Cancer Research Center
>                 1100 Fairview Ave. N, M1-B514
>                 P.O. Box 19024
>                 Seattle, WA 98109-1024
>
>                 E-mail: hpages@... <mailto:hpages@...>
>                 Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>                 Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>                         [[alternative HTML version deleted]]
>
>
>
>                 _______________________________________________
>                 Bioconductor mailing
>                 listBioconductor@...://stat.ethz.ch/mailman/listinfo/bioconductor
>                 <http://stat.ethz.ch/mailman/listinfo/bioconductor>
>
>                 Search the archives:
>                 http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>
>                 _______________________________________________
>                 Bioconductor mailing list
>                 Bioconductor@...
>                 <mailto:Bioconductor@...>
>                 https://stat.ethz.ch/mailman/listinfo/bioconductor
>                 Search the archives:
>                 http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>         -- 
>         Hervé Pagès
>
>         Program in Computational Biology
>         Division of Public Health Sciences
>         Fred Hutchinson Cancer Research Center
>         1100 Fairview Ave. N, M1-B514
>         P.O. Box 19024
>         Seattle, WA 98109-1024
>
>         E-mail: hpages@... <mailto:hpages@...>
>         Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Gmane