Andrew Yee | 1 Sep 2010 16:38
Picon
Favicon
Gravatar

Re: revisiting genomic coordinates to gene

Thank you very much for your suggestions!

Thanks,
Andrew

On Wed, Sep 1, 2010 at 3:53 AM, Vincent Carey <stvjc@...>wrote:

> There are many possible approaches and possible pitfalls.  Surely the
> following is relevant:
>
> > get("CTNNB1", revmap(org.Hs.egSYMBOL))
> [1] "1499"
>
> > get("1499", org.Hs.egCHRLOC)
>       3
> 41240941
> > get("1499", org.Hs.egCHRLOCEND)
>       3
> 41281939
>
> Your location lies within these limits.  You could do this more
> systematically by defining a collection
> of Entrez Gene IDs and building an IRanges or GRanges instance that
> stores all the "gene boundary"
> information for these IDs.  You will have to attend to signs and
> multiplicities, and to build versions.
>
> The GenomicFeatures makeTranscriptDb* facilities are potentially
> useful when one is interested in
> transcribed or exonic regions specifically.  In the following, tx.3 is
> an extract from the result of
> makeTranscriptDbFromUCSC("hg18"):
>
> > get("1499", org.Hs.egUCSCKG)
> [1] "uc003ckp.2" "uc003ckq.2" "uc003ckr.2" "uc003cks.2" "uc003ckt.1"
> [6] "uc010hia.1" "uc011azf.1" "uc011azg.1"
> > tx.3[ elementMetadata(tx.3)$tx_name %in% .Last.value, ]
> GRanges with 6 ranges and 2 elementMetadata values
>    seqnames               ranges strand |     tx_id     tx_name
>       <Rle>            <IRanges>  <Rle> | <integer> <character>
> [1]     chr3 [41211405, 41255849]      + |     11545  uc010hia.1
> [2]     chr3 [41215946, 41256943]      + |     11546  uc003ckp.2
> [3]     chr3 [41215946, 41256943]      + |     11547  uc003ckq.2
> [4]     chr3 [41215946, 41256943]      + |     11548  uc003ckr.2
> [5]     chr3 [41249904, 41253941]      + |     11550  uc003cks.2
> [6]     chr3 [41252167, 41253962]      + |     11551  uc003ckt.1
>
> seqlengths
>          chr1   chr1_random         chr10 ...   chrX_random          chrY
>     247249719       1663265     135374737 ...       1719168      57772954
>
> and there are undoubtedly ways to use biomaRt to address your concern.
>
> Perhaps the following is also of interest:
>
> > findOverlaps(IRanges(start=41266083,width=1), ranges(tx.3))
> An object of class "RangesMatching"
> Slot "matchMatrix":
>     query subject
> [1,]     1    2080
> [2,]     1    2081
>
> Slot "DIM":
> [1]    1 3528
>
> > tx.3[2080:2081,]
> GRanges with 2 ranges and 2 elementMetadata values
>    seqnames               ranges strand |     tx_id     tx_name
>       <Rle>            <IRanges>  <Rle> | <integer> <character>
> [1]     chr3 [41263094, 41294629]      - |     11552  uc003cku.2
> [2]     chr3 [41263094, 41978664]      - |     11553  uc003ckv.2
>
> seqlengths
>          chr1   chr1_random         chr10 ...   chrX_random          chrY
>     247249719       1663265     135374737 ...       1719168      57772954
>
> So it seems your location is in a region that is said to be
> transcribed.  I could
> not find an Entrez Gene ID associated with the "known gene" tx_name values
> just above.
>
> > sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-06-30 r52417)
> Platform: x86_64-apple-darwin10.3.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  tools     utils     methods
> [8] base
>
> other attached packages:
>  [1] org.Hs.eg.db_2.4.1     RSQLite_0.9-1          DBI_0.2-5
>  [4] AnnotationDbi_1.11.1   Biobase_2.9.0          GenomicFeatures_1.1.11
>  [7] GenomicRanges_1.1.15   IRanges_1.7.32         weaver_1.15.0
> [10] codetools_0.2-2        digest_0.4.2
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.17.5    Biostrings_2.17.26 RCurl_1.4-2        XML_3.1-0
> [5] biomaRt_2.5.1      rtracklayer_1.9.3
>
>
> On Tue, Aug 31, 2010 at 11:43 PM, Andrew Yee <yee@...> wrote:
> > I'm interested in converting genomic coordinates to gene names, with
> > potential use of the org.Hs.eg.db library, e.g. converting
> chr3:41,266,083
> > to CTNNB1.
> >
> > I know that this topic has been addressed before, see e.g.:
> >
> > https://stat.ethz.ch/pipermail/bioconductor/2009-January/025906.html(discusses
> > use of overlap in IRanges)
> > https://stat.ethz.ch/pipermail/bioconductor/2009-October/030140.html
> >
> > I was wondering if there have been any new solutions or new packages that
> > address this problem since these threads.
> >
> > Thanks,
> > Andrew
> >
> >        [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor@...
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane