Fatemehsadat Seyednasrollah | 13 Oct 09:25 2012
Picon
Picon

Re: BSgenome or org.Hs.eg.db to find gene length

Hi,

First sorry that I did not mention my ultimate intend. I am doing some research to find the effect of data
filtering on the number of differentially expressed genes. For this purpose I apply different filtering
using R package. Now I wanted to create an annotation file which keeps some features of genes of my RNA seq
dataset to use when it is necessary to have gene annotation to find differentially expressed genes. For
example I saw that if I want to use NOISeq to find the DE genes I need to have the genes annotation as well.

With many thanks and best regards,
Fatemeh
________________________________
From: Tim Triche, Jr. [tim.triche@...]
Sent: Friday, October 12, 2012 8:46 PM
To: Marc Carlson
Cc: Fatemehsadat Seyednasrollah; bioconductor@...
Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length

once upon a time there was a SpliceGraph package that aimed to resolve some of these questions

anyone know if Mr. Bindreither is doing OK?  As of this AM it could not be built.  But it might resolve deeper
questions like Fatemehsadat's

my $0.02 (adjusted for rampant inflation)

--t

On Fri, Oct 12, 2012 at 10:12 AM, Marc Carlson
<mcarlson@...<mailto:mcarlson@...>> wrote:
Hi Fatemehsadat,

Lets keep this on the list.  We almost always want to keep the thread public so that others can benefit from our
conversations.  And also, I am not really sure how to answer your question (it's not a simple question), and
others may have suggestions.  You can't get their input if you only speak with me.

Really though, your question about how to choose really depends on context that you have not provided us
here.  What is it that you want to know?  I mentioned some strategies in my earlier post.  For some cases the
longest transcript may be what you want, for others you may want the maximum range that a transcript can
cover, for other cases, you may want to "buffer" that region by adding to it.  For yet other cases you may not
care about the range at all and may only want to call unique on the result.  But I can't give even an opinion
without knowing more about what you are trying to do.

  Marc

On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote:
Hi,
Thank you so much. It was great using the package from the point of diversity of available features. Now I was
wondering I can use the result of my query as an annotation file for other R packages as well.
Just I wanted to know your opinion about how to decide which isofrom should I decide to choose for my
annotation file.
Imagine I need an annotation file with row names of gene symbols for example for the first symbol I have :

   SYMBOL  TXSTART    TXEND length
1   A1BG 58858172 58864865   6693
2   A1BG 58859832 58874214  14382

and so many other duplicated gene symbols. How do you decide which isoform to choose for having a unique
annotation file of gene symbols.

Thank you again.
________________________________________
From:
bioconductor-bounces@...<mailto:bioconductor-bounces@...>
[bioconductor-bounces@...<mailto:bioconductor-bounces@...>]
on behalf of Marc Carlson [mcarlson@...<mailto:mcarlson@...>]
Sent: Friday, October 12, 2012 1:18 AM
To: Michael Lawrence
Cc: bioconductor@...<mailto:bioconductor@...>
Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length

Oh sorry I missed that little detail about using gene symbols.

Here is how you would do it when you need to query by gene symbol:

library(Homo.sapiens)
cols(Homo.sapiens) ## shows cols you could use
keytypes(Homo.sapiens) ## shows keytypes
k<- keys(Homo.sapiens,keytype="SYMBOL")  ## discovers all available
keys of this kind
result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","TXEND",
"TXSTRAND"), keytype="SYMBOL")

The plan to support transcriptsBy etc for OrganismDbi is still just a
plan.  But we don't intend for it to remain a "plan" forever.

    Marc

On 10/11/2012 01:58 PM, Michael Lawrence wrote:
It's definitely a step in the right direction. A small next step would
be supporting queries based on gene symbols, as the OP had asked
about. Sure, one could do a transcriptsBy() on the TxDb package and
subset, but that means it has to be by="gene", and it's slower. Also,
has there been any progress towards supporting transcriptsBy on the
OrganismDbi package?

Michael

On Thu, Oct 11, 2012 at 1:46 PM, Marc
Carlson<mcarlson@...<mailto:mcarlson@...>
<mailto:mcarlson@...<mailto:mcarlson@...>>>
 wrote:

     Yes,

     Sorry about the lack of memos.  ;)  OrganismDbi is a new package
     that allows you to make meta packages from annotation packages
     that implement a select() method.  Homo.sapiens is one we made for
     humans.  It combines the human org package, the hg19 txdb known
     gene package and the GO.db package.  The package does not actually
     "contain" all of that data though.  It just retrieves it as
     requested and returns it to users as if there was a single place
     it was all coming from.

       Marc

     On 10/11/2012 12:33 PM, Steve Lianoglou wrote:

         On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche,

Jr.<tim.triche@...<mailto:tim.triche@...><mailto:tim.triche@...<mailto:tim.triche@...>>>
  wrote:

             OrganismDbi -- too many of us are used to doing things the
             confusing way --
             using OrganismDbi packages like Homo.sapiens will be
             better long-term

         Cool ... I like being less confused.

         Thanks for the pointer,
         -steve

     _______________________________________________
     Bioconductor mailing list
     Bioconductor@...<mailto:Bioconductor@...><mailto:Bioconductor@...<mailto:Bioconductor@...>>
     https://stat.ethz.ch/mailman/listinfo/bioconductor
     Search the archives:
     http://news.gmane.org/gmane.science.biology.informatics.conductor

         [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...<mailto:Bioconductor@...>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor@...<mailto:Bioconductor@...>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
A model is a lie that helps you see the truth.

Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane