Tim Triche, Jr. | 13 Oct 18:47 2012
Picon

Re: BSgenome or org.Hs.eg.db to find gene length

right so your question actually has sub-questions to it:

1) what's a gene?
2) does the performance of these packages differ depending on how you
answer (1) and if so how?

For example, DEXSeq is answering a different superficial question than
edgeR, but you could in principle use edgeR to answer the same questions.
 What happens if you do the book-keeping yourself and play with different
dispersion estimators (shrunken or not) at the gene level?  at the exon
level?  isoform level?  and how do you decide which one is the appropriate
level for your analysis?  And how do you decide what is the best way to
present the results?

Every day there are more shrinkage estimators for biological and
technical/shot-noise dispersion estimates (a mean shift is a useless
estimate for testing unless you have a good estimate of the dispersion
within groups -- think "point intensity" if there is any doubt of this) and
every day the reads get longer.  The question you're asking is not a simple
one and a reasoned answer will benefit an awful lot of people :-)

good luck,

--t

On Sat, Oct 13, 2012 at 12:25 AM, Fatemehsadat Seyednasrollah <fatsey@...
> wrote:

>  Hi,
>
> First sorry that I did not mention my ultimate intend. I am doing some
> research to find the effect of data filtering on the number of
> differentially expressed genes. For this purpose I apply different
> filtering using R package. Now I wanted to create an annotation file which
> keeps some features of genes of my RNA seq dataset to use when it is
> necessary to have gene annotation to find differentially expressed genes.
> For example I saw that if I want to use NOISeq to find the DE genes I need
> to have the genes annotation as well.
>
> With many thanks and best regards,
> Fatemeh
>  ------------------------------
> *From:* Tim Triche, Jr. [tim.triche@...]
> *Sent:* Friday, October 12, 2012 8:46 PM
> *To:* Marc Carlson
> *Cc:* Fatemehsadat Seyednasrollah; bioconductor@...
>
> *Subject:* Re: [BioC] BSgenome or org.Hs.eg.db to find gene length
>
>  once upon a time there was a SpliceGraph package that aimed to resolve
> some of these questions
>
>  anyone know if Mr. Bindreither is doing OK?  As of this AM it could not
> be built.  But it might resolve deeper questions like Fatemehsadat's
>
>  my $0.02 (adjusted for rampant inflation)
>
>  --t
>
>
> On Fri, Oct 12, 2012 at 10:12 AM, Marc Carlson <mcarlson@...> wrote:
>
>> Hi Fatemehsadat,
>>
>> Lets keep this on the list.  We almost always want to keep the thread
>> public so that others can benefit from our conversations.  And also, I am
>> not really sure how to answer your question (it's not a simple question),
>> and others may have suggestions.  You can't get their input if you only
>> speak with me.
>>
>> Really though, your question about how to choose really depends on
>> context that you have not provided us here.  What is it that you want to
>> know?  I mentioned some strategies in my earlier post.  For some cases the
>> longest transcript may be what you want, for others you may want the
>> maximum range that a transcript can cover, for other cases, you may want to
>> "buffer" that region by adding to it.  For yet other cases you may not care
>> about the range at all and may only want to call unique on the result.  But
>> I can't give even an opinion without knowing more about what you are trying
>> to do.
>>
>>
>>   Marc
>>
>>
>> On 10/12/2012 06:15 AM, Fatemehsadat Seyednasrollah wrote:
>>
>>> Hi,
>>> Thank you so much. It was great using the package from the point of
>>> diversity of available features. Now I was wondering I can use the result
>>> of my query as an annotation file for other R packages as well.
>>> Just I wanted to know your opinion about how to decide which isofrom
>>> should I decide to choose for my annotation file.
>>> Imagine I need an annotation file with row names of gene symbols for
>>> example for the first symbol I have :
>>>
>>>    SYMBOL  TXSTART    TXEND length
>>> 1   A1BG 58858172 58864865   6693
>>> 2   A1BG 58859832 58874214  14382
>>>
>>> and so many other duplicated gene symbols. How do you decide which
>>> isoform to choose for having a unique annotation file of gene symbols.
>>>
>>> Thank you again.
>>> ______________________________**__________
>>> From: bioconductor-bounces <at> r-**project.org<bioconductor-bounces@...>[
>>> bioconductor-bounces <at> r-**project.org<bioconductor-bounces@...>]
>>> on behalf of Marc Carlson [mcarlson@...]
>>> Sent: Friday, October 12, 2012 1:18 AM
>>> To: Michael Lawrence
>>> Cc: bioconductor@...
>>> Subject: Re: [BioC] BSgenome or org.Hs.eg.db to find gene length
>>>
>>>
>>> Oh sorry I missed that little detail about using gene symbols.
>>>
>>> Here is how you would do it when you need to query by gene symbol:
>>>
>>> library(Homo.sapiens)
>>> cols(Homo.sapiens) ## shows cols you could use
>>> keytypes(Homo.sapiens) ## shows keytypes
>>> k<- keys(Homo.sapiens,keytype="**SYMBOL")  ## discovers all available
>>> keys of this kind
>>> result<- select(Homo.sapiens, k, cols=c("TXNAME","TXSTART","**TXEND",
>>> "TXSTRAND"), keytype="SYMBOL")
>>>
>>>
>>> The plan to support transcriptsBy etc for OrganismDbi is still just a
>>> plan.  But we don't intend for it to remain a "plan" forever.
>>>
>>>
>>>     Marc
>>>
>>>
>>>
>>>
>>>
>>> On 10/11/2012 01:58 PM, Michael Lawrence wrote:
>>>
>>>> It's definitely a step in the right direction. A small next step would
>>>> be supporting queries based on gene symbols, as the OP had asked
>>>> about. Sure, one could do a transcriptsBy() on the TxDb package and
>>>> subset, but that means it has to be by="gene", and it's slower. Also,
>>>> has there been any progress towards supporting transcriptsBy on the
>>>> OrganismDbi package?
>>>>
>>>> Michael
>>>>
>>>> On Thu, Oct 11, 2012 at 1:46 PM, Marc Carlson<mcarlson@...
>>>> <mailto:mcarlson@...>>  wrote:
>>>>
>>>>      Yes,
>>>>
>>>>      Sorry about the lack of memos.  ;)  OrganismDbi is a new package
>>>>      that allows you to make meta packages from annotation packages
>>>>      that implement a select() method.  Homo.sapiens is one we made for
>>>>      humans.  It combines the human org package, the hg19 txdb known
>>>>      gene package and the GO.db package.  The package does not actually
>>>>      "contain" all of that data though.  It just retrieves it as
>>>>      requested and returns it to users as if there was a single place
>>>>      it was all coming from.
>>>>
>>>>        Marc
>>>>
>>>>
>>>>
>>>>
>>>>      On 10/11/2012 12:33 PM, Steve Lianoglou wrote:
>>>>
>>>>          On Thu, Oct 11, 2012 at 2:54 PM, Tim Triche,
>>>>          Jr.<tim.triche@...<**mailto:tim.triche@...>>
>>>> wrote:
>>>>
>>>>              OrganismDbi -- too many of us are used to doing things the
>>>>              confusing way --
>>>>              using OrganismDbi packages like Homo.sapiens will be
>>>>              better long-term
>>>>
>>>>          Cool ... I like being less confused.
>>>>
>>>>          Thanks for the pointer,
>>>>          -steve
>>>>
>>>>
>>>>      ______________________________**_________________
>>>>      Bioconductor mailing list
>>>>      Bioconductor@...<**mailto:Bioconductor <at> r-project.**org<Bioconductor@...>
>>>> >
>>>>      https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>>      Search the archives:
>>>>      http://news.gmane.org/gmane.**science.biology.informatics.**
>>>> conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor@...
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@...
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>
>
>
>  --
> *A model is a lie that helps you see the truth.*
> *
> *
> Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>
>

--

-- 
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane