Aaron Mackey | 20 Aug 14:36 2013

Re: RSEM for transcript and gene level read count and edgeR differential expression analysis

On Mon, Aug 19, 2013 at 4:44 PM, Steve Lianoglou

> You'll find that RSEM output doesn't play well with edgeR and DESeq.
> These methods explicitly require *count* data -- not any old number
> that has been then rounded to an integer.

I've never actually understood/bought-into this line of reasoning.  RSEM
values are not "any old number", but are just a special form of counts; if
you sum all the values, you get the total mapped read count.  If you'd like
to split hairs, they are weighted counts that are represented succinctly
(perhaps even sufficiently?) by multiplying each count by its weight,
rather than supplying the individual raw counts and their weights as
separate vectors.  I don't think anyone would argue against our numerical
ability to include weighted data in any GLM, so why so much fuss?  Since
edgeR/DESeq/voom/etc do not themselves handle the ambiguity of read mapping
(cf. any Cufflinks paper decrying this fact), it seems like RSEM +
edgeR/DESeq/voom/etc is your only analytical option for ANOVA-type
hypothesis testing that attempts to take read mapping uncertainty into
account (albeit without integrating that uncertainty in the model itself,
as cuffdiff/eXpress do).  This all seems squarely in the realm of "all
models are wrong, but some are useful" -- we're just making choices about
what kind of "wrongness" we're more willing to tolerate (rounding RSEM
weighted counts: perhaps distributional wrongness; taking raw read counts
with no uncertainty: certain error/variance wrongness).


	[[alternative HTML version deleted]]

Bioconductor mailing list
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor