Ann Loraine | 8 May 2012 01:04
Picon

Re: edgeR outlier question

Greetings,

I am seeing the same kind of thing with a data set in my lab, as well.

In fact, a few of the most significant genes (smallest p values) in my
data set are in this group - huge numbers for one sample and small for
all the rest.

Best,

Ann

On Mon, May 7, 2012 at 3:19 PM, Simon Melov
<smelov@...> wrote:
> I have a reasonable RNASeq data set of 10 biological replicates of a control group versus 10 biological
replicates experimental I've gone through the edgeR workflow, and get a nice list of about 1000 genes
differentially expressed due to the experimental manipulation. I input the data based on total reads per
gene (I'd like to get to exons too, but first things first). The data is obtained via a paired end strategy,
so its pretty good quality. The number of reads per sample (library) is about 10 million reads each. My
question is, as I go through list of significant genes which are differentially expressed between the two
groups  (normalized via the workflow), ranked by BH FDR down to 0.05, I see genes being judged as
differentially expressed which have very low expression in most samples, yet are thrown off by 1 or 2
values, thereby achieving statistical significance. For example, a gene might have between 1 and 2
counts per million reads in one group, and be basically the !
>  same in the other group, but one of the values is perhaps at a 1000 or so counts, which seems to throw off the
entire group, thereby becoming "significant".
>
> Shouldn't edgeR take into account this sort of biological variation within a group and account for it in
assessing significance? Its clear that in the above example, that sample is an outlier, and therefore the
variance is so high, so it shouldn't be ranked as being differentially expressed. I filtered the data by
applying the criteria of at least 1 count per sample, and I have to have at least 8 samples per group which
have this. Should there be an additional filtering criteria to exclude these outliers? or doesn't edgeR
take into account this sort of situation (I thought it did).
>
> Am I doing something wrong here?
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane