14 Jul 05:37 2007

## Re: Limma, SAM and Fold Change Calculation

At 01:53 AM 14/07/2007, Peter White wrote: >Dear Holger and Gordon, > >Thanks a lot for your responses - that is very helpful. It is likely that >both ways of calculating fold change are correct, but I agree with Gordon >that it is most appropriate to report the fold change that was used to >generate the significance statistic. > >I was also thinking I could take the fold changes calculated by both methods >and go with the largest (arrays seem to underestimate the magnitude of the >fold change), but this would have the issue that merging them would not >maintain the same confidence. In my opinion, mixing the two definitions would be truly horrible. It would be best to be consistent, even if you stick to SAM. In general, you cannot take two different statistical techniques and play them off one against the other on an item by item basis, choosing whichever result you like best, and still expect sensible results. > Ultimately going with the fold change used for >the statistic will be the simplest approach. See my email to Holger for an example showing that the SAM fold change can even be in a different direction to the statistical significance. Another potential confusions with the SAM fold change definition are that it doesn't generalise to linear models or to two colour arrays (because you can't compute it from log-ratios). Best wishes Gordon >Thanks again for your responses. > >Best wishes, > >Peter > >-----Original Message----- >From: Holger Schwender [mailto:holger.schw@...] >Sent: Friday, July 13, 2007 10:56 AM >To: smyth@...; pwhite@... >Cc: bioconductor@... >Subject: Re: [BioC] Limma, SAM and Fold Change Calculation > >Hi Peter, hi Gordon, > >the reason for the fold change computation in SAM is simply that Tusher et >al. also use these fold changes. In 2002, SAM was part of my diploma thesis >and I tried to reproduce the results of Tusher et al. and the results that >are provided by the Excel SAM version. Since then many things in sam have >changed, mainly to make sam faster, more user-friendly and more flexible. >But I didn't change the computation of the fold change to actually avoid the >question why the fold changes differ between the siggenes SAM and the Excel >SAM. > >Admittedly, "because Tusher et al. do so" is not a very good reason, but I >have stopped counting how often I was ask why this and that differs between >Excel SAM and siggenes SAM. > >Would it help you if I add another slot to the SAM object that contains the >numerator of the test statistics? Note that you can compute these numerators >by > > > sam.out <at> d * (sam.out <at> s0 + sam.out <at> s) > >if sam.out is the output of sam. > >Best, >Holger > > > > >-------- Original-Nachricht -------- >Datum: Fri, 13 Jul 2007 22:44:12 +1000 (EST) >Von: "Gordon K Smyth" <smyth@...> >An: "Peter White" <pwhite@...> >CC: bioconductor@... >Betreff: [BioC] Limma, SAM and Fold Change Calculation > > > Dear Peter, > > > > Since I'm the limma author, it will come as no surprise to you that I view > > the limma fold change > > (diff of log-intensities) as the more natural, consistent and useful. > > > > I don't know why SAM computes fold change as it does. It seems to me > > somewhat inconsistent, > > because SAM uses the limma log-fold-change as numerator in its own test > > statistic for computing > > statistical significant. So SAM reports a different fold change measure > > to that which it uses > > internally for determining statistical significance. > > > > Best wishes > > Gordon > > > > > Date: Thu, 12 Jul 2007 10:26:54 -0400 > > > From: "Peter White" <pwhite@...> > > > Subject: [BioC] Limma, SAM and Fold Change Calculation > > > To: <bioconductor@...> > > > > > > I wondered if anyone could comment on what is generally considered the > > most > > > appropriate method of calculating fold change values from Affy data. I > > have > > > a data set from a test vs. control experiment (n=3 in each group) > > performed > > > on the MOE430_2 array. I used the Affy package to read in the CEL file > > data > > > and GCRMA to normalize. Finally I used limma to analyze (lmFit and > > eBayes). > > > Now the FC coefficients I get back from limma appear to be > > log2(2^(mean(test > > > replicates in log2))/(2^mean(control replicates in log2)). For example: > > > > > >> exprs(eset)[18731,1:6] > > > TA2.CEL TA5.CEL TA7.CEL TA1.CEL TA8.CEL TA9.CEL > > > 3.294215 4.851795 4.403851 4.782934 7.716893 9.284909 > > > > > >> fit$coefficients[18731,1:2] #returns the mean of the above log2 values > > > Fed_MT Fed_WT > > > 4.183287 7.261578 > > > > > >> fit2$coefficients[18731,1] > > > [1] -3.078291 > > > > > >> -1/2^fit2$coefficients[18731,1] > > > [1] -8.446136 > > > > > > So we have a probe that is reporting a -8.4 fold downregulation. > > > > > > The problem is that SAM does not calculate the fold change values in the > > > same manner. It appears to take the average of the unlogged data and > > then > > > use those values to calculate fold changes. Thus: > > > > > >> 2^exprs(eset)[18731,1:6] > > > TA2.CEL TA5.CEL TA7.CEL TA1.CEL TA8.CEL TA9.CEL > > > 9.809738 28.875927 21.168555 27.530016 210.385700 623.786694 > > > > > >> tmp <- c(mean(2^exprs(eset)[18731,1:3]),mean(2^exprs(eset)[18731,4:6])) > > >> tmp > > > [1] 19.95141 287.23414 > > > > > >> tmp[1]/tmp[2] > > > [1] 0.06946043 > > > > > >> -1/(tmp[1]/tmp[2]) > > > [1] -14.39669 > > > > > > So now we have -14.4 fold downregulation. > > > > > > The example given is of course an extreme, but using the limma method > > there > > > are 282 probes with a fold change >2, while using the sam method there > > are > > > only 159, with 141 probes in common. The probes that were called > > uniquely by > > > limma were almost all downregulated (-2 to -5 fold). > > > > > > MY QUESTION IS WHICH METHOD IS THE CORRECT WAY? > > > > > > Thanks, > > > > > > Peter > > > > > > Peter White, Ph.D. > > > Technical Director > > > Functional Genomics Core > > > Department of Genetics > > > University of Pennsylvania > > > 570 Clinical Research Building > > > 415 Curie Boulevard > > > Philadelphia, PA 19104-6145 > > > ? > > > Tel: +1 (215) 898-0773 > > > Fax: +1 (215) 573-2326 > > > E-mail: pwhite@... > > > ? > > > http://www.med.upenn.edu/pdc/cores_fgc.html > > > http://www.med.upenn.edu/kaestnerlab/ > > > http://www.betacell.org/microarrays > > > http://www.cbil.upenn.edu/EPConDB/ _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor