3 Feb 2013 14:08
Re: LogFC query in Limma
Dear Gordon, Thanks, I could complete the analysis. Roopa On Sat, Feb 2, 2013 at 7:04 PM, Gordon K Smyth <smyth@...> wrote: > Dear Roopa, > > Perhaps you have just misread the limma output. This thread has been > mainly concerned with your statement that you found ~20,000 genes to be > differentially expressed. But the limma output shows 984 up-regulated and > 1187 down-regulated genes, in other words about 2k total rather than 20k. > > Best wishes > Gordon > > Date: Fri, 1 Feb 2013 11:05:36 -0500 >> From: Roopa Subbaiaih <rss115@...> >> To: "James W. MacDonald" <jmacdon@...> >> Cc: bioconductor@... >> Subject: Re: [BioC] LogFC query in Limma >> >> Hi James, >> >> Am I still making any silly mistake over here. I am comparing normal (21) >> with lesionous patient samples (34). This is what I get in R console. Is >> there a problem with the script? >> >> Any advice would greatly help for further analysis.Thanks, Roopa >> >> getwd() >>> >> [1] "C:/Documents and Settings/rsubbaiaih/My Documents" >> >>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a") >>> ls() >>> >> character(0) >> >>> #-----------------------------**------------------# >>> library(affy) >>> eset = justRMA() >>> >> Loading required package: AnnotationDbi >> >> eset >>> >> ExpressionSet (storageMode: lockedEnvironment) >> assayData: 54675 features, 55 samples >> element names: exprs, se.exprs >> protocolData >> sampleNames: GSM372286.CEL GSM372287.CEL ... >> GSM372367.CEL (55 total) >> varLabels: ScanDate >> varMetadata: labelDescription >> phenoData >> sampleNames: GSM372286.CEL GSM372287.CEL ... >> GSM372367.CEL (55 total) >> varLabels: sample >> varMetadata: labelDescription >> featureData: none >> experimentData: use 'experimentData(object)' >> Annotation: hgu133plus2 >> >>> f <- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1, >>> >> + >> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2), >> labels=c("Healthy", "affected")) >> >>> design <- model.matrix(~ 0 + f) >>> design >>> >> fHealthy faffected >> 1 1 0 >> 2 1 0 >> 3 1 0 >> 4 1 0 >> 5 1 0 >> 6 1 0 >> 7 1 0 >> 8 1 0 >> 9 1 0 >> 10 1 0 >> 11 1 0 >> 12 1 0 >> 13 1 0 >> 14 1 0 >> 15 1 0 >> 16 1 0 >> 17 1 0 >> 18 1 0 >> 19 1 0 >> 20 1 0 >> 21 1 0 >> 22 0 1 >> 23 0 1 >> 24 0 1 >> 25 0 1 >> 26 0 1 >> 27 0 1 >> 28 0 1 >> 29 0 1 >> 30 0 1 >> 31 0 1 >> 32 0 1 >> 33 0 1 >> 34 0 1 >> 35 0 1 >> 36 0 1 >> 37 0 1 >> 38 0 1 >> 39 0 1 >> 40 0 1 >> 41 0 1 >> 42 0 1 >> 43 0 1 >> 44 0 1 >> 45 0 1 >> 46 0 1 >> 47 0 1 >> 48 0 1 >> 49 0 1 >> 50 0 1 >> 51 0 1 >> 52 0 1 >> 53 0 1 >> 54 0 1 >> 55 0 1 >> attr(,"assign") >> [1] 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$f >> [1] "contr.treatment" >> >> colnames(design) <-c("Healthy","affected") >>> library(limma) >>> fit <- lmFit(eset, design) >>> fit >>> >> An object of class "MArrayLM" >> $coefficients >> Healthy affected >> 1007_s_at 10.081309 9.548286 >> 1053_at 6.807501 7.482849 >> 117_at 5.969921 6.147594 >> 121_at 7.403842 7.733666 >> 1255_g_at 2.804475 2.827041 >> 54670 more rows ... >> >> $rank >> [1] 2 >> >> $assign >> [1] 1 1 >> >> $qr >> $qr >> Healthy affected >> 1 -4.5825757 0.000000 >> 2 0.2182179 -5.830952 >> 3 0.2182179 0.000000 >> 4 0.2182179 0.000000 >> 5 0.2182179 0.000000 >> 50 more rows ... >> >> $qraux >> [1] 1.218218 1.000000 >> >> $pivot >> [1] 1 2 >> >> $tol >> [1] 1e-07 >> >> $rank >> [1] 2 >> >> >> $df.residual >> [1] 53 53 53 53 53 >> 54670 more elements ... >> >> $sigma >> 1007_s_at 1053_at 117_at 121_at 1255_g_at >> 0.2866489 0.4801618 0.3499880 0.2332555 0.1053397 >> 54670 more elements ... >> >> $cov.coefficients >> Healthy affected >> Healthy 0.04761905 0.00000000 >> affected 0.00000000 0.02941176 >> >> $stdev.unscaled >> Healthy affected >> 1007_s_at 0.2182179 0.1714986 >> 1053_at 0.2182179 0.1714986 >> 117_at 0.2182179 0.1714986 >> 121_at 0.2182179 0.1714986 >> 1255_g_at 0.2182179 0.1714986 >> 54670 more rows ... >> >> $pivot >> [1] 1 2 >> >> $genes >> [1] "1007_s_at" "1053_at" "117_at" "121_at" >> [5] "1255_g_at" >> 54670 more rows ... >> >> $Amean >> 1007_s_at 1053_at 117_at 121_at 1255_g_at >> 9.751804 7.224988 6.079756 7.607733 2.818425 >> 54670 more elements ... >> >> $method >> [1] "ls" >> >> $design >> Healthy affected >> 1 1 0 >> 2 1 0 >> 3 1 0 >> 4 1 0 >> 5 1 0 >> 50 more rows ... >> >> contrast.matrix <-makeContrasts(affected-**Healthy,levels = design) >>> fit2 <- contrasts.fit(fit, contrast.matrix) >>> fit2 <- eBayes(fit2) >>> results <- decideTests(fit2, adjust="fdr",lfc=1) >>> summary(results) >>> >> affected - Healthy >> -1 1187 >> 0 52504 >> 1 984 >> >>> results >>> >> TestResults matrix >> 1007_s_at 1053_at 117_at 121_at 1255_g_at >> 0 0 0 0 0 >> 54670 more rows ... >> >>> >>> >> >> On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon@...> >> wrote: >> >> If you just use the expression values from the original authors, I get >>> just under 9K probesets for this comparison at an FDR of 0.05 and no fold >>> change criterion. It drops to just under 900 with a 2-fold difference >>> added >>> in. >>> >>> So yeah, seems like a lot to me as well. >>> >>> Best, >>> >>> Jim >>> >>> >>> On 1/31/2013 4:02 PM, Steve Lianoglou wrote: >>> >>> ... what Jim said. >>>> >>>> But also, this 20k differentially expressed (likely probe sets, not >>>> genes) is raising a red flag for me, no? Am I alone here? >>>> >>>> That's .. what's the word I'm looking for ... "a lot". >>>> >>>> -steve >>>> >>>> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon@...> >>>> wrote: >>>> >>>> Hi Roopa, >>>>> >>>>> >>>>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote: >>>>> >>>>> Hi Steve, >>>>>> >>>>>> This was the script I used- >>>>>> getwd() >>>>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a") >>>>>> ls() >>>>>> #-----------------------------****------------------# >>>>>> library(affy) >>>>>> eset = justRMA() >>>>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,****1,1,1,1,1,1,1,1,1,1,1, >>>>>> >>>>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,****2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,** >>>>>> 2,2,2,2), >>>>>> labels=c("Healthy", "unaffected")) >>>>>> design<- model.matrix(~ 0 + f) >>>>>> design >>>>>> colnames(design)<-c("Healthy",****"unaffected") >>>>>> design >>>>>> library(limma) >>>>>> fit<- lmFit(eset, design) >>>>>> library(hgu133plus2.db) >>>>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"****hgu133plus2.db") >>>>>> contrast.matrix<-****makeContrasts(affected-****Healthy,levels = >>>>>> design) >>>>>> fit2<- contrasts.fit(fit, contrast.matrix) >>>>>> fit2<- eBayes(fit2) >>>>>> topTable(fit2,coef=1,p=0.05, adjust="fdr") >>>>>> results<- decideTests(fit2, adjust="fdr", p=0.05) >>>>>> summary(results) >>>>>> write.table(results,file="****myresults.txt") >>>>>> write.fit(). >>>>>> >>>>>> I had identified ~54,000 genes of which ~ 20K were differentially >>>>>> expressed. >>>>>> >>>>>> But when I use these genes for pathway analysis the software asks for >>>>>> fold >>>>>> change values but not p value so it is easier to analyze the data. >>>>>> >>>>>> What I did was - I used the differentially expressed gene table for >>>>>> further >>>>>> analysis. That is I converted logFC values to FC(test/control) >>>>>> assuming >>>>>> that >>>>>> >>>>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values. >>>>>> >>>>>> Once I got test/control values I converted them to fold changes using >>>>>> "IF" >>>>>> function in excel sheet to eliminate genes with fold changes between >>>>>> -2 >>>>>> to >>>>>> +2. >>>>>> >>>>>> Once I did this the number of significant genes drastically reduced >>>>>> to ~ >>>>>> 2,000. >>>>>> >>>>>> Is this the right method? >>>>>> >>>>>> >>>>> No. Note that the range of fold changes after 'unlogging' will be >>>>> 0-INF, >>>>> and >>>>> the down-regulated genes will be in the range 0-1 whereas the >>>>> upregulated >>>>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas >>>>> 2 >>>>> fold down will be 1/2 or 0.5). >>>>> >>>>> The easiest way to filter is to keep the logFC and filter on -1 and 1. >>>>> Or >>>>> you can use the lfc argument to decideTests(). >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> Please advice, thanks, Roopa >>>>> >>>>>> >>>>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou< >>>>>> mailinglist.honeypot@...****> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>>> >>>>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115@...> >>>>>>> wrote: >>>>>>> >>>>>>> Hi All, >>>>>>>> >>>>>>>> Thanks for the reply, I could pull out the the whole information for >>>>>>>> differentially expressed genes. The criteria used was adjust="fdr", >>>>>>>> >>>>>>>> p=0.05. >>>>>>> >>>>>>> I came up with ~ 20,000 genes to be differentially expressed. >>>>>>>> >>>>>>>> Hmm ... surely 20k cannot be correct? >>>>>>> >>>>>>> Since I wanted to analyze these genes for deregulated pathways I had >>>>>>> to >>>>>>> >>>>>>>> come up with fold change values for further analysis. >>>>>>>> >>>>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is >>>>>>>> log2 >>>>>>>> of FC values. >>>>>>>> >>>>>>>> When I convert the FC values (test/blank) to foldchanges using IF >>>>>>>> >>>>>>>> function >>>>>>> >>>>>>> I get lesser number of genes to be deregulated. The criteria was =>2 >>>>>>>> foldchanges and =<-2 fold changes. >>>>>>>> >>>>>>>> I'm missing previous context to this email, so -- not sure what the >>>>>>> "IF function" is, but if you're using limma, the log2fold changes are >>>>>>> reported for you in the logFC column that is returned from >>>>>>> `topTable(...)` >>>>>>> >>>>>>> -steve >>>>>>> >>>>>>> -- >>>>>>> Steve Lianoglou >>>>>>> Graduate Student: Computational Systems Biology >>>>>>> | Memorial Sloan-Kettering Cancer Center >>>>>>> | Weill Medical College of Cornell University >>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/****contact<http://cbio.mskcc.org/~lianos/**contact> >>>>>>> <http://cbio.mskcc.**org/~lianos/contact<http://cbio.mskcc.org/~lianos/contact> >>>>>>> > >>>>>>> >>>>>>> >>>>>>> -- >>>>>> >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> University of Washington >>>>> Environmental and Occupational Health Sciences >>>>> 4225 Roosevelt Way NE, # 100 >>>>> Seattle WA 98105-6099 >>>>> >>>>> >>>>> >>>> >>>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> University of Washington >>> Environmental and Occupational Health Sciences >>> 4225 Roosevelt Way NE, # 100 >>> Seattle WA 98105-6099 >>> >>> >>> >> >> >> >> >> >> [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed