3 Feb 2013 01:04
LogFC query in Limma
Dear Roopa, Perhaps you have just misread the limma output. This thread has been mainly concerned with your statement that you found ~20,000 genes to be differentially expressed. But the limma output shows 984 up-regulated and 1187 down-regulated genes, in other words about 2k total rather than 20k. Best wishes Gordon > Date: Fri, 1 Feb 2013 11:05:36 -0500 > From: Roopa Subbaiaih <rss115@...> > To: "James W. MacDonald" <jmacdon@...> > Cc: bioconductor@... > Subject: Re: [BioC] LogFC query in Limma > > Hi James, > > Am I still making any silly mistake over here. I am comparing normal (21) > with lesionous patient samples (34). This is what I get in R console. Is > there a problem with the script? > > Any advice would greatly help for further analysis.Thanks, Roopa > >> getwd() > [1] "C:/Documents and Settings/rsubbaiaih/My Documents" >> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a") >> ls() > character(0) >> #-----------------------------------------------# >> library(affy) >> eset = justRMA() > Loading required package: AnnotationDbi > >> eset > ExpressionSet (storageMode: lockedEnvironment) > assayData: 54675 features, 55 samples > element names: exprs, se.exprs > protocolData > sampleNames: GSM372286.CEL GSM372287.CEL ... > GSM372367.CEL (55 total) > varLabels: ScanDate > varMetadata: labelDescription > phenoData > sampleNames: GSM372286.CEL GSM372287.CEL ... > GSM372367.CEL (55 total) > varLabels: sample > varMetadata: labelDescription > featureData: none > experimentData: use 'experimentData(object)' > Annotation: hgu133plus2 >> f <- factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, > + > 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2), > labels=c("Healthy", "affected")) >> design <- model.matrix(~ 0 + f) >> design > fHealthy faffected > 1 1 0 > 2 1 0 > 3 1 0 > 4 1 0 > 5 1 0 > 6 1 0 > 7 1 0 > 8 1 0 > 9 1 0 > 10 1 0 > 11 1 0 > 12 1 0 > 13 1 0 > 14 1 0 > 15 1 0 > 16 1 0 > 17 1 0 > 18 1 0 > 19 1 0 > 20 1 0 > 21 1 0 > 22 0 1 > 23 0 1 > 24 0 1 > 25 0 1 > 26 0 1 > 27 0 1 > 28 0 1 > 29 0 1 > 30 0 1 > 31 0 1 > 32 0 1 > 33 0 1 > 34 0 1 > 35 0 1 > 36 0 1 > 37 0 1 > 38 0 1 > 39 0 1 > 40 0 1 > 41 0 1 > 42 0 1 > 43 0 1 > 44 0 1 > 45 0 1 > 46 0 1 > 47 0 1 > 48 0 1 > 49 0 1 > 50 0 1 > 51 0 1 > 52 0 1 > 53 0 1 > 54 0 1 > 55 0 1 > attr(,"assign") > [1] 1 1 > attr(,"contrasts") > attr(,"contrasts")$f > [1] "contr.treatment" > >> colnames(design) <-c("Healthy","affected") >> library(limma) >> fit <- lmFit(eset, design) >> fit > An object of class "MArrayLM" > $coefficients > Healthy affected > 1007_s_at 10.081309 9.548286 > 1053_at 6.807501 7.482849 > 117_at 5.969921 6.147594 > 121_at 7.403842 7.733666 > 1255_g_at 2.804475 2.827041 > 54670 more rows ... > > $rank > [1] 2 > > $assign > [1] 1 1 > > $qr > $qr > Healthy affected > 1 -4.5825757 0.000000 > 2 0.2182179 -5.830952 > 3 0.2182179 0.000000 > 4 0.2182179 0.000000 > 5 0.2182179 0.000000 > 50 more rows ... > > $qraux > [1] 1.218218 1.000000 > > $pivot > [1] 1 2 > > $tol > [1] 1e-07 > > $rank > [1] 2 > > > $df.residual > [1] 53 53 53 53 53 > 54670 more elements ... > > $sigma > 1007_s_at 1053_at 117_at 121_at 1255_g_at > 0.2866489 0.4801618 0.3499880 0.2332555 0.1053397 > 54670 more elements ... > > $cov.coefficients > Healthy affected > Healthy 0.04761905 0.00000000 > affected 0.00000000 0.02941176 > > $stdev.unscaled > Healthy affected > 1007_s_at 0.2182179 0.1714986 > 1053_at 0.2182179 0.1714986 > 117_at 0.2182179 0.1714986 > 121_at 0.2182179 0.1714986 > 1255_g_at 0.2182179 0.1714986 > 54670 more rows ... > > $pivot > [1] 1 2 > > $genes > [1] "1007_s_at" "1053_at" "117_at" "121_at" > [5] "1255_g_at" > 54670 more rows ... > > $Amean > 1007_s_at 1053_at 117_at 121_at 1255_g_at > 9.751804 7.224988 6.079756 7.607733 2.818425 > 54670 more elements ... > > $method > [1] "ls" > > $design > Healthy affected > 1 1 0 > 2 1 0 > 3 1 0 > 4 1 0 > 5 1 0 > 50 more rows ... > >> contrast.matrix <-makeContrasts(affected-Healthy,levels = design) >> fit2 <- contrasts.fit(fit, contrast.matrix) >> fit2 <- eBayes(fit2) >> results <- decideTests(fit2, adjust="fdr",lfc=1) >> summary(results) > affected - Healthy > -1 1187 > 0 52504 > 1 984 >> results > TestResults matrix > 1007_s_at 1053_at 117_at 121_at 1255_g_at > 0 0 0 0 0 > 54670 more rows ... >> > > > On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon@...> wrote: > >> If you just use the expression values from the original authors, I get >> just under 9K probesets for this comparison at an FDR of 0.05 and no fold >> change criterion. It drops to just under 900 with a 2-fold difference added >> in. >> >> So yeah, seems like a lot to me as well. >> >> Best, >> >> Jim >> >> >> On 1/31/2013 4:02 PM, Steve Lianoglou wrote: >> >>> ... what Jim said. >>> >>> But also, this 20k differentially expressed (likely probe sets, not >>> genes) is raising a red flag for me, no? Am I alone here? >>> >>> That's .. what's the word I'm looking for ... "a lot". >>> >>> -steve >>> >>> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon@...> >>> wrote: >>> >>>> Hi Roopa, >>>> >>>> >>>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote: >>>> >>>>> Hi Steve, >>>>> >>>>> This was the script I used- >>>>> getwd() >>>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a") >>>>> ls() >>>>> #-----------------------------**------------------# >>>>> library(affy) >>>>> eset = justRMA() >>>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1, >>>>> >>>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,** >>>>> 2,2,2,2), >>>>> labels=c("Healthy", "unaffected")) >>>>> design<- model.matrix(~ 0 + f) >>>>> design >>>>> colnames(design)<-c("Healthy",**"unaffected") >>>>> design >>>>> library(limma) >>>>> fit<- lmFit(eset, design) >>>>> library(hgu133plus2.db) >>>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"**hgu133plus2.db") >>>>> contrast.matrix<-**makeContrasts(affected-**Healthy,levels = design) >>>>> fit2<- contrasts.fit(fit, contrast.matrix) >>>>> fit2<- eBayes(fit2) >>>>> topTable(fit2,coef=1,p=0.05, adjust="fdr") >>>>> results<- decideTests(fit2, adjust="fdr", p=0.05) >>>>> summary(results) >>>>> write.table(results,file="**myresults.txt") >>>>> write.fit(). >>>>> >>>>> I had identified ~54,000 genes of which ~ 20K were differentially >>>>> expressed. >>>>> >>>>> But when I use these genes for pathway analysis the software asks for >>>>> fold >>>>> change values but not p value so it is easier to analyze the data. >>>>> >>>>> What I did was - I used the differentially expressed gene table for >>>>> further >>>>> analysis. That is I converted logFC values to FC(test/control) assuming >>>>> that >>>>> >>>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values. >>>>> >>>>> Once I got test/control values I converted them to fold changes using >>>>> "IF" >>>>> function in excel sheet to eliminate genes with fold changes between -2 >>>>> to >>>>> +2. >>>>> >>>>> Once I did this the number of significant genes drastically reduced to ~ >>>>> 2,000. >>>>> >>>>> Is this the right method? >>>>> >>>> >>>> No. Note that the range of fold changes after 'unlogging' will be 0-INF, >>>> and >>>> the down-regulated genes will be in the range 0-1 whereas the upregulated >>>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas 2 >>>> fold down will be 1/2 or 0.5). >>>> >>>> The easiest way to filter is to keep the logFC and filter on -1 and 1. Or >>>> you can use the lfc argument to decideTests(). >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> Please advice, thanks, Roopa >>>>> >>>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou< >>>>> mailinglist.honeypot@...**≥ wrote: >>>>> >>>>> Hi, >>>>>> >>>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115@...> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> Thanks for the reply, I could pull out the the whole information for >>>>>>> differentially expressed genes. The criteria used was adjust="fdr", >>>>>>> >>>>>> p=0.05. >>>>>> >>>>>>> I came up with ~ 20,000 genes to be differentially expressed. >>>>>>> >>>>>> Hmm ... surely 20k cannot be correct? >>>>>> >>>>>> Since I wanted to analyze these genes for deregulated pathways I had to >>>>>>> come up with fold change values for further analysis. >>>>>>> >>>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is >>>>>>> log2 >>>>>>> of FC values. >>>>>>> >>>>>>> When I convert the FC values (test/blank) to foldchanges using IF >>>>>>> >>>>>> function >>>>>> >>>>>>> I get lesser number of genes to be deregulated. The criteria was =>2 >>>>>>> foldchanges and =<-2 fold changes. >>>>>>> >>>>>> I'm missing previous context to this email, so -- not sure what the >>>>>> "IF function" is, but if you're using limma, the log2fold changes are >>>>>> reported for you in the logFC column that is returned from >>>>>> `topTable(...)` >>>>>> >>>>>> -steve >>>>>> >>>>>> -- >>>>>> Steve Lianoglou >>>>>> Graduate Student: Computational Systems Biology >>>>>> | Memorial Sloan-Kettering Cancer Center >>>>>> | Weill Medical College of Cornell University >>>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http://cbio.mskcc.org/~lianos/contact> >>>>>> >>>>>> >>>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >>>> >>> >>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> > > > -- > --------------------------------------- > Roopa Shree Subbaiaih > Post Doctoral Fellow > Department of Dermatology > School of Medicine > Case Western Reserve University > Cleveland, OH-44106 > Tel:+1 216 368 0211 > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}} _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed