Roopa Subbaiaih | 3 Feb 14:08 2013

Re: LogFC query in Limma

Dear Gordon,
Thanks, I could complete the analysis. Roopa

On Sat, Feb 2, 2013 at 7:04 PM, Gordon K Smyth <smyth@...> wrote:

> Dear Roopa,
>
> Perhaps you have just misread the limma output.  This thread has been
> mainly concerned with your statement that you found ~20,000 genes to be
> differentially expressed.  But the limma output shows 984 up-regulated and
> 1187 down-regulated genes, in other words about 2k total rather than 20k.
>
> Best wishes
> Gordon
>
>  Date: Fri, 1 Feb 2013 11:05:36 -0500
>> From: Roopa Subbaiaih <rss115@...>
>> To: "James W. MacDonald" <jmacdon@...>
>> Cc: bioconductor@...
>> Subject: Re: [BioC] LogFC query in Limma
>>
>> Hi James,
>>
>> Am I still making any silly mistake over here. I am comparing normal (21)
>> with lesionous patient samples (34). This is what I get in R console. Is
>> there a problem with the script?
>>
>> Any advice would greatly help for further analysis.Thanks, Roopa
>>
>>  getwd()
>>>
>> [1] "C:/Documents and Settings/rsubbaiaih/My Documents"
>>
>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>>> ls()
>>>
>> character(0)
>>
>>> #-----------------------------**------------------#
>>> library(affy)
>>> eset = justRMA()
>>>
>> Loading required package: AnnotationDbi
>>
>>  eset
>>>
>> ExpressionSet (storageMode: lockedEnvironment)
>> assayData: 54675 features, 55 samples
>>  element names: exprs, se.exprs
>> protocolData
>>  sampleNames: GSM372286.CEL GSM372287.CEL ...
>>    GSM372367.CEL (55 total)
>>  varLabels: ScanDate
>>  varMetadata: labelDescription
>> phenoData
>>  sampleNames: GSM372286.CEL GSM372287.CEL ...
>>    GSM372367.CEL (55 total)
>>  varLabels: sample
>>  varMetadata: labelDescription
>> featureData: none
>> experimentData: use 'experimentData(object)'
>> Annotation: hgu133plus2
>>
>>> f <- factor(c(1,1,1,1,1,1,1,1,1,1,**1,1,1,1,1,1,1,1,1,1,1,
>>>
>> +
>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,2,2,2),
>> labels=c("Healthy", "affected"))
>>
>>> design <- model.matrix(~ 0 + f)
>>> design
>>>
>>   fHealthy faffected
>> 1         1         0
>> 2         1         0
>> 3         1         0
>> 4         1         0
>> 5         1         0
>> 6         1         0
>> 7         1         0
>> 8         1         0
>> 9         1         0
>> 10        1         0
>> 11        1         0
>> 12        1         0
>> 13        1         0
>> 14        1         0
>> 15        1         0
>> 16        1         0
>> 17        1         0
>> 18        1         0
>> 19        1         0
>> 20        1         0
>> 21        1         0
>> 22        0         1
>> 23        0         1
>> 24        0         1
>> 25        0         1
>> 26        0         1
>> 27        0         1
>> 28        0         1
>> 29        0         1
>> 30        0         1
>> 31        0         1
>> 32        0         1
>> 33        0         1
>> 34        0         1
>> 35        0         1
>> 36        0         1
>> 37        0         1
>> 38        0         1
>> 39        0         1
>> 40        0         1
>> 41        0         1
>> 42        0         1
>> 43        0         1
>> 44        0         1
>> 45        0         1
>> 46        0         1
>> 47        0         1
>> 48        0         1
>> 49        0         1
>> 50        0         1
>> 51        0         1
>> 52        0         1
>> 53        0         1
>> 54        0         1
>> 55        0         1
>> attr(,"assign")
>> [1] 1 1
>> attr(,"contrasts")
>> attr(,"contrasts")$f
>> [1] "contr.treatment"
>>
>>  colnames(design) <-c("Healthy","affected")
>>> library(limma)
>>> fit <- lmFit(eset, design)
>>> fit
>>>
>> An object of class "MArrayLM"
>> $coefficients
>>            Healthy affected
>> 1007_s_at 10.081309 9.548286
>> 1053_at    6.807501 7.482849
>> 117_at     5.969921 6.147594
>> 121_at     7.403842 7.733666
>> 1255_g_at  2.804475 2.827041
>> 54670 more rows ...
>>
>> $rank
>> [1] 2
>>
>> $assign
>> [1] 1 1
>>
>> $qr
>> $qr
>>     Healthy  affected
>> 1 -4.5825757  0.000000
>> 2  0.2182179 -5.830952
>> 3  0.2182179  0.000000
>> 4  0.2182179  0.000000
>> 5  0.2182179  0.000000
>> 50 more rows ...
>>
>> $qraux
>> [1] 1.218218 1.000000
>>
>> $pivot
>> [1] 1 2
>>
>> $tol
>> [1] 1e-07
>>
>> $rank
>> [1] 2
>>
>>
>> $df.residual
>> [1] 53 53 53 53 53
>> 54670 more elements ...
>>
>> $sigma
>> 1007_s_at   1053_at    117_at    121_at 1255_g_at
>> 0.2866489 0.4801618 0.3499880 0.2332555 0.1053397
>> 54670 more elements ...
>>
>> $cov.coefficients
>>            Healthy   affected
>> Healthy  0.04761905 0.00000000
>> affected 0.00000000 0.02941176
>>
>> $stdev.unscaled
>>            Healthy  affected
>> 1007_s_at 0.2182179 0.1714986
>> 1053_at   0.2182179 0.1714986
>> 117_at    0.2182179 0.1714986
>> 121_at    0.2182179 0.1714986
>> 1255_g_at 0.2182179 0.1714986
>> 54670 more rows ...
>>
>> $pivot
>> [1] 1 2
>>
>> $genes
>> [1] "1007_s_at" "1053_at"   "117_at"    "121_at"
>> [5] "1255_g_at"
>> 54670 more rows ...
>>
>> $Amean
>> 1007_s_at   1053_at    117_at    121_at 1255_g_at
>> 9.751804  7.224988  6.079756  7.607733  2.818425
>> 54670 more elements ...
>>
>> $method
>> [1] "ls"
>>
>> $design
>>  Healthy affected
>> 1       1        0
>> 2       1        0
>> 3       1        0
>> 4       1        0
>> 5       1        0
>> 50 more rows ...
>>
>>  contrast.matrix <-makeContrasts(affected-**Healthy,levels = design)
>>> fit2 <- contrasts.fit(fit, contrast.matrix)
>>> fit2 <- eBayes(fit2)
>>> results <- decideTests(fit2, adjust="fdr",lfc=1)
>>> summary(results)
>>>
>>   affected - Healthy
>> -1               1187
>> 0               52504
>> 1                 984
>>
>>> results
>>>
>> TestResults matrix
>> 1007_s_at   1053_at    117_at    121_at 1255_g_at
>>        0         0         0         0         0
>> 54670 more rows ...
>>
>>>
>>>
>>
>> On Thu, Jan 31, 2013 at 4:48 PM, James W. MacDonald <jmacdon@...>
>> wrote:
>>
>>  If you just use the expression values from the original authors, I get
>>> just under 9K probesets for this comparison at an FDR of 0.05 and no fold
>>> change criterion. It drops to just under 900 with a 2-fold difference
>>> added
>>> in.
>>>
>>> So yeah, seems like a lot to me as well.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>> On 1/31/2013 4:02 PM, Steve Lianoglou wrote:
>>>
>>>  ... what Jim said.
>>>>
>>>> But also, this 20k differentially expressed (likely probe sets, not
>>>> genes) is raising a red flag for me, no? Am I alone here?
>>>>
>>>> That's .. what's the word I'm looking for ... "a lot".
>>>>
>>>> -steve
>>>>
>>>> On Thu, Jan 31, 2013 at 3:56 PM, James W. MacDonald<jmacdon@...>
>>>>  wrote:
>>>>
>>>>  Hi Roopa,
>>>>>
>>>>>
>>>>> On 1/31/2013 3:45 PM, Roopa Subbaiaih wrote:
>>>>>
>>>>>  Hi Steve,
>>>>>>
>>>>>> This was the script I used-
>>>>>> getwd()
>>>>>> setwd(dir="/CRSP 406-11/DEMOS/GSE14905-a")
>>>>>> ls()
>>>>>> #-----------------------------****------------------#
>>>>>> library(affy)
>>>>>> eset = justRMA()
>>>>>> f<- factor(c(1,1,1,1,1,1,1,1,1,1,****1,1,1,1,1,1,1,1,1,1,1,
>>>>>>
>>>>>> 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,****2,2,2,2,2,2,2,2,2,2,2,2,2,2,**2,**
>>>>>> 2,2,2,2),
>>>>>> labels=c("Healthy", "unaffected"))
>>>>>> design<- model.matrix(~ 0 + f)
>>>>>> design
>>>>>> colnames(design)<-c("Healthy",****"unaffected")
>>>>>> design
>>>>>> library(limma)
>>>>>> fit<- lmFit(eset, design)
>>>>>> library(hgu133plus2.db)
>>>>>> fit$genes$Symbol<- getSYMBOL(fit$genes$ID,"****hgu133plus2.db")
>>>>>> contrast.matrix<-****makeContrasts(affected-****Healthy,levels =
>>>>>> design)
>>>>>> fit2<- contrasts.fit(fit, contrast.matrix)
>>>>>> fit2<- eBayes(fit2)
>>>>>> topTable(fit2,coef=1,p=0.05, adjust="fdr")
>>>>>> results<- decideTests(fit2, adjust="fdr", p=0.05)
>>>>>> summary(results)
>>>>>> write.table(results,file="****myresults.txt")
>>>>>> write.fit().
>>>>>>
>>>>>> I had identified ~54,000 genes of which ~ 20K were differentially
>>>>>> expressed.
>>>>>>
>>>>>> But when I use these genes for pathway analysis the software asks for
>>>>>> fold
>>>>>> change values but not p value so it is easier to analyze the data.
>>>>>>
>>>>>> What I did was - I used the differentially expressed gene table for
>>>>>> further
>>>>>> analysis. That is I converted logFC values to FC(test/control)
>>>>>> assuming
>>>>>> that
>>>>>>
>>>>>> FC= FCmean(test)-FCmean(blank) and LogFC is log2 of FC values.
>>>>>>
>>>>>> Once I got test/control values I converted them to fold changes using
>>>>>> "IF"
>>>>>> function in excel sheet to eliminate genes with fold changes between
>>>>>> -2
>>>>>> to
>>>>>> +2.
>>>>>>
>>>>>> Once I did this the number of significant genes drastically reduced
>>>>>> to ~
>>>>>> 2,000.
>>>>>>
>>>>>> Is this the right method?
>>>>>>
>>>>>>
>>>>> No. Note that the range of fold changes after 'unlogging' will be
>>>>> 0-INF,
>>>>> and
>>>>> the down-regulated genes will be in the range 0-1 whereas the
>>>>> upregulated
>>>>> genes will be in the range 1-INF. (e.g. two fold up will be 2, whereas
>>>>> 2
>>>>> fold down will be 1/2 or 0.5).
>>>>>
>>>>> The easiest way to filter is to keep the logFC and filter on -1 and 1.
>>>>> Or
>>>>> you can use the lfc argument to decideTests().
>>>>>
>>>>> Best,
>>>>>
>>>>> Jim
>>>>>
>>>>>
>>>>>
>>>>> Please advice, thanks, Roopa
>>>>>
>>>>>>
>>>>>> On Thu, Jan 31, 2013 at 3:23 PM, Steve Lianoglou<
>>>>>> mailinglist.honeypot@...****>   wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 31, 2013 at 2:54 PM, Roopa Subbaiaih<rss115@...>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Hi All,
>>>>>>>>
>>>>>>>> Thanks for the reply, I could pull out the the whole information for
>>>>>>>> differentially expressed genes. The criteria used was adjust="fdr",
>>>>>>>>
>>>>>>>>  p=0.05.
>>>>>>>
>>>>>>>  I came up with ~ 20,000 genes to be differentially expressed.
>>>>>>>>
>>>>>>>>  Hmm ... surely 20k cannot be correct?
>>>>>>>
>>>>>>> Since I wanted to analyze these genes for deregulated pathways I had
>>>>>>> to
>>>>>>>
>>>>>>>> come up with fold change values for further analysis.
>>>>>>>>
>>>>>>>> I assume that for each gene FC= FCmean(test)-FCmean(blank). LogFC is
>>>>>>>> log2
>>>>>>>> of FC values.
>>>>>>>>
>>>>>>>> When I convert the FC values (test/blank) to foldchanges using IF
>>>>>>>>
>>>>>>>>  function
>>>>>>>
>>>>>>>  I get lesser number of genes to be deregulated. The criteria was =>2
>>>>>>>> foldchanges and =<-2 fold changes.
>>>>>>>>
>>>>>>>>  I'm missing previous context to this email, so -- not sure what the
>>>>>>> "IF function" is, but if you're using limma, the log2fold changes are
>>>>>>> reported for you in the logFC column that is returned from
>>>>>>> `topTable(...)`
>>>>>>>
>>>>>>> -steve
>>>>>>>
>>>>>>> --
>>>>>>> Steve Lianoglou
>>>>>>> Graduate Student: Computational Systems Biology
>>>>>>>    | Memorial Sloan-Kettering Cancer Center
>>>>>>>    | Weill Medical College of Cornell University
>>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/****contact<http://cbio.mskcc.org/~lianos/**contact>
>>>>>>> <http://cbio.mskcc.**org/~lianos/contact<http://cbio.mskcc.org/~lianos/contact>
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>
>>>>> James W. MacDonald, M.S.
>>>>> Biostatistician
>>>>> University of Washington
>>>>> Environmental and Occupational Health Sciences
>>>>> 4225 Roosevelt Way NE, # 100
>>>>> Seattle WA 98105-6099
>>>>>
>>>>>
>>>>>
>>>>
>>>>  --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane