Wang Peter | 28 Jul 05:02 2013
Picon

Re: limma for homemade microarray - question on NAs and multiple probes for one gene

sorry, i have a question.
why did you do log(M)
did you only extract matched probe signals?
shan

On Thu, Jul 25, 2013 at 7:21 AM, Gordon K Smyth <smyth@...> wrote:

> Dear Zhengyu,
>
>
> On Mon, 22 Jul 2013, zhengyu jiang wrote:
>
>  Hi Gordon,
>>
>> Sorry to bother you again.  I have two questions.
>>
>> (1) I started the "eset<-as.matrix(log(M)) #### take log intensities"
>> with a matrix and I have a separate annotation file which contains columns
>> of "ID", "Sample Name", "Chr", "Coordinate", ""GeneSymbol". The following
>> code to top list is below.  How can I add the annotation to the final
>> toplist output ?
>>
>
> The first and best way to answer questions is to read the documentation.
> If you type ?topTable, you will see that there is an argument for a
> data.frame containing gene annotation.  So you read the annotation into a
> data.frame, then use
>
>   topTable(fit, ..., genelist=yourdatannotation)
>
>
>  eset <- normalizeBetweenArrays(eset, method="quantile") ## quantile
>> normalization for all arrays
>> design<-model.matrix(~Pair+**Treat)
>> targets<-readTargets("targets.**txt",row.names=1)  ## or row name =1
>> Pair<-factor(targets$Pair)
>> Treat<-factor(targets$**Treatment,levels=c("N","T"))
>> fit_pair<-lmFit(eset,design)
>> plotSA(fit_pair)
>> fit= eBayes(fit_pair, trend=TRUE)
>> R=topTable(fit, coef="TreatT", adjust="BH",number=30
>>
>> (2) I have 6 sample data (3 control and 3 treatment) in one file.
>>
>
> Is this an Illumina BeadChip.  You don't say but, if not, using read.ilmn
> is not appropriate.
>
>
>  It contains columns of "ID", "Sample Name", "Chr", "Coordinate",
>> ""GeneSymbol", "XRaw" (expression raw data). I don't have control probes.
>> If I want to use the Spot quality weights function "wt.fun" to read in
>> these data as described in the limma manual to define all XRaw values below
>> 1000 as 0 weight,
>>
>
> And yet I have already asked you please not to do this.
>
>
>  is that possible to modify the code? I can change all column names if
>> needed. But I don't know what columns are required for read.illmn?
>>
>
> Again, please read the documentation.  Type ?read.ilmn and you will see
> that there is no argument called wt.fun.  You can only use read.ilmn for
> Illumina BeadChips.  This type of array does not have spots, so trying to
> set spot weights would obviously make no sense.
>
> The Biconductor posting guide says
>
> "Read the relevant R documentation ... If you are having trouble with
> function somefunc, try ?somefunc."
>
> The limma User's Guide says
>
> "Mailing list etiquette requires that you read the relevant help page
> carefully before posting a problem to the list."
>
> Best wishes
> Gordon
>
>
>  myfun <- function(x) as.numeric(x$XRaw <1000)
>> read.ilmn("062813_data.txt",**probeid="ID",annotation=c("**
>> Chr","Coordinate","GeneSymbol"**,expr="XRaw"),
>> wt.fun=myfun)
>> Thanks,
>> Zhengyu
>>
>>
>> On Tue, Jul 9, 2013 at 7:34 PM, Gordon K Smyth <smyth@...> wrote:
>>
>>  Dear Zhengyu,
>>>
>>>  Date: Mon, 8 Jul 2013 20:40:14 -0400
>>>
>>>> From: zhengyu jiang <zhyjiang2006@...>
>>>> To: bioconductor@...
>>>> Subject: [BioC] limma for homemade microarray - question on NAs and
>>>>         multiple probes for one gene
>>>>
>>>> Dear Bioconductor experts,
>>>>
>>>> We have data from a homemade one-channel microarray that I tried to
>>>> apply
>>>> limma for differential expression analysis between matched paired Normal
>>>> (N) and Tumor (Tumor) samples - 8 biological replicates (one tech
>>>> replicate
>>>> has been averaged after normalization). All samples are formatted in one
>>>> matrix (M).
>>>>
>>>>
>>> This is a standard problem, well covered in the limma User's Guide.
>>>
>>>  Signals have been quantile normalized between each paired normal and
>>>
>>>> tumor.
>>>>
>>>>
>>> I don't know what you mean by this.  I recommend ordinary quantile
>>> normalization of all the arrays together, without regard to which sample
>>> is
>>> paired with which.
>>>
>>>  Signal values below 5 (log scale) have been replaced by "NA" since they
>>>
>>>> are
>>>> potentially noises. So there are many NAs in M.
>>>>
>>>>
>>> Please don't do this.  limma can deal with low intensities perfectly
>>> weel,
>>> and can figure out whether they are noise or not.  You are simply
>>> removing
>>> valid data.
>>>
>>> If you are concerned about high variability of low intensity probes, then
>>> look at
>>>
>>>   plotSA(fit_pair)
>>>
>>> to examine the mean-variance trend, and use
>>>
>>>   eBayes(fit_pair, trend=TRUE)
>>>
>>> if there is a strong trend.
>>>
>>>  I followed the user manual and made the codes below.
>>>
>>>>
>>>> I think the code is correct?
>>>>
>>>>
>>> Looks ok.
>>>
>>>  My questions are (1) how to deal with NAs - as I did a search but no
>>>
>>>> clear idea
>>>>
>>>>
>>> Don't create them in the first place.  This has been said many times
>>> before!
>>>
>>>  (2) how do people do the statistics at the gene level for one gene
>>> having
>>>
>>>> multiple probes - averaging or taking median?
>>>>
>>>>
>>> Usually we do not find it necessary to aggregate the multiple probes. The
>>> multiple probes might represent different transcripts or variants, and
>>> some
>>> of the probes might not work.  Why do you need to take any special
>>> action?
>>>
>>> However, if you feel that you must, the best method to aggregate the
>>> multiple probes is probably to retain the probe for each gene with
>>> highest
>>> fit_pair$Amean value.  We have used this strategy in a number of
>>> published
>>> papers.  The rationale for this is to choose the probe corresponding to
>>> the
>>> most highly expressed transcript for the gene.  Alternatively, you could
>>> average the probes using the avereps() function in limma.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>  Thanks,
>>>
>>>> Zhengyu
>>>>
>>>>
>>>>  head(M)
>>>>>
>>>>         N1       N2       N3       N4       N5       N6       N7
>>>> N8       T1        T2       T3
>>>> 2  8.622724 7.423568       NA       NA 7.487174       NA 8.516293
>>>> NA
>>>> 7.876259  7.856707       NA
>>>>         T4       T5       T6       T7       T8
>>>> 2        NA 7.720018       NA 7.752550       NA
>>>>
>>>>  eset<-as.matrix(M)
>>>>
>>>>> Pair=factor(targets$Pair)
>>>>>     Treat=factor(targets$****Treatment,levels=c("N","T")) # compared
>>>>> matched
>>>>>
>>>>>  normal to tumors
>>>>
>>>>                design<-model.matrix(~Pair+****Treat)
>>>>> targets
>>>>>
>>>>>    FilenName Pair Treatment
>>>> 1         N1    1         N
>>>> 2         N2    2         N
>>>> 3         N3    3         N
>>>> 4         N4    4         N
>>>> 5         N5    5         N
>>>> 6         N6    6         N
>>>> 7         N7    7         N
>>>> 8         N8    8         N
>>>> 9         T1    1         T
>>>> 10        T2    2         T
>>>> 11        T3    3         T
>>>> 12        T4    4         T
>>>> 13        T5    5         T
>>>> 14        T6    6         T
>>>> 15        T7    7         T
>>>> 16        T8    8         T
>>>> fit_pair<-lmFit(eset,design)
>>>>             fit_pair<-eBayes(fit_pair)
>>>>
>>>> R=topTable(fit_pair, coef="TreatT", adjust="BH",number=30) # display top
>>>> 30
>>>>
>>>
> ______________________________**______________________________**__________
> The information in this email is confidential and inte...{{dropped:26}}

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane