James W. MacDonald | 3 Oct 20:48 2012

Re: Assigning gene symbols to Affymetrix data and averaging probes

Hi Lesley,

On 10/3/2012 2:29 PM, Hoyles, Lesley wrote:
> Hi Jim
>
> Thanks, the reannotation worked a treat. I've been able to export the normalized data in annotated format.
>
> I am adverse to removing probes that have no Entrez ID associated with them as I want to put the whole set of
data through limma. I can't use the annotated expr.loess in lmFit, but is there a way I can get the symbol
information into the output of lmFit (for instance, as fit$symbol)?

There is a 'genes' slot to an MArrayLM object (the output from e.g., 
lmFit) into which you can stuff a data.frame containing gene symbols, etc.

Another option is to use the annaffy package to do the annotation. And 
if you are going to use annaffy and limma, then I should make a 
shameless plug for the affycoretools package, which contains a function 
designed to go from an MArrayLM object to annotated output in a single 
function call (outputting HTML or text files).

Best,

Jim

>
> Best wishes
> Lesley
> .
>
> ________________________________________
> From: James W. MacDonald [jmacdon@...]
> Sent: 03 October 2012 16:30
> To: Hoyles, Lesley
> Cc: bioconductor@...
> Subject: Re: [BioC] Assigning gene symbols to Affymetrix data and averaging probes
>
> Hi Lesley,
>
> On 10/3/2012 10:55 AM, Hoyles, Lesley wrote:
>>   Hi
>>
>>   I have processed my affy data and am able to annotate the object
>>   mice.loess using the following. ID<- featureNames(mice.loess) Symbol
>>   <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess)<-
>>   data.frame(ID=ID,Symbol=Symbol)
>>
>>
>>   However, when I convert my object as follows - expr.loess<-
>>   exprs(mice.loess) - I lose the annotation and have been unable to
>>   find a way to annotate expr.loess. Please could anybody suggest how I
>>   can annotate expr.loess?
> expr.loess<- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess))
>
>>
>>   Is there a way of averaging probes for each gene with Affymetrix
>>   data? I've been able to do this with single-channel Agilent data
>>   using the example given in the limma guide.
> There are probably two reasonable ways to do this. First, the easiest.
>
> dat<- ReadAffy(cdfname = "mouse4302mmentrezcdf")
>
> and proceed from there. This will use the MBNI re-mapped CDF package
> based on Entrez Gene IDs, and you will have a single value per gene
> after summarization. There are other ways to map the probes; see
> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
> at the bottom of the page for more info.
>
> Alternatively if you want to stick with the original probesets, the
> problem arises that some probesets are not well annotated, so what to do
> with those? In addition, gene symbols are not guaranteed to be unique,
> so you can't just assume that they are. Entrez Gene and UniGene IDs are
> supposed to be unique, so you could go with them, doing something like
> (untested)
>
> gns<- toTable(mouse4302ENTREZID)
> alldat<- merge(gns, expr.loess, by = 1) ## where expr.loess is the
> data.frame I suggest above
> alldatlst<- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,])
> combined.data<- do.call("rbind", lapply(alldatlst, function(x)
> c(x[1,1:3], colMeans(x[,-c(1:3)])))
>
> Here I am assuming that after the merge() step the first three columns
> are the probeset ID, gene_id, symbol, and the remaining columns are the
> expression values. You will lose all data for which there isn't an
> Entrez Gene ID, but the same is true of the MBNI method I outline above.
>
> Best,
>
> Jim
>
>
>>
>>   Thanks in advance for your help.
>>
>>   Best wishes Lesley _______________________________________________
>>   Bioconductor mailing list Bioconductor@...
>>   https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>>   archives:
>>   http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099

--

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane