3 Oct 2012 20:29
Re: Assigning gene symbols to Affymetrix data and averaging probes
Hi Jim Thanks, the reannotation worked a treat. I've been able to export the normalized data in annotated format. I am adverse to removing probes that have no Entrez ID associated with them as I want to put the whole set of data through limma. I can't use the annotated expr.loess in lmFit, but is there a way I can get the symbol information into the output of lmFit (for instance, as fit$symbol)? Best wishes Lesley . ________________________________________ From: James W. MacDonald [jmacdon@...] Sent: 03 October 2012 16:30 To: Hoyles, Lesley Cc: bioconductor@... Subject: Re: [BioC] Assigning gene symbols to Affymetrix data and averaging probes Hi Lesley, On 10/3/2012 10:55 AM, Hoyles, Lesley wrote: > Hi > > I have processed my affy data and am able to annotate the object > mice.loess using the following. ID <- featureNames(mice.loess) Symbol > <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess) <- > data.frame(ID=ID,Symbol=Symbol) > > > However, when I convert my object as follows - expr.loess <- > exprs(mice.loess) - I lose the annotation and have been unable to > find a way to annotate expr.loess. Please could anybody suggest how I > can annotate expr.loess? expr.loess <- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess)) > > > Is there a way of averaging probes for each gene with Affymetrix > data? I've been able to do this with single-channel Agilent data > using the example given in the limma guide. There are probably two reasonable ways to do this. First, the easiest. dat <- ReadAffy(cdfname = "mouse4302mmentrezcdf") and proceed from there. This will use the MBNI re-mapped CDF package based on Entrez Gene IDs, and you will have a single value per gene after summarization. There are other ways to map the probes; see http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp at the bottom of the page for more info. Alternatively if you want to stick with the original probesets, the problem arises that some probesets are not well annotated, so what to do with those? In addition, gene symbols are not guaranteed to be unique, so you can't just assume that they are. Entrez Gene and UniGene IDs are supposed to be unique, so you could go with them, doing something like (untested) gns <- toTable(mouse4302ENTREZID) alldat <- merge(gns, expr.loess, by = 1) ## where expr.loess is the data.frame I suggest above alldatlst <- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,]) combined.data <- do.call("rbind", lapply(alldatlst, function(x) c(x[1,1:3], colMeans(x[,-c(1:3)]))) Here I am assuming that after the merge() step the first three columns are the probeset ID, gene_id, symbol, and the remaining columns are the expression values. You will lose all data for which there isn't an Entrez Gene ID, but the same is true of the MBNI method I outline above. Best, Jim > > > Thanks in advance for your help. > > Best wishes Lesley _______________________________________________ > Bioconductor mailing list Bioconductor@... > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed