8 Mar 2013 04:04
Re: Analysis of Affymetrix Mouse Gene 2.0 ST arrays
Hi Kamila et. al., FYI: I'll ensure that the pd packages for the 2.0 versions of the chip are available for the next BioC release. benilton 2013/3/7 Naxerova, Kamila <naxerova@...>: > Haha, I don't mind rubbing against that belly. It's like playing a game... tyring to figure out how to get to the 40th level to face the "final enemy". > > Your strategy worked, thank you! I am including all code up to RMA normalization (yay! I am there!) below, perhaps it will save somebody a few hours of work. > > library(pdInfoBuilder) > > baseDir <- "/Users/naxerova/Documents/xxx" > (pgf <- list.files(baseDir, pattern = ".pgf", > full.names = TRUE)) > (clf <- list.files(baseDir, pattern = ".clf", > full.names = TRUE)) > (prob <- list.files(baseDir, pattern = ".probeset.csv", > full.names = TRUE)) > mps <- list.files(baseDir, pattern = "mps$", full.names = TRUE) > trans <- list.files(baseDir, pattern="transcript",full.names=TRUE) > > seed <- new("AffyGenePDInfoPkgSeed", > pgfFile = pgf, clfFile = clf, coreMps=mps, transFile=trans, > probeFile = prob, author = "Kamila Naxerova", > email = "naxerova@...", > biocViews = "AnnotationData", > organism = "Mouse", species = "Mus Musculus") > makePdInfoPackage(seed, destDir = ".") > > ## This is what the beginning of your output should look like > Building annotation package for Affymetrix Gene ST Array > PGF.........: MoGene-2_0-st.pgf > CLF.........: MoGene-2_0-st.clf > Probeset....: MoGene-2_0-st-v1.na33.mm10.probeset.csv > Transcript..: MoGene-2_0-st-v1.na33.mm10.transcript.csv > Core MPS....: MoGene-2_0-st.mps > > install.packages("/Users/naxerova/pd.mogene.2.0.st/", repos=NULL, type="source") > >> abatch <- read.celfiles(list.celfiles()) > Loading required package: pd.mogene.2.0.st > Platform design info loaded. > Reading in : xxx.CEL > [etc.] >> eset <- rma(abatch) > Background correcting > Normalizing > Calculating Expression > > On Mar 7, 2013, at 11:03 AM, James W. MacDonald <jmacdon@...> wrote: > >> Wow. This is really an education on the vast unwashed underbelly of >> BioC, no? >> >> There is a file called MoGene-2_0-st.mps that came in the zip file you >> downloaded. Add >> >> mps <- list.files(baseDir, pattern = "mps$", full.names = TRUE) >> >> and then >> >> coreMps = mps >> >> when you create your AffyGenePDInfoPkgSeed. This file is used to >> distinguish between the probeset and transcript probe mappings. >> >> Best, >> >> Jim >> >> >> >> On 3/7/2013 10:36 AM, Naxerova, Kamila wrote: >>> Thanks Jim. Of course the annotation package does not contain probe --> probe set information. What was I thinking?!?? >>> >>> What I had not realized was that I needed to build the pd.mogene.2.0.st package myself first, because it also does not exist on Bioconductor. So I just downloaded all the required files from Affy, but again I am stuck with an error message I don't understand... what is the coreMPS file that gives me the error? >>> >>>> library(pdInfoBuilder) >>>> baseDir<- "/Users/naxerova/Documents/xxx" >>>> (pgf<- list.files(baseDir, pattern = ".pgf", >>> + full.names = TRUE)) >>> [1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st.pgf" >>>> (clf<- list.files(baseDir, pattern = ".clf", >>> + full.names = TRUE)) >>> [1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st.clf" >>>> (prob<- list.files(baseDir, pattern = ".probeset.csv", >>> + full.names = TRUE)) >>> [1] "/Users/naxerova/Documents/xxx/MoGene-2_0-st-v1.na33.mm10.probeset.csv" >>>> seed<- new("AffyGenePDInfoPkgSeed", >>> + pgfFile = pgf, clfFile = clf, >>> + probeFile = prob, author = "Kamila Naxerova", >>> + email = "naxerova@...", >>> + biocViews = "AnnotationData", >>> + organism = "Mouse", species = "Mus Musculus") >>>> makePdInfoPackage(seed, destDir = ".") >>> =============================================================================================================================================== >>> Building annotation package for Affymetrix Gene ST Array >>> PGF.........: MoGene-2_0-st.pgf >>> CLF.........: MoGene-2_0-st.clf >>> Probeset....: MoGene-2_0-st-v1.na33.mm10.probeset.csv >>> Transcript..: TheTranscriptFile >>> Core MPS....: coreMps >>> =============================================================================================================================================== >>> Parsing file: MoGene-2_0-st.pgf... OK >>> Parsing file: MoGene-2_0-st.clf... OK >>> Creating initial table for probes... OK >>> Creating dictionaries... OK >>> Parsing file: MoGene-2_0-st-v1.na33.mm10.probeset.csv... OK >>> Parsing file: coreMps... Error in file(file, "rt") : cannot open the connection >>> In addition: Warning message: >>> In file(file, "rt") : cannot open file 'coreMps': No such file or directory >>> >>> >>> >>> >>> >>> On Mar 7, 2013, at 10:06 AM, "James W. MacDonald"<jmacdon@...> wrote: >>> >>>> Hi Kamila, >>>> >>>> On 3/7/2013 9:54 AM, Naxerova, Kamila wrote: >>>>> Dear all, >>>>> >>>>> I am afraid I have to ask for help with the Mouse Gene 2.0 ST annotation package one more time. It looked like I created it successfully, but when I try to use it to read in cel files with the oligo package, I get a cryptic error message. Any suggestions would be much appreciated! >>>> You don't use the annotation package at this step. There are two >>>> packages that are used for the analysis of this chip type. The first is >>>> the pd.mogene.2.0.st.v1 package, which is used by oligo to map probes to >>>> probesets when doing the normalization/summarization step. This package >>>> will be automagically installed if you don't have it, so there is >>>> nothing to be done at the first step but >>>> >>>> abatch<- read.celfiles(list.celfiles()) >>>> eset<- rma(abatch) >>>> >>>> This will give you the summarized and normalized data at the transcript >>>> level. You then will normally fit some model(s) using the modeling >>>> package of your choice, and then might want to output a set of >>>> significant genes, at which time you will use the >>>> mogene20sttranscriptcluster.db package to map probeset IDs to gene >>>> information. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>>>> abatch<- read.celfiles(list.celfiles(),pkgname="mogene20sttranscriptcluster.db") >>>>> Platform design info loaded. >>>>> Reading in : xxx.CEL >>>>> Reading in : xxx.CEL >>>>> Reading in : xxx.CEL >>>>> [... more cel files listed] >>>>> >>>>> Error in function (classes, fdef, mtable) : >>>>> unable to find an inherited method for function ‘kind’ for signature ‘"ChipDb"’ >>>>> >>>>> Thanks >>>>> Kamila >>>>> >>>>> On Mar 6, 2013, at 6:16 PM, "Naxerova, Kamila"<naxerova@...du> wrote: >>>>> >>>>>> Dear Christian and Jim, >>>>>> >>>>>> many thanks to both of you for your explanations. >>>>>> >>>>>> Your hard work paid off, and I have finally understood everything and managed to build my annotation package!!!! I wrote a little script similar to what Jim was suggesting, namely picking the first RefSeq-like thing I came across. Jim called it "naive" -- but I think there is no downside to this approach, right? I have looked at various examples in the Affy file for a long time, and simply picking the first Refseq ID seems to be kosher. >>>>>> >>>>>> data<-read.csv("MoGene-transcript-noheader.csv",header=T,stringsAsFactors=F,sep=",") >>>>>> sdata<- data[,c(1,9)] >>>>>> >>>>>> returnRef=function(x){ >>>>>> refst<- strsplit(x,split="///")[[1]][grep("RefSeq",strsplit(x,split="///")[[1]])[1]] >>>>>> refid<- gsub(" ","",strsplit(refst,split="//")[[1]][1]) >>>>>> return(refid) >>>>>> } >>>>>> >>>>>> sdata$refseqids<- sapply(sdata[,2],returnRef) >>>>>> fdata<- sdata[,-2] >>>>>> write.table(fdata,"AnnotBuild.txt", sep="\t",quote=F,row.names=F,col.names=F) >>>>>> >>>>>> library(AnnotationForge) >>>>>> library(mouse.db0) >>>>>> library(org.Mm.eg.db) >>>>>> makeDBPackage("MOUSECHIP_DB", >>>>>> affy=F, >>>>>> prefix="mogene20sttranscriptcluster", >>>>>> fileName="AnnotBuild.txt", >>>>>> outputDir = ".", >>>>>> version="2.11.1", >>>>>> baseMapType="refseq", >>>>>> manufacturer = "Affymetrix", >>>>>> chipName = "Mouse Gene 2.0 ST Array", >>>>>> manufacturerUrl = "http://www.affymetrix.com", >>>>>> author = "Kamila Naxerova", >>>>>> maintainer = "Kamila Naxerova<naxerova@...>") >>>>>> >>>>>>> install.packages("mogene20sttranscriptcluster.db",repos=NULL, type="source") >>>>>> * installing *source* package ‘mogene20sttranscriptcluster.db’ ... >>>>>> ** R >>>>>> ** inst >>>>>> ** preparing package for lazy loading >>>>>> ** help >>>>>> *** installing help indices >>>>>> ** building package indices >>>>>> ** testing if installed package can be loaded >>>>>> *** arch - i386 >>>>>> *** arch - x86_64 >>>>>> >>>>>> * DONE (mogene20sttranscriptcluster.db) >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor@... >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@... >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> University of Washington >>>> Environmental and Occupational Health Sciences >>>> 4225 Roosevelt Way NE, # 100 >>>> Seattle WA 98105-6099 >>>> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed