Jenny Drnevich | 2 Sep 16:09
Favicon

Re: question regarding removing probes/probe sets before normalization

Hi Jack,

I found I had an even more updated version from Jan 2009. The code is 
below. Try using this, and let me know if you still have the problem. 
Note that I changed the arguments to RemoveProbes(); there is only 
listOutProbes, listOutProbesSets and cleancdf. cleancdf is the name 
array, without "cdf" or "probe" at the end. For example, if you're 
using the mouse 430 2.0 array, you'd need cleancdf = "mouse4302". If 
you're still having problems, please send the entire output of your 
code and sessionInfo().

Jenny

### The first part is just creating two ojects (ResetEnvir and 
RemoveProbes) originally
### written by Ariel Chernomoretz and modified by Jenny Drnevich to 
remove individual
### probes and/or entire probesets. Just highlight everything from here until
### you see STOP and paste it to R all at once

### Updated Jan 2009

ResetEnvir<-function(cleancdf){
  cdfpackagename   <- paste(cleancdf,"cdf",sep="")
  probepackagename <- paste(cleancdf,"probe",sep="")
  ll<-search()
  cdfpackagepos <- grep(cdfpackagename,ll)
  if(length(cdfpackagepos)>0) detach(pos=cdfpackagepos)
  ll<-search()
  probepackagepos <- grep(probepackagename,ll)
  if(length(probepackagepos)>0) detach(pos=probepackagepos)
  require(cdfpackagename,character.only=T)
  require(probepackagename,character.only=T)
  require(affy)
}

RemoveProbes<-function(listOutProbes=NULL,
                        listOutProbeSets=NULL,

cleancdf,destructive=TRUE){

  #default probe dataset values
  cdfpackagename   <- paste(cleancdf,"cdf",sep="")
  probepackagename <- paste(cleancdf,"probe",sep="")
  require(cdfpackagename,character.only = TRUE)
  require(probepackagename,character.only = TRUE)
  probe.env.orig <- get(probepackagename)

  if(!is.null(listOutProbes)){
   # taking probes out from CDF env
   probes<- unlist(lapply(listOutProbes,function(x){
                            a<-strsplit(x,"at")
                            aux1<-paste(a[[1]][1],"at",sep="")
                            aux2<-as.integer(a[[1]][2])
                            c(aux1,aux2)
                           }))
   n1<-as.character(probes[seq(1,(length(probes)/2))*2-1])
   n2<-as.integer(probes[seq(1,(length(probes)/2))*2])
   probes<-data.frame(I(n1),n2)
   probes[,1]<-as.character(probes[,1])
   probes[,2]<-as.integer(probes[,2])
   pset<-unique(probes[,1])
   if (!is.null(listOutProbeSets))
       pset <- setdiff(pset,listOutProbeSets)
   for(i in seq(along=pset)){
    ii  <-grep(pset[i],probes[,1])
    iout<-probes[ii,2]
    a<-get(pset[i],env=get(cdfpackagename))
    a<-a[-iout,]
    if(!is.matrix(a)) a <- matrix(a,ncol=2,byrow=T)
    if(nrow(a)==0)
     stop("Probe set ",pset[i]," has zero probes left. Remove this 
entire probe set
     in the listOutProbeSets argument")
    assign(pset[i],a,env=get(cdfpackagename))
   }
  }

  # taking probesets out from CDF env
  if(!is.null(listOutProbeSets)){
   rm(list=listOutProbeSets,envir=get(cdfpackagename))
  }

  # setting the PROBE env accordingly (idea from gcrma compute.affinities.R)
  tmp <- get("xy2indices",paste("package:",cdfpackagename,sep=""))
  newAB   <- new("AffyBatch",cdfName=cleancdf)
  pmIndex <-  unlist(indexProbes(newAB,"pm"))
  subIndex<- 
match(tmp(probe.env.orig$x,probe.env.orig$y,cdf=cdfpackagename),pmIndex)
  rm(newAB)
  iNA     <- which(is.na(subIndex))

  if(length(iNA)>0){
   ipos<-grep(probepackagename,search())
   assign(probepackagename,probe.env.orig[-iNA,],pos=ipos)
  }
}

At 08:41 AM 9/2/2010, Jack Luo wrote:
>Jenny,
>
>Thanks for your email. The reason you provided makes perfect sense. 
>However, I've looked into this and found that it may not be the root 
>cause of the error. I tried just masking 25 probes and 1 probe set 
>(U133A array) as below:
> >maskedprobes
>  [1] 
> "222152_at1"    "222152_at2"    "222152_at3"    "222152_at4" 
> "222152_at5"    "222152_at6"
>  [7] 
> "222152_at7"    "222152_at8"    "222152_at9"    "222152_at10" 
> "222152_at11"   "204659_s_at1"
>[13] 
>"204659_s_at2"  "204659_s_at3"  "204659_s_at4"  "204659_s_at5" 
>"204659_s_at6"  "204659_s_at7"
>[19] "204659_s_at8"  "204659_s_at9"  "204659_s_at10" "204659_s_at11" 
>"210893_at1"    "210893_at2"
>[25] "210893_at3"
>
> > maskedprobeSets
>[1] "222152_at"
>
>Presumably it should run into error since all the probes within the 
>probe set have been taken out.
>
>However, the code still runs through and the dimension works out correctly.
>A <- ReadAffy()
>B <- justMAS(A,tgt = 500,scale = TRUE)
> > dim(pm(A))
>[1] 247940     54     # the dimension without masking is 247965, 
>which is equal to 247940 + 25
> > dim(B)
>Features  Samples
>    22281       54      # the dimension without masking is 22283, 
> which is equal to 22281 + 2 (the codes knows that all the probes in 
> 204659_s_at have been taken out, so the total # of masked probe sets is 2).
>
>In addition, for one of the dataset which the code does not run into 
>problem, I found that there are ~ 4000 probe sets having this kind 
>of problem: all the probes within the probe sets have been taken 
>out. Yet it still runs through and the dimension works out correctly.
>
>Sorry for bothering you so much on this, I really appreciate it.
>
>Best,
>-Jack
>
>
>
>
>
>
>
>
>
>On Wed, Sep 1, 2010 at 4:57 PM, Jenny Drnevich

><<mailto:drnevich@...>drnevich@...> wrote:
>Hi Jack,
>
>There shouldn't be a limit on the number of probes/probesets that 
>you can remove. However, based on the error message, my guess is 
>that there's some redundancy between your list of probe sets to 
>remove and probes to remove. Probably you have all the probes of a 
>probe set in your probe list, and that probe set in your probe set 
>list as well. The code first removes all the probes, so when it goes 
>to remove that probe set, it won't find anything and could throw an 
>error. I don't know for sure that this is your problem, but I've 
>seen it before.
>
>Jenny
>
>
>At 02:54 PM 9/1/2010, Jack Luo wrote:
>>Hi Jenny,
>>
>>I run the code through 3 different datasets (each consists of 50-60 
>>U133a cel files), the first 2 datasets run pretty smoothly, the 
>>last one has the error msg:
>>Error in ans[[i]][, i.probes] : incorrect number of dimensions
>>
>>Not sure what's causing the problem, but the last one has the 
>>largest number of masked probes/probesets: ~179k masked probes (out 
>>of 247965 in total) and ~15k masked probe sets (out of 22283 in 
>>total). Does the code have a limit on the number of masked 
>>probe/probesets? or there are certain probes/probe sets that can 
>>not be filtered?
>>
>>Thanks a lot,
>>
>>-Jack
>>
>>
>>On Tue, Aug 31, 2010 at 5:57 PM, Jenny Drnevich

>><<mailto:drnevich@...>drnevich@...> wrote:
>>Hi Jack,
>>It's best to also post your question to the BioC list. First of 
>>all, you link was to the September 2006 code, but there was an 
>>update to the code in September 2008 (I think it still should
work):
>>
>><https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296.html>https://stat.ethz.ch/pipermail/bioconductor/2008-September/024296.html 
>>
>>
>>Did you see this more recent post on the BioC list about removing 
>>individual probes? The individual probe names have to be in the 
>>correct format, and your's
aren't:
>><https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html>https://stat.ethz.ch/pipermail/bioconductor/2010-January/031463.html 
>>
>>
>>HTH,
>>Jenny
>>
>>At 11:46 AM 8/31/2010, you wrote:
>>>Hi Jenny,
>>>I am a BioC user and find the following
link:
>>><https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html>https://stat.ethz.ch/pipermail/bioconductor/2006-September/014242.html 
>>>
>>>which is related to removing probes/probe sets from the cdf. It's 
>>>highly related to my work, which needs to perform normalization 
>>>after filtering out some probe pairs whose PM is not significantly 
>>>higher than MM.
>>>I played around with the code in the above link by following the 
>>>instruction, it works perfectly when some given probe sets need to 
>>>be taken out, but have some problem when some probes (not whole 
>>>probe sets) need to be taken out. I am wondering do you have an 
>>>example for taking some probes as well as probe sets? I have not 
>>>figured out what's the correct input for probes.
>>>I tried the following inputs and neither of them works.
>>>maskedprobeSets = rownames(exprs(justRMA(filenames = 
>>>celfiles)))[1:100] # this format works
>>>maskedprobes = rownames(pm(AffyBatch))[1:500] # doesn't work
>>>maskedprobes = as.character(1:500) # doesn't work
>>>RemoveProbes(listOutProbes=maskedprobes, 
>>>listOutProbeSets=maskedprobeSets, cleancdf)
>>>
>>>Thanks a lot! Your help will be highly appreciated.
>>>-Jack

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane