15 May 20:47
Re: GSEABase how to map gene symbols to mouse EntrezId or Affy
From: Martin Morgan <mtmorgan@...>
Subject: Re: GSEABase how to map gene symbols to mouse EntrezId or Affy
Newsgroups: gmane.science.biology.informatics.conductor
Date: 2008-05-15 18:47:40 GMT
Subject: Re: GSEABase how to map gene symbols to mouse EntrezId or Affy
Newsgroups: gmane.science.biology.informatics.conductor
Date: 2008-05-15 18:47:40 GMT
"Vladimir Morozov" <vmorozov@...> writes: > Martin, > > You are right that disagreement beween human and mouse symblos is the > problem. But you still should get some mapping if translate symbols into > capwords >> sum(!is.na(mget(gss[[1]]@geneIds,org.Mm.egSYMBOL2EG,ifnotfound=NA))) > [1] 0 Always use accessors, geneIds(gss[[1]]), ... > sum(!is.na(mget(capwords(tolower(gss[[1]]@geneIds)),org.Mm.egSYMBOL2EG,i > fnotfound=NA))) > [1] 46 be nice to your helpers with complete examples, I guess capwords is > capwords <- function(x) sub("^([a-z])", "\\U\\1", x, perl=TRUE) then > cids <- capwords(tolower(geneIds(gss[[1]]))) > egids <- mget(cids, org.Mm.egSYMBOL2EG, ifnotfound=NA) > egids <- egids[!is.na(egids)] > Let's say I will figure out some mapping using ortholog or alias names. > Will I screw the GeneSet data structure by > gss2 <- lapply(gss,function(x){x <at> geneIds <- > my.mapping(x <at> geneIds);x <at> geneIdType <at> type <- 'EntrezIdentifier'}) More on this below... mapIdentifiers provides a convenient side door in the form of > showMethods('mapIdentifiers', class='environment') Function: mapIdentifiers (package GSEABase) what="GeneColorSet", to="GeneIdentifierType", from="environment" what="GeneSet", to="GeneIdentifierType", from="environment" which is to say that if you have a custom mapping you can represent it as an environment with keys equal to the identifiers you're mapping from and values the identifiers you're mapping to, e.g., > names(egids) <- toupper(names(egids)) > env <- l2e(egids) > mapIdentifiers(gss[[1]], EntrezIdentifier(), env) probably you want to inject information about the identifiers you are mapping to, e.g., that they are mouse, using as the second argument EntrezIdentifier('org.Mm.eg.db') There doesn't seem to be a method defined for gene set collections (an oversight), but you can > GeneSetCollection(lapply(gss, mapIdentifiers, EntrezIdentifier(), env)) back to... > gss2 <- lapply(gss,function(x){x <at> geneIds <- > my.mapping(x <at> geneIds);x <at> geneIdType <at> type <- 'EntrezIdentifier'}) There are a bunch of ways through this, but I would avoid using direct slot access. One possibility would be > my.mapping <- force > gss2 <- GeneSetCollection(lapply(gss, function(x) { > GeneSet(EntrezIdentifier('org.Mm.eg.db'), > geneIds=my.mapping(geneIds(x)), > setName=setName(x)) > })) Martin > ? > > > > Vladimir Morozov > > > > -----Original Message----- > From: Martin Morgan [mailto:mtmorgan@...] > Sent: Thursday, May 15, 2008 12:56 PM > To: Vladimir Morozov > Cc: bioconductor@... > Subject: Re: [BioC] GSEABase how to map gene symbols to mouse EntrezId > or Affy > > Hi Vladimir -- > > "Vladimir Morozov" <vmorozov@...> writes: > >> Hi >> >> Any suggestions how to map gene symbols to mouse EntrezId(preffered) >> or Affy. >> mapping to Entez apparently is not supported by GSEABase >>> mapIdentifiers(gss,EntrezIdentifier()) >> Error in .mapIdentifiers_isMappable(from, to) : >> unable to map from 'Symbol' to 'EntrezId' >> neither GeneIdentifierType has annotation > > mapIdentifiers needs to know where to look for the map. I guess the way > you created gss means that it doesn't know about the organism you're > using, and EntrezIdentifier() also doesn't. What you want is > >> mapIdentifiers(gss, EntrezIdentifier("org.Mm.eg.db")) > GeneSetCollection > names: chr5q23, chr16q24 (2 total) > unique identifiers: (0 total) > types in collection: > geneIdType: EntrezIdentifier (1 total) > collectionType: BroadCollection (1 total) > > Here I'm using (and I guess you are too) the gss that comes from > example(getBroadSets). These are human genes, and have no corresponding > mouse equivalents (see below)... > >> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., >> verbose = verbose)) : >> error in evaluating the argument 'object' in selecting a method for >> function 'GeneSetCollection' >> >> >> Mapping to Affys works for human, but not for mouse >>> mapIdentifiers(gss, AnnotationIdentifier("hgu95av2.db")) >> GeneSetCollection >> names: chr5q23, chr16q24 (2 total) >> unique identifiers: 35089_at, 35090_g_at, ..., 35807_at (79 total) >> types in collection: >> geneIdType: AnnotationIdentifier (1 total) >> collectionType: BroadCollection (1 total) >>> mapIdentifiers(gss, AnnotationIdentifier("mouse4302.db")) >> GeneSetCollection >> names: chr5q23, chr16q24 (2 total) >> unique identifiers: (0 total) >> types in collection: >> geneIdType: AnnotationIdentifier (1 total) >> collectionType: BroadCollection (1 total) > > This is becaus the identifiers are not in mouse > >> ids <- unique(unlist(geneIds(gss))) >> egs <- mget(ids, revmap(mouse4302ENTREZID), ifnotfound=NA) >> sum(!sapply(egs, is.na)) > [1] 0 > >>> >> >> >> Thanks >> >> >> Vladimir Morozov >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@... >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center 1100 > Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M2 B169 > Phone: (206) 667-2793 > -- -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793 _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed