25 Apr 2012 20:25
Re: Motif search -- access to JASPAR, MotIV package, more TF-PWM relationships?
Regarding S. cerevisiae, there is a great resource, YEASTRACT database ( http://www.yeastract.com/). It compiles the motifs already identified, documented regulation based on wet-lab research and potential regulation based on comparison of motifs against the promoter region of genes. It is reviewed and up-dated regularly, being added new motifs and new documented regulation. pj No dia 25 de Abril de 2012 15:12, Steve Lianoglou < mailinglist.honeypot@...> escreveu: > Hi, > > To carry on the MEME stuff, a biostar post just pointed me to an > updated scoring metric in tomtom which is made available in the latest > MEME software suite: > > http://bioinformatics.oxfordjournals.org/content/27/12/1603.full > > Perhaps wrapping parts of the MEME suite into an R library would be > useful, no? > > You might find the FIRE (and FIRE-pro) suite of tools also useful for > motif discovery, as welll: > > http://physiology.med.cornell.edu/faculty/elemento/lab/software.shtml > > Related to that, S. Tavazoie gave a talk at the recent CSHL/sysbio > meeting and presented TEISER, which seems pretty cool if you're > looking for structural motifs: > > https://tavazoielab.c2b2.columbia.edu/TEISER/ > > -steve > > On Wed, Apr 25, 2012 at 9:44 AM, Zhu, Lihua (Julie) > <Julie.Zhu@...> wrote: > > Paul, > > > > Thanks for the positive feedback on FlyFactorSurvey! The motifs in this > > database are generated using the bacterial one-hybrid method (B1H and > > B1H-seq). All the public motifs can be downloaded freely. It would be > useful > > to have a Bioc data package, containing curated and current motifs from > all > > organisms if available, that interfaces with MotiV. > > > > MEME works very well in finding motifs from B1H-seq data (Christensen et > > al.,Nucleic Acid Research 2011, Vol39, No.12 e83), although only limited > > motif discovery tools were compared in the paper. Currently, we are > working > > on whether motif discovery can be improved with B1H-seq data. > > > > As I understand, MEME is for de nova motif discovery, TOMTOM and STAMP > are > > for testing whether the motif returned by a motif finder is significantly > > similar to a known motif, clover is for searching known motifs in a given > > set of sequences. We are thinking of adding clover to our website. > > > > I am looking forward to your collated survey results. > > > > Best regards, > > > > Julie > > > > > > On 4/24/12 11:02 PM, "Paul Shannon" <pshannon@...> wrote: > > > >> Hi Julie, > >> > >> FlyFactorSurvey looks great. Would that we had such a resource > (curated, > >> current, and growing) for all organisms! > >> > >> A few questions, if I may: > >> > >> 1) What role with respect to FlyFactorSurvey do you picture us taking > here > >> at BioC? How can we help? > >> > >> 2) Your website (http://pgfe.umassmed.edu/TFDBS) recommends meme and > TOMTOM > >> for motif comparison. Do you use them yourself? If so, can you tell > us about > >> their strengths and weaknesses? How do they compare to clover? > >> (http://zlab.bu.edu/clover/) > >> > >> In that same spirit -- trying to find out more about this topic -- here > are > >> some more questions: > >> > >> 3) The JASPAR database seems to be mostly unchanged since 2009. > >> (http://jaspar.genereg.net/html/DOWNLOAD). Does anyone know their > update > >> policy? > >> > >> 4) Is TRANSFAC only for license holders? > >> > >> 5) Are there any other organism-specific gems like FlyFactorSurvey to > be > >> discovered out on the web? > >> > >> Thanks! > >> > >> - Paul > >> > >> On Apr 24, 2012, at 3:16 PM, Zhu, Lihua (Julie) wrote: > >> > >>> Paul, > >>> > >>> Thanks so much for the comprehensive summary of existing capability of > Bioc > >>> and other resources for motif discovery and matching! > >>> > >>> Here is my response to your great initiative to collect use cases and > open > >>> data resources. > >>> > >>> Here is an open data source for Drosophila which we developed: > >>> http://pgfe.umassmed.edu/TFDBS/ > >>> http://nar.oxfordjournals.org/content/early/2010/11/19/nar.gkq858.full > >>> > >>> As you pointed out, there are several excellent Bioconductor packages > >>> available for the two common cases of motif problems, i.e., de nova > motif > >>> discovery and motif matching to known motifs. It would be useful to > have > >>> more motif databases available for motif comparison program such as > MotIV. > >>> In addition, we use clover to search for known motifs in a given set of > >>> sequences. > >>> > >>> Many thanks for sharing your insights! > >>> > >>> Best regards, > >>> > >>> Julie > >>> > >>> > >>> On 4/24/12 3:02 PM, "Paul Shannon" <pshannon@...> wrote: > >>> > >>>> The recent flurry of interest in sequence motifs here on the bioc list > >>>> suggests to us that maybe we at Bioconductor could strengthen our > >>>> infrastructure for this kind of work. If this work interests you -- > either > >>>> as > >>>> a package creator, or as a package user -- please suggest ideas or use > >>>> cases. > >>>> What do you need? I will collect and collate the responses. We > hope to > >>>> identify places where Bioc can help out. > >>>> > >>>> For background: we already have a number of packages (rGADEM, MotIV, > cosmo, > >>>> BCRANK, motifRG) which address, with different strengths, what I > believe to > >>>> be > >>>> the two aspects of the motif problem: > >>>> > >>>> 1) Detecting enriched motifs in DNA sequence, or in ChIP-seq data > (rGADEM, > >>>> cosmo, motifRG, BCRANK) > >>>> 2) Predicting the sequence motifs which bind to these enriched > motifs, and > >>>> what binding molecules they belong to (MotIV) > >>>> > >>>> In the past, a lot of sequence motif/binding work has addressed the > search > >>>> for > >>>> transcription factor binding sites and their cognate transcription > factors. > >>>> miRNAs, phorphorylation and methylation all pose related problems. > Is there > >>>> support which we can practically offer here as well? > >>>> > >>>> In addition to Bioc packages, there are of course many worthwhile > websites > >>>> and > >>>> external tools: JASPAR, meme, STAMP (and TRANSFAC, for those with a > >>>> license). > >>>> Nooshin mentioned the arabidopsis-specific 'AthaMap' > >>>> (http://www.athamap.de). > >>>> Are there other open-source data repositories like this for other > organisms? > >>>> c.elegans, as Julie requested? > >>>> > >>>> Questions, suggestions, use cases and data sources are all welcome. > >>>> > >>>> Thanks! > >>>> > >>>> - Paul > >>>> > >>>> > >>>> > >>>> > >>>> On Apr 24, 2012, at 10:47 AM, Zhu, Lihua (Julie) wrote: > >>>> > >>>>> Eloi, > >>>>> > >>>>> I would like to use MotIV for a c.elegans dataset. What data source > would > >>>>> you recommend for matchMotif? Many thanks for your help! > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Julie > >>>>> > >>>>> > >>>>> On 4/24/12 1:28 PM, "Mercier Eloi" <emercier@...> wrote: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> I am one of the developer of MotIV. I will be happy to help you if > you > >>>>>> have any question regarding the package. > >>>>>> > >>>>>> First, I want to mention that in the Plos One paper, we used PICS, > >>>>>> rGADEM and MotIV as a pipeline but MotIV can be use as a stand > alone. > >>>>>> Some of the advanced functions won't be available though. > >>>>>> > >>>>>> Since the PWMs in MotIV correspond to human TF, you may have to use > your > >>>>>> own list of PWMs. What MotIV needs is a simple list of matrices > >>>>>> (head(jaspar) to view the format). > >>>>>> Jaspar's PWMs can be easily downloaded but it seems it only > contains ~20 > >>>>>> motifs. On the other hand, AthaMap has more motifs but I did not > manage > >>>>>> to find an easy way to get them. Another place to look at is the > AGRIS > >>>>>> website (http://arabidopsis.med.ohio-state.edu/downloads.html). > >>>>>> > >>>>>> If you're only interested by the identification of the motifs and > do not > >>>>>> want to do further analysis with R, I recommend you to look at > >>>>>> http://www.benoslab.pitt.edu/stamp for the identification of your > motifs. > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Eloi Mercier > >>>>>> > >>>>>> > >>>>>> On 12-04-24 07:36 AM, nooshin wrote: > >>>>>>> Thanks a lot for your suggestion. I will for sure have a look and > inform > >>>>>>> you. > >>>>>>> Bests, > >>>>>>> Nooshin > >>>>>>> > >>>>>>> > >>>>>>> On 04/24/2012 04:15 PM, Tim Triche, Jr. wrote: > >>>>>>>> Ah, I see. GSL is a useful library to have installed regardless. > >>>>>>>> Hope things work out. I found your exchanges with Paul to be > useful > >>>>>>>> reading, but obviously I was not reading closely enough, since > Paul > >>>>>>>> started off his code sample with biocLite('MotIV'). Oops :-o > >>>>>>>> > >>>>>>>> Here is a paper that I found interesting, which does go into some > >>>>>>>> detail towards a "bulk" approach, from Gottardo's group: > >>>>>>>> > >>>>>>>> > http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.00164 > >>>>>>>> 32 > >>>>>> > >>>>>>>> Perhaps it will be useful to you as well, would be curious to > hear if > >>>>>>>> so. > >>>>>>>> > >>>>>>>> --t > >>>>>>>> > >>>>>>>> On Tue, Apr 24, 2012 at 7:00 AM, nooshin<n_omranian@... > >>>>>>>> <mailto:n_omranian@...>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks, it's been already solved, it needs GSL package, which > is a > >>>>>>>> bit problematic, but I solved it already. > >>>>>>>> > >>>>>>>> But it does include only 5 matrices (in the webpage) for > >>>>>>>> arabidopsis and in the package also! > >>>>>>>> I'm downloading manually from AthaMap! > >>>>>>>> > >>>>>>>> Thanks again and keep waiting for 'bulk' approach. > >>>>>>>> > >>>>>>>> Bests, > >>>>>>>> Nooshin > >>>>>>>> > >>>>>>>> > >>>>>>>> On 04/24/2012 03:16 PM, Tim Triche, Jr. wrote: > >>>>>>>>> source("http://bioconductor.org/biocLite.R") > >>>>>>>>> biocLite("MotIV") > >>>>>>>>> > >>>>>>>>> ought to do the trick for you > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, Apr 24, 2012 at 1:01 AM, nooshin<n_omranian@... > >>>>>>>>> <mailto:n_omranian@...>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Hi Paul, > >>>>>>>>> > >>>>>>>>> Thanks a lot. > >>>>>>>>> I forgot to include bioc, since I only replied to you (no > to > >>>>>>>>> all). > >>>>>>>>> > >>>>>>>>> I can"t install MotIV package to check. I checked in > google but > >>>>>>>>> I > >>>>>>>>> couldn't find any solution! Do you have any suggestion for > >>>>>>>>> installing > >>>>>>>>> this package? > >>>>>>>>> > >>>>>>>>> Bests, > >>>>>>>>> Nooshin > >>>>>>>>> > >>>>>>>>> On 04/23/2012 06:35 PM, Paul Shannon wrote: > >>>>>>>>>> (redirecting this back to the Bioc list...) > >>>>>>>>>> > >>>>>>>>>> Hi Nooshin, > >>>>>>>>>> > >>>>>>>>>> The 'bulk' approach is not quite so ready as I predicted. > >>>>>>>>> I might have something by the end of the week. > >>>>>>>>>> > >>>>>>>>>> As for mapping between PWMs and TFs, I have most often done > >>>>>>>>> this with 'tom-tom' from the meme website. > >>>>>>>>>> > >>>>>>>>>> But I just discovered what looks like a good -- maybe > >>>>>>>>> better -- approach: the Bioconductor MotIV package, which > >>>>>>>>> includes a 2010 version of jasper. > >>>>>>>>>> Try this: > >>>>>>>>>> > >>>>>>>>>> source("http://bioconductor.org/biocLite.R") > >>>>>>>>>> > >>>>>>>>>> biocLite ('MotIV') > >>>>>>>>>> library (MotIV); > >>>>>>>>>> browseVignettes ('MotIV') > >>>>>>>>>> > >>>>>>>>>> The jaspar data in this package has 130 TF-PWM mappings, > >>>>>>>>> which appear to be human. More must be known, and > publicly > >>>>>>>>> available. The JASPAR website has a 'JASPAR CORE Plantae' > >>>>>>>>> data set that > >>>>>>>>>> - is probably what you are interested in > >>>>>>>>>> - might be downloadable, and convertible to the form > >>>>>>>>> MotIV wants. > >>>>>>>>>> > >>>>>>>>>> Perhaps other readers of the list have other suggestions. > >>>>>>>>>> > >>>>>>>>>> If you have any questions on this, please include 'BioC' in > >>>>>>>>> your reply, so that we can all get better at this! > >>>>>>>>>> > >>>>>>>>>> - Paul > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Apr 23, 2012, at 6:53 AM, nooshin wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Paul, > >>>>>>>>>>> > >>>>>>>>>>> Many thanks for your comprehensive information and code! > >>>>>>>>>>> I have a question regarding to extract of PWMs. How and > >>>>>>>>> where I can download these matrices for all TFs that PWM > is > >>>>>>>>> available for them? I need it only for Arabidopsis > thaliana. > >>>>>>>>>>> Is there any package in R which I can give the TF and > >>>>>>>>> receive the PWM for it? Or any online database which I can > >>>>>>>>> download from it? I have a big problem since Friday to > find > >>>>>>>>> out these matrices for different TFs of A.th. That would > be > >>>>>>>>> so great if you can help me to get these matrices. > >>>>>>>>>>> > >>>>>>>>>>>> If you want to do this in bulk, Herve' has some lovely > >>>>>>>>> code to make that efficient. > >>>>>>>>>>> Also can I have this? :) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thanks a lot in advance. > >>>>>>>>>>> Best regards, > >>>>>>>>>>> Nooshin > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> *TODAY*/(Beta) /*.*Powered by Yahoo! > >>>>>>>>> > >>>>>>>>> Armored catfish wreak havoc in U.S. South > >>>>>>>>> > >>>>>>>>> < > http://news.yahoo.com/blogs/sideshow/armored-catfish-wreaking-havoc-so > >>>>>>>>> ut > >>>>>>>>> h- > >>>>>>>>> > florida-lakes-182812663.html;_ylc=X3oDMTFia2oyNjZoBF9TAzk1NDAxMDAyNwRwa > >>>>>>>>> 2c > >>>>>>>>> Da > >>>>>>>>> WQtMjIzODM5NARzeWlkA2RfZWNoMGQ4MGQ-#more-4190> > >>>>>>>>> > >>>>>>>>> Privacy Policy > >>>>>>>>> < > http://info.yahoo.com/privacy/us/yahoo/webbeacons/details.html> > >>>>>>>>> > >>>>>>>>> [[alternative HTML version deleted]] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Bioconductor mailing list > >>>>>>>>> Bioconductor@...<mailto: > Bioconductor@...> > >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>>>>>> Search the archives: > >>>>>>>>> > >>>>>>>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> /A model is a lie that helps you see the truth./ > >>>>>>>>> / > >>>>>>>>> / > >>>>>>>>> Howard Skipper > >>>>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf > > > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> /A model is a lie that helps you see the truth./ > >>>>>>>> / > >>>>>>>> / > >>>>>>>> Howard Skipper > >>>>>>>> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > >>>>>>>> > >>>>>>> > >>>>>>> [[alternative HTML version deleted]] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioconductor mailing list > >>>>>>> Bioconductor@... > >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>>>> Search the archives: > >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioconductor mailing list > >>>>> Bioconductor@... > >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>> Search the archives: > >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>> > >>> > >> > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@... > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@... > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed