19 Aug 2012 22:21
Re: Question about interpretation of CHARM results
(a bit late, but better late than never) Dr. Coombes is right, of course, and I ought to have mentioned this earlier. It is a thorny issue, especially with epidemiological studies. Cancer, less so. For differences that are real, usually they are significant on the beta value scale *and* the logistic scale. Changes that are only significant on the logit scale are often artifacts. The logistic scale is much more sensitive to technical artifacts, even if you damp the extremes by using a huge offset. There are intermediate transformations (probit, arcsin(sqrt(x)), etc.) which are intermediate between proportional and logistic, but they haven't found much traction. There are a couple of things that do help and can be used to filter an unbiased fashion. I'm hoping to get them submitted in the very near future. I do not know if they will be as useful for CHARM data as they seem to be for Illumina arrays, eRRBS, and WGSBS sequence data, but I guess we shall find out. regarding TCGA data: TCGA methylation data is background corrected and dye bias equalized (for the 450k samples, at least, and as batches are updated, 27k as well) but no batch correction is done for the level 3 data. In the case of multi-batch tumors it is a good idea to run ComBat or (if you must) SVA to compensate. It's run through normalizeMethyLumiSet(methylumi.bgcorr(the.data)) for level 2 and 3 data, and the IDAT files are provided as level 1, so anyone who wants to reproduce things from scratch with a different preprocessing strategy is welcome to do so. Methylumi and minfi happen to have the same IDAT parsing code these days, and I would not be surprised if they eventually merged. I wouldn't bother using anything else, especially for large volumes of samples. switching from 0.1% methylated to 99.9% methylated is probably a real effect. Switching from 1% to 3% across the board is probably technical artifacts. You will see this all the time on 450k data that hasn't been (ahem) properly background corrected, and to a significant degree in 27k data as well. It's not limited to TCGA; I've seen plenty of data from other centers that benefited significantly from being re-processed sensibly. In any event, I pushed for, and got, changes to policy so that Illumina methylation data for TCGA is provided as raw IDAT files, and all of the code used for processing is available either from the Bioconductor package repository or from GitHub (in the case of the packaging pipeline itself). It's not perfect but at least it is 100% transparent. one last thing: The Illumina annotations have been updated to a FeatureDb (one for hg19, and another available soon for hg18) which I would like to propose as the standard for these arrays, as they cover both 27k and 450k features (along with some information that very few people seem to know about each). I think it's as fast as what minfi uses and as comprehensive as the .db0 packages, while less confusing than either. So anyone who wants to, please try them. On Fri, Aug 17, 2012 at 1:05 PM, Kevin R. Coombes <kevin.r.coombes@... > wrote: > I understand all the statistical reasons for converting from methylation > "beta values" to something logistic, and am frequently tempted to do this > myself. > > But I think in the context of methylation that this advice should come > with a warning: changes in levels near 0 and 1 may have a lot of leverage > on the final results. For example, we have done analyses on some of the > TCGA data where we find "statistically significant differences in > methylation between normal and tumor" where the mean beta values are 0.03 > and 0.08. I find it hard to believe that this level of change in > methylation has any kind of biological meaning. In fact, I'm not even > convinced that we can accurately measure this amount of change using the > technology that TCGA is using (although I might well believe that such a > change could result from batch effects, whether in the assay or in the data > processing). > > I don't have any magic solution to fix this issue; it is intrinsic in the > shape of the logistic curve. One might want to explore shrinking the beta > values toward 0.5 (i.e., away from 0 and 1), but I can't offer any concrete > advice on how well this might work in practice. > > Best, > Kevin > > > On 8/17/2012 12:36 PM, Tim Triche, Jr. wrote: > > The reason to switch from a proportion (%, beta-value, whichever; anything > measuring M / (M+U) where M and U are surrogates for methylated and > unmethylated cytosines) to a fold-change (logit(proportion.methylated) or > log2(M/U)) is that the latter is far more amenable to linear models, and > roughly parallels the expected behavior in terms of expression changes on a > log2 or log-fold-change scale. > > Furthermore, the range for logit(M/U) is -Infinity to +Infinity, which is > appropriate when you are modeling something as having Gaussian error. > Something with a range of 0 to 1 is neither homoskedastic (which is to > say, such a 0-1 measurement will have a variance that depends on the mean) > nor unbounded (this turns out to be an issue when computing maximum > likelihood estimates, for example, as values close to the boundary will > cause problems). > > In any event, logit(% methylation) is equivalent to log(M/U) which is where > I veered off course this morning. My brain seems to have been a bit slow. > > > On Fri, Aug 17, 2012 at 9:26 AM, zeynep özkeserli <zeynep.ozkeserli <at> gmail.com> wrote: > > > Dear Tim, > > Thank you for your answer. But to my understanding, if I could get this > answer by undoing the logit function (I tought you were doing this), we > should use inverse logit function. Which is exp(x)/(1+exp(x)) > > And in my case it gives: > > > exp(-0.30427)/(1+exp(-0.30427)) > > [1] 0.424514 > > Ok, this seems reasonable. And it makes sense how you put this into words. > But if we could use this one as a methylation measure, why would the > creators make things more complicated and convert the value to a logit > value? So, again, to my understanding, I shall learn how to interpret the > diff thing. > > Thank you again, > > Best :) > > Zeynep > > On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr. <tim.triche@...> <tim.triche@...>wrote: > > > Perhaps "on average this region has an > > R> 1 - exp(-0.347) > [1] 0.2931947 > > approximately 29.3% relative decrease in cytosine methylation after > treatment?" > > > > On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli <zeynep.ozkeserli <at> gmail.com> wrote: > > > Dear All, Dear Dr. Aryee and Dr. Carvalho, > > I have a question on interpreting the results of dmrFinder function. > > We have performed a CHARM analysis on the data we got from NimbleGen > Promoter Medip Arrays. The data is obtained from each patient before and > after treatment. And after performing CHARM analysis, we got some > differentially methylated regions (DMRs). > > As the samples are before and after treatment results of the same > patient, > the samples are treated as paired samples. > > My question is about interpretation of the results: > > After running this: > > dmr1_2 <- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"), > cutoff=0.995,paired=TRUE,pairs=pairs) > > to: before treatment > ts: after treatment > > - For example I have found a DMR like this (I summerized the result for > my > question): > > chr 8, diff= -0.30427 and maxdiff=0.47935 > > As the diff value is calculated like this: average l (logit(percentage) > methylation if l=NULL) difference within the DMR if paired=TRUE > > Is it true to say that: "The region has 0.30427 times the risk of being > methylated in samples of after treatment compared to samples of before > treatment." > > I know that it does not look meaningful to use the word "risk" when > talking > about something like that but I can not find a better way to say it > truely. Is it possible to express it like a "0.30427 fold difference in > methylation"? And also am I interpreting the "-" sign truely? > > Thank you for your help in advance, > > Best Regards, > > Zeynep > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing listBioconductor@...://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> > > > > _______________________________________________ > Bioconductor mailing listBioconductor@...://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf> [[alternative HTML version deleted]]
_______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
RSS Feed