8 May 03:22 2013

## Re: edgeR: finding tissue specific genes [was: Design matrix and BCV]

Got it! It works well, and I think I finally have what I wanted..! Thanks a lot. Manoj. Manoj Hariharan Staff Researcher The Salk Institute for Biological Studies La Jolla, CA 92037 Office: 858.453.4100 x2143 ________________________________ From: Gordon K Smyth <smyth@...> Cc: Bioconductor mailing list <bioconductor@...> Sent: Tuesday, May 7, 2013 5:26 PM Subject: Re: edgeR: finding tissue specific genes [was: Design matrix and BCV] Sorry, first command shoud have been contrasts(tiss_groups) <- contr.sum(levels(tiss_groups)) Your linear model can be parametrized in terms of any set of 18 coefficients. This command says that you want the effects to "sum" to zero, in other words the effects should be relative to the grand mean. Best wishes Gordon On Tue, 7 May 2013, Manoj Hariharan wrote: > Hi Gordon, Actually, I had never used the glmQRT() - I've always been using the glmQLFTest(). And, as you had suggested, when I do the contrasts(tiss_groups) <- contr.sum(tiss_groups) I get the following error: > contrasts(tiss_groups) <- contr.sum(tiss_groups) Error in `contrasts<-`(`*tmp*`, value = c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, : wrong number of contrast matrix rows I didn't really understand what the difference by using the contrasts(tiss_groups) <- contr.sum(tiss_groups) design <- model.matrix(~tiss_groups) rather than specifying design without the "contrasts(tiss_groups) <- contr.sum(tiss_groups)", as below: design <- model.matrix(~tiss_groups) I would still have the intercept and have the following for fit$design: attr(,"assign") [1] 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$tiss_groups [1] "contr.treatment" Thanks, Manoj. ________________________________ From: Gordon K Smyth <smyth@...> Cc: Bioconductor mailing list <bioconductor@...> Sent: Monday, May 6, 2013 11:39 PM Subject: edgeR: finding tissue specific genes [was: Design matrix and BCV] Dear Manoj, Why not simply find genes than are higher in one group than the average of the other groups? edgeR can do this sort of thing easily. Let's suppose suppose you going to using the quasi-lik approach of glmQFTest() rather than glmQRT(). First define a design matrix for which the intercept is the overall mean: contrasts(tiss_groups) <- contr.sum(tiss_groups) design <- model.matrix(~tiss_groups) Then estimate the trended dispersions: y <- estimateGLMCommonDisp(y, design) y <- estimateGLMTrendedDisp(y, design) Then fit the basic linear model: fit <- glmFit(y, design) Then you can extract all the lists you want. For example ql <- glmQLFTest(fit, coef=2) top1 <- topTags(ql) will give you genes specifically up or specifically down in tissue 1, as compared to the average of all the other groups. ql <- glmQLFTest(fit, coef=3) top2 <- topTags(ql) will give you genes specifically up/down in tissue 2, and so on up to ql <- glmQLFTest(fit, coef=18) top17 <- topTags(de) will give you genes specifically up/down in tissue 17. Finally, to get genes up/down in tissue 18: cont <- rep(-1,18) cont[1] <- 0 ql <- glmQLFTest(fit, contrast=cont) top18 <- topTags(de) What you propose doesn't quite make sense to me. If you want to put genes on the same scale (and you don't need to for the above analysis), would it not be better to use rpkm()? Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Mon, 6 May 2013, Manoj Hariharan wrote: > Thanks Gordon. I was wondering if I could have a quantitative value for the deviance of each group from the average, for each of the DE genes. I understand that the F value (from the F-statistic) is a measure of how far the gene is compared to the expression of all others across the samples. One option, I could think of is to just get the normalized counts for each of the sample, for the set of DE genes: de_lrt <- rownames(top_lrt[top_lrt$FDR<0.05,]) scale <- D$samples$lib.size*D$samples$norm.factors normCounts <- round(t(t(D$counts)/scale)*mean(scale)) write.table(log(normCounts[de_lrt[1:5690],]+1), "All37_NormCounts_DEGenes", sep="\t", quote=FALSE) Essentially, I am trying to get the list of genes that shows a more "tissue-specific" behaviour. Most genes are not expressed strictly in one particular tissue - there would be related tissues where its expression would be almost similar. So I would like to rank them based on their expression values and for that I need to have all comparable values. Then, I could consider those samples where the expression of the gene is [[elided Yahoo spam]] Thanks, Manoj. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:8}}

_______________________________________________ Bioconductor mailing list Bioconductor@... https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor