priya [guest] | 23 Oct 2012 09:34
Favicon

clustering in R


I have a RMA normalized genes expression datset with 22810 rows and 9 columns( types of promoters) and a
subset of the data is as follows:

    ID_REF GSM362180    GSM362181  GSM362188    GSM362189  GSM362192
    244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647
    244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605
    244903 5.412329253 5.352970877 5.06250609  5.305709079 8.365082403
    244904 5.529220594 5.28134657  5.467445095 5.62968933  5.458388909
    244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246
    244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836

 -- output of sessionInfo(): 

I want to do a clustering of the above and tried the hierarchical clustering:

    d <- dist(as.matrix(deg), method = "euclidean")
where deg is the a matrix of the differentially expressed genes ( 4300 in number ).And I get the following warning:

      Warning message:
     In dist(as.matrix(deg), method = "euclidean") : NAs introduced by coercion

 Is it allright to proceed with the clustering inspite of the warning ?

    hc <- hclust(d)
    plot(hc, hang = -0.01, cex = 0.7)

I get a dendrogram which is very dense and the labels are not clear: Also I do not know which of the 9 promoters
are classified in the tree for the several genes: How would it be possible to label the tree with the
promoters and also how to visualize the genes into a clearer dendrogram? There are around 4300 genes and
would like to get a better dendrogram so that I could visualize it better.

--
Sent via the guest posting facility at bioconductor.org.

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane