Paul [guest] | 30 Sep 08:57 2013

Computing large correlations in R

I have two list of lists A and B, A and B contain 100 data frames each and the dimension of each data frame is
15000 X 15000. I would like to find the correlation for the entire data frame in the following way: Consider
the first list in both lists and find cor (A,B) and get a single value correlating the entire dataframe.
Similarly consider the second list in both lists and find cor(A,B) and continue this for the 100 dataframes.

I tried the following:

      A # list of 100 dataframes
      B #list of 100 dataframes

      C<- A[1] # extract only the first list from A
      D<- B[1] # extract only the first list from B

      C<-unlist(C) ### unlist C
      D<-unlist(D) ## unlist D

Then computed

       Correlation<- cor(C,D) ## to obtain a single correlation coefficient to see how these two vectors are

But I end up with the error sayin 

      R cannot allocate a vector of size 3.9 GB

Is there a better way to do this in faster way which could be implemented to the entire list. I work on a server
which allows me to compute large values but it still shows up this error and the unlisting takes ages
because of the size of the dataframe.

 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C                 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.0.1

Sent via the guest posting facility at

Bioconductor mailing list
Search the archives: