Skip to main content eteppo

Pearson Correlations Between Two Tidy Dataframes in R

Published: 2023-08-04
Updated: 2023-08-04

Weighted graphs are basically just correlation matrices. The simplest correlation measure is the Pearson correlation which corresponds to a linear regression coefficient between two continuous, z-scored variables. So computing a weighted graph representation of a dataset can be done by calcuting Pearson correlations for all pairs of variables. The WGCNA package made this fast but getting tidy results makes the post-processing much easier.

learn_pearsons <- function(x, y) {
  
  assert_that(is.data.frame(x), is.data.frame(y))
  assert_that(nrow(x) > 0, ncol(x) > 0)
  assert_that(nrow(y) > 0, ncol(y) > 0)

  # WGCNA::corAndPvalue returns a non-tidy object.
  tidy_corAndPvalue <- function(object) {
    correlations <- object %>%
      chuck("cor") %>%
      tibble::as_tibble(rownames = "from") %>%
      pivot_longer(cols = -from, names_to = "to", values_to = "correlation")
    p_values <- object %>%
      chuck("p") %>%
      tibble::as_tibble(rownames = "from") %>%
      pivot_longer(cols = -from, names_to = "to", values_to = "p.value")
    parameters <- left_join(correlations, p_values, by = c("from", "to"))
    return(parameters)
  }
  
  # WGCNA::corAndPvalue is very fast.
  parameters <- WGCNA::corAndPvalue(
      x = as.matrix(x), 
      y = as.matrix(y), 
      alternative = "two.sided", 
      use = "all.obs",
      method = "pearson"
    ) %>%
    tidy_corAndPvalue()
  
  return(parameters)
  
}