Pearson Correlations Between Two Tidy Dataframes in R
Weighted graphs are basically just correlation matrices. The simplest correlation measure is the Pearson correlation which corresponds to a linear regression coefficient between two continuous, z-scored variables. So computing a weighted graph representation of a dataset can be done by calcuting Pearson correlations for all pairs of variables. The WGCNA
package made this fast but getting tidy results makes the post-processing much easier.
learn_pearsons <- function(x, y) {
assert_that(is.data.frame(x), is.data.frame(y))
assert_that(nrow(x) > 0, ncol(x) > 0)
assert_that(nrow(y) > 0, ncol(y) > 0)
# WGCNA::corAndPvalue returns a non-tidy object.
tidy_corAndPvalue <- function(object) {
correlations <- object %>%
chuck("cor") %>%
tibble::as_tibble(rownames = "from") %>%
pivot_longer(cols = -from, names_to = "to", values_to = "correlation")
p_values <- object %>%
chuck("p") %>%
tibble::as_tibble(rownames = "from") %>%
pivot_longer(cols = -from, names_to = "to", values_to = "p.value")
parameters <- left_join(correlations, p_values, by = c("from", "to"))
return(parameters)
}
# WGCNA::corAndPvalue is very fast.
parameters <- WGCNA::corAndPvalue(
x = as.matrix(x),
y = as.matrix(y),
alternative = "two.sided",
use = "all.obs",
method = "pearson"
) %>%
tidy_corAndPvalue()
return(parameters)
}