Skip to main content eteppo

Renaming Dataframe Variables Using A Metadata File in R

Published: 2023-08-04
Updated: 2023-08-04

Often it’s best to have all information about variables in a separate metadata file. This might include some old variable names in the raw file you are using, and some new variable names you’d prefer to use in your analysis. So how to rename all variables based on this file? Here’s what I came up with.

rename_with <- function(data, variable_meta_file, from, to, separator = ",") {
  
  assert_that("data.frame" %in% class(data))
  assert_that(ncol(data) > 0)
  assert_that(is.character(variable_meta_file))
  assert_that(length(variable_meta_file) == 1)
  assert_that(file.exists(variable_meta_file))

  metadata <- variable_meta_file %>%
    read_delim(delim = separator, col_types = cols(.default = "c")) %>%
    select(from = {{ from }}, to = {{ to }}) %>%
    drop_na() %>%
    distinct()
  
  is_variable_missing <- data %>%
    colnames() %>%
    is_in(metadata %>% pull(from)) %>%
    not()
  
  if (any(is_variable_missing)) {
    missing_variable_names <- data %>%
       colnames() %>%
       magrittr::extract(is_variable_missing)
    warning_message <- str_c(
      "Data variable names {",
      str_c(missing_variable_names, collapse = ", "),
      "} were not renamed because they do not exist in the given metadata."
    )
    warning(warning_message)
  }
  
  new_column_names <- tibble::tibble(from = colnames(data)) %>%
    left_join(metadata, by = "from") %>%
    mutate(to = if_else(condition = is.na(to), true = from, false = to)) %>%
    pull(to)
  
  data <- data %>%
    magrittr::set_colnames(new_column_names)
  
  return(data)
  
}