Skip to main content eteppo

Recoding Dataframe Variables Using A Metadata File in R

Published: 2023-08-04
Updated: 2023-08-04

Often raw data has been coded in some way to obscure it or to make it directly usable in a particular analysis. “Yes” might be 1, and “Class C” might be 3.

But it’s often more convenient to have the true, descriptive values. So you need to recode all of the values to get to your clean, understandable data. The mapping between coded and descriptive values exist usually in some metadata file.

How to recode all the values based on a variable metadata file? Here’s a solution I used.

library(tidyverse)

recode_with <- function(data, value_meta_file, variable_name, from, to) {
  
  metadata <- value_meta_file %>%
    read_csv(col_types = cols(.default = "c")) %>%
    select(variable_name = {{variable_name}}, from = {{from}}, to = {{to}})
  
  for (i_column in 1:ncol(data)) {
    
    column_name <- data %>%
      colnames() %>%
      chuck(i_column)
    
    if (column_name %in% pull(metadata, variable_name)) {
      
      codes_to_values <- metadata %>% 
        filter(variable_name == column_name) %>%
        distinct(from, to)
      
      new_column_values <- data %>%
        select(from = all_of(column_name)) %>%
        left_join(codes_to_values, by = "from") %>%
        pull(to)

      # This is the same as data[x, y] <- value
      data <- magrittr::inset(
        data, 
        1:nrow(data), 
        column_name, 
        value = new_column_values
      )

    }

  }
  
  return(data)
  
}