Christopher R. Dishop: Reminder Cleaning Commands

A few reminders for longitudinal wrangling:

creating file names
advanced filtering
find people with full data
reshape issues

Creating File Names

“wave1.dta”
“wave2.dta”
“wave3.dta”
etc…

# file names
files <- paste0("wave", 1:10, ".dta")

# which can then be used in a function

combine_files <- function(x){
  
  df <- read_dta(paste0("../data/another-folder/", x))

}

# ...and iterated over
combine_files(files[1])

Advanced Filtering

Let’s say I’m iterating over multiple data frames. For each data frame, I want to filter to include only people who are currently employed (1 = yes, 0 = no). The question asking whether a respondent is employed is “wave1_emp” in the first data set, “wave2_emp” in the second data set, “wave3_emp” in the third data set, etc.

df %>% 
  filter_at(vars(contains("_emp")), all_vars(. == 1))

This command is robust across the different q formats within various waves, or across various waves.

Find People With Full Data

Make the df wide. Drop NAs. Pull unique ids. Filter original long df to include only those ids from previous step.

df_wide <- df_wide %>% 
  select_at(vars(contains(c("work", "sat", "cond", "time", "id")))) %>% 
  drop_na()

use_ids <- unique(df_wide$id)

# use long, not wide df here
df_no_missing <- df_long %>% 
  filter(id %in% use_ids)

Reshape Issue

I prefer reshape over pivot_wider / pivot_longer. Unfortunately, the function does not work well with tibbles.

# no good
df_wide <- reshape(df_tibble, idvar = "id", timevar = "time", direction = "wide")

# that'll work
df <- as.data.frame(df_tibble)
df_wide <- reshape(df, idvar = "id", timevar = "time", direction = "wide")

Bo\(^2\)m =)

Reminder Cleaning Commands - Longitudinal

Creating File Names

Advanced Filtering

Find People With Full Data

Reshape Issue