Turning Unequal Dates into Days

Longitudinal data of a group or team often have missing days. For example, only Bob reports a stress score on January 3rd even though Joe and Sam are also part of the sample.

   id       date stress
1 bob 2019-01-01      4
2 joe 2019-01-01      5
3 sam 2019-01-01      6
4 bob 2019-01-02      6
5 joe 2019-01-02      5
6 bob 2019-01-03      4
7 bob 2019-01-04      5
8 joe 2019-01-04      6
9 sam 2019-01-04      7

We want to create an additional column called “day” and use integers rather than dates to make plotting easier/prettier. To do so, we need to create a new data frame of unique dates and unique days, and then we need to merge that new data fram with the original to align the new “day” integer values.

Turn the dates into a character vector so that they are easier to work with.

df$date <- as.character(df$date)

Now give each unique date a respective integer “day” value in a new data frame.

uniq_dates <- unique(df$date)

day_integers <- data.frame(
  'date' = c(uniq_dates),
  'day' = c(1:length(uniq_dates))
)

day_integers$date <- as.character(day_integers$date)

Finally, merge the new day_integers data frame with the original so that we have easy numbers for plotting.

plot_df <- left_join(df, day_integers)

plot_df
   id       date stress day
1 bob 2019-01-01      4   1
2 joe 2019-01-01      5   1
3 sam 2019-01-01      6   1
4 bob 2019-01-02      6   2
5 joe 2019-01-02      5   2
6 bob 2019-01-03      4   3
7 bob 2019-01-04      5   4
8 joe 2019-01-04      6   4
9 sam 2019-01-04      7   4

One additional note. It can be instructive to see the inefficient way to get the same result using a for-loop. Here is un-evaluated code that is the for-loop equivalent to above.

# take unique date
# which rows match 
# plug in counter to those values
# increase counter by 1

time_vec <- numeric(nrow(original_df))
unique_dates <- unique(original_df$date)

counter <- 0

for(i in 1:length(unique_dates)){
  
  # take unique date
  
  datey <- unique_dates[i]
  
  # which rows match this date?
  
  use_rows <- which(original_df$date == datey)
  
  # increase counter
  
  counter <- counter + 1
  
  # plug in counter in time vec
  
  time_vec[use_rows] <- counter
  
}

original_df$day <- time_vec

Bo\(^2\)m =)