Longitudinal data sets often have hidden NAs when they are in long-form. For example, in the data set below Zoe is missing on days 2 and 4, but it isn’t obvious because there are no specific “NA’s” within the data.
time id q1 q2
1 1 Jac 4 3
2 1 Jess 5 2
3 1 Zoe 3 4
4 2 Jac 6 1
5 2 Jess 7 2
6 3 Jac 5 3
7 3 Jess 4 4
8 3 Zoe 3 2
9 4 Jac 4 3
10 4 Jess 5 4
Usually I recommend cleaning within the tidyverse
package, but in this case I prefer reshape
. Change the data frame to wide
and then back to long to reveal the hidden NA’s.
cd_reveal <- reshape(wide_cd, timevar = 'time', idvar = 'id', direction = 'long')
cd_reveal
id time q1.1 q2.1
Jac.1 Jac 1 4 3
Jess.1 Jess 1 5 2
Zoe.1 Zoe 1 3 4
Jac.2 Jac 2 6 1
Jess.2 Jess 2 7 2
Zoe.2 Zoe 2 NA NA
Jac.3 Jac 3 5 3
Jess.3 Jess 3 4 4
Zoe.3 Zoe 3 3 2
Jac.4 Jac 4 4 3
Jess.4 Jess 4 5 4
Zoe.4 Zoe 4 NA NA
It is possible to do all of this within tidyverse
, but it’s tricky because the spread
command only applies to one column (the value parameter only takes one entry), so anytime your data frame contains multiple columns to spread over (almost always the case) then spread
does not work well.
id q2 1 2 3 4
1 Jac 1 NA 6 NA NA
2 Jac 3 4 NA 5 4
3 Jess 2 5 7 NA NA
4 Jess 4 NA NA 4 5
5 Zoe 2 NA NA 3 NA
6 Zoe 4 3 NA NA NA
Notice how it only used q1. The proper way to go from long to wide and then back to long to reveal the NA’s using tidyverse
is either of the following:
cd %>%
select(time, id, q1) %>%
spread(key = time, value = q1) %>%
gather(key = time, value = 'q1', '1','2','3','4') # string code needed
id time q1
1 Jac 1 4
2 Jess 1 5
3 Zoe 1 3
4 Jac 2 6
5 Jess 2 7
6 Zoe 2 NA
7 Jac 3 5
8 Jess 3 4
9 Zoe 3 3
10 Jac 4 4
11 Jess 4 5
12 Zoe 4 NA
time_string <- as.character(unique(cd$time))
cd %>%
select(time, id, q1) %>%
spread(key = time, value = q1) %>%
gather(key = time, value = 'q1', time_string) # string code not needed due to pre-allocation
id time q1
1 Jac 1 4
2 Jess 1 5
3 Zoe 1 3
4 Jac 2 6
5 Jess 2 7
6 Zoe 2 NA
7 Jac 3 5
8 Jess 3 4
9 Zoe 3 3
10 Jac 4 4
11 Jess 4 5
12 Zoe 4 NA
Again, I prefer reshape
because the spread
commands in tidyverse
are not easy to read.
Bo\(^2\)m =)