I begin with an intercept-only model in a latent change framework and then build to a full dual change model. SEM images in this post come from a lecture by Amy Nuttall. Two notes about the models and code below. First, the initial models will not fit well because they are too simple. The DGP uses both constant and proportion change (hence, “dual-change”) whereas the first few models only estimate an intercept. Second, I use the sem
rather than growth
command in lavaan
because it forces me to specify the entire model. I do not like using commands that make automatic constraints for me – if you do, you are much more likely to make a mistake or not know what your model is doing.
The underlying DGP will be the same throughout this exercise. Consistent with Ghisletta and McArdle, 2012, we have:
\[\begin{equation} y_t = \alpha*b_1 + (1 + b_2)*y_{t-1} \end{equation}\]
where \(b_1\) is the constant change (similar to the “slope” term in a basic growth model, in latent change frameworks it is called the “change factor”) and \(b_2\) is the proportion change, or the change from point to point. The values specified in the DGP are
\[\begin{equation} y_t = 1*0.3 + (1 + -0.4)*y_{t-1} \end{equation}\]
where \(b_1\) is equal to 0.3 and \(b_2\) is equal to -0.4. Let’s generate data for 500 people across six time points.
constant <- 0.3
proportion <- -0.4
people <- 500
time <- 6
df <- matrix(, nrow = people*time, ncol = 3)
count <- 0
for(i in 1:people){
y_het <- rnorm(1, 0, 2)
for(j in 1:time){
count <- count + 1
if(j == 1){
df[count, 1] <- i
df[count, 2] <- j
df[count, 3] <- y_het + rnorm(1,0,1)
}else{
df[count, 1] <- i
df[count, 2] <- j
df[count, 3] <- 1*constant + (1+proportion)*df[count - 1, 3] + y_het + rnorm(1,0,1)
}
}
}
df <- data.frame(df)
names(df) <- c('id', 'time', 'y')
random_ids <- sample(1:people, 5)
sample_df <- df %>%
filter(id %in% random_ids)
ggplot(df, aes(x = time, y = y, group = id)) +
geom_point(color = 'grey85') +
geom_line(color = 'grey85') +
geom_point(data = sample_df, aes(x = time, y = y, group = id)) +
geom_line(data = sample_df, aes(x = time, y = y, group = id))
Change the data to wide and load lavaan
before we start modeling.
Similar to the intercept-only model in a “non-latent change” framework (i.e., a simple growth model), the intercept-only model here contains a latent variable over the first observation.
There are six observations of \(y\) and each is predicted by its latent “true score.” The first true score term is regressed on a latent intercept. The other true scores are regressed on additional latent variables that represent latent change. We don’t have anything relating to those latent change score terms yet so they don’t do much in this model. The autoregressive paths from true score to true score are constrained to 1. Here is how we estimate it.
int_only_string <- [1553 chars quoted with ''']
int_only_model <- sem(int_only_string, data = df_wide)
summary(int_only_model, fit.measures = T)
lavaan 0.6-9 ended normally after 17 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 8
Number of equality constraints 5
Number of observations 500
Model Test User Model:
Test statistic 2665.363
Degrees of freedom 24
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 6704.702
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.605
Tucker-Lewis Index (TLI) 0.753
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -6485.615
Loglikelihood unrestricted model (H1) -5152.933
Akaike (AIC) 12977.230
Bayesian (BIC) 12989.873
Sample-size adjusted Bayesian (BIC) 12980.351
Root Mean Square Error of Approximation:
RMSEA 0.469
90 Percent confidence interval - lower 0.454
90 Percent confidence interval - upper 0.484
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.611
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
l_y1 =~
y.1 1.000
l_y2 =~
y.2 1.000
l_y3 =~
y.3 1.000
l_y4 =~
y.4 1.000
l_y5 =~
y.5 1.000
l_y6 =~
y.6 1.000
lc_y2 =~
l_y2 1.000
lc_y3 =~
l_y3 1.000
lc_y4 =~
l_y4 1.000
lc_y5 =~
l_y5 1.000
lc_y6 =~
l_y6 1.000
latent_intercept =~
l_y1 1.000
Regressions:
Estimate Std.Err z-value P(>|z|)
l_y2 ~
l_y1 1.000
l_y3 ~
l_y2 1.000
l_y4 ~
l_y3 1.000
l_y5 ~
l_y4 1.000
l_y6 ~
l_y5 1.000
Covariances:
Estimate Std.Err z-value P(>|z|)
lc_y2 ~~
lc_y3 0.000
lc_y4 0.000
lc_y5 0.000
lc_y6 0.000
lc_y3 ~~
lc_y4 0.000
lc_y5 0.000
lc_y6 0.000
lc_y4 ~~
lc_y5 0.000
lc_y6 0.000
lc_y5 ~~
lc_y6 0.000
lc_y2 ~~
latent_intrcpt 0.000
lc_y3 ~~
latent_intrcpt 0.000
lc_y4 ~~
latent_intrcpt 0.000
lc_y5 ~~
latent_intrcpt 0.000
lc_y6 ~~
latent_intrcpt 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
latent_intrcpt 0.083 0.186 0.445 0.656
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
lc_y2 0.000
lc_y3 0.000
lc_y4 0.000
lc_y5 0.000
lc_y6 0.000
.y.1 0.000
.y.2 0.000
.y.3 0.000
.y.4 0.000
.y.5 0.000
.y.6 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
ltnt_nt 16.981 1.099 15.454 0.000
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
lc_y2 0.000
lc_y3 0.000
lc_y4 0.000
lc_y5 0.000
lc_y6 0.000
.y.1 (rs_v) 2.348 0.066 35.355 0.000
.y.2 (rs_v) 2.348 0.066 35.355 0.000
.y.3 (rs_v) 2.348 0.066 35.355 0.000
.y.4 (rs_v) 2.348 0.066 35.355 0.000
.y.5 (rs_v) 2.348 0.066 35.355 0.000
.y.6 (rs_v) 2.348 0.066 35.355 0.000
Now we include the proportion change along with the latent intercept.
proportion_string <- [1774 chars quoted with ''']
proportion_model <- sem(proportion_string, data = df_wide)
summary(proportion_model, fit.measures = T)
lavaan 0.6-9 ended normally after 25 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 13
Number of equality constraints 9
Number of observations 500
Model Test User Model:
Test statistic 1073.805
Degrees of freedom 23
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 6704.702
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.843
Tucker-Lewis Index (TLI) 0.898
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -5689.835
Loglikelihood unrestricted model (H1) -5152.933
Akaike (AIC) 11387.671
Bayesian (BIC) 11404.529
Sample-size adjusted Bayesian (BIC) 11391.833
Root Mean Square Error of Approximation:
RMSEA 0.302
90 Percent confidence interval - lower 0.287
90 Percent confidence interval - upper 0.318
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.219
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
l_y1 =~
y.1 1.000
l_y2 =~
y.2 1.000
l_y3 =~
y.3 1.000
l_y4 =~
y.4 1.000
l_y5 =~
y.5 1.000
l_y6 =~
y.6 1.000
lc_y2 =~
l_y2 1.000
lc_y3 =~
l_y3 1.000
lc_y4 =~
l_y4 1.000
lc_y5 =~
l_y5 1.000
lc_y6 =~
l_y6 1.000
latent_intercept =~
l_y1 1.000
Regressions:
Estimate Std.Err z-value P(>|z|)
l_y2 ~
l_y1 1.000
l_y3 ~
l_y2 1.000
l_y4 ~
l_y3 1.000
l_y5 ~
l_y4 1.000
l_y6 ~
l_y5 1.000
lc_y2 ~
l_y1 (b2) 0.140 0.003 41.297 0.000
lc_y3 ~
l_y2 (b2) 0.140 0.003 41.297 0.000
lc_y4 ~
l_y3 (b2) 0.140 0.003 41.297 0.000
lc_y5 ~
l_y4 (b2) 0.140 0.003 41.297 0.000
lc_y6 ~
l_y5 (b2) 0.140 0.003 41.297 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
.lc_y2 ~~
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.lc_y3 ~~
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.lc_y4 ~~
.lc_y5 0.000
.lc_y6 0.000
.lc_y5 ~~
.lc_y6 0.000
.lc_y2 ~~
latent_intrcpt 0.000
.lc_y3 ~~
latent_intrcpt 0.000
.lc_y4 ~~
latent_intrcpt 0.000
.lc_y5 ~~
latent_intrcpt 0.000
.lc_y6 ~~
latent_intrcpt 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
latent_intrcpt 0.071 0.131 0.544 0.586
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
.lc_y2 0.000
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.y.1 0.000
.y.2 0.000
.y.3 0.000
.y.4 0.000
.y.5 0.000
.y.6 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
ltnt_nt 8.505 0.568 14.969 0.000
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
.lc_y2 0.000
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.y.1 (rs_v) 1.230 0.035 35.355 0.000
.y.2 (rs_v) 1.230 0.035 35.355 0.000
.y.3 (rs_v) 1.230 0.035 35.355 0.000
.y.4 (rs_v) 1.230 0.035 35.355 0.000
.y.5 (rs_v) 1.230 0.035 35.355 0.000
.y.6 (rs_v) 1.230 0.035 35.355 0.000
This model is nearly identical to the basic linear growth curve model, it simply embodies it in the latent change framework. The basis coefficients from the constant change term to the latent change scores are constrained to one, then we estimate the mean of the constant change.
constant_change_string <- [2016 chars quoted with ''']
constant_change_model <- sem(constant_change_string, data = df_wide)
summary(constant_change_model, fit.measures = T)
lavaan 0.6-9 ended normally after 33 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 11
Number of equality constraints 5
Number of observations 500
Model Test User Model:
Test statistic 768.956
Degrees of freedom 21
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 6704.702
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.888
Tucker-Lewis Index (TLI) 0.920
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -5537.411
Loglikelihood unrestricted model (H1) -5152.933
Akaike (AIC) 11086.822
Bayesian (BIC) 11112.110
Sample-size adjusted Bayesian (BIC) 11093.065
Root Mean Square Error of Approximation:
RMSEA 0.267
90 Percent confidence interval - lower 0.251
90 Percent confidence interval - upper 0.283
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.162
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
l_y1 =~
y.1 1.000
l_y2 =~
y.2 1.000
l_y3 =~
y.3 1.000
l_y4 =~
y.4 1.000
l_y5 =~
y.5 1.000
l_y6 =~
y.6 1.000
lc_y2 =~
l_y2 1.000
lc_y3 =~
l_y3 1.000
lc_y4 =~
l_y4 1.000
lc_y5 =~
l_y5 1.000
lc_y6 =~
l_y6 1.000
latent_intercept =~
l_y1 1.000
latent_slope =~
lc_y2 1.000
lc_y3 1.000
lc_y4 1.000
lc_y5 1.000
lc_y6 1.000
Regressions:
Estimate Std.Err z-value P(>|z|)
l_y2 ~
l_y1 1.000
l_y3 ~
l_y2 1.000
l_y4 ~
l_y3 1.000
l_y5 ~
l_y4 1.000
l_y6 ~
l_y5 1.000
Covariances:
Estimate Std.Err z-value P(>|z|)
latent_intercept ~~
latent_slope 1.449 0.103 14.059 0.000
.lc_y2 ~~
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.lc_y3 ~~
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.lc_y4 ~~
.lc_y5 0.000
.lc_y6 0.000
.lc_y5 ~~
.lc_y6 0.000
.lc_y2 ~~
latent_intrcpt 0.000
.lc_y3 ~~
latent_intrcpt 0.000
.lc_y4 ~~
latent_intrcpt 0.000
.lc_y5 ~~
latent_intrcpt 0.000
.lc_y6 ~~
latent_intrcpt 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
latent_intrcpt -0.080 0.127 -0.631 0.528
latent_slope 0.065 0.030 2.181 0.029
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
.lc_y2 0.000
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.y.1 0.000
.y.2 0.000
.y.3 0.000
.y.4 0.000
.y.5 0.000
.y.6 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
ltnt_nt 7.517 0.508 14.810 0.000
ltnt_sl 0.392 0.028 13.839 0.000
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
.lc_y2 0.000
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.y.1 (rs_v) 0.962 0.030 31.623 0.000
.y.2 (rs_v) 0.962 0.030 31.623 0.000
.y.3 (rs_v) 0.962 0.030 31.623 0.000
.y.4 (rs_v) 0.962 0.030 31.623 0.000
.y.5 (rs_v) 0.962 0.030 31.623 0.000
.y.6 (rs_v) 0.962 0.030 31.623 0.000
Now a full dual change model with both constant and proportion change parameters.
dual_c_string <- [2128 chars quoted with ''']
dual_change_model <- sem(dual_c_string, data = df_wide)
summary(dual_change_model, fit.measures = T)
lavaan 0.6-9 ended normally after 50 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 16
Number of equality constraints 9
Number of observations 500
Model Test User Model:
Test statistic 356.288
Degrees of freedom 20
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 6704.702
Degrees of freedom 15
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.950
Tucker-Lewis Index (TLI) 0.962
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -5331.077
Loglikelihood unrestricted model (H1) -5152.933
Akaike (AIC) 10676.154
Bayesian (BIC) 10705.657
Sample-size adjusted Bayesian (BIC) 10683.438
Root Mean Square Error of Approximation:
RMSEA 0.183
90 Percent confidence interval - lower 0.167
90 Percent confidence interval - upper 0.200
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.023
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
l_y1 =~
y.1 1.000
l_y2 =~
y.2 1.000
l_y3 =~
y.3 1.000
l_y4 =~
y.4 1.000
l_y5 =~
y.5 1.000
l_y6 =~
y.6 1.000
lc_y2 =~
l_y2 1.000
lc_y3 =~
l_y3 1.000
lc_y4 =~
l_y4 1.000
lc_y5 =~
l_y5 1.000
lc_y6 =~
l_y6 1.000
latent_intercept =~
l_y1 1.000
latent_slope =~
lc_y2 1.000
lc_y3 1.000
lc_y4 1.000
lc_y5 1.000
lc_y6 1.000
Regressions:
Estimate Std.Err z-value P(>|z|)
l_y2 ~
l_y1 1.000
l_y3 ~
l_y2 1.000
l_y4 ~
l_y3 1.000
l_y5 ~
l_y4 1.000
l_y6 ~
l_y5 1.000
lc_y2 ~
l_y1 (b) -0.379 0.015 -24.480 0.000
lc_y3 ~
l_y2 (b) -0.379 0.015 -24.480 0.000
lc_y4 ~
l_y3 (b) -0.379 0.015 -24.480 0.000
lc_y5 ~
l_y4 (b) -0.379 0.015 -24.480 0.000
lc_y6 ~
l_y5 (b) -0.379 0.015 -24.480 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
latent_intercept ~~
latent_slope 4.438 0.315 14.080 0.000
.lc_y2 ~~
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.lc_y3 ~~
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.lc_y4 ~~
.lc_y5 0.000
.lc_y6 0.000
.lc_y5 ~~
.lc_y6 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
latent_intrcpt -0.146 0.107 -1.368 0.171
latent_slope 0.093 0.095 0.977 0.328
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
.lc_y2 0.000
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.y.1 0.000
.y.2 0.000
.y.3 0.000
.y.4 0.000
.y.5 0.000
.y.6 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
ltnt_nt 5.126 0.369 13.889 0.000
ltnt_sl 4.478 0.387 11.564 0.000
.l_y1 0.000
.l_y2 0.000
.l_y3 0.000
.l_y4 0.000
.l_y5 0.000
.l_y6 0.000
.lc_y2 0.000
.lc_y3 0.000
.lc_y4 0.000
.lc_y5 0.000
.lc_y6 0.000
.y.1 (rs_v) 0.810 0.026 31.623 0.000
.y.2 (rs_v) 0.810 0.026 31.623 0.000
.y.3 (rs_v) 0.810 0.026 31.623 0.000
.y.4 (rs_v) 0.810 0.026 31.623 0.000
.y.5 (rs_v) 0.810 0.026 31.623 0.000
.y.6 (rs_v) 0.810 0.026 31.623 0.000
The estimate of the constant change (called “latent slope” in my string syntax; \(b_1\)) is close to 0.3 and the estimate of the proportion change (\(b_2\)) is close to -0.4. Not bad.
These models predict complex change patterns. It is difficult to know the expected curvilinear pattern that the models expect without computing expected scores and plotting them. I did not do that here.
Bo\(^2\)m =)