Consider the notion of incremental validity:
The issue is that measurement error leads to spurious inferences of incremental validity. To be confident that an incremental validity argument is sound, one needs to either ensure perfect measurement reliability or formally account for unreliability in one’s model.
Suppose heat is a common cause of swimming pool deaths and ice cream sales.
If I regress ice cream sales on swimming pool deaths, I (spuriously) conclude that swimming pool deaths predict ice cream sales.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.34 0.89 -0.39 0.7
swimmingdeaths 0.86 0.03 33.54 0.0
If instead I control for heat as a common cause, then the relationship between swimming pool deaths and ice cream sales goes away.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.34 0.77 -0.44 0.66
swimmingdeaths 0.13 0.13 0.97 0.33
heat 0.44 0.08 5.76 0.00
What if heat is subjectively measured?
Now, even when I control for heat perceptions, the spurious relationship between swimming pool deaths and ice cream sales will return.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.92 0.82 -1.12 0.27
swimmingdeaths 0.45 0.09 4.90 0.00
heatperceptions 0.25 0.06 4.48 0.00
What is the solution? Use SEM to control for measurement error.
Let’s assume that heat is measured with three subjective indicators.
Now use SEM. The perceptions are caused by a latent heat factor, and then we regress ice cream sales on swimming pool deaths and the latent heat factor.
library(lavaan)
modelstring <- '
latentheat =~ fl1*hp1 + fl2*hp2 + fl3*hp3
creamsales ~ b1*swimmingdeaths + b2*latentheat
'
model <- sem(modelstring, data = df)
summary(model)
lavaan 0.6-9 ended normally after 119 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 100
Model Test User Model:
Test statistic 333.675
Degrees of freedom 5
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|)
latentheat =~
hp1 (fl1) 1.000
hp2 (fl2) 0.474 0.019 25.072 0.000
hp3 (fl3) 1.685 0.033 51.029 0.000
Regressions:
Estimate Std.Err z-value P(>|z|)
creamsales ~
swmmngdth (b1) 0.280 0.023 12.330 0.000
latenthet (b2) 0.581 0.024 24.262 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.hp1 7.979 1.917 4.163 0.000
.hp2 8.834 1.322 6.681 0.000
.hp3 8.023 4.497 1.784 0.074
.creamsales 14.383 2.142 6.714 0.000
latentheat 300.071 43.578 6.886 0.000
Why is the relationship between swimming pool deaths and ice cream sales still significant?
See Jacob Westfall’s original paper for more on this issue.
Bo\(^2\)m =)