Meta Analysis Comps Notes

I’m taking comps pretty soon so this is my summary document regarding meta-analysis.

MAs give us an average estimate across settings, tests, and people after correcting for noise. In a bare-bones MA, we correct only for sampling error. In a full MA, we correct for sampling error, unreliability, and range restriction. I’ll demonstrate a full MA here where we assume direct (rather than indirect) range restriction.

Steps

  1. Literature Review

    • Create inclusion criteria for how you are going to select studies
    • Find relevant articles
  2. Code Articles

    • Measures
    • Reliability
    • SD
    • Means
    • Effect sizes
    • Moderators
  3. Calculate Meta-Analytic Estimate

    • Calculate meta-analytic effect size
    • Calculate its variance

In this post, I’m focusing only on step 3, the calculations, even though steps 1 and 2 are arguably the more important pieces.

Calculating MA Estimate and Variance

Within this step, there are many substeps:

Before we begin, here is a peak at the (mock) data set. I reviewed four studies and compiled their observed effect sizes – in this case we’re going to use correlations. Let’s say that our IV is dancing ability and our DV is life satisfaction, both are continuous variables. We are interested in the meta-analytic correlation between dancing ability and life satisfaction.

  study restricted_predictor_sd unrestricted_predictor_sd
1     1                      14                        20
2     2                      13                        20
3     3                      16                        20
4     4                      18                        20
  predictor_reliability criterion_reliability sample_size
1                  0.94                  0.75          50
2                  0.73                  0.80         100
3                  0.82                  0.83         125
4                  0.75                  0.94         240
  observed_correlation
1                 0.32
2                 0.10
3                 0.25
4                 0.40

a) Calculate the MA correlation

For each study gathered from our literature review…

1)

Correct the observed correlation for range restriction, which produces a rr-corrected correlation

\[\begin{equation} r_{RR} = \dfrac{ \left(\dfrac{US_{x}}{RS_{x}}\right)r_{xy} } {\sqrt{1 + r^2_{xy}\left(\dfrac{US^2_{x}}{RS^2_{x}} - 1\right)} } \end{equation}\]

where \(r_{RR}\) is the correlation that is corrected for range restriction, \(US_x\) is the unrestricted SD on dancing ability, \(RS_x\) is the restricted SD on dancing ability, and \(r_{xy}\) is the correlation between dancing ability and life satisfaction. We are going to compute \(r_{RR}\) for every study.

df <- df %>%
  mutate(r_RR = 
           ((unrestricted_predictor_sd / restricted_predictor_sd)*observed_correlation) / sqrt(
             
             1 + ((observed_correlation^2) * ((unrestricted_predictor_sd / restricted_predictor_sd) -1))    
             
           )
         
  )

2)

Use the rr-corrected correlation along with the criterion reliability to correct for unreliability, which produces operational validity

\[\begin{equation} r_{ov} = \dfrac{r_{RR}}{\sqrt{r_{yy}}} \end{equation}\]

where \(r_{ov}\) is the operational validity of dancing ability and life satisfaction, \(r_{RR}\) is the correlation we calculated in step 1 (the range-restriction-corrected correlation) between dancing ability and life satisfaction, and \(r_{yy}\) is the reliability of the criterion, life satisfaction.

df <- df %>%
  mutate(r_ov = 
           r_RR / sqrt(criterion_reliability))

3)

Then, use the operational validities along with sample sizes to correct for sampling error and produce a sample-size-weighted meta-analytic correlation

\[\begin{equation} \rho = \dfrac{\sum{w_sr_{ov_{i}}}}{\sum{w_s}} \end{equation}\]

where \(\rho\) is the meta-analytic estimate, \(r_{ov_{i}}\) is the operational validity between dancing ability and life satisfaction for each study, and \(w_s\) is the sample size for each study.

ovs_by_sample_size <- df$sample_size * df$r_ov
ma_correlation <- sum(ovs_by_sample_size) / sum(df$sample_size)

df <- df %>%
  mutate(ma_correlation = ma_correlation)

b) Calculate the variance in our MA effect estimate

Compile all of the corrections – steps 4 through 7

4)

Compute the correction factor for unreliability on X, dancing ability (take the square root of the reliability)

df <- df %>%
  mutate(cf_x = sqrt(predictor_reliability))

5)

Compute the correction factor for unreliability on Y, life satisfaction (take the square root of the reliability)

df <- df %>%
  mutate(cf_y = sqrt(criterion_reliability))

6)

Compute the correction factor for range restriction

\[\begin{equation} a_{rr} = \dfrac{1}{ \left(\left(\dfrac{US_x}{RS_x}\right)^2 - 1\right)r_{xy}^2 + 1} \end{equation}\]

where all terms are defined above.

df <- df %>%
  mutate(cf_rr = 1 / 
          (  ((unrestricted_predictor_sd / restricted_predictor_sd)^2 - 1)*(observed_correlation^2) + 1 )
         )

7)

Combine all of those correction factors together into one common correction factor, \(A\).

df <- df %>%
  mutate(A = cf_x*cf_y*cf_rr)

8)

Compute the error variance for a given observed correlation and correct it using the combined correction factor.

This part takes three steps.

I: Compute the sample size weighted observed correlation

    - Essentially the same thing as step 3 but using observed correlations rather than operational validities
    

\[\begin{equation} r_{wa} = \dfrac{\sum{w_sr_{xy_{i}}}}{\sum{w_s}} \end{equation}\]

ss_times_correlations <- df$sample_size*df$observed_correlation
wa_correlation <- sum(ss_times_correlations) / sum(df$sample_size)
II: Compute the error variance on the observed correlation for each study

\[\begin{equation} \sigma^2_e = \dfrac{\left(1-r_{wa}^2\right)^2}{N-1} \end{equation}\]

where \(\sigma^2_e\) is the error variance (for each study), \(r_{wa}\) is the weighted average observed correlation between dancing ability and life satisfaction that we computed above, and \(N\) is the sample size.

df <- df %>%
  mutate(sigma2e = 
           ((1 - wa_correlation^2)^2) / (sample_size - 1)
  )
III: Compute the sampling error correction for each study

\[\begin{equation} Var_{ec} = \dfrac{\sigma^2_e}{A^2} \end{equation}\]

where \(Var_{ec}\) is the sampling error correction, \(\sigma^2_e\) is what we just calculated above, the error variance on the observed correlation for each study, and \(A\) is the combined correction factor for each study.

df <- df %>%
  mutate(var_ec = sigma2e^2 / A^2)

9)

Calculate the average sampling error

\[\begin{equation} Ave_{var_{ec}} = \dfrac{\sum{w_sVar_{ec}}}{\sum{w_s}} \end{equation}\]

where \(Ave_{var_{ec}}\) is the average sampling error, \(w_s\) is the sample size for each study, and \(Var_{ec}\) is the sampling error correction for each individual study.

ss_times_varec <- df$sample_size*df$var_ec
ave_var_ec <- sum(ss_times_varec) / sum(df$sample_size)

10)

Calculate the observed error variance

\[\begin{equation} var_r = \dfrac{\sum{w_s\left(r_{xy} - r_{ov}\right)^2}}{\sum{w_s}} \end{equation}\]

where all terms are defined above.

ss_times_r_minus_ov <- df$sample_size*((df$observed_correlation - df$r_ov)^2)
var_r <- sum(ss_times_r_minus_ov) / sum(df$sample_size)

11)

The MA variance estimate is equal to the observed error variance - the average sampling error

\[\begin{equation} Var_p = var_r - Ave_{var_{ec}} \end{equation}\]

var_p <- var_r - ave_var_ec

Recap

What a nightmare. Here’s a recap:

Now we can calculate credibility and confidence intervals.

Credibility Interval

Gives us a sense for whether or not moderators are at play.

\[\begin{equation} \textrm{95 credibility interval} = \rho +- 1.96*\sqrt{Var_p} \end{equation}\]

upper_cred_i = ma_correlation + (1.96 * sqrt(var_p))
lower_cred_i = ma_correlation - (1.96 * sqrt(var_p))
upper_cred_i
[1] 0.5532591
lower_cred_i
[1] 0.2024129

Credibility Ratio: if \(\dfrac{ave_{var_{ec}}}{var_{r}}\) is lower than 0.75, then moderators may be at play.

ave_var_ec / var_r
[1] 0.01211969

Confidence Interval

\[\begin{equation} \textrm{95 confidence interval} = r_{ov} +- 1.96*SE_{r_{ov}} \end{equation}\]

where \(SE_{r_{ov}}\) is the standard error of the operational validities and is calculated as…

\[\begin{equation} SE_{r_{ov}} = \dfrac{SD_{r_{ov}}}{\sqrt{k}} \end{equation}\]

where \(k\) is the number of studies.

se_r_ov = sd(df$r_ov) / sqrt(length(df$study))

upper_ci = ma_correlation + (1.96 * se_r_ov)
lower_ci = ma_correlation - (1.96 * se_r_ov)
upper_ci
[1] 0.5263396
lower_ci
[1] 0.2293324

Bo\(^2\)m =)