In many research methods or statistics courses you come across the idea that correlated errors signal a third variable. In other words, you have a missing, relevant variable that induces correlation among your residuals. That’s a tough idea to wrap your head around, but it is easier to consider with respect to a given topic: cheating on exams. This post builds intuition for “correlated errors with respect to missing third variables” in the context of college exams and cheating.
First, let’s get a feel for the exams. I’m going to use a lot of images in this post so it helps to walk through the basics of each plot. Imagine students taking a multiple choice test where they fill in one of five responses, “A,” “B,” “C,” “D,” or “E” for each question. The correct answer for question one is “C”
where the x-axis shows the response options a student can select for each question, and the y-axis shows the question number (there is only one question so far).
The correct answer for question two is also “C”
and that pattern continues for the rest of the questions on this 5-item test.
In other words, imagine a 5 question exam where the correct answer for each question is “C.” With the basic images in play, we can think about how students might respond.
First, consider an exam where students do not cheat. If nobody cheats, then everyone’s errors will be dispersed about the true option for each question, “C.” Some people falsely select “A” whereas others falsely select “E,” and yet others falsely select “B.” Here is a plot that retains the purple crosses that mark the true option, “C,” but also includes student responses from Susie, Peter, and John.
For example, on question one Susie selects “A,” Peter selects “E,” and John selects “D,” meaning that none of the students get the answer correct. Every student, though, marks the correct response (C) for question 4.
There is no pattern in this plot. The green triangles, red circles, and blue-green squares are dispersed about the true score purple-crosses randomly. John gets some questions wrong, Susie gets some questions wrong, and Peter gets some questions wrong, but whether John incorrectly marks “A” or “E” doesn’t tell us anything about whether Peter incorrectly marks “A” or “E.” They are all wrong in a random way.
What about when students cheat?
When students cheat, the errors, or “wrongness” of questions, produce a pattern – meaning that the errors are correlated. Imagine that a cheater, let’s say it’s Peter, hacks the teacher’s computer and gains access to all of the questions before the exam. He then answers all of the questions and sends his responses to the rest of the class (John and Susie). But Peter makes a mistake: he writes down the wrong answers to four of the questions. John and Susie use Peter’s responses on the exam, but since they were copying from Peter’s responses they have the same pattern on “wrongness” on the four questions that Peter missed – they get the same questions wrong in the same way. After the exam, the pattern of scores looks as follows.
The true, correct responses are again labeled with purple cross-hairs: response “C” is the correct answer for each question. On question one, John, Peter, and Susie all incorrectly selected “A.” On question three, everyone incorrectly selected “E,” and on question four everyone incorrectly selected “D.” John, Peter, and Susie are wrong in the same way across the questions, their errors produce a pattern, a consistency, a correlation. The errors in our system correlate, which signals a third variable. In this case, the third variable is cheating.
So, think of exams and cheating when you hear the mantra, “correlated errors signal a third variable.” When an important variable is omitted from an analysis, the model is said to be missing a third variable and the errors may correlate. What this tells you is that something else – something unaccounted for – is influencing the patterns in your system, in the same sense as cheating influencing the pattern of responses on an exam.
Bo\(^2\)m =)