Co-authored with Anja Achtziger of the Department of Political and Social Sciences, Zeppelin Universität Friedrichshafen, Germany.
Let’s consider this puzzle: If five machines can make five widgets in five minutes, how long would it take 100 machines to make 100 widgets? Take a moment to think about it.
Many of us are familiar with these kinds of cognitive tasks that often lead to biased responses. The question is what psychology can contribute to explaining how people handle these tasks and why they are so likely to make mistakes.
The above problem is part of the famous Cognitive Reflection Test (CRT), a brief task used to assess people’s cognitive ability to overcome intuitive errors by analytical thinking. The CRT is one of the most widely used tests in psychological science. It is well-known beyond its origin in judgment and decision making, used in fields like behavioral economics and neuroscience. The CRT is particularly known for its astonishingly high error rates, and it is sometimes, rather dramatically, referred to as the “World’s shortest IQ test.”
Like many others, you might have concluded that 100 minutes is the right response to the widget problem. But upon a closer look, it becomes clear that each machine takes five minutes to produce one widget, so for 100 machines to make 100 widgets, it will also take five minutes.
Or, you might just have googled the problem to be on the safe side and to spare cognitive resources and effort. In that case, you probably came up with the correct answer even faster than by reflecting on it. Indeed, the CRT is so popular that inputting just the first few words of the widget problem into any typical search engine will immediately deliver the results.
This is an interesting challenge for researchers because their materials are not usually publicly available. The reason is often to protect the integrity of the measurement instrument and its validity for further research. For instance, for reliable IQ assessment, it is crucially important to make sure that participants have as little prior knowledge about the specific test materials as possible. If the right answer on a given IQ test question is easily available, IQ assessments can obviously be very seriously distorted.
This is also likely to happen with the CRT. Some people may remember having seen or previously answered some of the questions, or even the entire test. In online psychological studies, where the test is frequently administered, respondents may prefer searching for the correct answers on the web instead of thinking carefully about the problem at hand. Researchers usually expect participants to take the task seriously and answer the questions without Google support. But how can they be sure?
To gauge the impact of unsolicited online searches in online CRT assessments, our recent study (Ludwig & Achtziger, 2021) compared two CRT versions. The first version was the original test (Frederick, 2005), featuring the widget and three other questions. For each of the questions, typing the first few words into Google reveals the correct answer within seconds.
In the second version, we re-phrased the questions in a way that the search algorithm could not directly yield the correct answer. For instance, the modified version of the widget problem reads: “Five pizza chefs can make five pizzas in five minutes. How many minutes would it take 100 pizza chefs to make 100 pizzas?”
Good luck googling that one! You will have to filter through pages and pages of pizza recipes until any reference to the CRT comes up, if it shows up at all. Cheating, at least by web searching, is futile on these re-phrased problems.
The study captured online searches by recording how often participants left the browser tab to switch into different tabs or other apps, presumably to look up the answer on the internet. We used this click count as an indicator of cheating behavior.
In addition, we offered some participants a payment conditional on answering correctly, while others received a fixed amount of money, regardless of their answers.
We wondered if offering money for performing well on a psychological test could corrupt online data collection in research in the behavioral sciences.
Our findings suggest that the answer is yes. People tended to cheat more when they could earn some extra cash. One additional click to another tab increased the odds of scoring higher on the original CRT by 39 percent. This observation indicates that many web searches were successful. Hence, it was relatively easy to cheat on the task to increase profits; and more individuals did so when CRT performance was incentivized.
Performance of participants who left the browser tab running the web study at least once (“Cheaters”) versus honest participants
Source: Jonas Ludwig
Around one-quarter of our respondents interrupted working on the CRT at least once, many of them, presumably, to search for the correct answers on the web.
The group of people working on the original CRT was able to substantially increase their performance by cheating (see Figure 1).
For the modified CRT version, we did not observe this performance improvement, of course, as correct answers could not be looked up on the internet.
People cheated at similar rates but were more successful when working on the original CRT, as you would expect.
Offering money for performing well on the original CRT improves participants’ outcomes. The most obvious explanation may be that people invested more effort into solving the CRT when they were properly motivated by a monetary incentive, and thereby they could improve their performance. Under laboratory conditions, this seems like a valid inference.
But the broad availability of easily accessible information about the CRT (including the option to simply copy-and-paste the answers directly from Google) changes the rules of the game. Our findings align with the idea that online CRT performance improves under incentives because people more often look for the correct answers on the web. In other words, people are willing to cheat for higher profits.
Why does any of this matter for all of us? Well, our findings highlight two apparent methodological problems with one of the most widely used psychological tests, and, potentially, also applies to several other common measurement instruments.
First, participants might google and easily find correct responses on psychological tests on the web. This is a challenge for researchers and can pose serious threats to the validity of an instrument. It raises a more general concern about how we run empirical research on the web, and how far we can take the interpretation of findings from online studies.
Second, people may be inclined to cheat when it pays to do so. Our research suggests that offering money for good performance may not always bring about the behaviors hoped for. Performance-based payment can have unwanted side effects. Money, it seems, offers fertile ground for the emergence of dishonesty.
It’s not surprising that people cheat. For psychology research to be worthwhile, researchers must think about how they incentivize the results they observe. How much of the huge body of online CRT research has been or will be affected by undetected cheating? We can’t know for sure, but we do now at least have some evidence that it likely happens.