80% of Users Accept Wrong LLM Answers: New Study

Research from the University of Pennsylvania examines how AI users approach LLM tools, identifying a pattern called 'cognitive surrender' where users outsource critical thinking to AI systems.

Two categories of AI users

The research identifies two broad categories: users who treat AI as a powerful but faulty service requiring careful human oversight, and users who routinely outsource their critical thinking to what they see as an all-knowing machine. The latter group engages in 'cognitive surrender' - providing minimal internal engagement and accepting AI's reasoning wholesale without oversight or verification.

Experimental methodology

Researchers used Cognitive Reflection Tests (CRT) designed to elicit incorrect answers from intuitive thought processes but be simple for deliberative thinkers. They provided participants with optional access to an LLM chatbot modified to randomly provide inaccurate answers about half the time and accurate answers the other half.

Key findings

Experimental group with AI access consulted it for about 50% of CRT problems
When AI was accurate, users accepted its reasoning about 93% of the time
When AI was randomly faulty, users still accepted AI reasoning 80% of the time
AI-using group did better than control when AI was accurate, worse when AI was inaccurate
AI users scored 11.7% higher on confidence measures despite AI being wrong half the time

Factors affecting verification behavior

Adding incentives (small payments) and immediate feedback for correct answers increased likelihood of overruling faulty AI by 19 percentage points relative to baseline. Adding time pressures (30-second timer) decreased tendency to correct faulty AI by 12 percentage points.

The research suggests AI systems have created a third category of 'artificial cognition' where decisions are driven by external, automated, data-driven reasoning rather than human thought processes. This differs from traditional 'cognitive offloading' where tools like calculators are used strategically with human oversight.

📖 Read the full source: HN LLM Tools