GPT-4 Still Unable to Reason, New Study Finds

A new study published on the preprint server Preprints.org argues that despite impressive advances, GPT-4 still lacks fundamental reasoning abilities. The study was conducted by Konstantine Arkoudas, Senior Director at Dyhania Health.

Arkoudas evaluated GPT-4 on 21 reasoning tasks across diverse areas including arithmetic, logic, graph theory, and compiler correctness. He found that GPT-4 struggled with most of these simple problems, often exhibiting internal inconsistencies and making elementary mistakes.

Some key findings highlighted in the paper:

  • GPT-4 was unable to reliably perform basic arithmetic operations like addition and multiplication when given randomly generated numbers.
  • It repeatedly made the same logical errors even after mistakes were pointed out, suggesting a lack of understanding of core concepts.
  • On a simple graph coloring problem, GPT-4 incorrectly claimed a non-complete graph was complete and went on to propose invalid colorings.
  • GPT-4 failed to derive obvious conclusions from a simple blocks world scenario, instead considering arbitrary possible worlds.
  • It was unable to complete basic proofs involving quantifiers and set theory operations.
  • On the well-known Wason selection task, a logical reasoning puzzle, GPT-4 achieved only 28% accuracy.

Arkoudas concludes that despite occasional ingenious responses, GPT-4 is currently “utterly incapable of reasoning.” He argues that using such AI systems in software development or engineering could introduce serious risks.

The results cast doubt on claims that large language models like GPT-4 can reason or think logically. Arkoudas suggests rigorous proof checking may become important if LLM reasoning improves. He also deems dystopian scenarios involving highly capable AI as “far-fetched” given current capabilities.

While acknowledging progress, Arkoudas contends that realizing human-level reasoning remains an extremely challenging goal. This study provides detailed evidence that leading LLMs have yet to achieve robust reasoning competence comparable to humans.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Emsi's feed

Subscribe now to keep reading and get access to the full archive.

Continue reading