Does AI have MENSA-level abstract reasoning skills?
Researchers test AI’s ability to tackle visual puzzles and abstract reasoning, revealing its strengths and limitations.
Artificial intelligence has demonstrated astonishing capabilities, from mastering language to generating stunning artworks and defeating chess grandmasters. Yet, a profound question remains: Can AI tackle the complex realm of abstract reasoning?
This type of reasoning, embodied in visual puzzles that often baffle humans, challenges both perception and logical thinking. Researchers at the University of Southern California’s Viterbi School of Engineering are exploring the depths of AI’s cognitive potential, shedding light on its current abilities and limitations.
In a groundbreaking study, researchers Kian Ahrabian and Zhivar Sourati from USC’s Information Sciences Institute (ISI) sought to evaluate the reasoning skills of multi-modal large language models (MLLMs).
Their work, presented at the 2024 Conference on Language Modeling in Philadelphia, focused on nonverbal abstract reasoning tasks—challenges requiring both visual understanding and logical deduction.
The researchers designed their tests around Raven’s Progressive Matrices, a standard measure of abstract reasoning. These puzzles often involve identifying patterns or sequences in visual arrangements, requiring logical extrapolation.
The team assessed 24 MLLMs, spanning both open-source and closed-source models, to see how well they could process and analyze these tasks.
Jay Pujara, a USC research associate professor and co-author of the study, emphasized the importance of this work. “Every day, we see surprising headlines about what AI can and can’t do. We still have a limited understanding of these models’ capabilities, and this paper helps illuminate where AI struggles. Until we grasp these limitations, we can’t improve AI—or ensure it’s safe and useful.”
The results of the study revealed a stark contrast between the performance of open-source and closed-source models. Open-source models, which are publicly available for modification and improvement, struggled significantly with the visual reasoning tasks. According to Ahrabian, “They were really bad. They couldn’t get anything out of it.”
Related Stories
In comparison, closed-source models, like GPT-4V, demonstrated better results. These models are typically developed by private companies and trained using advanced resources, including extensive datasets and high-powered computing systems. Ahrabian noted, “GPT-4V showed some nontrivial results. It was relatively good at reasoning but still far from perfect.”
The disparity highlights how proprietary advancements in AI development can yield superior performance. However, even the best-performing models encountered challenges, revealing significant gaps in their ability to mimic human reasoning.
To understand why AI struggles with abstract reasoning, the researchers dissected the models’ failures. A key finding was the difficulty these systems faced in accurately interpreting visual information. The models often failed to perceive details such as color changes or intersecting lines, which are crucial for solving the puzzles.
To pinpoint the root cause, the team supplemented the visual puzzles with detailed textual descriptions. This ensured the models had all necessary information in a different format. Surprisingly, many models continued to falter. Sourati explained, “Even when we removed the visual element and just gave them text, they still couldn’t reason effectively.”
This revelation pointed to a fundamental issue: the problem wasn’t just about visual processing. The models lacked the capacity for robust reasoning itself. This distinction allowed the researchers to better target areas for improvement in future AI development.
One promising approach explored by the team was “Chain of Thought prompting.” This method involves guiding the AI to reason step by step through complex tasks. By breaking problems into smaller, logical steps, the models showed marked improvement. Ahrabian noted, “Using hints to guide the models, we observed up to a 100% improvement in performance.”
This strategy highlights how structured guidance can enhance AI’s problem-solving abilities. However, significant work remains to bridge the gap between machine reasoning and human cognition. Current models, while advanced, still fall short of replicating the nuanced and adaptable reasoning that humans excel at.
The findings from this study offer both a reality check and an inspiring glimpse into the future. While today’s AI models excel in specific tasks, their struggles with abstract reasoning underscore the complexity of human cognition. Nevertheless, researchers remain optimistic about the path forward.
By identifying where AI falls short, studies like this pave the way for meaningful advancements. As AI systems evolve, they may one day approach human-level reasoning, blurring the line between artificial and natural intelligence.
Such progress could revolutionize fields ranging from education to problem-solving, unlocking new possibilities for technology and society.
Note: Materials provided above by The Brighter Side of News. Content may be edited for style and length.
Like these kind of feel good stories? Get The Brighter Side of News' newsletter.