François Chollet on the end of scaling, ARC-3 and his path to AGI

4 hours ago 1
ARTICLE AD BOX

AI researcher François Chollet argues that the era of simply scaling up models to achieve intelligence has run its course. Instead, he sees the field moving toward systems that can adapt to new problems and develop solutions independently, much like human programmers.

Chollet contends that the major breakthroughs in deep learning during the 2010s were largely driven by falling computing costs. This led to the rise of large language models and the widespread belief that increasing scale would eventually yield artificial general intelligence (AGI). However, the field’s focus on ever-larger models, he says, blurred the line between memorized skills and true general intelligence - the kind that allows someone to tackle problems they’ve never encountered before.

To demonstrate this, Chollet points to his Abstraction and Reasoning Corpus (ARC) benchmark, introduced in 2019. Even as models like GPT-4.5 increased massively in size, their ARC performance barely improved, reaching only about 10%, while humans consistently score above 95%. For Chollet, this is clear evidence that scaling up pre-training alone does not produce flexible intelligence.

A shift toward test-time adaptation

Chollet highlights 2024 as a pivotal year, with AI research transitioning to "test-time adaptation" (TTA). Unlike traditional models that remain static at inference, TTA methods allow models to modify their own state on the fly to better handle unfamiliar situations. Approaches like program synthesis and chain-of-thought synthesis enable these systems to reprogram themselves for each new task.

Ad

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

The arrival of TTA has led to significant progress on the ARC benchmark for the first time. Chollet notes that a specialized OpenAI 03-model is now matching human performance levels on ARC, signaling a major shift in AI capabilities.

He frames this change using two classic perspectives on intelligence. Marvin Minsky’s view focuses on automating tasks humans already do, prioritizing efficiency and economic impact. In contrast, John McCarthy defines intelligence as the ability to solve novel problems without prior preparation. Chollet favors the latter, emphasizing that intelligence is about handling new situations, not just executing known routines. He distinguishes between skill - handling familiar challenges - and intelligence, which means facing the unfamiliar.

To illustrate, he likens skill to traveling on an existing road network, while intelligence is more like building new roads to uncharted destinations. In his view, genuine intelligence reveals itself in the creation of new paths rather than the repetition of established ones.

ARC benchmarks steer research

Chollet’s ARC benchmarks are designed to push AI research toward the field’s toughest open questions. While ARC-1 revealed the limitations of scaling, ARC-2 tests "compositional generalization" - the ability to combine previously learned concepts in new ways, similar to how people use their knowledge to solve novel problems.

Current models, including GPT-4.5 and Llama 4, continue to score 0% on ARC-2. Even advanced TTA systems like the 03-model barely reach 1-2%, far behind human performance. Chollet is preparing ARC-3 for release in 2026, aiming to assess a model’s agency, or its capacity to set and pursue goals independently in interactive environments.

Recommendation

Two paths to abstraction

Chollet describes two types of abstraction as central to the future of AI. The first involves pattern recognition, where models detect similarities based on measurable features - the domain of deep learning, which excels at quick intuition and perception, but remains limited to statistical correlations.

The second is based on rule-governed reasoning, identifying structures and processes that are fundamentally the same even if they appear different. This approach underpins logical thinking, planning, and systematic problem-solving, as seen in programming and mathematics.

Deep learning models are strong with the first type but often falter with tasks that require rule-based, structured reasoning, such as sorting lists or manipulating symbols in a precise, programmatic way.

Chollet believes that genuine intelligence requires both: the rapid, intuitive pattern recognition found in deep learning, and the flexible, accurate manipulation of symbols and logical rules, which together enable the creation of novel solutions.

Meta-learning: the next step for AI

Looking ahead, Chollet envisions AI systems that combine both forms of abstraction. He proposes a programmer-like meta-learner capable of developing custom solutions for new problems. This architecture blends deep neural networks for pattern recognition with discrete program search for logic and structure.

Such a system would first use deep learning to extract reusable abstractions from massive datasets, storing them in an ever-expanding global library. When presented with a new challenge, the deep learning component would quickly suggest promising solution candidates, narrowing the field for the symbolic search process. This keeps the combinatorial search space manageable.

The symbolic component then assembles these building blocks into a concrete program tailored to the specific problem, drawing from the library much like a software engineer uses existing tools and code. As the system solves more problems, it can discover new abstractions and add them to the library, continually expanding its capabilities and intuition for assembling solutions.

The goal is to build an AI that can handle entirely new challenges with minimal additional training, improving itself through experience. Chollet’s new research lab, NDEA, is working to turn this vision into reality, aiming to create AI systems that are as flexible and inventive as human programmers, and in doing so, accelerate scientific progress.

Read Entire Article
LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.