Google DeepMind has published a paper titled “Measuring Progress Toward AGI: A Cognitive Taxonomy” that lays out a scientific framework for evaluating how close AI systems are to achieving general intelligence. The work draws on research from psychology, neuroscience, and cognitive science to define what measurable progress toward AGI should actually look like.

A Taxonomy of 10 Cognitive Abilities

The framework identifies 10 cognitive abilities the authors hypothesize are central to general intelligence in AI systems:

  • Perception: extracting and processing sensory information
  • Generation: producing text, speech, and actions
  • Attention: focusing cognitive resources effectively
  • Learning: acquiring new knowledge through experience and instruction
  • Memory: storing and retrieving information over time
  • Reasoning: drawing valid conclusions through logical inference
  • Metacognition: monitoring and understanding one’s own cognitive processes
  • Executive functions: planning, inhibition, and cognitive flexibility
  • Problem solving: finding effective solutions to domain-specific challenges
  • Social cognition: processing and responding appropriately to social information

A Three-Stage Evaluation Protocol

To benchmark AI systems against human capability, DeepMind proposes a three-stage protocol. First, AI systems are evaluated across a broad suite of cognitive tasks with held-out test sets to prevent data contamination. Second, human baselines are collected from a demographically representative adult sample. Third, each system’s performance is mapped relative to the distribution of human performance across each ability.

This approach is intended to give researchers a consistent, empirically grounded way to compare AI progress rather than relying on task-specific benchmarks that may not generalize.

Kaggle Hackathon With $200,000 in Prizes

Alongside the paper, DeepMind is partnering with Kaggle on a community hackathon focused on building evaluations for five cognitive abilities where current tooling is most lacking: learning, metacognition, attention, executive functions, and social cognition. Participants can test submissions against frontier models using Kaggle’s Community Benchmarks platform.

The prize structure includes $10,000 awards for the top two submissions in each of the five tracks, plus four grand prizes of $25,000 each for the best overall submissions. Submissions are accepted from March 17 through April 16, with results announced June 1.

For security researchers and AI safety practitioners, the framework offers a more rigorous vocabulary for discussing capability gaps and risks in frontier models, grounding AGI discourse in measurable cognitive science rather than informal intuition.