Infinite Faculty

What are the real problems of continual learning?

Andrew Lampinen — Fri, 29 May 2026 20:38:01 GMT

Lately, there’s been a surge of interest in continual learning. In particular, there’s been an increasing sense that continual learning is one of the areas where the gap between humans and AI is largest.1 In this post, I want to explain what continual learning is, and some of the past perspectives on it — including how I think the field focused on what in retrospect turned out to be the wrong problems. I’ll then describe what I think are the actual problems that remain, why humans are still superior to AI, and why I think continual learning is so important.

What is continual learning?

In short, continual learning is the ability of a system to keep improving throughout its existence — just as humans can learn new skills and knowledge throughout our lives. This is not an inherent feature of most contemporary AI systems; if you have one conversation with a language model, and then have another conversation on that topic, the model will have no recollection of the first one or what you explained during it (unless the AI writes notes for itself, like some memory systems allow). If you work with a language model to write a paper, you will have to put the first paper in context when you want to work on a follow-up. Why don’t these systems support continual learning? To understand this question, let’s start from earlier perspectives.

Catastrophic interference: the original problem

From the early eras of neural network research, it was noted that these networks exhibit catastrophic interference (or catastrophic forgetting) — when the network is trained on a new task, it catastrophically degrades the network’s ability to perform the tasks it was trained on previously. Thus, the networks were generally trained on data that were constant over training, or sampled uniformly (IID), rather than a sequence of tasks. This was seen to pose a challenge for using neural networks as cognitive models, since human development and education is fundamentally sequential.

This challenge provided one motivation for theories of the role of the human hippocampus (episodic memory system) as allowing for rapid learning that complemented the cortical system — if the hippocampus can rapidly store a memory, it can then be learned by the cortical system more gradually as it is interleaved with other experiences. This type of interleaving makes the data distribution closer to IID, and thus reduces the problem of catastrophic interference. The idea of replaying past experiences to smooth the data distribution has been very influential in subsequent machine learning works, for example in reinforcement learning.

However, machine learning approaches to replay have generally relied on storing veridical experiences; as such, they tended to seem impractical as the tasks being learned became more numerous and complex. This problem led to an explosion of approaches trying to address catastrophic interference through other changes to architectures or learning objectives — for example, various approaches that preserve weights proportional to their importance to prior tasks, or prevent gradients from interfering with prior tasks, so that learning will occur where it interferes the least.

Is interference as catastrophic as it seems?

However, various recent works have argued that “interference” in continual learning is not quite as catastrophic as it seems. Often, the knowledge of earlier tasks is preserved within the model’s representations in some sense, and can be recovered relatively easily. For example, one paper finds that internal representations often preserve relatively high linear decodability of earlier task information, even when performance degrades. Several other papers have suggested similarly that interference is strongest at the readout layers, while earlier layer representations preserve information about other tasks — thus allowing relatively easy recovery of earlier tasks by retraining the output layers. On their own, these findings do not completely resolve the interference problem — but they do suggest it may not be fundamental.

Loss of plasticity: forward interference

More recently, there has been a newer perspective on a different type of interference: loss of plasticity. While typically catastrophic forgetting works backwards (tasks learned later interfere with earlier tasks), loss of plasticity is instead about forward interference: how earlier learning impairs the ability of the network to learn later tasks. Evidently, for a continual learner this type of interference is just as bad as catastrophic forgetting; a continual learning system cannot lose its ability to learn over time.

However, again, the problems are not universal. Commonly-used interventions like layer normalization and weight decay can substantially reduce loss of plasticity. Moreover, some researchers have argued that loss of plasticity is an artifact of artificial, hard boundaries between tasks — if the environment drifts more continuously, plasticity may be preserved.

The modern paradigm: scale and pretraining reduce interference

However, I believe that large pretrained models have more fundamentally changed the paradigm for continual learning and interference. Large language models are trained on huge quantities of data, over a series of stages that often include pretraining (possibly with longer sequences at later stages), midtraining, supervised fine tuning, and RL from various types of rewards. If language models experienced substantial catastrophic interference or loss of plasticity across these dramatically different distributions, these training pipelines simply would not work. So why do they?

One key piece of the answer is scale. Scale is something that I think prior continual learning research often got wrong. Historically, continual learning research focused on small models (sometimes only a few layers) trained on many tasks. However, more recently researchers have found that larger models exhibit substantially less interference. For example, wider models show less catastrophic forgetting on visual continual learning tasks, in part because gradients are sparser and more orthogonal between tasks in wider architectures. Indeed, it seems intuitive that a model with more capacity will show reduced interference between tasks.

Moreover, scale has a positive interaction with pretraining for both vision and language models — pretrained models forget less than models trained from scratch, and the benefits of pretraining increase with scale. (In fact, it was noted in a relatively-unknown cognitive science proceedings paper in 1993 that pretraining on related tasks reduces catastrophic interference on subsequent sequential learning.)

Larger models may memorize more data — but they also show less overfitting even when they memorize. Perhaps because they are less overfit to the training distribution, larger models have also been found to perform better on downstream tasks, even if their pretraining loss is similar to smaller models.

In a recent preprint, we’ve argued that processes like these drive many of the benefits of language model scaling: the reason larger language models learn more than smaller models is precisely because of their reduced interference and increased capacity for memorization, allowing the model to more effectively preserve its learning about rarer structures in the data until they are next encountered.

Beyond scale, other details of language model training may help to ameliorate interference. Strategies that smooth the distributional shift between phases of training, such as mid-training to bridge between pre-training and post-training distributions, may help to reduce interference and loss of plasticity. Language models already incorporate architectural features like normalization layers, and often training techniques like weight decay, that can reduce loss of plasticity. Other simple strategies, such as regularizing with KL-divergence to the original model (as is often done in RL with LLMs), or using parameter-efficient fine-tuning methods that constrain updates to a subset of the parameters, likely reduce interference as well.

Thus, pretraining and scale — especially when coupled with relatively standard strategies for preserving prior knowledge — have substantially reduced the catastrophic interference and loss-of-plasticity problems in large language models, even if they have not eliminated them entirely.

In search of positive transfer & cumulative learning

However, avoiding interference is only one piece of continual learning. From the early days of continual learning research, it was observed that human and animal learning does not just avoid interference, but that learned tasks actively support one another — for example, after learning hundreds of mathematical concepts, a mathematician will be faster to learn the next one (forward positive transfer), and may even use it to improve their understanding of earlier concepts (backward positive transfer). There has been a renewed interest more recently in this kind of cumulative learning, where tasks build on each other and help to learn future tasks.2

It’s in this area that I believe current language models still show the weakest continual learning abilities. Despite having been trained on huge numbers of tasks, language models cannot necessarily learn an entirely new domain as efficiently and reliably as one might hope, given the incredible number of related prior tasks from which they could transfer.

For example, while pretrained large language models can often learn a new task effectively in context, there is not a universal recipe for transferring that knowledge beyond that context. However, there are a growing number of interesting explorations on how to help language models learn more effectively for the future, including:

Data augmentation: allowing a model to “learn by thinking” and generate further synthetic inferences from some data, that can then be distilled back into the model (e.g., 1, 2, 3, 4).
Document Retrieval: explicitly retrieving documents from the training corpus (or another) that are relevant to the present task (e.g., 1, 2, 3), which can allow more flexible transfer.
KV caches: a variety of papers have suggested clever strategies for adapting the KV cache (which normally stores just the key-value activations from the earlier parts of the present document), such as compressing its knowledge into fewer KV pairs (e.g., 1, 2). While these strategies are often simply intended for more efficient inference, compressed KV caches can sometimes be recomposed to integrate knowledge across multiple corpora — thus showing their potential for continual learning (e.g., 1, 2).
Textual memory: the model can also write succinct notes, such as causal abstractions, about its tasks that (e.g., 1) — indeed, most chat platforms now incorporate some form of textual memory for personalization (e.g., 1, 2, 3).
Context distillation and self-distillation: a variety of works have considered distilling a context into the model’s parameters by training it to make similar predictions without the context, or distilling from a teacher that has access to privileged information (e.g., 1, 2) — these approaches appear to offer an effective path to integrating skills from context into the weights, and have thus been suggested as a path to continual learning (e.g., 1, 2).

Nevertheless, while any of these approaches can partially ameliorate the problem of transferring knowledge forward from a particular learning experience, it is not clear whether they fundamentally solve it. A model cannot fit arbitrarily many retrieved documents, KVs, or textual memories in context; nor can it be trained on arbitrarily many synthetic documents or distilled from every context. Is there still something missing for enabling true continual learning systems?

Handing off between different learning systems: some directions for the future of continual learning

I see two possible paths forward for continual learning research, that involve different ways of handing off between different learning systems.

First, it’s possible that the pieces I listed above — letting the system write notes or other inferences, retrieve them or its experiences when needed, and then distilling them back into the model for the future — is sufficient to achieve more effective continual learning. Combining these approaches is often limited by practical limitations (such as the increased expense or technical difficulty of updating model parameters frequently, or differently for different users), rather than their inadequacy. But because of algorithmic progress, the scale of models needed to achieve a given performance level is decreasing — and thus, it may become increasingly feasible to deploy systems that combine multiple of these approaches.

However, it’s also possible that something more fundamental needs to change.

In the discussion of “the broader spectrum of in-context learning” we noted that there is currently an artificial boundary between in-context learning and the longer-term parametric learning of a language model — whereas natural intelligences do not have any such hard boundary.

Specifically, while natural intelligences do certainly have qualitatively different learning systems — from working memory to episodic memory, and neural plasticity — these learning systems are not so cleanly divided across timescales. Synaptic plasticity can operate on timescales ranging from milliseconds (short term synaptic plasticity) to a lifetime; episodic memory likewise can operate from within a minute to much longer time periods. Maybe allowing multiple systems of learning to work together across every timescale, instead of having entirely separate systems at different timescales, is needed to enable the system to more effectively learn cumulatively.

Thus, it is possible that for our AI systems to achieve efficient and cumulative continual learning, we need to remove the artificial boundaries we’ve introduced between the present context and the rest of their past experiences, and the tokens to come in the present context and those to come in their future tasks.

It’s worth caveating up front that continual learning is not obviously a necessary feature for achieving any particular goal with artificial intelligence; a mathematician might learn a new domain more efficiently than a current language model, but that doesn’t mean language models cannot achieve new insights in an area — as indeed they have recently appeared to.

A related area of recent work focuses on prospective learning where a system explicitly attempts to learn in a way that is future-oriented rather than focused on the current task distribution — such as preplaying possible future tasks, or using episodic memory to store learning experiences in a way that allows more flexible reuse in the future.

The no-magic approach to understanding intelligent systems

Andrew Lampinen — Sat, 07 Mar 2026 20:53:44 GMT

Today I want to write a bit about the philosophy I think underlies much of the work that my collaborators and I (as well as many other researchers that I respect) have done on understanding artificial and natural intelligence: avoiding magical thinking, about either artificial or natural intelligence, and instead trying to understand the principles from which the phenomena of intelligence emerge. I want to talk about where this approach comes from and why I see it as important — especially as the impact of AI grows — and why it drives me to do things like writing this substack.

Indistinguishable from magic

“Any sufficiently advanced technology is indistinguishable from magic” - Arthur C. Clarke

AI can feel magical. I remember watching the AlphaGo matches against Lee Sedol from my tiny grad school apartment; although I’m far from a good enough Go player to appreciate the full depth of the games, I knew enough to have been struck by the beauty of the games. I’ve had similar feelings many times in the years since, for example the first time a language model successfully wrote an entire class and its tests for me.

(Human intelligence can feel magical too; after all, we were playing beautiful Go, building computers, and writing elegant code long before AI was.)

I think that magical thinking often sneaks into the way that people think and talk about AI — among the general public, but also among experts. It’s natural to see a system doing something surprising and decide that it must be inexplicable.

But as Clarke’s quote emphasizes, anything can seem magical when you don’t understand it. If someone from a thousand years ago saw airplanes or video calls, or earthquakes or diseases, they would certainly think of gods or magic. It’s important not to get trapped in magical thinking when there is something we don’t understand.

Overreacting: the temptation to dismiss AI as an illusion, or to decide natural intelligence is magic

A common reaction to magical thinking is to go as far as possible in the other direction. Wherever something seems like magic, it must be an illusion; there must be a man behind the curtain. This reaction often accompanies arguments that AI is simply memorizing all the things it has seen in training, without generalizing.

An impulse that often accompanies dismissing AI is believing that there must be something magical in natural intelligence. For example, there must be something about the nature of biological synapses, or quantum effects of microtubules, or the status of being an organism that is essential to achieving “real” intelligence.

Of course, there is a reason that these arguments seem persuasive. First, the field is advancing quickly, but unevenly, creating what some have called a “jagged” frontier. These uneven capabilities make it easy to point to failures of AI to capture all aspects of human intelligence, and conclude that therefore it never could. Moreover, some of the apparent capability increases really are due to increasing coverage within the training data — but this is not incompatible with generalization.

Likewise we certainly can’t rule out the possibility that there really is something about natural computation that allows natural intelligence to achieve qualitatively different things than artificial intelligence — but the arguments for this often rely on vague speculation, or bad assumptions. As is often pointed out, arguments about the specialness of biological intelligence often have the same dismissive flavor as arguments that flying machines would not fly at the turn of the 20th century. We should be careful not to confuse possibility with necessity: just because some model behaviors can be explained without generalization, or just because we see the existence of natural intelligence that is more general than current AI, does not mean that is the only possibility.

The no-magic middle way: searching for the principles from which phenomena of intelligence emerge

I think that assuming that AI is magical, that it is an illusion, or that natural intelligence is magical, could be detrimental to the long-term health of science and society. Misleading ourselves about the relationship between AI and our own intelligence, what it can achieve and what we can, can lead to either paralyzing fear or complacency about how AI will affect society.

That is why I think it is important for researchers to pursue understanding the principles underlying the performance of these systems — and communicating about that understanding — and why I have devoted a portion of my work to this area.

I’ve written about some of the directions we’ve taken in prior posts. Here, I want to briefly emphasize how surprisingly complex phenomena in these models, that really exhibit generalization, can nevertheless be understood as relatively straightforward consequences of adapting to their training data and constraints, such as soft inductive biases. In many cases, these phenomena parallel aspects of human cognition, due to the shared world underlying human learning environments and model training data.

For example, semantic knowledge development in AI, as well as seeming failures like content-entangled reasoning or incoherent probability judgments in language models can be interpreted as rational adaptations to training that parallel similar findings in humans. Many other phenomena that are more model-specific, such as patterns of hierarchical generalization, or in-context learning, can similarly be explained through relatively simple principles. Even cases where models exhibit superficially surprising patterns of generalization — such as learning to like owls from training on sequences of random numbers — can be understood as consequences of relatively simple features of models. And there is a growing trend of work showing how richly structured representations — such as linear and geometric representations in language models, even of higher-order concepts such as “truth,” can be analytically derived from relatively low-level properties of the training data, such as word- or sentence-level cooccurrence structures. We are beginning to understand many principles of model behavior and representations, and how in some cases they parallel human ones.

I hope that achieving this type of understanding can help to inoculate us against resorting to magical thinking.

Coda: the need for science communication in AI

For researchers in this area who agree with my assessment: I think this is an important time for science communication, for helping everyone to understand the systems we are building as we understand them. (For researchers who disagree with me, I think that it’s important for the public to understand those perspectives too, as long as you make it clear that there are multiple perspectives out there.) I intend to write more accessibly about understanding AI here. I appreciate the other researchers who are already making efforts to communicate more broadly, and if you are not already doing so, I hope that you will consider investing some of your time similarly.

Memorization vs. generalization in deep learning: implicit biases, benign overfitting, and more

Andrew Lampinen — Wed, 18 Feb 2026 15:48:13 GMT

Is there a fundamental tradeoff between memorization and generalization, or is their relationship more nuanced? Questions like these come up again and again in machine learning, most recently in discussions about language models — are they just a “blurry JPEG of the web” as Ted Chiang wrote? I’ve felt like there isn’t a good high-level overview of the nuanced relationship between memorization and generalization as we understand it in the theory and practice of machine learning today. Thus, today I want to write a bit about how I think the field’s understanding of these issues has evolved over the past few decades.

Classical perspectives:

In classical learning theory, there is a fundamental tradeoff between memorization capacity and generalization — models that are able to memorize arbitrary data should not be expected to generalize. This notion is captured in the VC dimension bounds on generalization error, which (very loosely) says that models that can memorize a relatively small amount of the training dataset will have test error close to the train error, while models which can memorize the training data can have arbitrarily bad test error.

This makes intuitive sense if you think of a picture of what it means to overfit a polynomial to some points:

Classical perspectives: Under- and over-fitting, and the just-right happy medium. Overfitting matches all the training data points perfectly, but makes worse predictions outside the training data in the course of doing so.

Being able to memorize all the data points means the function is much too complex, and that it will therefore make much worse predictions about new data points than a function that can only memorize a few data points. This can be seen as an instance of Occam’s razor; it’s better to favor simple explanations. The simplest polynomial that can get reasonably close to all the data will probably give the best predictions; a complicated one that matches everything perfectly will probably do worse. (The VC dimension bounds are about classification, not regression, but the intuition for why memorization capacity might make a model worse is similar.)

A memorization puzzle:

Over a decade ago, deep learning models for vision started to dominate competitions like ImageNet, achieving high accuracy, and even learning features that generalized well to entirely new tasks. However, in a paper first released 10 years ago, Chiyuan Zhang and colleagues pointed out a puzzle these empirical results pose for the classic story above. Specifically, the authors showed that these successful deep learning models were capable of memorizing entirely random labels for the ImageNet datasets — even when applying regularization. Yet, despite this capacity for memorization, the models generalize well!

Clearly, the simplest version of the picture above can’t be true; there must be something that pushes deep learning models away from purely overfitting even in cases where they could. (N.B.: this doesn’t contradict the generalization bounds given by VC theory; it just shows that they are looser than they could be.)

In search of implicit simplicity biases:

The observations above led researchers to search for some implicit bias towards learning simpler functions1 — rather than complex squiggly ones like the overfit example above. This bias must be implicit in the sense that it was not intentionally created by the model designers, but rather something that happens to be true of the models or optimization processes more generally. Indeed, researchers have since proposed a number of such implicit biases that might explain the generalization we observe in practice.

One plausible source of bias is architectural; that something about neural networks is inherently biased towards simpler functions. There is some evidence for this. A particularly nice example of this comes from a paper by Valle Pérez & collaborators that shows both theoretically and empirically that randomly sampled parameters will be exponentially more likely to instantiate a simpler function than a more complex one. Thus, if we imagine that the model is randomly sampling from among the possible solutions to the problem that it can implement, it will tend to hit simpler ones much more often, which may help it to generalize. Other papers have studied many other architectural features that improve generalization, e.g., why deeper models may be more biased towards simpler solutions.

However, some of the arguments above — and many more relatively architecture-independent ones — depend on the optimization process, which I will turn to next.

Optimization biases — simpler functions are learned first:

One of the most commonly-studied sources of bias is the optimization process itself — with the idea that what is learned first will tend to be simpler. For example several papers show that models trained with gradient-based optimization tend to learn linear structures in the data before (or in lieu of) learning nonlinear structures — and these simplicity biases persist in their representations even when the more complex nonlinear features have been learned later. Humans also show similar simplicity biases in terms of how quickly they learn different types of concepts. The ease of learning may favor simplicity, which in turn may encourage generalization.

The neural race reduction: a depiction from Andrew Saxe’s keynote at NeurIPS 2025.

Another optimization-based perspective I find particularly useful is The Neural Race Reduction by Saxe & Sodhani. The authors demonstrate through both theory and empirics the idea that if a model has multiple ways to solve a problem, the one that leads to the fastest learning will tend to dominate. For example, solutions that share more structure across multiple tasks will tend to be learned faster than solutions that are idiosyncratic. Intuitively, these will tend to be the solutions that generalize, rather than those that memorize individual facts. In fact, it is precisely because the solutions generalize that they are learned faster; the fact that they apply to more instances means they get more frequent and more consistent gradient updates than idiosyncratic solution paths do.

This result resonates with an earlier study in which we showed analytically in much simpler settings (deep linear networks learning linear functions) why learning a signal shared across many data points will be faster than memorizing many individual random labels; the memorizing case spreads the same total signal out over many different modes, each of which is learned more slowly. Likewise, signals that are shared across multiple tasks will tend to be learned faster and more reliably than signals that are not shared.

There are many more recent papers that extend, elaborate, or add nuance to this perspective, but I think the overall intuitions here — that simpler, shared structures will tend to dominate because they contribute to solving more problems — are a good starting place for understanding why optimization processes may bias models towards common, generalizing structures.2

Memorization is not incompatible with generalization3

In practice, however, deep learning models do memorize some of their training data. Does this hurt generalization? In some cases, it may not!

First, there may be cases where models memorize in a way that doesn’t hurt generalization (but also doesn’t help it), which is often referred to as benign overfitting. This has been demonstrated empirically and theoretically in various settings, from linear to nonlinear models. The intuitive picture is that models learn both the common structure in the data and memorize the idiosyncrasies, but when a new data point is not too close to a memorized example, they will tend to fall back to the simple, generalizing structure. Thus, memorization may not be as bad as it is often assumed to be — while overfitting distorts the global structure, benign overfitting causes distortions only locally.

In benign overfitting, the model still overfits the training data, but this only causes distortions in local neighborhoods of training data points; outside these, the global shape is simpler.

More surprisingly, however, in some cases memorization may actually help models to generalize. In particular, several papers have argued that memorization may be needed if there are some obscure structures in the data — either rare exceptions that are unlike the other examples (e.g., a penguin is unlike other birds in numerous ways), or actual errors in the data (e.g., because real datasets often have some percentage of wrong or incomplete labels). If the model tried to learn these rare examples with the same structure as the other examples, it might hurt generalization. In this case, memorizing exceptions or rare examples actually protects the simpler, generalizing structures on the rest of the data.

There are also some reasons to think that early in learning memorization itself might help to more effectively set up the representations for learning shared structures; we’ve observed several instances in recent papers where having common “celebrity” entities that are often encountered, or even explicitly repeated data, can accelerate learning of mechanisms (e.g., sparse attention patterns) needed to support instances beyond these. Thus, memorizing can support later learning and generalization.

Several recent theoretical works have elaborated and deepened these stories — highlighting how (at least for some problems) memorization of some training data may be necessary for effective generalization, including memorizing information irrelevant to the target task, and how trading off learning shared structures and memorizing unpredictable ones can lead to optimal generalization. In a recent preprint, we’ve also argued that explicit episodic recall may be needed to unlock certain types of generalization to tasks sufficiently different from the training — that is, memorizing veridical experiences (in a complementary episodic storage system) may be necessary for reusing information flexibly, because the abstractions induced by task-oriented learning inherently discard information that might be useful for sufficiently different future tasks.

Thus, while we do not understand the complete picture of memorization and generalization across all problems, there are a number of lines of evidence suggesting memorization is not always detrimental to learning generalizable structures, and may in fact support some types generalization.

There are also traditions of memorization in human education, even when there is an underlying principle from which the memorized instances can be inferred, e.g., memorizing multiplication tables as a part of learning mathematics. It is interesting to speculate about whether these practices help with learning underlying principles per se (or whether they help performance through more prosaic mechanisms, e.g., having cached solutions to subproblems that reduce working memory load).

What does this all mean for language models?

Nowadays, there are many discussions in the academic literature, popular press, and social media about the extent to which language models can generalize. Often, critical perspectives lean on metaphors about (approximate) memorization — language models are “just” stochastic parrots or blurry JPEGs that partially regurgitate their training data in response to new prompts. These perspectives are buttressed by clear observations that language models do memorize some of their training data in a way that increase with scale, and more concerningly, that proximity to training data drives some of their performance. For example, when researchers create items that are similar to a familiar brainteaser, but that change the logical structure to be much simpler, models often incorrectly respond with a memorized answer pattern (example from twitter, paper) — a clear instance of overfitting. Findings like this, as well as patterns such as models performing better on more frequent items have led researchers to argue that these models are not truly reasoning or generalizing.

However, I hope that the examples above show why we might hope that models will learn to generalize rather than, or at least in addition to, simply performing lossy memorization. For example, even if models overfit on instances resembling well-known brainteasers, this overfitting could be “benign”, in the sense that it doesn’t harm generalization much outside the neighborhood of similar-sounding problems (though of course, we would like models to generalize on these instances, too). Indeed, benign-overfitting-like phenomena have been documented in language models in controlled studies, such as strong generalization even after memorizing noisy parts of data, larger models memorizing more without overfitting, and models learning organized representations that capture generalizable structure in the data, rather than simply memorizing associations.

These findings on memorization complement a broader range of studies showing how language models can learn generalizable syntax or algorithms in controlled settings where what is held out is known, and data-influence-based studies suggesting that memorization is not the key driver of performance on reasoning problems; instead, models infer generalizable procedures. We have strong empirical reasons to think that language models learn generalizable abilities from their training — and some theoretical intuitions of why that might happen — even when they memorize nontrivial quantities of training data, and even if they do not always generalize in the ways that we would hope.

Of course, none of this precludes the possibility that there are strategies for encouraging models to generalize better. There’s a lot more to be said on retrieval, RL vs. imitation, modern reasoning-trained models, etc. of course — stay tuned.

There’s an implicit assumption in the arguments here that simpler functions = better generalization, but of course there are cases where this isn’t true, as a nice paper by Harshay Shah and collaborators shows. Indeed, in more complex settings, it is not even always trivial to define what is simpler; it can have complex interactions with architecture and training objective (see e.g. this paper).

There have also been a number of papers that try to adapt generalization theories to account for implicit biases like those discussed, e.g. by measuring the “capacity” of a model in terms of parameters that measure how much it has actually learned from the data, via measurements like parameter norms or compressibility. These are potentially promising approaches for reconciling theory with the empirics of deep learning — though so far, it’s not obvious to me if they have successfully done so. E.g., as a recent compression-based paper notes in its limitations, their bounds still favor models with fewer parameters, when in practice larger models generalize better. Nevertheless, I think that these approaches are promising directions for further investigation.

Thanks to Gavin Brown for helpful suggestions for this section.

On research careers in academia and industry

Andrew Lampinen — Fri, 23 Jan 2026 15:06:57 GMT

I get asked a lot of questions about the relationship between AI & Cognitive Science, especially from early-career researchers wondering where their work might fit into the rapidly evolving fields. This is the final in a series of posts where I aim to lay out my current thoughts on the relationship between these fields — and the career options in them — in the present research environment. Everything here is my personal opinion; almost any claim I make some researchers would stridently disagree with. So take it with a grain of salt.

In the past three posts, I’ve laid out some relationships I see between research in these fields, and where they can learn from each other. Here, I want to discuss a more practical question that I get asked even more frequently: should I do research in academia or industry? I’ll describe how I chose to go to DeepMind when I finished my PhD, how things have changed since, and lay out what I see as some of the pros and cons of industry and academia at present. (Note: I’ll focus on large scale industry research labs for this post; I don’t have enough familiarity with startups.)

Joining DeepMind and five years of change

Even though I only finished my PhD in 2020, the research world feels entirely different than it did when I was making my career decisions. In 2019, I interned at DeepMind (after having interned at Google Brain a few years before). I was fortunate to work with Adam Santoro & Felix Hill, who both became incredible mentors (and frequent collaborators) for me in the years after. And it felt like a magical time at DeepMind. I got to meet brilliant people ranging from physicists to neuroscientists, explore ideas about the origins and nature of intelligence, and be in a space where it felt like the future of AI was happening. So it felt obvious that I should come back after I finished my PhD.

At the time I joined, DeepMind had a string of successful projects combining RL and search, beating champions at games like Go, and playing competitively in StarCraft. There was a growing strand of research on playing games without prior knowledge, by learning world models. And there was a sense that these methods were on the path to real generality. It felt like a magical time.

But there was also a growing wave of enthusiasm for large language models, initially driven mostly by external work. Actually, from early on in my time at DeepMind, my collaborators and I were interested in how these models achieved things like in-context learning, and how pretrained models could help accelerate RL by conveying useful abstractions. But these were something of a minority interest within DeepMind. There was a strong contingent of researchers focused on building the foundations of intelligence on RL alone — and arguments (not entirely incorrect) that imitation could not scale arbitrarily, and RL would be a necessary piece of achieving real intelligence. Because of this, I think many at DeepMind were caught off guard by just how far imitation learning could take you.

Thus, as successes like the launch of ChatGPT became harder to ignore, the company undertook a massive pivot. DeepMind, which had been somewhat independent, merged with Google’s other AI research team (Google Brain) to become Google DeepMind. Before this change, and more dramatically after it, there was also a strong shift in culture — a stronger focus on Gemini, and more pressure to unify disparate efforts that were working towards similar goals. And a less adventurous, more corporate feeling. And much more bureaucracy around doing research. There were also many departures.

But ultimately, I think that many of these choices were necessary for the company to remain at the frontier. I have been glad to play a part in some larger scale projects on training state-of-the-art video-game agents and language models, and I have been grateful to be able to continue spending much of my time doing basic research. I am especially grateful to my managers and others who have supported me (especially Felix Hill, Murray Shanahan, Shane Legg, Tim Harley, and Michael Terry) for helping maintain space for that.

I have certainly considered moving to a different company (or even academia) at times over the past five years. But ultimately, there are a few factors that have kept me from leaving thus far. The first is the diversity of topics I get to work on, and the diversity of topics that the company works on. I appreciate that in addition to language models DeepMind still works on tackling science problems from biology to weather prediction. And I appreciate that I’ve been able to work on topics ranging from reasoning and language to vision, and methodological arguments about cognitive science. The last, and probably most important, is the people — I have many amazing collaborators and friends that I would have a hard time leaving behind. So, at least for now, I’ve stayed.

So with that personal context, here are some comparison points between academia and industry as I see it.

Differences between academia and (large-scale) industry:

Experience levels & mentorship

Academic teams are very junior-skewed; as an academic you will be the most (or nearly the most) experienced person around, and spend the majority of your research time mentoring people less experienced than you. You will get to build deep, years-long relationships with your PhD students.
By contrast, industry research teams are very senior-skewed. When you join, you will spend much of your time working with people who have years of industry experience (and/or experience as faculty). Thus, you will be among the least experienced on the team, and while you will have unique expertise to contribute, you will also have to learn a lot from those around you. While you may have the opportunity to mentor others (interns, or other employees), it will not be the primary thing that you do at the beginning of your industry career.
Depending on the extent to which you value mentoring others, or learning from them, you may prefer either of these options. (Of course, both jobs do have some of both components, but the relative weighting is very different.)

Diversity of backgrounds

Researchers at leading industry labs often come from very diverse backgrounds — at DeepMind I’ve collaborated with people with PhDs in physics, mathematics, neuroscience, philosophy, linguistics, etc.; as well as people without a PhD who came to the job with different skills such as engineering, and built their research expertise on the job.
By contrast, academic researchers in a particular department often (though not always), come from a relatively narrower set of backgrounds and perspectives.

Diversity of roles and teams

Different teams within industry (even within the same company) can function in radically different ways. Some teams have strong top-down goals that they are trying to achieve, while others do more basic, bottom-up research. So, it’s important to be aware of what type of team you’re joining — be inquisitive about that in your interviews.

Organizational scale

The largest industry research organizations are much larger than any particular academic department — think thousands of researchers, many of whom have PhDs, compared to usually tens of faculty (plus maybe hundreds of postdocs and grad students at most) in a given department. This, together with the diversity of backgrounds mentioned above, means there are far broader options for collaboration.
Correspondingly, the scale of industry can be a bit overwhelming: if I am not careful, I could spend my full time going to talks or meetings that are relevant to topics I’m working on, and never have time to do any work myself.

Scale and speed of research

In part because of the expertise levels, industry research moves faster than academic research. This can either be exhilarating or exhausting, depending on who you are.
Industry also has many larger-scale projects, with tens, hundreds, or even thousands of contributors. Obviously, such projects can be an effective way to achieve moonshot goals. But they require a very different mindset for work, which is less focused on individually owning a research portfolio.

Scope of impact

Large-scale projects can make it easier to have a large impact. In industry, it is far easier to work on something that billions of people will use, or even something that might win the Nobel prize.

Advocating for one’s work

In both academia and industry, a portion of one’s job is to advocate for one’s work.
In academia, this includes writing grants, creating packages for tenure or promotion, etc.
In industry, this similarly takes the form of writing packages for promotion, and advocating for why your work should get resources. Many large-scale industry projects started as small ones that were given more resources because they demonstrated some preliminary success — much like the pilot experiments one might describe to convince grant reviews.

Work-life balance

In my experience, most academics work quite a lot; they respond to emails at odd times of nights and weekends, and even seek out those quieter times for focused work like writing. It’s to the point that when someone wrote an article saying that academia should be treated like a normal job they were widely criticized on bluesky for not understanding that academia should be a vocation.
People do work hard in industry too, but in larger companies it can be much more of a ‘normal’ job — go to the office 9-5, and then disconnect on evenings and weekends. Of course, near a deadline people will work harder, but in general my perception is that industry researchers tend to have more of a life outside of work than academics do. In large companies, there can also be flexibility for working abroad from time to time, or even relocating to another city or country while keeping the same job (I’ve done this!).
I personally find myself to be a more effective and creative researcher when I work a reasonable amount, not all the time. I don’t think that’s incompatible with being passionate about one’s job.

Research freedom

When I started in industry, it felt like the research freedom was comparable to academia. That’s still somewhat true, in the sense that nobody will stop me from running some small scale experiments on something I’m interested in. But it has gotten harder to get large scale resources for projects that don’t have an obvious route to major impact — whether that impact might be making language models better, or scientific impact on something like protein folding. (Note, however, that my definition of “small scale” resources that I can access readily might seem large scale to most academics.)
More importantly though, it’s become more difficult to publish research that has even a hint of potential to impact large model training. Unfortunately, publishing is a prisoner’s dilemma — once one industry lab started to defect, it became less feasible for others to make a different choice. Thus, strategically, this shift makes sense for the company. (And of course, we’d all like our own company to have the best models too.) But I think for many of us, it is nevertheless a development that was hard to anticipate when we joined, when the field was mostly focused on publishing and communicating our work.

Stability

The shift in publishing mentioned above is one example of the more general point that industry careers have less promise of stability than tenured academic jobs. If the company needs particular kinds of research to happen, it will be part of your job to make them happen. If the field changes, your job will change.
Of course, it takes a long time to get a tenured position in academia — and the career I’ve had while employed in industry post-PhD has been much more stable than my friends who have gone through several postdocs before finally entering the tenure track; as far as I know, none of them are yet up for tenure. And at least until tenure, there will certainly be pressure to do research that your organization and your peers consider valuable.
And recently, even academia seems less than perfectly stable, unfortunately.

Possibility for change

The flip side of industry careers being less stable is that it’s much easier and more acceptable to change companies. Relatively few people stay at a single company for their entire careers.
This includes returning to academia; a number of my colleagues in industry have made that choice.

I hope this post will be useful for those thinking about their own career decisions! I think my philosophy in making most career decisions is that there are not really incorrect choices — everything will have its advantages, and if you focus on taking advantage of those you can make any choice the right one for you.

Be wary of assumptions in impossibility arguments

Andrew Lampinen — Tue, 13 Jan 2026 15:26:03 GMT

Because of all the recent excitement about AI, it is à la mode to try to deflate the hype. I think that this is a healthy impulse — many developments (and even more so, products) are indeed overhyped. However, the desire to push back on the hype can lead, in turn, to over-hyped arguments that AI is impossible, that language models must always hallucinate, that their reasoning traces must always accumulate errors, etc.

Here, I want to illustrate a particularly common pattern of failure I see in these arguments: proofs that are logically correct, but where the assumptions from which the proof begins make the proof fundamentally misaligned with the real-world question it is trying to address.

In particular, it is common to see proofs that, when they attempt to formalize some messy, real-world structure — e.g., what it means to “approximate cognition” or “avoid hallucination” — introduce unjustified assumptions about how hard this problem is. The proofs then rely crucially on these assumptions. But making formal statements about the real world is a tricky business; especially when discussing something as abstract as cognition or hallucination. And unfortunately, it is far easier to validate the logic of a proof, than it is to validate its assumptions.

After a brief interlude on Euclid, I’ll show a few examples below.

Euclid and axiomatic assumptions

Let’s start with a well-known example of why one has to be careful with assumptions. Euclid famously stated 5 postulates (axioms/assumptions) for geometry; from these, he was able to derive many other geometrical inferences. The 5th postulate has to do with (non-)parallel lines, and whether they will intersect; essentially, it says that if two infinitely-long lines are not parallel, then they will have an intersection point. The converse is that, if two lines are parallel, then they do not intersect.

While this seems intuitive from a diagram, it turns out that one can construct geometries for which each direction of this postulate are violated. On a positively curved surface like a sphere, parallel lines can intersect; on a negatively curved surface like a hyperbola, lines that are not parallel may nevertheless fail to intersect. (Diagram below, sourced from the wikipedia article on the 5th postulate.)

Correspondingly, on these non-Euclidean geometries — including, for example, the surface of the Earth — theorems derived using the 5th postulate may not hold. This illustrates how the assumptions we make before we begin to prove something, even if they are intuitive, can cause logically valid proofs to mislead us about what is possible in the real world. A proof is only as good as its assumptions.

In the next few sections, I’ll show some examples of recent proofs about AI or cognition that I believe similarly rely on assumptions that are too strong for the real world.

Are machine learning approaches to cognition intractable?

A recent paper argues that achieving an algorithm that can approximate human behavior in general (‘AI’) through learning is intractable (i.e., that it is NP-hard). When the preprint came out, I (among others) criticized the assumptions on twitter; others have since made more formal arguments along similar lines. Here, I’ll outline the crux of my argument about why this proof is misleading.

The intractability proof follows a fairly-standard proof-by-contradiction-via-reduction-to-a-known-case approach. The way these types of proofs work is to begin from a problem that is known to be intractable — in the present case, a result by Hirahara about learning to approximate arbitrary random functions better than chance. (Hirahara’s proof relies on the difficulty of inverting pseudorandomness.) The proof that achieving AI through learning is intractable then takes the following approach:

Define what it would mean to be able to achieve AI by learning.
Assume that AI-by-learning is tractable.
Prove that under (1) and (2), Hirahara’s setting would also be tractable; thus, there is a contradiction.

I think that the proof as stated in the paper is logically valid. However, I think that the way it defines AI-by-learning — and particularly the statement that this proof implies something about approximating cognition — is deeply misleading.

In particular, the paper assumes that approximating cognition requires a learning system that is extremely strong: one that can learn any arbitrary function. It does so via a kind of bait-and-switch in assumptions between the distribution to be learned that is described in the informal statement of the proof, and the distribution that is assumed in the formal proof. In particular, the informal statement imagines that there is an engineer who is trying to learn to approximate human behavior:

Excerpt from the paper’s informal statement about what the theorem proves.

However, the formal statement assumes that such an engineer would need a learning algorithm that can approximate any arbitrary distribution of behaviors.

Excerpt from the paper’s formal proof.

This latter assumption is crucial to the proof, because the arbitrariness of the distribution to be learned is precisely what allows showing such a learning system could tractably solve the problems that Hirahara showed were intractable.

The problem is that the formal statement is very different from the informal statement that the authors use to market their claim. In particular, if human-like behavior is “simple” in some sense it could easily be tractable to learn to approximate that behavior, without tractably solving the problem considered by Hirahara.

As an intuitive example, imagine that human behavior could be described in all cases as a linear regression. In that case, it would be tractable to approximate human behavior, and yet there would be no contradiction with Hirahara’s proof.

Of course, human behavior is not that simple. But in order for the author’s proof to convey the claim as informally stated (e.g., in the abstract: “Yet, as we formally prove herein, creating systems with human(-like or -level) cognition is intrinsically computationally intractable”), one has to assume that human behavior is as complex as arbitrary pseudorandom functions — and there is no clear reason that we should assume that.

Are LLMs fundamentally bound to hallucinate?

Another recent paper surveys hallucinations and other failures of language — and proves several statements about fundamental limitations on them. I think that the survey portion of this paper does a reasonable job of articulating some of the issues in this space and their causes. However, I think the “fundamental limitations” proofs, like the proof above, are based on overly strong assumptions about the structure of the world.

As an illustrative example, consider the key assumption of Theorem 4 in the paper:

Excerpt from the paper: Theorem 4

As the theorem notes, this is about sample complexity for learning arbitrary, random facts that are each sampled independently. That is, it is an assumption that there is no structure in the world to be learned — for example, that the facts “Language Models are trained to respond in natural language,” “there exist AI systems that can respond in natural language” and “computers can sometimes answer natural language questions” all have completely independent answers. In a world where facts were all independent and unstructured like this, it would indeed be difficult to avoid hallucinating. Fortunately, the real world has much more structure.

The other proofs in this paper run into similar challenges, e.g. for Theorem 2 the authors demonstrate that there exists a function f’ on which all language models hallucinate on (countably) many inputs. However, they do not attempt to argue that this function corresponds to any real world structure, or anything else that we would want models to learn to approximate. That is, these proofs tell us something about how LLMs might fail in the worst of all possible worlds — but on their own, they can’t convince us that we live in such a world.

Do LLMs always get more error-prone the longer they reason?

Similar issues apply to Yann LeCun’s famous argument that language models must accumulate increasing error over a reasoning trace, which relies on the assumption that errors can only accumulate over a reasoning trace:

However, as subsequent work has noted, reasoning can also verify intermediate steps to identify and correct errors, or backtrack to try a more productive strategy. Thus, error rates are not monotonic over the trace; hence, the assumption of the proof does not hold. In fact, RL-trained LLMs often show increasing accuracy as their reasoning traces get longer.

This shows how even seemingly-reasonable assumptions about the algorithms we are using can mislead us.

What to look out for

The examples above show how proofs about the difficulty of real world problems often rely on assuming that the problems are very difficult; for example, that there is no structure to be learned. These assumptions may not clearly match what we know about the world. By smuggling difficulty into these assumptions, it’s easy to construct a proof, but it’s unclear how much it tells us about the real-world problem. So, when you see an impossibility proof, make sure to ask: what is it assuming, and can those assumptions be justified on the basis of what we know about the world?

What cognitive science can learn from AI

Andrew Lampinen — Mon, 05 Jan 2026 18:05:00 GMT

I get asked a lot of questions about the relationship between AI & Cognitive Science, especially from early-career researchers wondering where their work might fit into the rapidly evolving fields. This is the third in a series of posts where I aim to lay out my current thoughts on the relationship between these fields — and the career options in them — in the present research environment. Everything here is my personal opinion; almost any claim I make some researchers would stridently disagree with. So take it with a grain of salt.

In the past two posts, I’ve laid out where I think cognitive science can and can’t be useful for AI. Here, I want to look the other direction, and talk about lessons that I think cognitive (neuro)science can learn from AI. This post will largely follow a preprint that Wilka Carvalho and I wrote arguing for a similar perspective, though here I’ll give a more succinct argument with a slightly different focus.

What should cognitive science learn from AI?

At a high level, I think that cognitive science can learn from AI that:

Scale and richness of learning experiences can fundamentally change learning & generalization.
Thus, the solutions to toy problems are not necessarily the best solutions for problems at scale — especially if the toy problems are constructed to demonstrate one’s pet theory (be careful of “setting your own homework”).
And the computational solutions to tasks individually may be quite different from the solution that solves them all relatively well.

Natural intelligence operates in the regime of learning at scale, and solving many tasks. But most cognitive models focus on solving an extremely narrow task — a single, simple experimental paradigm. This mismatch can limit our ability to achieve understanding. As I said in the first post of this series, an adaptive system can be like a mirror — if you’re not careful, the system just reflects the structure in which you test it.

Thus, in order to understand such a system, it’s important to understand how it adapts across a broader range of tasks.

Richness & scale of experiences change learning & generalization

The first point I want to make is that when the scale and richness of learning experiences are enhanced, they tend to enhance what is learned and how the system generalizes. For example, when a system learns in a broader set of situations (scale) and those situations have more of the perceptual or motor complexity of the real world (richness), what it learns is fundamentally different than what a model will learn in a toy setting. At some level, this point is intuitive, and relates to well-established phenomena in cognitive science (and in literature, as with the taste of a madeleine dipped in tea demonstrating the power of multi-sensory memory). But, the last 10 years of progress in AI have been strongly driven by this lesson.

Because there can be many confounds introduced in uncontrolled comparisons of scale and richness of learning, I’ll illustrate my points with a few examples from controlled experiments in AI. (See also a recent commentary where I discuss these issues.)

In a recent study, Misra & Mahowald studied how language models generalize to a particular linguistic structure, in cases when it never appears in training. They found that the models generalize from related constructions (e.g. components of the novel construction) that they have encountered in training. Crucially, they found that this generalization was enhanced by the variability in the semantic fillers observed in these structures during training. This shows how real, structural generalization can be enhanced by richer data.
In a study of causal reasoning & generalization, we similarly showed that the ability of an agent to generalize causal reasoning strategies to novel causal dependencies (never observed in training) depended crucially on the variability of the causal structures that had been observed in training. We also showed that features that enriched the learning with other cues, such as natural-language explanations, could enhance that generalization (as we’ve also found in prior work).
In a study led by Felix Hill, we showed how compositional generalization of language can be fundamentally altered by the richness of the environment in which it is learned. We took the same compositional train-test split of language, and experimented with how generalization was affected by the setting in which models learned the language. We found that agents that learned language grounded in an interactive, 3D environment generalized much more effectively (100% accuracy) than agents that learned simply from images, and likewise agents trained in 3D generalized better than those trained in 2D. This shows how richer features like embodiment might affect structural generalization. (I’m disappointed to say that I’ve seen little evidence from controlled experiments that grounding or embodiment per se improve language models when evaluated in language alone… though perhaps it’s yet to come.)
Raventós & Paul show that, when training a transformer model model on an in-context linear regression task, there is a phase transition in generalization as the diversity of training tasks increases. Before this transition, the model essentially behaves with a memorization-like prior (a mixture of tasks seen in training); after the transition, however, it closely approximates the true prior over all tasks, including unseen ones.

These examples show how having a greater scale of examples to learn from, and richer features in those examples, can change what a system learns, and enhance its generalization. And there are many examples of how features like scale (e.g. of number of exemplars) and richness (e.g. of backgrounds behind objects) can enhance learning and generalization for natural intelligences like humans.

Systems that solve many problems are different from those that solve individual problems

I believe that the observations above are behind a fundamental shift in the way we approach solving problems in machine learning. In some sense, AI is now doing a much better job of solving general classes of problems like answering arbitrary questions in natural language, or identifying the objects in natural scenes, than prior systems did at solving much narrower problems like coreference resolution or classifying individual categories. Indeed, state-of-the-art methods for solving narrow linguistic problems now involve taking a general pretrained model, and adapting it to the specific tasks — just like state-of-the-art methods in computer vision for the last ten years have involved starting from models trained on large datasets, and ever more general tasks. We’ve moved from hand-engineering solutions to particular problems, to largely relying on general learning at scale, with a bit of task-specific learning at the end.

Why is it easier to solve a much more general problem — make a system good at many things, and then specialize it for some particular task — than it is to solve a narrow, specific problem? I’d argue that it’s due to precisely the type of pattern I’ve highlighted above: that systems that learn from a larger scale of richer data learn fundamentally different (and better) features and inferential processes than those that are trained in simpler, smaller-scale settings.

Toy problems, and setting your own homework

In the first post of this series, I mentioned that minimalistic task design is often a design principle of cognitive (neuro)science, but one that also might be holding the field back, by focusing our understanding of systems onto narrow settings. I think that the discussion above offers a different perspective on these issues: if richer and larger-scale settings can fundamentally change what a system learns and how it generalizes, than studying learning & generalization solely in toy settings may mislead us.

A particularly insidious case of this is when a toy problem builds in precisely the structure the researcher is looking to find in the natural system (which Daan Wierstra once described to me as “setting your own homework”). For example, if a researcher believes that the mind does something like “logical inference,” they will naturally design a minimal, toy task that is fundamentally structured as a logic problem, then show that their system does well (perhaps better than a neural network). Of course, where humans also perform similarly, this is suggestive of an interesting capability to mimic such a system in some capacity. However, as highlighted above, the solution to many problems may be different from the solutions to individual ones. And this is particularly problematic when most of the problems that humans solve do not as obviously have the clear structure (e.g., logic) that the toy problem does. Of course your logic system works well on logic problems, but how does the human mind know when to use strict logical inference and when to use other processes? And does it really do strict logical inference, or might it be something not so different from what language models do? Knowing what a system does in a limiting case gives us only a very narrow window into how it works.

For this reason, Wilka & I recommend that researchers separate the creation of benchmarks that define a problem from the proposal of solutions, and keep one fixed when changing the other. We think that toy problems are a useful starting point for investigations, but we also think that it is important that candidate models of cognitive phenomena can be scaled to solve many problems, to show that the solutions can scale and what that scaled implementation might look like.

(I don’t want to be overly critical of others here; I think that this is something I have done a poor job of at many times in my career — I have too often designed test settings alongside the algorithm that I believed would solve them, without demonstrating that the solution can scale — and this is something that I aim to do better in present and future work. I do think there are things to be learned from these works, but I think they are less general than I hoped they would be.)

In the last sections of our paper, Wilka & I propose an approach to building theories in cognitive science that I think addresses these challenges.

Building naturalistic paradigms without giving up experimental control

We argue that cognitive science needs to build towards more complex, naturalistic experimental paradigms that can capture the broader range of variables and situations across which a theory would be expected to hold. While the idea of naturalism is often pitted against the possibility of experimental control, we argue that it is possible to do theory-driven experimental manipulations in naturalistic experiments — maintaining control of the variables of theoretical import, while varying the many others that may interact with them to ensure that the effects we discover are general.

For example, using machine learning tools, programmatic methods, and/or human effort, it is possible to take a widely varying set of natural stimuli, and augment them to capture the effects of interest — even in unnatural ways that go far outside the natural distribution, as when Robert Geirhos and colleagues artificially constructed stimuli with the shape of one object, but the texture of another (though cf. this more recent paper).

Figure from our paper, showing conceptually how Geirhos’s shape-texture mashup stimuli (bottom right) go outside the natural distribution.

Other researchers have done similar experimental manipulations like augmenting real stories with unnatural grammatical structures, or sampling variations of maps for a videogame environment that systematically vary particular features — showing how data with rich natural structure can be enhanced with systematic manipulation of key variables.

But how can we tightly link these naturalistic experiments to our models and theories?

Building theories that combine naturalistic task-performing models with reductive understanding

There is a tension between building large-scale task-performing models, and building understanding as cognitive scientists. Our resolution of that tension is that cognitive scientists need to do both, and link them more closely than they usually are.

We think that simple models on toy problems (or other methods of reductive understanding, such as analytic theory) are one essential piece of cognitive theories, but not the whole of it. Instead, we argue that these simpler components need to be grounded in models that can really perform the same range of naturalistic tasks as the natural intelligences do, in as similar a range of naturalistic settings as possible.

The reductive component of such a theory helps us to achieve the abstract understanding that cognitive scientists seek. But crucially, tightly coupling these theories to models that can really perform the tasks in question allows us to see that the solutions are scalable, and can address the full range of behaviors, rather than simply those that match the theory exactly.

Importantly, we are increasingly getting the tools that allow us to make these links between simple and scaled-up models precise. One particularly promising example is Distributed Alignment Search by Atticus Geiger & Zhengxuan Wu (and others), which allows making precise causal links between simple, structured causal models and the distributed representations learned in large machine learning models. But these interpretability-based methods are not the only route. There are also many other paths to achieving reductive understanding, such as theories of deep learning that elucidate cognitive phenomena (for example, in semantic cognition), that can still tightly link to the task performing models.

Summary

I think that cognitive science can learn from AI about the importance of scale and richness of learning experiences in generalization. That should make us concerned about overly narrow experimental paradigms and models, and push us to build more naturalistic ones. Even in naturalistic settings, we can maintain experimental control. And we should aim to build theories that tightly couple reductive, abstract models to ones that really perform the naturalistic tasks that the natural intelligences do.

For a much deeper discussion of these topics, and the challenges and limitations therein, and the prior literature, check out our preprint (and keep an eye out for a new & improved version, hopefully sometime soon).

How cognitive science can contribute to AI: methods for understanding

Andrew Lampinen — Tue, 23 Dec 2025 17:02:50 GMT

In a prior post, I’ve laid out some of the directions that try to directly build on what we know about the brain in AI, and explain why I think they haven’t generally been the most effective. In this post, I want to lay out a few areas where I do find cognitive science to provide consistently useful perspectives, both in the research I do and in the field more broadly.

How can cognitive scientists help in AI research?

At a high level, the places I’ve found my background in cognitive science to contribute most to my work in AI are:

Analyzing a complex phenomenon by distilling it down to its bare essentials, and understanding levels of analysis.
Knowing how to design a good experiment, and thinking critically about confounds and alternative explanations.
Making sense of complex behavioral datasets, and going beyond single summary statistics like overall accuracy.
Taking high-level inspiration from the behavioral phenomena of natural intelligence to think about learning pressures.
Applying rational analysis — i.e., making sense of behavior in terms of rational adaptation to “environmental” (training) pressures.

(I don’t want to claim that these are the only areas where the cognitive sciences can contribute to AI; I’m merely listing the places where I, personally, have found it to be most useful in my work. I’ll suggest some other relevant directions at the end.)

Note that very few of these are about taking specific *insights* about the architecture or computations of natural intelligence and applying them to improve AI. Instead, these strategies focus on how to thoroughly understand the systems that we have (including their weaknesses), by using *techniques* similar to those we use in cognitive (neuro)science to understand natural intelligence. I think that cognitive scientists usually have unique strengths in making sense of the complex behaviors of a complex system, and that these strengths are often more useful than any particular insights we might have about natural intelligence.

In the remainder of this post, I’ll give a few examples from our work that illustrate some of these themes.

Rational analysis of in-context learning as adaptation to simple data pressures

Much of the excitement around language models (from the research side) started with the surprising observation that large language models somehow acquired the ability to do few-shot learning in-context. The field had recently been building lots of models that could do this, but they had been trained specifically for that task using techniques like meta-learning — that is, heavily engineering the data and training process to focus on the ability. It was surprising to find that models could learn it simply from natural data, without any of the heavy-handed meta-learning techniques in prior work.

My colleague Stephanie Chan (Ph.D. in computational neuroscience!) set out to understand how this could be. In particular, meta-learning approaches for few-shot classification relied on putting multiple examples of each same class into each step of training and randomizing the data labels every time a new task was introduced, to prevent the model from memorizing the answers and instead force it to learn from the examples. Obviously, language models trained on internet data don’t have that randomization or consistent data structure. So how could they learn how to learn something new?

The key insight was that several well-known properties of natural language data: its heavy-tailed, power-law distribution over words (Zipf’s law from linguistics: the idea that word frequency is inversely proportional to frequency rank), and its burstiness (i.e., the fact that if a word or phrase appears once in a document, it is much more likely to appear later in that document than in another random sample over the internet). Essentially, in a long, heavy-tailed distribution there are too many rare-but-occasionally-encountered things to readily memorize, and burstiness gives the necessary structure for the alternate strategy of learning in context. By creating controlled datasets in which these distributional factors were modulated, and training models from scratch, Stephanie showed that these two factors alone sufficed to give the emergent ability to learn new categories in context. (Stephanie also made a bunch of other interesting observations, like the fact that the optimal tradeoff between memorizing common things and learning novel ones occurs at a word frequency distribution with a power law exponent of around 1 — which is empirically the distribution in almost all natural languages.)

This study exemplifies distilling a phenomenon to its most elementary principles — the few data factors that suffice to give rise to it. The inspiration for what those principles should be came directly from what was already known about the structure of natural data from linguistics and other cognitive scientists. This work can also be interpreted as a kind of rational analysis; that is, an interpretation of why the in-context learning behavior is a rational response to the optimization pressures of sufficiently long-tailed data and local bursty structure.

This type of rational analysis has become increasingly popular, e.g. for understanding why chain-of-thought reasoning abilities are a rational response to local structure, and we’ve made a conceptual argument that this same kind of analysis can illuminate a much broader spectrum of in-context behaviors. Similar analyses based on Bayesian models have also been used to argue why transitions from in-context learning to memorization are rational. These analyses can also be applied to make sense of when models might fail, as in our new work that uses a similar level of analysis to argue that language models and RL agents fail to capture latent information in their training data, unless they have access to it via test-time retrieval (or train-time augmentation). In subsequent studies Stephanie and her collaborators have dug deeper into the mechanisms of in-context learning using intervention methods inspired by neuroscience.

These lines of work demonstrate a variety of ways that the methods and analysis techniques of cognitive (neuro)science can shed new light on how why language models do the things they do.

Explanations as a learning signal

One way I’ve often found cognitive science useful is as a source for thinking about learning signals. One thing that cognitive scientists — and especially developmental researchers — often do is to debate over which cues people rely on to infer things like the referents of words, or an entity’s goals. These cues can be implicit or explicit, and come in many forms.

A particular kind of cue I’ve been interested in over the years is the role of explanations as a learning signal. Explanations are thought to be incredibly important to human learning; they can shape what we learn and how we generalize. This kind of learning signal is often missing from classic AI methods like Reinforcement Learning (RL). However, explanations do appear in the training data of language models.

Thus, we embarked on a series of studies to evaluate how explanations could affect learning in AI. We first studied RL; we designed tasks that were very hard for agents to learn from rewards alone, and then showed that language modeling of explanations could enable them to learn. In settings with ambiguous rewards, providing different kinds of explanations could even change how the agents generalize out-of-distribution! We also showed that non-explanatory alternatives (such as true-but-not-currently-relevant statements about the situation) did not offer these benefits; thus, explanations have a special role to play.

In a subsequent study, we considered the role of providing explanations after the answer during in-context learning. Again, we designed careful experiments, and compared against carefully matched ablations (true-non-explanatory statements again, as well including the same explanations in a few-shot prompt, but applying them to the wrong problems). As before, we found that explanations could play a unique role.

Finally, we built on these studies and other past works to ask a fundamental question about language models: what can they learn about causal structures and causal reasoning from their passive imitation training? We showed how explanations, along with other data features like the inherent interventions behind the generation of internet text, can help to unlock generalizable causal reasoning even in the case of passive learning.

This series of studies illustrates how taking inspiration from a high-level aspect of human cognition, at a mostly behavioral level, can provide new perspectives on understanding and improving AI. A persistent theme across the papers is the use of good experimental design and careful controls to eliminate confounds and alternative hypotheses, such as the use of simpler sentence-level associations.

In the in-context explanations part of the work we also drew on particular analytic methods that are often used in the cognitive sciences: mixed models. These models allow appropriately allocating uncertainty to derive more precise effect estimates when there are many non-independent measurements (as when a human is tested on multiple problems, or when we evaluated language models tasks using multiple prompts combined with different explanation conditions). Although these methods aren’t as widely used in AI, they are increasingly relevant to making sense of the complex behavioral datasets produced from language model evaluations — and I hope they will be adopted more often. A nice intro to these methods, as well as many other principles of experiment design and analysis from cognitive sciences, can be found at experimentology.io.

Challenges of bridging levels of analysis in interpretability: representation biases and unfaithful simplifications

In his 1982 textbook on vision, David Marr articulated three levels at which a computational system could be analyzed:

Computational: What is the system’s goal and the high-level strategy by which it achieves it?

Algorithmic: How is this strategy implemented; what representations are used and how are they transformed to execute the computation?

Implementational: How can the representations and algorithms be realized physically?

These three levels of analysis have been substantially influential in computational cognitive science and neuroscience (and even in AI). However, many works since have highlighted challenges to the framework and interactions between these levels.

The levels framework, and its challenges, are useful context for understanding the goals and challenges of mechanistic interpretability—which like neuroscience, attempts to make sense of a system through its internal activity. In our recent works studying challenges of interpretability, we’ve drawn on these debates, and particularly the challenges in linking from representations and algorithms to the higher computational level.

For example, we’ve highlighted how biases in the representations that models learn — for example, biases towards simpler or more prevalent features over more complex or rarer ones, even when both features play an equivalent computational role — can mislead us about the computations of the system. Common analyses of representations (like PCA, or the loss used for training SAEs) implicitly or explicitly assume that stronger signals in the representations are more important, but we’ve found that this isn’t true. This highlights a complexity in the relationship between the algorithmic and computational levels that poses a challenge for making inferences from one to another.

Similarly, we’ve studied how simplifying a system to interpret it can lead to unfaithful interpretations that only reliably describe the system’s behavior on the training distribution. Many approaches to interpretability replace a model with a simplified proxy model; e.g., by treating soft attention as though it were hard, or (as above) assuming that only the dominant features matter. We show how these kinds of approximations can break down in edge cases, resulting in the simplified proxy model behaving like the original model in distribution, but result in systematically different predictions out of distribution. For example, in cases where the simplified model fails out of distribution, the original model may still perform well, or vice versa. Again, this mismatch stems from a failure to appropriately consider the mapping between levels, and in particular the idea that a computational description of a system might only be faithful to its algorithmic details over certain distributions.

These examples illustrate how applying frameworks from neuroscience can help us to think about research methodologies in AI, and their challenges. They also illustrate how reducing a phenomenon (e.g. the relationship between a system’s representations and its computations) down to its simplest instantiations (small models trained on simple data that we understand) can shed new insights into the phenomena at play in more complex settings.

Summary:

I think these examples illustrate the themes I laid out above: how cognitive science can teach AI about methods for probing and making sense of complex systems, through careful experiments, analysis, and reducing phenomena down to their simplest instantiations and models.

Of course, there are many other exciting directions that I haven’t touched on here, or explored as much, but where cognitive science could be especially impactful — particularly in thinking about how AI can be made to help people most effectively. For example:

Work on human-AI interaction that draw on perspectives from cognitive science for understanding social aspects, building interfaces that readily expose useful information without being overwhelming, etc.
Thinking about opportunities and risks for AI’s effects on mental health.
And opportunities and risks for its effects on democracy.
Work on AI for tutoring or education that builds on what we know about human learning.
Etc.

I believe that work on understanding the fundamentals of what AI systems learn, using techniques like those I outlined above, goes hand-in-hand with thinking about AI in more applied areas like these.

Why isn’t modern AI built around principles from cognitive science?

Andrew Lampinen — Tue, 16 Dec 2025 15:35:49 GMT

I get asked a lot of questions about the relationship between AI & Cognitive Science, especially from early-career researchers wondering where their work might fit into the rapidly evolving fields. This is the first in a series of posts where I aim to lay out my current thoughts on the relationship between these fields (and the career options in them) in the present research environment. Everything here is my personal opinion; almost any claim I make some researchers would stridently disagree with. So take it with a grain of salt.

In particular, I want to lay out a few areas where I find cognitive science to provide consistently useful perspectives, both in the research I do and in the field more broadly. I also want to contrast that with some of the directions that I think are often pursued, but that I personally believe may be less promising. This post begins by discussing some history, and my perspective on why current AI models are not mainly designed around principles from cognitive science or neuroscience. In my next few posts, I’ll take a more optimistic perspective, on where I find that my research in AI (and that of others) has gained a great deal from cognitive science.

A very brief history of AI & Cognitive Science

Historically, cognitive science and AI were tightly coupled fields, with insights in one quickly driving progress in the other. More recently, things have seemed more unequal. There has been an explosion of recent progress in AI. This progress has been driven by increased computational power, growing datasets, architectural innovations driven by machine learning problems rather than cognitive inspirations, and practical development of effective machine-learning frameworks with tools like automatic differentiation that make model development easy.

At least at a glance, most of these developments in AI do not seem to be driven by cognitive science. Instead, the recent flow has mostly been the other direction. There has been a torrent of observations that AI vision models predict activity in visual cortex in animals (or humans), and even aspects of its spatial organization; likewise, language models can predict human imaging data remarkably well, and reproduce known behavioral phenomena. Given these developments, cognitive scientists have suggested that AI progress refutes one of the dominant linguistic paradigms, suggests new perspectives on the constraints faced by the brain, and changes how cognitive scientists should approach research and theory-building. I’ll return to these themes of what AI might offer to cognitive science in another post. For now, though, I want to focus on the opposite direction: what cognitive science can offer to AI.

Why doesn’t modern AI build on what we understand about the brain more often?

We know a lot about the brain across many levels of analysis. We can describe the detailed biochemistry of neurons; we can identify how particular brain regions focus on different aspects of vision, language, memory, or cognitive control; we know how various aspects of intelligence develop from birth through adulthood; and we can explain computational principles that underlie aspects of subtle behaviors like pragmatic language inferences. Yet almost none of these insights has contributed anything major to AI. Why?

A snide answer might be that AI researchers just don’t know or care about cognitive science, but ignore it to their detriment. There may be some truth to this, but clearly it isn’t the whole picture — if for no other reason than that there are a *ton* of cognitive (neuro)scientists working in AI. The cofounders of DeepMind met in a computational neuroscience program. During my time in industry, there have easily been more people with PhDs in cognitive science or neuroscience working at DeepMind than the number of faculty at any university cognitive or neuroscience department in the world. There have been whole teams of such researchers. And that’s only in a single organization.

There have been plenty of papers over the years arguing that AI is missing core ingredients of human intelligence, and there are often works published in machine learning conferences that explicitly draw on such inspiration to propose new perspectives. For my own part, I’ve also spent plenty of time trying to engineer systems that incorporate principles of natural intelligence. I think there are plenty of things to be learned from such works.

However, in my works where I’ve drawn on explicit ideas from of cognitive science in designing architectures (and I think this often applies to the field more generally), I’ve often done so at a cartoonish level of abstraction that could in many cases be derived from first principles without needing a cognitive motivation (e.g., that a system should be adaptable to novel tasks based on their relationship to known ones, or that it might be useful to recall aspects of the past individually in detail). Even so, my works that build architectures around ideas from natural intelligence haven’t seemed to me to be the most successful or impactful things I’ve done.

And more generally, I’d argue that the current dominant paradigm in AI has not primarily grown out of approaches that designs systems on the basis of cognitive science.

The bitter lesson

(This section will probably not be surprising to AI researchers, but I think may be useful context for those coming from cognitive science who are less familiar with the theme.)

My own journey in my research has often reflected aspects of the bitter lesson that Rich Sutton (an RL pioneer) articulated about the spirit of modern machine learning vis-à-vis our own knowledge of thinking (emphasis mine):
“We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.”

Indeed, my most successful works have focused on ways of enriching training to help systems to learn more effectively, or understanding what it is that systems learn. I’ll discuss in a subsequent post how cognitive science can be (and has been) useful in those areas. But for now, I want to focus on one of the implications that I think AI has absorbed from this lesson — and why I think it is important for cognitive (neuro)science researchers who want to contribute to AI to understand.

I think that AI has been driven by the bitter lesson to be a strongly empirical field. In particular, it has learned to focus on learning-based methods that scale to tackle large-scale real world problems.

Of course, AI researchers (such as Sutton) are often interested in theoretical arguments and toy demonstrations in simplified settings. These can be the seed of major changes in the field. However, making a principled argument that some principles of neural anatomy or computation would improve AI will rarely shift the focus of the field without a scaled-up demonstration: not just a toy model on a toy task. As Sutton highlighted, building in knowledge always helps in small-scale experiments — particularly if those experiments are designed precisely to demonstrate the benefits of the principle in question — but in the long run building in knowledge often holds models (and research) back. It’s important to demonstrate that an idea works beyond a narrow, simplified setting.

However, in my experience cognitive (neuro)science has focused most on studying intelligence within small sets of relatively simplified tasks.

Minimalistic task design

Indeed, it is often a deliberate principle of experiment design in cognitive (neuro)science to focus on as minimalistic an instantiation of a particular challenge as you possibly can. Rather than studying planning in how someone organizes an event (slow, messy, difficult to control) we might study it in how someone navigates around a simple grid environment (fast, very few variables, controlled). Of course, this is a very reasonable choice. There are many benefits to studying in minimalistic settings: it makes it easier to ensure that there are no other variables at play, to understand the experiments, to analyze the data, etc. The field has made these choices for a reason.

But in my view there is a major obstacle to achieving full understanding via this approach. It means we often build tasks that test only the capability we’re interested in, in settings where that capability is the only perfect solution. That means we don’t really understand how or when that capability might get invoked more generally, or whether that capability is even the right way to describe the larger picture of what the brain does.

To make that more concrete, imagine that we were testing a perfectly adaptive intelligent system. When we gave the system a set of math problems it would respond just like a calculator would, and we might decide that it has a calculator like component. When we gave it a navigation problem, perhaps it would take longer to plan longer paths (or in settings with higher branching factor) but still succeed, and we might decide it is using some kind of search. If we tested it on a set of linear regression problems, we might decide that it was using least-squares. An adaptive system is like a mirror that reflects the structure we put in front of it; if we engineer a test around some simple structure, we’ll see that structure reflected back out. But are any of these the right description for what the system is doing? Without pushing into tasks that step outside of these simplified isolated paradigms, it’s unclear.

Because of this, we often build up deep understanding of isolated islands in the full space of natural intelligence — the way the brain works in specific regimes like navigating a grid, perceiving gabor patches, or learning about the (drifting) value of risky gambles — and to build up our bigger-picture understanding by extrapolating from these relatively few points in the larger scope of intelligence.

Understanding natural intelligence is hard

I don’t want this to sound too critical of cognitive science and neuroscience. Researchers in those fields are doing amazing work on understanding a very complex and difficult system that changes as you interact with it. That’s a very difficult job, and it only makes sense to try to restrict the scope of the problem. Despite the challenges, there are many scientists who are pushing the boundaries by testing out ever richer and more naturalistic paradigms, studying more complex interactions between different neural systems across settings, and covering broader swaths of task space. There are plenty of researchers arguing that current paradigms are inadequate for achieving full understanding of the brain, e.g., Nicole Rust’s great new book Elusive Cures: Why Neuroscience Hasn’t Solved Brain Disorders—and How We Can Change That. There have even been long-standing efforts to build cognitive architectures that more completely describe how the pieces fit together and allow simulating the full of scope of human cognition. But so far, the progress made in these efforts has seemed to me to not yet have the scope needed to really answer the practical challenges of AI.

Fragmentary, abstract understanding meets messy real-world challenges

Thus, I think that the fundamental reason that cognitive science has not contributed more to engineering modern AI systems is that our current understanding of cognition is often too fragmentary or abstract. There’s a lot we understand about the brain and mind in certain contexts, but we don’t understand how the pieces fit together to form the whole of adaptive intelligence at the level of detail that could actually help in practical AI development.

When I think about challenges of AI, I think they tend to occur at the messy places where many constraints intersect. For example, models may struggle to appropriately integrate visual and linguistic information on challenging problems, or may get confused when a domain uses different syntactic structures than usual, or when a problem seems like a classic brainteaser, but isn’t. These challenges seem to me to come from a more holistic failure of the many types of knowledge in the system to work together in just the right way to solve the task.

Of course, that may sound like a fuzzy and ambiguous statement — what couldn’t be interpreted as a failure of the system to work in “just the right way?” But that’s precisely my point: I think that the kinds of issues that AI systems often face are the messy problems involving how to appropriately represent and integrate many different types of information across many different types of tasks with very different structures, with as much generalization and as little interference as possible between tasks. Areas where many interacting factors intersect are necessarily hard to understand in simple terms. And because of that, these are precisely the areas where I think we understand how the brain actually works fairly poorly; certainly not well enough to apply it to building better AI.

A new hope

This might all sound a bit pessimistic. But in truth, I think it’s a great time for cognitive science to both contribute to, and learn from, AI. I think the problems the fields face are increasingly overlapping—as each tries to make sense of complex adaptive systems—and each field can learn from the methods, practices, and principles of the other. I’ll describe those directions in my next few posts.

Thanks to Wilka Carvalho for helpful feedback and comments on this post.

Hello world

Andrew Lampinen — Tue, 16 Dec 2025 02:12:35 GMT

I’ve been thinking about (and working in) cognitive science and AI for a while. I write papers and am (overly-)active on the-website-formerly-known-as-twitter and bluesky. But I’ve been wanting a place to share some some not-quite-fully-baked ideas or opinionated takes, write down answers to questions I get a lot, or share my interpretations of things in a more accessible format than in my academic papers — something a bit longer form than what I would post on x/bsky, but something not quite developed or serious enough for a paper. So here we are!

I intend to start with a series of posts addressing some common questions I get about the relationship between cognitive science and AI in the modern era.