Discussion about this post

User's avatar
Cognitive Drift's avatar

One thing AI has clarified is that generalization emerges from recursive compression across many environments, not from solving isolated paradigms cleanly. Toy tasks often capture a slice of behavior but miss the dynamics that determine when and how a system updates its representations. That gap seems central to the limits of many cognitive theories.

Jeff Bowers's avatar

I think I disagree with the following passage, or I’m misunderstanding it:

“In the first post of this series, I mentioned that minimalistic task design is often a design principle of cognitive (neuro)science, but one that also might be holding thefield back, by focusing our understanding of systems onto narrow settings. I think that the discussion above offers a different perspective on these issues: if richer and larger-scale settings can fundamentally change what a system learns and how it generalizes, than studying learning & generalization solely in toy settings may mislead us.”

Most cognitive neuroscience research assesses cognitive/brain systems of people who have experienced diverse set of stimuli in a wide variety of contexts (i.e., in the person’s life prior to the experiments). So, it makes sense to study the solution the brain has come up with experiments that test specific hypotheses. For example, if you test a specific hypothesis about vision by manipulating artificial stimuli, and you find that the model of object classification does not behave like a human following the manipulation, it suggests that the model is has found a different solution to classifying objects, no? The specific experimental setup used in CogNeuro is critical for characterising how the mind and the model work – not holding us back.

Of course, it is always possible that a domain specific model developed in CogNeuro only succeeds with artificial stimuli associated with a specific phenomenon, and an ANN does better with naturalistic stimuli (e.g., gets a better Brain-Score). In which case, both models are deficient as models of the brain. But I think there is a misunderstanding of what CogNeuro researchers are trying to do.

For instance, you write:

“Instead, we argue that these simpler components need to be grounded in models that can really perform the same range of naturalistic tasks as the natural intelligences do, in as similar a range of naturalistic settings as possible.”

But most researchers in CogNeuro are *not* trying to build a complete model of the visual system, etc. They are trying to characterise specific properties of the mind/brain, and when modelling these properties, they are trying to gain insights to these specific functions. ANNs that do well on Brain-Score need to behave like humans in these experiments if they are taken to be models of humans.

At the bottom of this post is a list of findings about human vision that any model of the human vision needs to contend with (the passage is taken from our response to the commentaries of our BBS article). You might want to argue that the brain adopted these specific solutions because it has been trained on a wide variety of contexts of naturalistic stimuli, and thus it is necessary to train your model this way. But they way to know whether you have succeeded in building an ANN that sees like humans is to assess whether the model shows these phenomena.

Finally, the problem with many ANNs of humans is that they are trained on orders of magnitude more data than humans. This suggests that researchers need to build much more innate structure in their models. I don’t think pretraining using back-prop is a reasonable proxy for evolution. For example, here is a link to a paper of mine that shows how the massive amount of training of LLMs shows that LLMs are missing innate inductive biases that humans enjoy: https://doi.org/10.1037/rev0000595

From BBS response:

“To give only the most cursory of overviews, the following findings should play a central role in theory and model building. The input to our visual system is degraded due to a large blind spot and an inverted retina with light having to pass through multiple layers of retinal neurons, axons and blood vessels before reaching the photoreceptors. Nevertheless, we are unaware of the degraded signals due to a process of actively filling in missing signals in early visual cortex (e.g., Grossberg, 2003; Ramachandran, & Gregory, 1991). We have fovea that support high-acuity colour vision for only about 2 degrees of visual angle (about the size of a thumbnail at arm’s length). Nevertheless, we have the subjective sense of a rich visual experience across a much wider visual field because we move our eyes approximately 3 times per second (Rayner, 1978), with the encoding of visual inputs suppressed during each saccade (Matin, 1974), and the visual system somehow integrating inputs across fixations (Irwin, 1991). At the same time, we can identify multiple objects in scenes following a single fixation (Biederman, 1972), with object identification taking approximately 150 ms (Thorpe et al., 1996) - too quick to rely on recurrence. We are also blind to major changes in a scene as revealed by change blindness (Simons, & Levin, 1997) and have a visual short-term memory of approximately four items (Cowan, 2001). Our visual system organizes image contours by various Gestalt rules to separate figure from ground (Wagemans et al., 2012) and organize contours to build representations of object parts (Biederman, 1987). Objects are encoded in terms of their surfaces, parts, and relations between parts to build 3D representations relying on monocular and binocular inputs (Biederman, 1987; Marr, 1982; Nakayama, & Shimojo, 1992). Colour, form, and motion processing are factorized to the extent that it is possible to be cortically colour blind (Cavanagh et al., 1998), or suffer motion blindness where objects disappear during motion but are visible and recognizable while static (Zeki, 1991), or show severe impairments with object identification while maintaining the ability to reach and manipulate objects (Goodale, & Milner, 1992). Participants can even classify objects while denying seeing them (Koculak & Wierzchon). Our visual system manifests a wide range of visual, size, and shape constancies to estimate the distal properties of the world independent of the lighting and object pose, and we suffer from size, colour and motion illusions that reflect the very mechanisms that serve the building of these distal representations from the proximal image projected onto our retinas. These representations of distal stimuli in the world support a range of visual tasks, including object classification, navigation, grasping, and visual reasoning. All this is done with spiking networks composed of neurons with a vast range of morphologies that vary in ways relevant to their function, with architectures constrained by evolution and biophysics.”

NeuroAI is useful to the extent that it provides explanations for these and countless other experimental findings reported in psychology and neuroscience. Otherwise, ANNs should be considered amazing engineering artefacts that work differently than humans.

4 more comments...

No posts

Ready for more?