Why (and how) embodied cognition matters to AI: a canned history
Caution: this is an irresponsible, incomplete, and self-serving sort of history the upshot of which is, essentially, that the work that I personally am doing is exactly the work that will make AI possible. This is useful for explaining why it is I do what I do (and offers the advantage of brevity and clarity), but is maybe less useful for understanding AI. For a somewhat more responsible version of a similar argument, please see my “Embodied cognition: A field guide“.
scientists suppposed that if we could only describe human cognition entirely in terms of logic–that is in terms of abstract symbols being transformed by abstract rules–then implementing this in a computer would be easy, since, after all, computers are just logic machines. However, it turns out that not only was it a bit difficult to describe people this way, but we operate according to a collection of more specialized processing mechanisms, and that logic is not the natural formalism to describe or implement these systems (even supposing that it were possible in principle to do so, which is a questionable supposition). It is the case that there were some early successes in logic-based AI–I am thinking of such things as the General Problem Solver (GPS), Automated Mathematician (AM), and EURISKO— but the artificial systems that resulted from this method were too abstract. They could “think” all right, in the sense of solve equations and make logical deductions, but they couldn’t do anything autonomously in the world, because their symbols were essentially meaningless to them, ungrounded in any experience. This is all cognition and no embodiment.
The reaction against this failure was to build systems from the ground up, that started and stayed grounded in their environments and were able to do actual things with real input. The primary form this reaction took was the building of systems like Braitenberg Vehicles, simple neural-network controlled robotic systems, and subsumption architecture based systems like Genghis and Attila, that learned how to do things like move about and react to simple stimuli in real or simulated environments. Well, these systems were grounded, all right, but (you guessed it) too grounded, unable to do anything other than simple things in particular environments. One of the attractive things about logic is that it is, like language, compositional; from a small set of elements, you can build up many different and highly complex structures. Neural nets are not compositional, and so they are inherently limited. All embodiment and no cognition.
So, what’s the right way to approach AI?
My approach to threading this particular needle–something I call action-grounded cognition, and which, I argue, is the true embodied cognition approach–is to adopt the structure of logic, that is, rule-based transforms of representations, but replace these abstract elements with their biologically-grounded counterparts. Thus, representations are not abstract, but defined in terms of the situated perceptual-motor abilities of the agent; and the rules are not like modus ponens, but specialized motor-based transforms (operations on affordances).

There is biological evidence that brain areas primarily associated with motor control are also used to support higher-order cognitive functions like working memory and language processing. (See, e.g. “How to study the mind: An introduction to embodied cognition“, and “The massive redeployment hypothesis and the functional topography of the brain“.)

Since affordances, the perceived availability of objects for certain kinds of interaction, aren’t just motor programs, but interpretations of the environment, this opens the possibility that the motor control system is also, already, a primitive meaning processor. This would offer one explanation of how it is even possible to leverage motor control to support and constrain higher-order processes like language understanding.

The pie-in-the-sky claim is that a motor-control system using action-grounded representations and biologically-based transform rules could both drive the behavior of an organism in its environment, and serve as the basis of higher-order cognition. This would solve the symbol-grounding problem, among other things.

I’ve already developed a theory of representation based on this approach (see “Content and action: the guidance theory of representation“, “A brief introduction to the guidance theory of representation” and “Representation, evolution and embodiment“), and am starting a project to understand motor-control, as it supports higher-order cognition, along these same lines, with the eventual aim of describing affordance-transform rules of the sort described.

Related approaches include studies of cognitive blending and conceptual metaphor, on the assumption that the ultimate source domains supporting conceptual blends or mappings are basic features of human experience like locomotion. (Consider the mapping: Life is a journey. The thesis is that this mapping guides us to think about life in the same way that we think about moving around in the environment.) These approaches are compelling as far as they go. However, neither of these approaches can explain how it is that base domains like locomotion are even available for conceptual mapping in the first place (without assuming that the source domain is in fact the already conceptually-interpreted experience of things like locomotion, which rather begs the question), and thus neither can offer as deeply and fundamentally grounded an account of higher-order cognition as the approach I am advocating.