Hetero-associative Procedural Memory Specification

From Wiki for iCub and Friends
Jump to navigation Jump to search


The hetero-associative procedural memory, together with auto-associative episodic memory and the affective state module embodying the system’s motives, is the principal mechanism by which the iCub accomplishes cognitive behaviour. As such, before specifying the functionality and behaviour of the procedural memory, we will very briefly review the underlying principles which form the foundation of action, perception, and cognition in the iCub. This review is abstracted from [1].

Background: The Nature of Cognition in the iCub

The iCub project has adopted the enactive approach to cognition, the five central tenets of which are the following.

  • Embodiment,
  • Experience,
  • Emergence,
  • Autonomy, and
  • Sense-making.

Enaction asserts that cognition is a process whereby the issues that are important for the continued existence of a cognitive entity are brought out or enacted: co-determined by the entity as it interacts with the environment in which it is embedded.

The term co-determination is laden with meaning. It implies that the cognitive agent is embodied and embedded in the environment and is specified by it. At the same time, it implies that the process of cognition determines what is real or meaningful for the agent. The effectively means that the system’s actions define the space of perception. This space of perceptual possibilities is predicated not on an objective environment, but on the space of possible actions that the system can engage in whilst still maintaining the consistency of the coupling with the environment and the system's autonomy.

So, co-determination means that the agent constructs its reality (its world) as a result of its operation in that world. In this context, cognitive behaviour is inherently specific to the embodiment of the system and dependent on the system’s history of interactions, i.e., its experiences. Thus, nothing is ‘pre-given’. Instead there is an enactive interpretation: a real-time context-based choosing of relevance. This is often referred to as 'sense-making'. For enactive systems, the purpose of cognition is to uncover unspecified regularity and order that can then be construed as meaningful because they facilitate the continuing operation and development of the cognitive system.

For an enactive system, knowledge is the effective use of sensorimotor contingencies grounded in the structural coupling of the system with its environment. Knowledge is particular to the system’s history of interaction. If that knowledge is shared among a society of cognitive agents, it is not because of any intrinsic abstract universality, but because of the consensual history of experiences shared between cognitive agents with similar phylogeny and compatible ontogeny. The knowledge possessed by an enactive system is built on sensorimotor associations, achieved initially by exploration, and affordances.

The enactive system uses the knowledge gained to form new knowledge which is then subjected to empirical validation to see whether or not it is warranted (we, as enactive beings, imagine many things but not everything we imagine is plausible or corresponds well with reality, i.e. our phenomenological experience of our environment). One of the key issues in cognition, in general, and enaction, in particular, is the importance of internal simulation in accelerating the scaffolding of this early developmentally-acquired sensorimotor knowledge to provide a means to:

  • predict future events;
  • explain observed events (constructing a causal chain leading to that event);
  • imagine new events.

Crucially, there is a need to focus on (re-)grounding predicted, explained, or imagined events in experience so that the system — the robot — can do something new and interact with the environment in a new way.

In summary, what you perceive depends on what you have done and on what you can do: your knowledge of the world depends on your history of interaction with the environment in which you are embedded. Furthermore, the purpose of cognition is to make sense of this environment: to facilitate effective prediction of events, construct explanations of observed events, and imagine or anticipate hypothetical events.

With this understanding of the nature of cognition, we are now in a position to specify the role, functionality, and behaviour of the procedural memory in the iCub cognitive architecture.

The Hetero-associative Procedural Memory

Procedural memory is a network of associations between events. We define an event to be a perception or an action. For the moment, a perception event is a visual landmark which has been learned by the iCub and stored in the episodic memory. An action event is a gaze saccade with an optional reaching movement, a hand-pushing movement, a grasping movement, or a locomotion movement. Since the episodic memory effects one-shot learning, it has no capacity for generalization. This generalization will be effected at some future point by the long-term 'semantic' memory and it may be appropriate then to link the procedural memory to the long-term memory. This could be paricularly relevant in instances where the procedural memory is used to learn affordances.

A clique in this network of associations represents some perception-action sequence. This clique might be a perception-action tuple, a perception-action-perception triple, or a more extended perception-action sequence. Thus, the procedural memory encapsulates a set of learned temporal behaviours (or sensorimotor skills, if you prefer). The procedural memory can be considered to a form of extended hetero-associative memory (hetero because the recalled information or vector is not necessarily in the same space as the information used to effect the recall).

Such a network of mutual associations can be effected in several ways. It might be represented as a graph in which each node is a perception or an action. If it is a perception, then it is simply the identification number of a landmark image event stored in the episodic memory. If it is an action, then it is simply the relative saccade coordinates together with a tag designating the action performed (possibly none). The arcs represent the likelihood of making a transition from one node to another. Alternatively, the network of mutual associations might be implemented by as a network of hetero-associative memories, each memory representing a specific event or sequence of events.

The procedural memory is accessed by presenting two events. Depending on the learned relationship between these events, the procedural memory will behave in a different way, recalling either a perception event, an action event, or a perception-action sequence of arbitrary length which ‘connects’ the presented events.

In the following, we will relate the procedural memory to different aspects of the intended cognitive operation of the iCub.

Prediction, Reconstruction, and Action: Learning Affordances

Every action entails a prediction about how the perceptual world will change as a consequence of that action. Equivalently, every pair of perceptions is intrinsically linked or associated with an action. So, if we think of a perception-action-perception triplet of associations (Pi, A, Pj), we can effect prediction, explanation, and action as associative recall by presenting (Pi, A, ~), (~, A, Pj), or (Pi, ~, Pj), respectively, to the procedural memory.

In principle, this triplet-based representation is very similar to the iCub framework for learning object affordances (see [Deliverable 4.1], pp. 16-20). Here, affordances are represented by a triplet (O, A, E), where O is an object, A is an action performed on that object, and E is the effect of that action. (O, A) → E is the predictive aspect of affordance; (O, E) → A recognizes an action and aids planning; (A, E) → O is object recognition and selection. It will be an interesting exercise to see how this affordance work can be integrated with the cognitive architecture, in general, and the procedural memory, in particular.

Scan-path based Object Representation

There is no explicit concept of objecthood in the iCub cognitive architecture. Arguably, however, parts of a visual scene assume objecthood when they present a persistent and stable pattern of salience. This stable pattern of salience can be encapsulated by a repeatable localized eye gaze scan path pattern and represented by a given (Pa, Ai, Pb … Aj, Pc) clique within the network of associations in the procedural memory. Object recognition then becomes a matter of associative clique retrieval based one all or part of the clique.


For locomotion, the procedural memory produces a series of scale-invariant landmarks that should be followed to take the robot from an initial position to a final goal position. These can be learned as the iCub moves about the environment, storing landmarks in its episodic memory as it goes. Since the procedural memory assumes the same image landmark representation as the episodic memory, it simply stores the episodic memory identification number. In the case of locomotion, the procedural memory action events connote the visibility of one landmark from another, and thus connote whether or not one can move directly from one landmark to another.

The initial position is input as a scale-invariant landmark image from the attention module: this will typically be the target object to which the robot has navigated. The final position is input also as a scale-invariant landmark representation from the episodic memory: this will typically be the first landmark the robot encountered on its exploratory journey in search of the target. The procedural memory will then produce a sequence of events (landmark images and movements) that the robot should follow to achieve its goal path.

There are two options open in generating this procedural sequence. The first is to retrace the landmarks encountered when searching for the target, in the reverse order in which they were encountered. The second is to determine a shortest path between the initial and final positions. The first approach requires no cognitive ability: it is simply a pair-wise association between landmarks. The second approach can be argued to offer a simple cognitive capability by prospectively seeking an optimal set of associations between landmarks, minimizing some overall cost of returning to the goal position.

Action Representation

So far, we have assumed that the action events that are stored in the procedural memory are relative gaze saccades with tags to denote movements (reaching, grasp & object contact, locomotion, or no movement). Recall that iCub cognition involves an implicit model of motion control, specifically the so-called motor-motor control model whereby the proprioceptive state of one set of motors implicitly defines and controls the state of another set during action. For example, a reaching or a locomotive action is specified by the gaze of the eyes: you reach where you are looking or you move to where you are looking. Consequently, it isn't necessary to store the detailed kinematics or dynamics of either locomotion or reaching actions in the procedural memory. Instead, it is sufficient to store simply a tag denoting the type of action. Gaze actions capture the spatial relationships between percepts and, together with the movement tags, specify the actions that perturb the environment, i.e. grasp and object contact motions, are typically object-specific and can be considered to be a form of proprioceptive image of the interaction.

Functional Specification

See [proceduralMemory module].

Future Work

  • Add an image output that shows the association strength between all perception and action events (Done. DV 7/1/01)
  • Add a decay attribute so that streamed input events persist for some finite period.
  • At present, all associations are based purely on the interaction history of the iCub. This needs to be augmented with a process whereby associations can be formed internally by the iCub to facilitate the 'imagination' aspect of cognition. One possible approach is to implement some form of Hebbian learning whereby events that loosely co-occur (i.e. that fire closely in time but are not causally connected) might be associated.
  • Allow some form of recursive definition of an event, so that an event itself could be some network of perception-action associations, and not just either an atomic perception or an action as it is at the moment, i.e., generalize the input to the memory to allow something more flexible that the current (Pi, A, Pj) triplet.


[1] D. Vernon, G. Metta, and G. Sandini, "Embodiment in Cognitive Systems: on the Mutual Dependence of Cognition & Robotics", Invited Chapter to appear in "Embodied Cognitive Systems", J. Gray and S. Nefti-Meziani (Eds.), Institution of Engineering and Technology (IET), UK, 2009.

Back to iCub Cognitive Architecture