Auto-associative Memory Specification

From Wiki for iCub and Friends
Jump to: navigation, search


The auto-associative memory (AAM) module - autoAssociativeMemory - is a simple from of episodic memory, i.e., a memory of autobiographical events. It is a form on one-shot learning and does not generalize multiple instances of an observed event. That functionality will be provided later by some form of semantic memory.

In its current form, the episodic memory is unimodal (visual). In the future, as we develop the iCub cognitive architecture, it will embrace other modalities such as sound and haptic sensing. It will also include some memory of emotion. This fully-fledged episodic memory will probably comprise a collection of unimodal auto-associative memories connected by a hetero-associative network.

Functional Specifications

See [autoAssociativeMemory module] and [episodicMemory module]

Implementation Considerations

AAM vs Content-addressable Memory

Strictly speaking, the functionality that is specified above is more like a content-addressable memory than an auto-associative memory (AAM). AAMs are typically implemented as recurrent artificial neural networks (e.g. a Hopfield network) and perform pattern completion as part of the recall function. That is, an input vector, i.e. an input image, that is incomplete or corrupted by noise or other distortion will recall the closest stored version of that vector pattern.

Our intention (expectation?) is to use colour histogram intersection as the matching technique rather than true neural network association. Consequently, the memory doesn't actually perform pattern completion; it just recalls the closest match in an existing repository of images. When recall fails (i.e. no image exists with sufficiently high match value), the image is then stored.

This then leads to two possible implementation strategies, one with recurrent neural networks, and one with conventional algorithmic image processing.

If you choose to opt for a neural network implementation, you should consider using colour histograms (see below) rather than raw images. If you do want to implement a raw image associated neural network, use a sensibly-chosed colour sub-space, such as the Hue-Saturation dimensions of HSI image representation, rather than simple RGB images.

Image Representation

In many circumstances, it is necessary to have an iconic memory of landmark appearance that is scale, rotation, and translation invariant (SRT-invariant) so that landmarks can be recognized from any distance or viewing angle. Depending on the application, a landmark can be considered to be an object or salient appearance-based feature in the scene.

For our purposes with the iCub cognitive architecture, translation invariance — which would facilitate landmark recognition at any position in the image — is not required if the camera gaze is always directed towards the landmark. This is the case here because gaze is controlled independently by a salience-based visual attention system.

There are three components of rotation invariance, one about each axis. Rotation about the principal axis of the camera (i.e. roll) is important as the iCub head can tilt from side to side. Rotation about the other two axes reflects different viewpoints (or object rotation, if the focus of attention is an object). Typically, for landmarks, invariance to these two remaining rotations is less significant here as the orientation of objects or landmarks won't change significantly during a given task. Of course, full rotation invariance would be best.

Scale invariance, however, is critical because the apparent size of the landmark patterns may vary significantly with distance due to the projective nature of the imaging system.

There are many possibilities for SRT-invariant representations but we intend to use use colour histograms as the invariant landmark representation and matching will be effected using colour histograms and colour histogram intersection, respectively [1, 2]. Colour histograms are scale invariant, translation invariant, and invariant to rotation about the principal axis of the camera (i.e. the gaze direction). They are also relatively robust to slight rotations about the remaining two axes. Colour histogram representation and matching strategy also have the advantage of being robust to occlusion.

There is one remaining problem: varying lighting conditions. To overcome this, an additional process is needed to be applied before the histogram stage to effect colour constancy normalization. Swain and Ballard [1,2] suggest using a three-dimensional opponent colour representation:

rg = R - G

by = 2 * B - R - G

wb = R + G + B

where R, G, B are the red, green, and blue components of a raw image. The rg, by, and wb dimensions have to sampled when computing the histograms. Typically there are 16, 16, and 8 histogram bins for each dimension respectively.

It might also be worth considering alternative image sub-space representations (e.g. hue and saturation in HSI image representations). This would reduce the dimensionality of the histograms and still address invariance to luminance changes.

Log-Polar Images

It is intended that the AAM module be used either with conventional Cartesian images or with Log-Polar mapped images. The advantage of Log-Polar images in this context are that they are effectively centre-weighted due to the non-linear sampling and low-pass filtered at the periphery. This should make it possible to effect appearance-based image/object recognition without prior segmentation.


See [episodicMemory application]

Future Work

As currently specified, this auto-associative memory is fairly simple and there are a few natural ways in which it could be extended or augmented.

Multi-modal Memory

One obvious requirement, especially in the context of the cognitive architecture attention sub-system, is the need to include aural information. One way to do this would be to extend the auto-associative memory to be a multi-modal auto-associative memory, with a composite audio-visual storage and recall. This has the disadvantage of necessarily associating sound and vision with every data set, even though no significant sound may be present for that image (and vice versa). An alternative would be to implement an explicit aural auto-associative memory and link them with a hetero-associative memory.


The second way in which this auto-associative memory might be extended would be to implement some form of generalization. At present, the memory simply does one-shot learning and similar images (or images of similar data) are not generalized. Such one-shot learning based memory is sometimes referred to as episodic memory while memory that consolidates multiple experiences of the same memory is often referred to as semantic memory. Together, they form (according to some psychologists) a form of explicit declarative memory. This is in contradistinction to implicit procedural memory which encapsulates temporal sequencing and skill-based learning.

The question this is whether this ability to generalize should be encapsulated or subsumed into the auto-associative memory. Neuroscientific evidence suggests not. For example, McClelland et al. have suggested that the hippocampal formation and the neocortex form a complementary system for learning [4]. The hippocampus facilitates rapid autoassociative and heteroassociative learning which is used to reinstate and consolidate learned memories in the neocortex in a gradual manner. In this way, the hippocampal memory can be viewed not just as a memory store but as a "teacher of the neocortical processing system." (from [3]). This suggests that the best way to proceed would be to implement a separate long-term semantic/generalized memory which takes as input the output of the current auto-associative memory (or, better still, the combined visual AAM, aural AAM, and visuo-aural hetero-associative memory).

There are more ideas on the development of the episodic memory, in particular, and the cognitive architecture, in general, here.


[1] M. J. Swain and D. H. Ballard. "Indexing via colour histograms", pp. 390–393, 1990.

[2] M. Swain and D. Ballard. "Color indexing". Internation Journal of Computer Vision, 7(1):11–32, 1991.

[3] D. Vernon, G. Metta, and G. Sandini, "A Survey of Artificial Cognitive Systems: Implications for the Autonomous Development of Mental Capabilities in Computational Agents", IEEE Transactions on Evolutionary Computation, special issue on Autonomous Mental Development, Vol. 11, No. 2, pp. 151-180, 2007.

[4] J. L. McClelland, B. L. NcNaughton, and R. C. O’Reilly, "Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory," Psy. Rev., vol. 102, no. 3, pp. 419–457, 1995.

Back to iCub Cognitive Architecture