VVV08 Interaction Histories

From Wiki for iCub and Friends
Revision as of 15:59, 30 July 2008 by Assif (talk | contribs) (Interaction Histories)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Interaction Histories

Particpants

Assif Mirza - IIT, Genova and previously University of Hertfordshire, UK

IHA Progress Update

Overview

Interaction Histories for Grounded Robot Ontogeny have been the subject of my PhD research work over the last few years and here at the summer school my goal is to demonstrate their capabilities using the simple interaction game "peekaboo".

Interaction History Architecture

Interaction History Architecture

An interaction history is basically a collection of past "experiences" of a robot that can be used to generate future action based on new experiences. At the heart of the architecture is a "metric space of experiences" where an experience can be compared with any other in terms of their informational-theoretic relationships. This comparison is a metric and so experiences can be "placed" in a space with distances relative to each other.

The Interaction History Architecture selects those experiences from the history that are "closest" to the current experience. Then it chooses one of those experiences and the action that was executed in the past just after that experience, and executes that experience. The choice is determined by proximity in the metric space as well as the subjective value of the experience in relation to the reward received in the past.

The architecture also includes mechanisms for "forgetting" and formation of proto-typical experiences. The first simply deletes those experiences with low subjective value, and the second merges together experiences based on thier proximity in the space.

Reward

Motivation feedback (reward) is provided through two mechanisms: observation of a face, and audio feedback.

Face

A face can be detected in the robot's camera image using OpenCV HAAR Cascades, and this provides direct positive reward. Habituation causes this reward to drop-off over time.

The reward for face detection, f, constrained to be in the range [0,1], is a function of the number of consecutive timesteps a face is seen. First the reward rises linearly, then holds at 1 for a period before decaying towards 0. f is calculated incrementally

Sound

Sound is captured from a microphone using Portaudio v1.9, and used both as an additional sensory signal as well as providing further environmental reward. The "energy" of the sound over the period of a timestep, Esound, provides a sensory input to the robot. It is calculated as the sum of the amplitude of the sound signal for every sound sample in a period of a timestep, and is normalized to take values in the range [0,1]. In converting Esound to a reward signal Rs, low level background noise is attenuated by taking the square of the sound sensor variable for all values below a threshold Tsound, above which the reward value is set to 1. Taking the square of the sound signal results in a greater attenuation of smaller values of the variable than larger ones thus effectively reducing background noise and emphasizing the reward when the sound is above the threshold.


Resulting Reward Signal

The final reward signal is a combination of the sound and face reward signals, as follows:

 R = max( 1, α( Rf + Rs ) )

where α, in the range [0,1] attenuates the reward signal. With α=0.5, R is the average of the reward signals, and with α=1, either of the reward signals can result in a maximum resulting reward. For these experiments, α=0.75, meaning that neither reward signal on its own can result in a maximum R, but requires support from the other reward signal.

Actions

Actions
Number Action Description
Movement Actions
3 HL Head Left
4 HR Head Right
6 HID Hide Head with Hands
8 RAU Right Arm Up
9 LAU Left Arm Up
12 RAW Wave Right Arm
13 LAW Wave Left Arm
14 TR "Think" Right raise right arm to chin and look right
15 TL "Think" Left raise left arm to chin
Facial Expressions
1 Smi Smile
2 Neu Neutral
16 Frn Frown
Resetting Actions
0 Rst All motors to resting position
7 NA No Action
5 HF Head to forward position
10 RAD Right Arm Down
11 LAD Left Arm Down

Peekaboo

The development of gestural communicative interaction skills is grounded in the early interaction games that infants play. In the study of the ontogeny of social interaction, gestural communication and turn-taking in artificial agents, it is instructive to look at the kinds of interactions that children are capable of in early development and how they learn to interact appropriately with adults and other children. A well known interaction game is ``peekaboo.

Defining a Peekaboo Sequence

A ``peekaboo sequence is defined to be a sequence of actions beginning with the robot hiding its face (action 6 - HID), followed by any number of ``no-action actions (action 7 - NA) and ending with the robot back in the resting position (action 0 - Rst). Furthermore, for the purposes of evaluating the results of this experiment the actions should be selected from previous experience rather than executed randomly.

Code

The code is in the iCub repository at:

 $ICUB_DIR/src/interactionHistories/modules/

and the application start-up/stop scripts are in

 $ICUB_DIR/app/iha_manual

The key scripts are:

To start:

 10_run_supporting.sh
 20_connect_supporting.sh
 30_run_and_connect_iha_processes.sh
 40_run_controller.sh

and to stop:

 70_stop_controller.sh
 80_stop_iha_processes.sh
 90_stop_supporting.sh


Processes

Here is a flow diagram showing how the processes in the Interaction History are interconnected.

IHAFlow iCub 800.jpg