Our robot will learn "affordances", sort of.
It will infer an object from its appearance. Then it will learn how that appearance predicts how the object responds to actions directed towards that object.
What it will learn is a mapping from object and action to consequence. We propose to represent the approximately 10d object, action, state mapping using and modifying the GMM library already in iCub. GMMs have the advantage of quickly learning high dimensional non-linear mappings. For motor actions we
modify use the Action Primitives library.
Vision, state estimation and data association will be done with IQR and some other bits (guys?).
As a final demo, it will play "golf" with the object to get it to a target location - hopefully it will do this at above random capability after learning with the object for a bit.
The demo will proceed in three phases:
The calibration phase may not be necessary if we can read stuff from files but there are two potential calibration scenarios:
- Obtain height of table using existing demo code that makes robot put its hand on the table and measure force response on impact. Once read this will be streamed to eye2world which will then provide the correct mapping to vision module (or it might just read this from files).
- Obtain offset mapping by making robot grasp an object and then repairing its orientation manually (again using force control). This is probably not relevant since we are learning a mapping anyway and not doing any grasping.
The exploration phase will learn a mapping by trying out different actions in our continuous domain of actions.
The golf phase will exploit that mapping by making inferences about the best actions to take for a given outcome and then just having a go, mate. Shall we chose an object and attempt to tap it into any other objects that exist in the visual scene?. This could make a nice demonstration of the multiple object capabilities of Zenon and Andre's data association abilities..
Module port: /iFumble/iconTroll/rpc --> Will expect bottles with one of the following formats: [cali] * cali (vocab): VOCAB4('c','a','l','i'), will CALIBRATE [expl] * expl (vocab): VOCAB4('e','x','p','l'), will EXPLORE [golf] * golf (vocab): VOCAB4('g','o','l','f'), will GOLF [quit] * quit (vocab): VOCAB4('q','u','i','t'), will QUIT or EATSELF (that's what 't','r','o','l','l','s' do)
Vision interface (iCStuff)
We do object recognition using population encoding.. (right?) ... then we send that off to get the data association done and there is localisation done.
We send our image u,v coordinates to eyeToWorld which calculates an x,y,z in the robot space. This gives the object position.
Therefore, the vision module exports the following interface:
Module Port: /iFumble/iCStuff/ObjectOut * This port is constantly streaming output of the form "OBJECT" Label x y z * It will only stream for those objects that are detected - nothing else will be streamed. * The x y and z are in robot pose space, as returned by eye2world.
Poking interface (iFumbly)
We'll be modifying ActionPrimitivesExample or CHRIS equivalent and exposing the interface to fiddle and fumble around!
Module port: /iFumble/iFumbly/rpc --> Will expect bottles with one of the following formats: [cali] * cali (vocab): VOCAB4('c','a','l','i') [fidl] x y z theta rstart ftheta rstop execTime * fidl (vocab): VOCAB4('f','i','d','l') * x (double): target good ol' x [m] (provided by eye2world) * y (double): target good ol' y [m] (provided by eye2world) * z (double): target good ol' z [m] (provided by the table height in fact) * theta (double): the hand angle used initially (orientation given by robot) [radians] * rstart (double): the initial hand distance to object [m] * ftheta (double): the final hand angle (orientation given by robot) [radians] * rstop (double): the final hand distance relative to the initial object position [m] * execTime (double): the reference velocity [s]
Module motto: "Fee-fi-fo-fum"
There is some hairiness with coordinate frames to be talked about: We don't know the orientation of the object so our mapping will be orientation independent but probabilistic - so if an object behaves differently according to orientation, we should get a multi-modal distribution. However, during the golf phase there is a natural orientation of our frame of reference is with respect to the target. On the other hand, during both phases there is a natural orientation with respect to the robot torso, the use of which may capture in the mapping something about how the iCub is able to move its hands. Therefore, the writer of this paragraph has decided to suggest that the frame of reference for the above action interface should be centred on the object, but oriented according to the fixed robot torso base.
Learning interface (iLearnStuff)
This is the Learning interface, not to be confused with the [Trajectory learner and replayer], a small part from which it is derived.
You can send this interface N-dimensional datapoints from which it will a build probability density function over the N-dimensional space. It will learn in an ongoing fashion, rebuilding its pdf representation as it needs to - but this will be opaque to you, the user, who will merely send datapoints on one port and ask for inferences on another port. The only thing you can't do is ask for inferences before enough data has been sent (at least as many datapoints as there are dimensions).
The inference occurs when you give it partial vectors and it infers a simple probability distribution over the remaining dimensions using Gaussian Mixture Regression - and returns the mean of this distribution.
For more information, google Gaussian+Mixture+Model, Gaussian+Mixture+Model+EM, Gaussian+Mixture+Regression .
Note that the learning will automatically decrease the number of Gaussians from a first guess of twice the number of dimensions, until it gets a result without singular covariance Gaussian components. This behviour should be hidden behind the scenes since the interface says nothing about Gaussians at all.
Bug: Data is being dropped from /iFumble/iLearnStuff/givemedata if it's coming too fast, and otherwise it seems to be replacing data with duplicates.
Module Port: /iFumble/iLearnStuff/givemedata * On this port you send the learner bottles of the form DATAPOINT objectlabel ndim dim_1_value dim_2_value ... dim_ndim_value * Note that this port can drop data if it comes in too fast because of the way that BufferedPort overwrites previous input as new input comes in. But we don't expect data to come in fast...
Module Port: /iFumble/iLearnStuff/rpc * To this port you can request an inference to be done by sending a bottle of the form: ** INFER objectlabel ndim comp_dim_1 comp_dim_2 ... comp_dim_ndim input_val_1 input_val_2 ... input_val_ndim ** So you have to give it a list of those dimensions being provided with the input vector, as well as those dimensions' values. ** Infer will then give one of the following responses: ** NOTREADY (if the learner hasn't learnt anything yet) ** UNKNOWNCOMMAND (if someone is being silly, or) ** REGRESSED ndim output_val_1 output_val_2 ... output_val_ndim * Also you can ask for data to be saved for each object: SAVEDATA objectlabel filename. ** Response is always OK, because no checking is done for success. ** Also you can ask for the learned map to be saved for each object: SAVEMODEL objectlabel filename. ** Response is always OK, because no checking is done for success. ** Sending the command RELEARN objectlabel forces the module to relearn the mapping ** - might be useful if the existing mapping is not working.
Code can be found in iCub/contrib/src/iFumble/iLearnStuff/SimpleGMM
- Check for duplicate data coming over /givemedata/ port and throw it away.
- Handle incoming data better - don't drop data if possible
Behaviours not yet implemented (would be via RPC):
- load-dataset name
- load-distribution name