Difference between revisions of "VVV10 iFumble"
|Line 13:||Line 13:|
The demo will proceed in
The demo will proceed in phases:
a mapping .
The golf phase will exploit that mapping by making inferences about the best actions to take for a given outcome and then just having a go, mate. ''Shall we chose an object and attempt to tap it into any other objects that exist in the visual scene?''. This could make a nice demonstration of the multiple object capabilities of Zenon and Andre's data association abilities..
golfphase will exploit that mapping by making inferences about the best actions to take for a given outcome and then just having a go, mate. ''Shall we chose an object and attempt to tap it into any other objects that exist in the visual scene?''. This could make a nice demonstration of the multiple object capabilities of Zenon and Andre's data association abilities..
TBD: Think about smart exploration.
TBD: Think about smart exploration.
Revision as of 21:14, 26 July 2010
Our robot will learn "affordances", sort of.
It will infer an object from its appearance. Then it will learn how that appearance predicts how the object responds to actions directed towards that object.
What it will learn is a mapping from object and action to consequence. We propose to represent the approximately 10d object, action, state mapping using and modifying the GMM library already in iCub. GMMs have the advantage of quickly learning high dimensional non-linear mappings. For motor actions we modify the Action Primitives library.
Vision, state estimation and data association will be done with IQR and some other bits (guys?).
As a final demo, it will play "golf" with the object to get it to a target location - hopefully it will do this at above random capability after learning with the object for a bit.
The demo will proceed in three phases:
The calibration phase may not be necessary if we can read stuff from files but there are two potential calibration scenarios:
- Obtain height of table using existing demo code that makes robot put its hand on the table and measure force response on impact. Once read this will be streamed to eyeToWorld which will then provide the correct mapping to vision module (or it might just read this from files).
- Obtain offset mapping by making robot grasp an object and then repairing its orientation manually (again using force control). This is probably not relevant since we are learning a mapping anyway and not doing any grasping.
The exploration phase will learn a mapping by trying out different actions in our continuous domain of actions.
The golf phase will exploit that mapping by making inferences about the best actions to take for a given outcome and then just having a go, mate. Shall we chose an object and attempt to tap it into any other objects that exist in the visual scene?. This could make a nice demonstration of the multiple object capabilities of Zenon and Andre's data association abilities..
TBD: Think about smart exploration.
We do object recognition using population encoding.. (right?) ... then we send that off to get the data association done and there is localisation done.
We send our image u,v coordinates to eyeToWorld which calculates an x,y,z in the robot space. This gives the object position.
Therefore, the vision module exports the following interface:
- This port is constantly streaming output of the form "XXXXX" Label x y z
- It will only stream for those objects that are detected - nothing else will be streamed.
- The x y and z are in robot pose space, as returned by eyeToWorld.
Poking interface (iFumble)
We'll be modifying ActionPrimitivesExample or CHRIS equivalent and exposing the interface to fiddle and fumble around!
Module name: iFumbly
Module port: /iFumbly/rpc
Expect bottles of the format :
[fidl] x y z theta rstart ftheta rstop execTime
* fidl (vocab): VOCAB4('f','i','d','l') * x (double): target good ol' x [m] (provided by eye2world). * y (double): target good ol' y [m] (provided by eye2world). * z (double): target good ol' z [m] (provided by the table height in fact). * theta (double): the hand angle used initially (orientation given by robot) [radians]. * rstart (double): the initial hand distance to object [m]. * ftheta (double): the final hand angle (orientation given by robot) [radians]. * rstop (double): the final hand distance relative to the initial object position [m]. * execTime (double): the reference velocity [s].
Module motto: "Fee-fi-fo-fum"
There is some hairiness with coordinate frames to be talked about: We don't know the orientation of the object so our mapping will be orientation independent but probabilistic - so if an object behaves differently according to orientation, we should get a multi-modal distribution. However, during the golf phase there is a natural orientation of our frame of reference is with respect to the target. On the other hand, during both phases there is a natural orientation with respect to the robot torso, the use of which may capture in the mapping something about how the iCub is able to move its hands. Therefore, the writer of this paragraph has decided to suggest that the frame of reference for the above action interface should be centred on the object, but oriented according to the fixed robot torso base.
Learning interface (iLearnStuff)
This is the Learning interface, not to be confused with the [Trajectory learner and replayer], a small part from which it is derived.
You can send this interface N-dimensional datapoints from which it will a build probability density function over the N-dimensional space. It will learn in an ongoing fashion, rebuilding its pdf representation as it needs to - but this will be opaque to you, the user, who will merely send datapoints on one port and ask for inferences on another port. The only thing you can't do is ask for inferences before enough data has been sent (at least as many datapoints as there are dimensions).
The inference occurs when you give it partial vectors and it infers a simple probability distribution over the remaining dimensions using Gaussian Mixture Regression - and returns the mean of this distribution.
Note that the learning will automatically decrease the number of Gaussians from a first guess of twice the number of dimensions, until it gets a result without singular covariance Gaussian components. This behviour should be hidden behind the scenes since the interface says nothing about Gaussians at all.
Bug: Data is being dropped from /iFumble/iLearnStuff/givemedata if it's coming too fast, and otherwise it seems to be replacing data with duplicates.
- On this port you send the learner bottles of the form DATAPOINT objectlabel ndim dim_1_value dim_2_value ... dim_ndim_value
- Should not block too long... debugging that...
- To this port you can request an inference to be done by sending a bottle of the form:
- INFER objectlabel ndim comp_dim_1 comp_dim_2 ... comp_dim_ndim input_val_1 input_val_2 ... input_val_ndim
- So you have to give it a list of those dimensions being provided with the input vector, as well as those dimensions' values.
- Infer will then give one of the following responses:
- NOTREADY (if the learner hasn't learnt anything yet)
- UNKNOWNCOMMAND (if someone is being silly, or)
- REGRESSED ndim output_val_1 output_val_2 ... output_val_ndim
Code can be found in iCub/contrib/src/iFumble/iLearnStuff/SimpleGMM
Check for duplicate data coming over /givemedata/ port and throw it away.
Behaviours not yet implemented (but will be, via RPC):
- save-dataset name
- load-dataset name
- save-distribution name
- load-distribution name