Here we are trying to describe how we are drafting the interface structure.
Please note that we are considering the attention system as a processes which is capable to drive the focus of interest of the robot. This process need to take all possible sensors and internal states into account. For this process we need to define a set of equal ports across sensors, to make it easy to add or delete sensors or other input sources in this process.
It would be great if every module would add a port with a relating how important it is to execute the current request of the module. For every module there should be a port that gives a probability how accurate the current results are.
The final version of the API structure will be online by the end of the day (Thursday).
You can inherit, extend and overload as much as you want...
Something like this:
It might be useful to have every attention-related module to register to a main attention supervisor : clients might then interrogate this module to find some functionnality (running modules). For some module, it might be worth delaying the creation of the saliency map for on-demand calls only. But we still need to get information from these modules... an event based system might be useful, to warn the 'supervisor' or the clients that something is happening, so that they get a chance to request a saliency map even if the module had somewhat lower priority.
We might get inspiration from a "what / where / when" analysis. An event would trigger a "when" information, as fast as possible. The module posting this event might already be an hint about "what" is happening. The saliency map is the answer to "where", as well as to "what" is there are several sources of attention (several faces in a image, some smiling or not, several sound sources in the environnement). Maybe the event should be when a source of attention is appearing, as well as disappearing. It's the responsability of the clien module (or supervisor acting on his behalf) to track the saliency maps during this interval (or not).
For example, a face detection module implemented using openCV might be running faster if only trying to detect only "the biggest face". That is enough to trigger a "face appear" event. From then, requesting for a saliency image might switch to "all face detecting", either by default or if a module is requesting this specific saliency map. When the last face has disappeared, the module can switch back to the faster "detect the biggest face only" mode.
For sound(s) , it is easier and more efficient to generate events as a continuous process. And only generate a saliency map if someone asks for it.
Hope it helps, Frédéric