Functional components for control and behavioural models

Gaze stabilization experiment

In this work, we focused on reflexes used by humans for gaze stabilization. A model of gaze stabilization, based on the coordination of the vestibulo-collic reflex (VCR) and vestibulo-ocular reflex (VOR) has been designed and implemented on humanoid robots. The model, inspired on neuroscientific cerebellar theories, is provided with learning and adaptation capabilities based on internal models.

In a first phase, we designed experiments to assess the model’s response to disturbances, validating the model both with the NRP and with a real humanoid robot (SABIAN). In this phase, we mounted the SABIAN head on an oscillating platform (shown below) able to rotate along the pitch axis, in order to produce a disturbance.


The oscillating platform. In (a) the SABIAN head mounted on the platform, with its inertial reference frame is shown. The transmission of motion from the DC motor to the oscillating platform is depicted in (b).

In a second phase, we carried out experiments for testing the gaze stabilization capability of the model, during a locomotion task. We gathered human data of torso displacement while walking and running. The data has been used to animate a virtual iCub while the gaze stabilization model was active.

Balancing experiment

Using the same principles of the gaze stabilization experiment, we carried out a balancing experiment for a simulated iCub. In this experiment, the simulated iCub is holding up a red tray with a green ball on top. The goal of the experiment is to control the robot’s roll and pitch joints for the wrist, in order to keep the ball in the center of the tray. The control model for the wrist joints is provided with learning and adaptation capabilities based on internal models.

Visual segmentation experiment

A cortical model for visual segmentation (Laminart) has been built with the aim of integrating it in the neurorobotics platform. The goal is to see how the model behaves in a realistic visual environment. A second goal is to connect it to another model for the retina.
The model consists of a biologically plausible network containing hundreds of thousands of neurons and several millions connections embedded in about 50 cortical layers. It is built functionnaly in order to link objects that are likely to group together with illusory contours, and to segment disctinct perceptual groups in separate segmentation layers.
Up to now, the Laminart model has been successfully integrated in the NRP and first expriments are being built to check the behaviour of the model and discover what has to be added to it to ensure it can coherently segment objects from each other in a realistic environment. Besides, the Laminart model is almost connected to the retina model.
In the future, the model will be connected to other models for saliency detection, learning, predictive coding, decision making, on the NRP, to create a closed loop experiment. It will also take into account some experimental data about texture segmentation and contour integration.

Visual perception experiment

In this work, we evaluated the construction of neural models for visual perception. The validation scenario chosen for the models is an end-to-end controller capable of lane following for an self-driving vehicle. We developed a visual encoder from camera images to spikes inspired by the silicon retina (i.e., the DVS Dynamic Vision Sensor). The veichle controller embeds a wheel decoder based on a virtual agonist antagonist muscle model.



Grasping experiment

During the first 12 month of SGA1, we investigated methods for representing and executing grasping motions with spiking neural networks that can be simulated in the NEST simulator and therefore, the Neurorobotics Platform. For grasping in particular, humans can remember motions and modify them while executing based on the shape and the interaction with objects. We developed a spiking neural network with a biologically inspired architecture to perform different grasping motions, that first learns with plasticity from human demonstration in simulation and then is used to control a humanoid robotic hand. The network is made with two types of associative networks trained independently: One represents single fingers and learns joint synergies as motion primitives; and another represents the hand and coordinates multiple finger networks to execute a specific grasp. Both receive the joint states as proprioception using population encoding, and the finger networks also receives tactile feedback to inhibit the output neurons and stop the motion if a contact with an object is detected.



Multimodal sensory representation for invariant object recognition

This functional component integrates multisensory information -namely tactile, visual and auditory- to form an object representation. Although we firstly target invariant object recognition problem using the only visual information, the component is capable of combining other sensory modalities. The model is based on computational phases of the Hierarchical Temporal Memory which is inspired by operating principles of the mammalian neocortex. The model was adapted and modified to extract a multimodal sensory representation of an object. The representation can be interpreted as a cortical representation of perceived inputs. To test the model, we perform object recognition in COIL-20 and COIL-100 datasets in which consist of 20 and 100 different objects (see Figure 1). In details, each object rotated 5 degrees on a turntable and object image was captured by the camera (see Figure2). In addition to image acquisition steps, a number post-processing procedures such as background elimination and size normalization were performed on the images.


Figure 1 Selected images from different categories.


Figure 2 A duck object under various rotational transformations.

To obtain object representations, the standard image processing algorithms were performed to binarize and downsize available images in datasets. Then, the model was fed with the processed image data to generate sparsely distributed representation of the perceived images. A sample processed image and cortical representation of the same visual pattern are illustrated in Figure 3 and Figure 4, respectively. Note that, the representation of an object with different sensory inputs can be achieved by same procedure and concatenating the obtained representations for each modality.

Figure 3 A processed visual pattern.                            Figure 4 Cortical representation of a visual pattern

After obtaining representation for all images, we perform recognition operations by grouping the datasets into two categories which are memory representation (or training set) and unseen object patterns (or test set). The representation similarity metric defined as the number of same active cortical columns (the same active bits in the same location) between existing and unseen patterns. The recognition accuracies are shown in Table below. and were derived via splitting training and testing dataset by 10% to 90% and each time incremented by 10.

Training percent






























The obtained results indicate that the modal performs well with single modality. Our ongoing studies focus on integrating multiple sensory information (e.g. tactile) to represent multimodal representation to achieve a grasping task.