Skip to content


Michal Vavrečka
A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language
On 2015-04-13 13:00 at KN:E-112
I will talk about our bio-inspired unsupervised connectionist architecture that
grounds the spatial phrases. This two-layer architecture combines the
information from the visual and the phonological inputs. In the first layer, the
visual pathway employs separate ‘what’ and ‘where’ subsystems that
represent the identity and spatial relations of two objects in 2D space,
respectively. The bitmap images are presented to an artificial retina and the
phonologically encoded five-word sentences describing the image serve as the
phonological input. The visual scene is hence represented by several
self-organizing maps (SOMs) and the phonological description is processed by the
Recursive SOM that learns to topographically represent the spatial phrases,
represented as five-word sentences (e.g., ‘blue ball above red cup’).
Primary representations from the first-layer modules are unambiguously
integrated in a multimodal second-layer module, implemented by the SOM or the
‘neural gas’ algorithms. The system learns to bind proper lexical and visual
features without any prior knowledge. The simulations reveal that separate
processing and representation of the spatial location and the object shape
significantly improve the performance of the model. 
Back to the list