Communicating and interpreting spatial information is one of the key skills for robots in order to achieve successful collaboration with humans (The Peer-to-Peer Interaction project). Human Robot Interactions often involve sharing and discussing spatial information. A few scenarios are
giving instructions to a robot, e.g. “please bring me my keys, I left them in the basket on the cupboard”
explaining new tasks, e.g. “for setting a table properly you must place a clean plate on the table for each diner, in front of a chair and close to the border”
sharing information, e.g. “your room is at the end of the corridor, the last door to the right”
Interactions in physical environments contain bidirectional (human-robot-human) exchanges of spatial information and thereby require the robot to be equipped with tools for interpreting and synthesising such information. For humans, natural language based communication is most intuitive.It includes references to physical elements in the environment e.g. objects, rooms, corridors, etc. To use this information a robot requires mechanisms for mapping symbolic concepts to its own internal representations which it uses for planning and executing actions. Robot actions contain geometrical elements occupancy grids, poses of detected objects, geometrical models of objects, etc. directly extracted from available sensors and domain knowledge. The problem of grounding symbolic concepts in geometric perceptual information is known as the symbol anchoring problem (Coradeschi & Saffiotti, 2003).
In our group we address the symbol anchoring problem by (1) exploring mixed symbolic and geometric representations that allow a robot to integrate information coming from different sources (verbal communication, sensors, background knowledge, etc.), (2) development of geometrical models of spatial terms as qualitative spatial relations or route instructions. This blogpost focuses on the later.
The purpose of developing models of spatial terms is to enable grounding them in geometrical representations. They help determine if a particular relation is applicable between two objects in a certain spatial configuration given their relative configuration (position and orientation) and the surrounding environment. The main challenge in achieving this is handling the ambiguity intrinsic to qualitative spatial terms. Spatial relations often used in references such as near, left, across, etc., do not imply concrete geometric measurements e.g. meters, degrees. On the contrary, they are applicable on a very wide range of spatial configurations. Furthermore, the meaning of these terms are contextual (Kelleher et al., 2006), e.g. my house is near the stadium implies a very different distance than the box is near the desk. Therefore, a good model for a spatial relation should account for its semantics, contain a fuzzy element and incorporate context.
In the research community there are two approaches for acquiring such models. The first is based on learning probability distributions for applicability through machine learning techniques applied on annotated spatial scenes (Golland et al., 2010). The scenes contain several objects in a certain configuration annotated with the applicability value of a certain relation between object pairs.
The distributions depend on simple geometric features such as the distance between objects or their relative orientation. The second approach consists on manually building mathematical models for assessing the applicability of spatial terms. The models are based on psychological studies about the semantics of relations.
When assessing the advantages and disadvantages of each approach we must consider how they deal with: (1) the different interpretations that different people can make of the same relation in the same geometrical configuration, (2) the complexity of the factors that affect the interpretation of spatial relations. The first point is straightforward: different people have different concepts of the meaning of spatial terms and this diversity must be reflected in the models in order to perform good predictions. Regarding the second point, the assessment of spatial relations is influenced by diverse geometric and semantic factors e.g. relations which constraint relative orientation (such as in front of), can be interpreted in different frames of references or points of views (Retz-Schmidt, 2015) (the speaker's or the listener’s). Also some types of objects can have their own intrinsic frame of reference (e.g. tables or walls). Psychological studies suggest that the intrinsic frame of reference depends on geometric and functional features of objects. This frame is learnt by humans in the process of learning new objects. Furthermore, a third object can block the applicability of a relation between two objects if it is near or between them (Carlson & Hill, 2008).
The machine learning approach addresses the first point. Since the scenes are annotated by different subjects, the learnt distributions implicitly take into account the diverse conceptualization of spatial terms that people can have. However, the dependency with the aforementioned factors is hard to capture through these models e.g. the assessment of the preferred frame of reference. Considering all the possible options requires using the relative orientation in each frame as features which in turn requires knowledge about the intrinsic frame for each kind of object. This is contrary to what machine learning based approaches try to address. In the second approach based on purely theoretical models we find exactly the opposite situation. It is possible to benefit from the experience gained in the field of psychology regarding the effect of the commented geometrical and semantic factors on the applicability of spatial relations and carefully including this knowledge into the models. But there is an inherent risk of not considering the diversity in the interpretation of spatial terms that different people can have.
In conclusion, a mixed approach that uses models with a more complex structure in which machine learning and annotated samples for fixing free parameters would be ideal. The challenge with such a mixed approach is gathering a good corpus that will allow learning models that can make optimal predictions for the applicability of relations given all the possibilities. These possibilities include preferred frames of reference depending on the object type, the dependency with geometric features as distance between objects, the role of contextual factors as the size of the scene or of the objects, etc. In the absence of such a study and given the strong and weak points of each approach, we adopt the second one to obtain our models for spatial relations from psychological studies which can be used in the spatial reasoning component of our robot architecture. This decision was motivated by the necessity of considering the geometric and semantic factors in order to obtain good predictions in the assessment of spatial relations in situated interactions.
E.Retamino, S. Nair, A. Vijayalingam, A. Knoll, “Architecture and representation for handling dialogues in human-robot interactions”, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015