My research is focused towards building autonomous intelligent systems that can make decisions under uncertainty in dynamic and human-populated environments. I view this problem as a question of identifying suitable representations and algorithms for computational reasoning. My research is often connected to robots as demonstrator platforms since they are embodied agents that have to operate in the physical world.

Social Robots

We can expect that more and more robotic agents will join our environments and spaces. As a consequence these agents will have to understand human behavior and cope with environments that are designed for, structured by, and inhabited by humans. This poses challenges to modeling, reasoning, and perception.

Simulating Pedestrians

Visualization of people tracking.

Predicting human behavior is essential to successful robot-to-human interaction. One interesting aspect of human behavior is how we behave in larger crowds. In these scenarios, humans are often modeled as particles that are affected by external and internal forces. In my research, I investigated how one of these models behaves, when only a small number of people is present and instead physical obstacles such as walls and doorways are introduces.

Tracking Pedestrians

Visualization of people tracking.

For a robot that has to move and navigate in a human-populated environment and interact with humans, it is important to perceive and follow the motions of walking and standing humans. Humans as social beings use spaces according to an underlying social protocol which imposes structure to their motions. In my research, I incorporated a model of a social protocol called Social Forces into a tracking algorithm to better predict future states and improve tracking performance.

Socially Normative Navigation

Visualization of people tracking.

A robot that operates in a human-populated environment best behaves in an expectable and familiar way when navigating. In my research, I therefore used learning from demonstration to imitate human navigation behavior in encounter and crowd situations. The method learns a distribution over behaviors and at the same time provides a cost function for motion planning.

Audio-based Activity Recognition

Visualization of people tracking.

Activities of humans create characteristic sounds, however there are often no strict temporal boundaries between activities and many other sources of sound in a populated environment. In my research, I used a non-parametric, non-Markovian approach with a committee of experts for recognizing activities.

This research also resulted in a benchmark dataset.

Scene Understanding

Our environments a not only built for humans, they are also structured by us. This means that we arrange objects in our spaces according to our needs and habits. From the perspective of an artificial agent that has to understand our environment, to e.g. find and object, this provides additional structure. However, perception in cluttered environments such as a kitchen or an office is difficult. In our research, we therefore assumed that only rough geometric information, such as position and general size, of objects are available. The method therefore reasons based on the spacial relationships between the objects about their class.

Grasp Design for Robots

For a long time, the major challenge for robotics was navigation and thereby necessarily localization and mapping. To perform a task, however, a robot often needs to interact with physical objects in the environment with close contact. This is required for lifting, transporting, and using objects as tools. Consequently, designing grasps, that is identifying the parameters of the manipulator (hand) relative to the object with respect to objectives, is now a fundamental challenge in robotics. Here, both object shape and degrees of freedom of the manipulator introduce complex spaces for decision making.

Fingertip Graps

Visualization of people tracking.

Often, we only hold objects with our fingertips. This is a type of grasp that allows us to manipulate the object and is often associated with precision in interaction. For designing a fingertip grasp, we have to analyze the object’s shape and understand the hand’s abilities since a grasp can only exist where the geometry of object and hand meet. What is challenging in fingertip grasp design is the large and complex search space of grasps on an object and the complex relationship to the hand’s kinematic. Additionally, grasp objectives such as mechanical stability are difficult to decompose into decoupled problems and therefore make it difficult to operate on partial solutions as is often the case in optimization.

Visualization of people tracking.

In our research, we turn to abstraction and optimization to design fingertip grasps and iteratively improve solutions on more exact representations of the same problem. For this, we introduced the object’s fingertip space hierarchy which describes all positions where a fingertip can be placed and their similarity in several levels of abstraction. We also reduced optimal fingertip grasping to path-finding which allows for efficient and heuristic algorithms for optimal grasping.

Catenane Caging

Visualization of people tracking.

Another way to move or manipulate an object is caging. A caged object is not rigidly fixated but instead merely constraint in its motion such that it cannot escape. This has the advantage that the interaction between object and manipulator is flexible and that the object’s small-scale geometry is less relevant to caging compared to grasp design. However, caging is difficult because of its general and essentially geometric nature. Reasoning about the spacial relationship of two bodies in geometric terms is computationally hard. In my research, I therefore used topological representation of object and hand allowing direct analysis of the caging property. The representation is based on holes and allows to model a type of caging as inter-linking object and hand like links of a chain.

Predictive State Representations in Robotics

For an intelligent agent to be autonomous, it needs the capacity to adapt to its environment by learning the environment’s properties. This entails understanding the consequences of its actions in the environment. In realistic settings, the environment’s state cannot be directly observed, it is latent, transitions are stochastic, and observations are perturbed by other signals and imperfect sensors. Predictive State Representations (PSRs) are a class of models that were designed to learn representations of such environments. They do not depend on a nominal state space and assign no semantics to the learned state vector. However, learning them in realistic scenarios is hard and guarantees only hold under ideal conditions for discrete systems.

In my research, I explored the use of PSRs in continuous robotic systems to control a robot with Reinforcement Learning. I focused on the feature space embedding which determines the shape of the PSR’s state space. Under the condition of sparse data this embedding as a major effect on the model’s quality and performance. I used embeddings based on sequence kernels and learned embeddings based on robotic priors.

In-Hand Manipulation and External Dexterity

Visualization of people tracking.

In-hand manipulation is an important ability for humans. Almost all objects we pick up we later re-orient and re-position to use them, e.g., a pencil. At the same time, we often use external resources when interacting with objects we hold in our hands. One example is pushing an object against a contact in the environment. To a simple manipulator this provides a range interactions that otherwise would only be accessible to a hand with many fingers and joints, called external dexterity.

In my research, I modeled an in-hand manipulation scenario with external dexterity as a PSR and learned a representation of the system only based on tactile feedback from pressure sensors in the hand’s fingertips. This system is challenging to learn and model because many different states result in similar observations (aliasing) and the tedious procedure for data collection inhibits large sets of training examples.