Anticipating human motion is a key skill for intelligent systems that share a space or interact with humans

Accurate long-term predictions of human movement trajectories, body poses, actions or activities may significantly improve the ability of robots to plan ahead, anticipate the effects of their actions or to foresee hazardous situations. The topic has received increasing attention in recent years across several scientific communities with a growing spectrum of applications in service robots, self-driving cars, collaborative manipulators or tracking and surveillance.

This workshop is fourth in a series of ICRA 2019-2022 events. The aim of this workshop is to bring together researchers and practitioners from different communities and to discuss recent developments in this field, promising approaches, their limitations, benchmarking techniques and open challenges.

Participation

Online participation in this workshop is free of charge. To join the virtual workshop, use this link: https://zoom.us/j/7746263634?pwd=UFVYelo5RGE2cTNWVmdMekx4TzZjUT09.

For in-person participation, please follow to the ICRA conference and workshops registration page.

Program

The program of this workshop includes 9 talks in several sessions, and a tutorial session on prediction methods and benchmarking.
Recordings of all talks are available on the LHMP YouTube channel.

Time Speaker Topic
8:30 - 8:45 am EDT
14:30 - 14:45 CEST
Organizers Welcome and Introduction
Blue-sky session
8:45 - 9:15 am EDT
14:45 - 15:15 CEST
Alberto Sanfeliu, Universitat Politècnica de Catalunya Predicting human motion for human-robot interaction & collaboration Abstract: tbd
9:15 - 9:45 am EDT
15:15 - 15:45 CEST
Julien Pettré, INRIA Predicting crowds: scales and data Abstract: Human trajectory prediction (HTP), which has mainly found applications in robotics, is now also finding points of convergence with the field of predictive crowd simulation (for traffic management in public environments for instance). This convergence accelerates in particular the ongoing transition of crowd simulators from knowledge-based models to data-driven ones. Through the exploration of this new field of applications, this presentation raises the question of the formulation of the prediction problem as we know it, but also of the nature of the data that are used for the modeling. In this presentation, I will give the perspective of the field of crowd simulation on these issues, and present some promising techniques for the acquisition of new data that are based on virtual reality.
9:45 - 10:15 am EDT
15:45 - 16:15 CEST
Gerard Pons-Moll, University of Tübingen Virtual Humans — From appearance to behaviour Abstract: Modelling 3D virtual humans is necessary for VR/AR and to digitally transfer people to digital spaces — often referred to as the metaverse. While there has been significant progress on modelling human appearance (how we look), modelling and capturing fine grained human behaviour in 3D has received much less attention. Accurately capturing human interactions with scenes and objects in 3D is challenging due to occlusions, complex poses, ambiguities in reconstructing objects in 3D, and limited recording volumes imposed by multi-view camera setups. In this talk, I will describe our recent works to capture and model how we behave and interact with the 3D world around us. I will present HPS, a method to capture and localize humans interacting in large 3D scenes from wearable sensors, as well as BEHAVE a recent dataset and method to capture 3D humans interacting with objects.
10:15 - 11:00 am EDT
16:15 - 17:00 CEST
Coffee break
Body and Mind session
11:00 - 11:30 am EDT
17:00 - 17:30 CEST
Dagmar Sternad, Northeastern University Predicting actions and interactions: the human perspective Abstract: Anticipating motion of other agents and objects in the environment is key for successful behavior, both for robots and humans. The ability to predict is a core computational competence necessary for almost all aspects of human behavior, including social, cognitive, perceptual and action contexts. This talk will focus on the human perspective and present several lines of research examining predictive abilities and challenges in humans.

Our recent research investigated this fundamental ability in the context of sensorimotor interactions with a dynamic object: intercepting and catching a flying ball. To examine the developmental trajectory of predictive ability, we assessed participants between 5 and 92 years of age in a suite of custom-developed virtual games that provided millisecond-scale measures of actions in response to the flying ball. Results revealed age-related improvements in predictive motor behavior, with performance reaching adult levels by 12 years of age. This developmental progression provides a behavioral manifestation consistent with recent findings on cerebellar and cortical maturation.

A second line of research scrutinized how humans continuous interact with dynamically complex objects, such as a cup of coffee. The internal dynamics in such objects creates complex nonlinear interaction forces that can be chaotic and essentially unpredictable for humans. We investigated how humans deal with such scenarios in a virtual environment where human participants transported a cup of coffee’, modeled by a cart and pendulum system. Results showed that humans learnt to simplify the interaction forces that made them more predictable.

A third line of research examined prediction within the motor control system in the context of postural control under perturbations. Catching a ball not only involves finely timed arm and hand movements with respect to the ball, but these rapid arm movements also create perturbing forces that destabilize postural balance. Maintaining upright posture requires anticipatory postural adjustments in trunk muscles that ensures the control of hand movements. We show such subtle adjustments in the context of catching a ball in both healthy and impaired populations.

These different lines of research demonstrate the pervasiveness of accurate prediction at all levels of successful behavior, and also show the challenges that also humans and not only robots face.
11:30 - 12:00 am EDT
17:30 - 18:00 CEST
Silvia Rossi, University of Naples Human and context awareness: Toward socially enhanced autonomous capabilities Abstract: To effectively exploit autonomous capabilities that are socially enhanced, a robot is required to sense its environment but also to understand what happens within it. Human awareness not only concerns the acknowledgment of the person’s position and pose within the environment but also the opportunity to understand the activity that the person is currently performing. For example, robotic personal assistance may be required to recognize the user’s Activities of Daily Living (eating, drinking, cooking, watching TV, using a mobile phone, etc.) or emergencies, such as fall detection. Moreover, the importance of interpreting and recognizing social and non-verbal signals during the interaction is generally well recognized within the social robotics community, but it plays a fundamental role also in service and collaborative robotics. For example, the interpretation of non-verbal cues, such as gaze, posture, and backchannels, can be used in the recognition of the person’s engagement during an interaction, the same could be used to evaluate the person’s discomfort or the disengagement from the current activity caused by the robot’s behavior in the shared environment. In this talk, we will discuss different approaches aiming at achieving socially enhanced autonomous robot behaviors from the interpretation of human behavior.
12:00 - 12:30 pm EDT
18:00 - 18:30 CEST
Kelsey Allen, DeepMind The surprising diversity of human tool use Abstract: People use tools every day – from forks and knives to computers and cellphones. Indeed, much of our interaction with the world is modulated by our tools. As a result, understanding how people use tools is critical to safely interacting with humans. In this talk, I will discuss our research into the vast and varied ways that people learn how to use new tools. I will highlight both humans’ rapid adaptivity, but also their surprising rigidity, in the face of new problems. I will also discuss how differences in lived experience, such as growing up with only one hand, can fundamentally alter the ways in which people approach physical problem-solving generally. For robot-human interaction, it will not be enough to model the motion of one ideal human. Instead, a diversity of humans is needed.
12:30 - 1:45 pm EDT
18:30 - 19:45 CEST
Lunch break
1:45 - 2:15 pm EDT
19:45 - 20:15 CEST
Katherine Driggs-Campbell, University of Illinois Inference and prediction for safe interaction Abstract: Autonomous systems and robots are becoming prevalent in our everyday lives and changing the foundations of our way of life. However, the desirable impacts of autonomy are only achievable if the underlying algorithms can handle the unique challenges humans present. To design safe, trustworthy autonomy, we must transform how intelligent systems interact, influence, and predict human agents. In this talk, we'll discuss how inferring hidden states (e.g., driver traits, pedestrian intent, occluded agents) coupled with robust prediction methods can be used to improve decision-making and control in interactive settings. These methods are used to generate safe interactions between humans and mobile robots (sometimes with guarantees), which are demonstrated on fully equipped test vehicles and mobile robots.
2:15 - 2:45 pm EDT
20:15 - 20:45 CEST
Boris Ivanovic, Nvidia Effectively integrating prediction within the autonomous vehicle stack Abstract: tbd
Tutorial & Benchmarking session
2:45 - 3:00 pm EDT
20:45 - 21:00 CEST
Parth Kothari, EPFL TrajNet++ update Link: https://www.aicrowd.com/challenges/trajnet-a-trajectory-forecasting-challenge
3:00 - 3:15 pm EDT
21:00 - 21:15 CEST
Andrey Rudenko, Bosch The Atlas Benchmark presentation Link: tbd
3:15 - 4:00 pm EDT
21:15 - 22:00 CEST
Coffee break
4:00 - 4:30 pm EDT
22:00 - 22:30 CEST
Sanjiban Choudhury, Arun Venkatraman, Aurora Imitation learning and forecasting: It’s only a game! Abstract: A core challenge in self-driving is reasoning about how actors on the road interact to affect each other’s motions. Recent advances in machine learning have enabled powerful forecasting techniques that jointly reason about such interactions. However, many of these approaches treat forecasting in isolation from downstream decision making. This leads to a fundamental misalignment in objectives where better forecasts do not necessarily translate into better end-to-end behavior.

In this talk, we will explore a clean-sheet approach to forecasting and decision making that builds on a singular objective – imitate expert human driving. We will present a unified, game-theoretic framework for imitation learning. We will view forecasting as an adversary that discriminates between human and robot driving. Finally, we will show how this framework leads to forecasts that enable human-like, predictable, and interpretable behavior on the road.
4:30 - 5:00 pm EDT
22:30 - 23:00 CEST
Organizers Discussion and conclusions