Accurate long-term predictions of human movement trajectories, body poses, actions or activities may significantly improve the ability of robots to plan ahead, anticipate the effects of their actions or to foresee hazardous situations. The topic has received increasing attention in recent years across several scientific communities with a growing spectrum of applications in service robots, self-driving cars, collaborative manipulators or tracking and surveillance. This workshop is fourth in a series of ICRA 2019-2022 events. The aim of this workshop is to bring together researchers and practitioners from different communities and to discuss recent developments in this field, promising approaches, their limitations, benchmarking techniques and open challenges.
Online participation in this workshop is free of charge. To join the virtual workshop, use this link: https://zoom.us/j/7746263634?pwd=UFVYelo5RGE2cTNWVmdMekx4TzZjUT09. For in-person participation, please follow to the ICRA conference and workshops registration page.
The program of this workshop includes 9 talks in several sessions, and a tutorial session on prediction methods and benchmarking. Recordings of all talks are available on the LHMP YouTube channel.
|8:30 - 8:45 am EDT 14:30 - 14:45 CEST||Organizers||Welcome and Introduction|
|8:45 - 9:15 am EDT 14:45 - 15:15 CEST||Alberto Sanfeliu, Universitat Politècnica de Catalunya||Predicting human motion for human-robot interaction & collaboration||Abstract: tbd|
|9:15 - 9:45 am EDT 15:15 - 15:45 CEST||Julien Pettré, INRIA||Predicting crowds: scales and data||Abstract: Human trajectory prediction (HTP), which has mainly found applications in robotics, is now also finding points of convergence with the field of predictive crowd simulation (for traffic management in public environments for instance). This convergence accelerates in particular the ongoing transition of crowd simulators from knowledge-based models to data-driven ones. Through the exploration of this new field of applications, this presentation raises the question of the formulation of the prediction problem as we know it, but also of the nature of the data that are used for the modeling. In this presentation, I will give the perspective of the field of crowd simulation on these issues, and present some promising techniques for the acquisition of new data that are based on virtual reality.|
|9:45 - 10:15 am EDT 15:45 - 16:15 CEST||Gerard Pons-Moll, University of Tübingen||Virtual Humans — From appearance to behaviour||Abstract: Modelling 3D virtual humans is necessary for VR/AR and to digitally transfer people to digital spaces — often referred to as the metaverse. While there has been significant progress on modelling human appearance (how we look), modelling and capturing fine grained human behaviour in 3D has received much less attention. Accurately capturing human interactions with scenes and objects in 3D is challenging due to occlusions, complex poses, ambiguities in reconstructing objects in 3D, and limited recording volumes imposed by multi-view camera setups. In this talk, I will describe our recent works to capture and model how we behave and interact with the 3D world around us. I will present HPS, a method to capture and localize humans interacting in large 3D scenes from wearable sensors, as well as BEHAVE a recent dataset and method to capture 3D humans interacting with objects.|
|10:15 - 11:00 am EDT 16:15 - 17:00 CEST||Coffee break|
|Body and Mind session|
|11:00 - 11:30 am EDT 17:00 - 17:30 CEST||Dagmar Sternad, Northeastern University||Predicting actions and interactions: the human perspective||Abstract: Anticipating motion of other agents and objects in the environment is key for successful behavior, both for robots and humans. The ability to predict is a core computational competence necessary for almost all aspects of human behavior, including social, cognitive, perceptual and action contexts. This talk will focus on the human perspective and present several lines of research examining predictive abilities and challenges in humans. Our recent research investigated this fundamental ability in the context of sensorimotor interactions with a dynamic object: intercepting and catching a flying ball. To examine the developmental trajectory of predictive ability, we assessed participants between 5 and 92 years of age in a suite of custom-developed virtual games that provided millisecond-scale measures of actions in response to the flying ball. Results revealed age-related improvements in predictive motor behavior, with performance reaching adult levels by 12 years of age. This developmental progression provides a behavioral manifestation consistent with recent findings on cerebellar and cortical maturation. A second line of research scrutinized how humans continuous interact with dynamically complex objects, such as a cup of coffee. The internal dynamics in such objects creates complex nonlinear interaction forces that can be chaotic and essentially unpredictable for humans. We investigated how humans deal with such scenarios in a virtual environment where human participants transported a cup of coffee’, modeled by a cart and pendulum system. Results showed that humans learnt to simplify the interaction forces that made them more predictable. A third line of research examined prediction within the motor control system in the context of postural control under perturbations. Catching a ball not only involves finely timed arm and hand movements with respect to the ball, but these rapid arm movements also create perturbing forces that destabilize postural balance. Maintaining upright posture requires anticipatory postural adjustments in trunk muscles that ensures the control of hand movements. We show such subtle adjustments in the context of catching a ball in both healthy and impaired populations. These different lines of research demonstrate the pervasiveness of accurate prediction at all levels of successful behavior, and also show the challenges that also humans and not only robots face.|
|11:30 - 12:00 am EDT 17:30 - 18:00 CEST||Silvia Rossi, University of Naples||Human and context awareness: Toward socially enhanced autonomous capabilities||Abstract: To effectively exploit autonomous capabilities that are socially enhanced, a robot is required to sense its environment but also to understand what happens within it. Human awareness not only concerns the acknowledgment of the person’s position and pose within the environment but also the opportunity to understand the activity that the person is currently performing. For example, robotic personal assistance may be required to recognize the user’s Activities of Daily Living (eating, drinking, cooking, watching TV, using a mobile phone, etc.) or emergencies, such as fall detection. Moreover, the importance of interpreting and recognizing social and non-verbal signals during the interaction is generally well recognized within the social robotics community, but it plays a fundamental role also in service and collaborative robotics. For example, the interpretation of non-verbal cues, such as gaze, posture, and backchannels, can be used in the recognition of the person’s engagement during an interaction, the same could be used to evaluate the person’s discomfort or the disengagement from the current activity caused by the robot’s behavior in the shared environment. In this talk, we will discuss different approaches aiming at achieving socially enhanced autonomous robot behaviors from the interpretation of human behavior.|
|12:00 - 12:30 pm EDT 18:00 - 18:30 CEST||Kelsey Allen, DeepMind||The surprising diversity of human tool use||Abstract: People use tools every day – from forks and knives to computers and cellphones. Indeed, much of our interaction with the world is modulated by our tools. As a result, understanding how people use tools is critical to safely interacting with humans. In this talk, I will discuss our research into the vast and varied ways that people learn how to use new tools. I will highlight both humans’ rapid adaptivity, but also their surprising rigidity, in the face of new problems. I will also discuss how differences in lived experience, such as growing up with only one hand, can fundamentally alter the ways in which people approach physical problem-solving generally. For robot-human interaction, it will not be enough to model the motion of one ideal human. Instead, a diversity of humans is needed.|
|12:30 - 1:45 pm EDT 18:30 - 19:45 CEST||Lunch break|
|1:45 - 2:15 pm EDT 19:45 - 20:15 CEST||Katherine Driggs-Campbell, University of Illinois||Inference and prediction for safe interaction||Abstract: Autonomous systems and robots are becoming prevalent in our everyday lives and changing the foundations of our way of life. However, the desirable impacts of autonomy are only achievable if the underlying algorithms can handle the unique challenges humans present. To design safe, trustworthy autonomy, we must transform how intelligent systems interact, influence, and predict human agents. In this talk, we'll discuss how inferring hidden states (e.g., driver traits, pedestrian intent, occluded agents) coupled with robust prediction methods can be used to improve decision-making and control in interactive settings. These methods are used to generate safe interactions between humans and mobile robots (sometimes with guarantees), which are demonstrated on fully equipped test vehicles and mobile robots.|
|2:15 - 2:45 pm EDT 20:15 - 20:45 CEST||Boris Ivanovic, Nvidia||Effectively integrating prediction within the autonomous vehicle stack||Abstract: tbd|
|Tutorial & Benchmarking session|
|2:45 - 3:00 pm EDT 20:45 - 21:00 CEST||Parth Kothari, EPFL||TrajNet++ update||Link: https://www.aicrowd.com/challenges/trajnet-a-trajectory-forecasting-challenge|
|3:00 - 3:15 pm EDT 21:00 - 21:15 CEST||Andrey Rudenko, Bosch||The Atlas Benchmark presentation||Link: tbd|
|3:15 - 4:00 pm EDT 21:15 - 22:00 CEST||Coffee break|
|4:00 - 4:30 pm EDT 22:00 - 22:30 CEST||Sanjiban Choudhury, Arun Venkatraman, Aurora||Imitation learning and forecasting: It’s only a game!||Abstract: A core challenge in self-driving is reasoning about how actors on the road interact to affect each other’s motions. Recent advances in machine learning have enabled powerful forecasting techniques that jointly reason about such interactions. However, many of these approaches treat forecasting in isolation from downstream decision making. This leads to a fundamental misalignment in objectives where better forecasts do not necessarily translate into better end-to-end behavior. In this talk, we will explore a clean-sheet approach to forecasting and decision making that builds on a singular objective – imitate expert human driving. We will present a unified, game-theoretic framework for imitation learning. We will view forecasting as an adversary that discriminates between human and robot driving. Finally, we will show how this framework leads to forecasts that enable human-like, predictable, and interpretable behavior on the road.|
|4:30 - 5:00 pm EDT 22:30 - 23:00 CEST||Organizers||Discussion and conclusions|