Giter Site home page Giter Site logo

scan's Introduction

SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Code for NeurIPS20 Submission: SCAN: A Spatial Context Attentive Network for Joint Multi-Agent Intent Prediction

Predicting pedestrian intent is a complex problem because pedestrians are influenced by the current trajectories of their neighbors as well as their expected future trajectories. Human navigation behavior is governed by implicit spatial rules such as yielding right-of-way, respecting personal space, etc. that typical sequence to sequence learning approaches cannot capture. In this work, we propose SCAN, a Spatial Context Attentive Network that can jointly predict socially-acceptable future trajectories of all pedestrians in a scene.

Our proposed method uses a novel spatial attention mechanism to model the influence of spatially close neighbors on each other and a temporal attention mechanism to model the ability of pedestrians to respond similarly to previously experienced spatial contexts or situations.

Our Model

Our model contains an LSTM-based Encoder-Decoder Framework that takes as input observed trajectories for all pedestrians in the frame and jointly predicts future trajectories for all the pedestrians in the given frame. To account for spatial influences of spatially close pedestrians on each other, our model uses a Spatial Attention Mechanism that infers and incorporates perceived spatial contexts into each pedestrian's LSTM's knowledge. In the decoder, our model additionally uses a temporal attention mechanism to attend to the observed spatial contexts for each pedestrian, to enable the model to learn how to navigate by learning from previously encountered spatial situations.

Example Predictions

Below we show some socially acceptable trajectories predicted by our model in crowded environments. Every pedestrian navigating in the frame is denoted by a different color. The images on the left are the ground truth trajectories and the images on the left denote the trajectories predicted by our model. Our model is able to predict socially acceptable trajectories that also demonstrate social behavior such as walking in pairs, walking in groups, avoiding collision, and so on.

In some cases, our model fails to predict accurate trajectories. The primary reason for that is that our model is agnostic to scene context and hence does not contain information about useful scene features such as turns, building entrances, sidewalks, and so on. Below we show a fail case for our model that is experienced because of lack of scene information:

Training Details

We train and evaluate our model on five publicly available datasets: ETH, HOTEL, UNIV, ZARA1, ZARA2. We follow a leave-one-out process where we train on four of the five models and test on the fifth. The exact training, validation, test datasets we use are in directory data/ . For each pedestrian in a given frame, our model observes the trajectory for 8 time steps (3.2 seconds) and predicts intent over future 8 time steps (3.2 seconds) jointly for all pedestrians in the scene.

To download the data:

sh scripts/download_data.sh

To train our models with our best-performing parameters to reproduce our results, run:

sh scripts/train.sh <model_name> <dataset_name> 16 32 128 16

where: <model_name> can be spatial_temporal_model or spatial_model. spatial_temporal_model is our final model, SCAN , and <spatial_model> is our ablation, vanillaSCAN . <dataset_name> choices are zara1, zara2, univ, eth, hotel.

To evaluate SCAN on the five datasets, run:

sh scripts/test.sh <dataset_name> 

To evaluate vanillaSCAN change spatial_temporal_model to spatial_model in scripts/test.sh.

We evaluate our models using 2 metrics: ADE (Average Displacement Error) and FDE (Final Displacement Error).

ETH HOTEL ZARA1 ZARA2 UNIV AVG
vanillaSCAN 0.75 / 1.44 0.45/ 0.91 0.39 / 0.89 0.34 / 0.75 0.62 / 1.31 0.51 / 1.06
SCAN 0.55/0.78 0.47 /0.90 0.22/0.45 0.31/0.67 0.62 /1.28 0.43/0.82

Qualitative Analysis

Track Fragmentation Analysis

Noisy or missing data is a common problem in object tracking. It is important for an intent prediction model to be able to predict fairly accurate future trajectories despite interrupted observed ground truth trajectories. We evaluated the robustness of SCAN to such track fragmentation by randomly dropping frames from the observation window for each pedestrian.

To evaluate track fragmentation add

--drop_frames=<drop_frames> 

to scripts/test.sh. Where <drop_frames> is the number of frames or timesteps from the observed sequence that are to be dropped per pedestrian in the scene.

Collision Analysis

To demonstrate the capability of our spatial attention mechanism to predict safe, socially-acceptable trajectories, we evaluate the ability of trajectories predicted by our model to avoid collisions . For a given prediction time window of 12 timesteps and a certain distance threshold, we say the situation contains a collision if the distance between any two pedestrians in the frame drops below the distance threshold at any of the 12 timesteps. To evaluate SCAN and vanillaSCAN for socially acceptable trajectories and ability to avoid collisions, run:

sh scripts/collisions.sh <dataset_name> 

scan's People

Contributors

pedestrian-intent avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.