CVPR 2020, "MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps"

autonomous-driving cvpr2020 motion-prediction perception pytorch

motionnet's Issues

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

python --data /DataSet/ --batch 8 --nepoch 45 --nworker 4 --use_bg_tc --reg_weight_bg_tc 0.1 --use_fg_tc --reg_weight_fg_tc 2.5 --use_sc --reg_weight_sc 15.0 --log

Namespace(batch=8, data='/DataSet/', log=True, logpath='', nepoch=45, nn_sampling=False, nworker=4, reg_weight_bg_tc=0.1, reg_weight_fg_tc=2.5, reg_weight_sc=15.0, resume='', use_bg_tc=True, use_fg_tc=True, use_sc=True)
device number 1
data root: /DataSet/
Training dataset size: 17065
Epoch 1, learning rate 0.0016
Traceback (most recent call last):
File "", line 610, in
File "", line 183, in main
= train(model, criterion, trainloader, optimizer, device, epoch)
File "", line 226, in train
motion_pred, trans_matrices, pixel_instance_map)
File "", line 356, in compute_and_bp_loss
File "/home/redrafi/.local/lib/python3.7/site-packages/torch/", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/redrafi/.local/lib/python3.7/site-packages/torch/autograd/", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

some problem about the motionnet postprocess

First, thanks for your work!
there has some place in postprocess part make me confused

# We only show the cells having one-hot category vectors
max_prob = np.amax(pixel_cat_map_gt, axis=-1)
filter_mask = max_prob == 1.0
pixel_cat_map = np.argmax(pixel_cat_map_gt, axis=-1) + 1  # category starts from 1 (background), etc
pixel_cat_map = (pixel_cat_map * non_empty_map * filter_mask).astype(

cat_pred = np.argmax(cat_pred, axis=0) + 1
cat_pred = (cat_pred * non_empty_map * filter_mask).astype(

the tensor cat_pred output by motionnet looks like is the catrgory of per pixel in lidar bev map.
but what the filter mask mean in this part ?
cat_pred = (cat_pred * non_empty_map).astype( will also output normal result

the filter_mask tensor related to pixel_cat_map_gt value, but if I test MotionNet on my own lidar data, that means I have no any GT-boxes annotations, and the filter_mask may be can't be compute.
I have no idea my understanding is correct, Hope for any reply!

BrokenPipeError during training process

Hi Wu,

Thanks for your excellent work! When I train the network with provided training script, it always shows errors at some batches or epochs. error information is as:
Traceback (most recent call last):
File "", line 611, in
File "", line 184, in main
= train(model, criterion, trainloader, optimizer, device, epoch)
File "", line 212, in train
for i, data in enumerate(trainloader, 0):
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/site-packages/torch/utils/data/", line 582, in next
return self._process_next_batch(batch)
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/site-packages/torch/utils/data/", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
BrokenPipeError: Traceback (most recent call last):
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/site-packages/torch/utils/data/_utils/", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/site-packages/torch/utils/data/_utils/", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/aiserver/FangYang/MotionNet/data/", line 58, in getitem
if idx in self.cache:
File "", line 2, in contains
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/multiprocessing/", line 818, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/multiprocessing/", line 206, in send
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/multiprocessing/", line 404, in _send_bytes
self._send(header + buf)
File "/home/aiserver/anaconda3/envs/MotionNet/lib/python3.7/multiprocessing/", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

I have no idea why it happened, could you help me figure it out ? Thank you!

Bug in [os.listdir does not guarantee the order of the returned list of files on other systems beyond Ubuntu]


I think I found a bug in data/

            gt_file_paths = [
                os.path.join(seq_dir, f)
                for f in os.listdir(seq_dir)
                if os.path.isfile(os.path.join(seq_dir, f))
            num_gt_files = len(gt_file_paths)

            assert gt_file_paths == sorted(gt_file_paths), gt_file_paths  # <-- line inserted by me
            gt_dict_list = []
            for f in range(
            ):  # process the files, starting from 0.npy to 1.npy, etc

The gt_file_paths should contain the two files (for multi_seq training) 0.npy and 1.npy and based on the following comment starting the for-loop, this list should be ordered (see assertion line I inserted and marked with a comment).
Now os.listdir does not guarantee any ordering:
Taken from here:

Python method listdir() returns a list containing the names of the entries in the directory given by path. The list is in arbitrary order. [...]

When I run the above code with the assertion, on the cluster everything is fine. However, on my local machine throws the assertion. So the ordering probably depends on the file system, OS, and os.listdir implementation, but one might get lucky and it will be always ordered.

However, I started wondering if this might even be kind of an augmentation (reverse in time), or if it even really matters (network applies to both 0.npy and 1.npy independently, and they are only combined for the background consistency loss, in which probably the ordering is(?) relevant, as one uses the trans_matrix which only works in one direction by definition? Could you shed some light on this, if it was intended or not?

The datasets of

The trainval on the nuscenes official website is divided into 10 parts. Do I need to download all the datasets of these parts?

hello! When I run the, it has a problem in LidarPointCloud(from nuscenes.utils.data_classes import LidarPointCloud)

Traceback (most recent call last):
File "E:/MotionNet/MotionNet/data/", line 401, in
File "E:/MotionNet/MotionNet/data/", line 83, in gen_data
LidarPointCloud.from_file_multisweep_bf_sample_data(nusc, curr_sample_data,
AttributeError: type object 'LidarPointCloud' has no attribute 'from_file_multisweep_bf_sample_data'

It seen that the class of LidarPointCloud don't have 'from_file_multisweep_bf_sample_data'

Is the background loss filtering out dynamic objects?

Hi again,

this is a question related to the paper and after skimming the code I was still not quite clear on this.
Your background temporal consistency loss in equation (3) of the paper seems reasonable for static points but not for dynamic ones because you specifically wrote that the alignment transformation T is rigid and therefore cannot account for object motion.
Are you filtering out cells that dynamic/non-background for this loss?
Also why did you need a complete second set of N motion maps for the background loss?

On a side note:
In a different issue #4 (comment) you wrote:

The training usually takes less than one day on a single RTX 2080 Ti GPU.

with that GPU only having around 11GB. However even on my Tesla V100 16GB GPU the training ran out of memory at the very beginning. Running the complete training with 2 GPUs worked, though. Do you have an idea what the reason could be for this?

Thanks again for your answer.

Inactive bug in classify_speed_level

In the following snippet I added an assertion with a NOTE comment. Is it correct?
The assertion never throws, that is why I described this as an inactive bug ;)

def classify_speed_level(
    all_disp_field_gt, total_future_sweeps=20, future_frame_skip=0
    Classify each cell into static (possibly background) or moving.
    # First, compute the static and moving cell masks
    all_disp_field_gt_norm = np.linalg.norm(all_disp_field_gt, ord=2, axis=-1)

    # Every future_frame_skip frames, if the movement of grid cells does not exceed this thresh (unit: meters),
    # then they are considered as static. This thresh is set to be the maximum perturbation for 1 second.
    upper_thresh = 0.2
    upper_bound = (future_frame_skip + 1) / 20 * upper_thresh
    selected_future_sweeps = np.arange(
        0, total_future_sweeps + 1, future_frame_skip + 1
    selected_future_sweeps = selected_future_sweeps[1:]

    assert (
        future_frame_skip == 0
    )  # NOTE: the following selection does only work if no frames were skipped
    future_sweeps_disp_field_gt_norm = all_disp_field_gt_norm[
        -len(selected_future_sweeps) :, ...

Problems training train_single_seq

Impressive work.

When I tried to train the model without using spatio-temporal consistency losses from scratch, I found out that the training dataset cannot be imported successfully. It turned out that there is a typo in the line 171 of MotionNet/data/ 'os.path.isfile(os.path.join(self.dataset_root, d))]', which is supposed to be 'os.path.isdir(os.path.join(self.dataset_root, d))]'.

BTW, it would be very helpful if you can provide a pre-trained single-seq model. In addition, how long does it take to train the model?

Some questions about background_temporal_consistency loss

Hi @pxiangwu ,
Thanks for your open sources, I am just wondering why flipping curr_pred along dim=2 before applying F.affine_grid in function background_temporal_consistency_loss

# Next, translation
curr_pred = curr_pred.permute(0, 1, 3, 2).contiguous()  # swap x and y axis 
curr_pred = torch.flip(curr_pred, dims=[2]) 

grid = F.affine_grid(grid_trans_matrix_disp, curr_pred.size())

Training Set size not 17065 for NuScenes after preprocessing

First of all, thank you for releasing your code and your great work.
I have a short question regarding your MotionNet, as I am trying to reproduce your numbers. When I run the pre-processing script over the NuScenes folder everything seems to work fine and the output looks also good, with a rough training dataset size of 19GB. You reported in your data/ a total preprocessed training dataset size of 26,5 GB on your system. Is this difference realistic? Also, when I start the MGDA with ST consistency loss as shown in the, the first warning I get is "The size of training dataset is not 17065" and shortly after I am told that my Training dataset size is instead 6951. So a lot of numbers do not add up for me here (if you have 17k samples in 26GB and I less than half of samples still in 19GB, and also where are the missing 10k samples).
Maybe you can help me out here or have an idea of what is different?

Could you provide your command line when running the code? Let me check what might cause this inconsistency.

Everything I did was done in a venv with python3.6.9 and the required pip-dependencies on an Ubuntu 18.04 system.
The command line I ran was directly taken from the where I just replaced my used directories:

python $SRC_DIR/data/ --root $INPUT_DATADIR/nuscenes --split train --savepath $INPUT_DATADIR/nuscenes_preprocessed/train

The starting output looks like the following:

Loading NuScenes tables for version v1.0-trainval...                                                                                                                                                               
23 category,                                                                                                                                                                                                       
8 attribute,                                                                                                                                                                                                       
4 visibility,                                                                                                                                                                                                      
64386 instance,                                                                                                                                                                                                    
12 sensor,                                                                                                                                                                                            
10200 calibrated_sensor,                                                                                                                                                                                           
2631083 ego_pose,                                                                                                                                                                                   
68 log,                                                                                                                                                                                                      
850 scene,                                                                                                                                                                                                        
34149 sample,                                                                                                                                                                                                      
2631083 sample_data,                                                                                                                                                                                       
1166187 sample_annotation,                                                                                                                                                                                         
4 map,                                                                                                                                                                                                
Done loading in 36.6 seconds.                                                                                                                                                                                      
Reverse indexing ...
Done reverse indexing in 9.8 seconds.
Total number of scenes: 850
Split: train, which contains 500 scenes.
Processing scene 411 ...
  >> Finish sample: 0, sequence 0

When I now start a training with MGDA and ST consistency loss like described in the

python --data $INPUT_DATADIR/nuscenes_preprocessed/train --batch 8 --nepoch 70 --nworker 4 --use_bg_tc --reg_weight_bg_tc 0.1 --use_
fg_tc --reg_weight_fg_tc 2.5 --use_sc --reg_weight_sc 15.0 --reg_weight_cls 2.0 --log

I get the following output:

Namespace(batch=8, data='/xxxINPUT_DATADIRxxx(postedited for this issue)/nuscenes_preprocessed/train', log=True, logpath='', nepoch=70, nn_sampling=False, nworker=4, reg_weight_bg_tc=0.1, reg_weight_cls=2.0, reg_weight_fg_tc=2
.5, reg_weight_sc=15.0, resume='', use_bg_tc=True, use_fg_tc=True, use_sc=True)                                                                                                                                    
device number 2                                                                                                                                                                                                    
data root: /xxxINPUT_DATADIRxxx/nuscenes_preprocessed/train                                                                                                                                            
/xxxSRC_DIRxxx/data/ UserWarning: >> The size of training dataset is not 17065.                                                                                     
  warnings.warn(">> The size of training dataset is not 17065.\n")                                                                                                                                                 
Training dataset size: 6951                                                                                                                                                                                        
Epoch 1, learning rate 0.002                                                                                                                                                                                       
[1/0]   Disp 0.106501,  Obj_Cls 0.110858,       Motion_Cls 0.057613,    bg_tc 0.8646359,        sc 0.0885072,   fg_tc 0.0126457

So as you can see, there is no real problem with the preprocessing and the start of the training, however having 10k samples missing compared to the publicated results makes the reproduction of the results impossible.

Also after some time the training actually fails, but I cannot tell if it is related to this issue (I am not a pickle expert):

[1/482] Disp 0.035911,  Obj_Cls 0.068191,       Motion_Cls 0.014474,    bg_tc 0.0069933,        sc 0.0002731,   fg_tc 0.0000397
Traceback (most recent call last):
  File "", line 1042, in <module>
  File "", line 269, in main
    models, criterion, trainloader, optimizers, device, epoch
  File "", line 321, in train
    for i, data in enumerate(trainloader, 0):
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/", line 582, in __next__
    return self._process_next_batch(batch)
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
_pickle.UnpicklingError: Traceback (most recent call last):
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/_utils/", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/torch/utils/data/_utils/", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/xxxSRC_DIRxxx/data/", line 68, in __getitem__
    gt_data_handle = np.load(gt_file_path, allow_pickle=True)
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/numpy/lib/", line 440, in load
  File "/xxxSRC_DIRxxx/.venv/lib/python3.6/site-packages/numpy/lib/", line 732, in read_array
    array = pickle.load(fp, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '\x00'.

Inference on own LiDAR data


First of all would I like to thank you for the provided code.
I understand how to preprocess and train the model, but I'm in the dark about how I could inference now on my own LiDAR data. We have a Velodyne ultra puck 32 LiDAR, as well as an Ouster OS1 & OS2, from which we can receive a bytestream. Are there certain parameters I can tweak in the model to accomodate for our LiDAR setup (we're not using 360°, the angle is tilted, ...).

Thanks in advance!

if other datasets can be used by this model

I wanna know if other datasets can be used by this model.
Is it feasible to convert the annotation file into nuScenes format?
I think rewriting the dataset import may be a solution.

