edwardleelpz / powerbev Goto Github PK

POWERBEV, a novel and elegant vision-based end-to-end framework that only consists of 2D convolutional layers to perform perception and forecasting of multiple objects in BEVs.

License: Other

Python 100.00%

powerbev's Introduction

PowerBEV

This is the official PyTorch implementation of the paper:

PowerBEV: A Powerful yet Lightweight Framework for Instance Prediction in Bird's-Eye View
Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, Jürgen Gall

📃 Contents

PowerBEV

📰 News

PowerBEV has been accepted by the 32nd International Joint Conference on Artificial Intelligence.
PowerBEV has been included in ROAD++: The Second Workshop & Challenge on Event Detection for Situation Awareness in Autonomous Driving @ ICCV 2023.

⚙️ Setup

Create the conda environment by running

conda env create -f environment.yml

📁 Dataset

Download the full NuScenes dataset (v1.0), which includes the Mini dataset (metadata and sensor file blobs) and the Trainval dataset (metadata and file blobs part 1-10).

Extract the tar files to the default nuscenes/ or to YOUR_NUSCENES_DATAROOT. The files should be organized in the following structure:

nuscenes/
├──── trainval/
│     ├──── maps/
│     ├──── samples/
│     ├──── sweeps/
│     └──── v1.0-trainval/
└──── mini/
      ├──── maps/
      ├──── samples/
      ├──── sweeps/
      └──── v1.0-mini/

🔥 Pre-trained models

The config file can be found in powerbev/configs

Config	Weights	Dataset	Past Context	Future Horizon	BEV Size	IoU	VPQ
`powerbev.yml`	`PowerBEV_long.ckpt`	NuScenes	1.0s	2.0s	100m x 100m (50cm res.)	39.3	33.8
`powerbev.yml`	`PowerBEV_short.ckpt`	NuScenes	1.0s	2.0s	30m x 30m (15cm res.)	62.5	55.5

Note: All metrics above are obtained by training based on pre-trained static weights (static long/static short).

🏊 Training

To train the model from scratch on NuScenes, run

python train.py --config powerbev/configs/powerbev.yml

To train the model from the pre-trained static checkpoint on NuScenes, download pre-trained static weights (static long/static short) to YOUR_PRETRAINED_STATIC_WEIGHTS_PATH and run

python train.py --config powerbev/configs/powerbev.yml \
                PRETRAINED.LOAD_WEIGHTS True \
                PRETRAINED.PATH $YOUR_PRETRAINED_STATIC_WEIGHTS_PATH

Note: These will train the model on 4 GPUs, each with a batch of size 2.

To set your configs, please run

python train.py --config powerbev/configs/powerbev.yml \
                DATASET.DATAROOT $YOUR_NUSCENES_DATAROOT \
                LOG_DIR $YOUR_OUTPUT_PATH \
                GPUS [0] \
                BATCHSIZE $YOUR_DESIRED_BATCHSIZE

The above settings can also be changed directly by modifying powerbev.yml. Please see the config.py for more information.

🏄 Prediction

Evaluation

Download trained weights (long/short) to YOUR_PRETRAINED_WEIGHTS_PATH and run

python test.py --config powerbev/configs/powerbev.yml \
                PRETRAINED.LOAD_WEIGHTS True \
                PRETRAINED.PATH $YOUR_PRETRAINED_WEIGHTS_PATH

Visualisation

Download trained weights (long/short) to YOUR_PRETRAINED_WEIGHTS_PATH and run

python visualise.py --config powerbev/configs/powerbev.yml \
                PRETRAINED.LOAD_WEIGHTS True \
                PRETRAINED.PATH $YOUR_PRETRAINED_WEIGHTS_PATH \
                BATCHSIZE 1

This will render predictions from the network and save them to an visualization_outputs folder. Note: To visualize Ground Truth, please add the config VISUALIZATION.VIS_GT True at the end of the command

📜 License

PowerBEV is released under the MIT license. Please see the LICENSE file for more information.

🔗 Citation

@article{li2023powerbev,
  title     = {PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird's-Eye View},
  author    = {Li, Peizheng and Ding, Shuxiao and Chen, Xieyuanli and Hanselmann, Niklas and Cordts, Marius and Gall, Juergen},
  journal   = {arXiv preprint arXiv:2306.10761},
  year      = {2023}
}
@inproceedings{ijcai2023p120,
  title     = {PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View},
  author    = {Li, Peizheng and Ding, Shuxiao and Chen, Xieyuanli and Hanselmann, Niklas and Cordts, Marius and Gall, Juergen},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-23}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Edith Elkind},
  pages     = {1080--1088},
  year      = {2023},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2023/120},
  url       = {https://doi.org/10.24963/ijcai.2023/120},
}

powerbev's People

Contributors

Stargazers

Watchers

Forkers

chaomath jianglingxin tommyanqi noticeable sheepyrc ai-jie01 bobe-wang carrotsniper dannyway03 zhangzw12319 xuweiyichen truongconghiep thol1998 finninmunich osiris-b dsx0511 fshi2006

powerbev's Issues

predict_instance_segmentation

Hello @EdwardLeeLPZ ,

Thanks for your great work! I have one question about the predict_instance_segmentation function in instance.py. Could you please tell me why you use output['instance_flow'][b, 1:2].detach(), rather than the first predicted instance flow [b, 0:1] to generate the instance in get_instance_segmentation_and_centers?

how to train on own dataset?

Hi, thank you for sharing the great work! I want to train the model on own dataset. Can you give me some advice? My dataset is the same as kitti style.

N_FUTURE_FRAMES: 0 is not working

Thank you for sharing your code first. I set N_FUTURE_FRAMES to 0 for testing the encoder of the model. I noticed that in this case, the loss calculation does not proceed correctly. Could you provide any comments or suggestions for me?

Question about warp features.

Thank you for your kindness answer.

I have another question.

        cum_flow = flow[:, t - 1] @ cum_flow

In the function, when t is 1, flow[:, 0] represents the transformation matrix from timestep 0 to timestep 1, and cum_flow represents the transformation matrix from timestep 1 to timestep 2. I'm wondering if it should be cum_flow @ flow[:, t - 1] instead, assuming the input x has timesteps 0, 1, and 2.

def cumulative_warp_features(x, flow, mode='nearest', spatial_extent=None):
    """ Warps a sequence of feature maps by accumulating incremental 2d flow.

    x[:, -1] remains unchanged
    x[:, -2] is warped using flow[:, -2]
    x[:, -3] is warped using flow[:, -3] @ flow[:, -2]
    ...
    x[:, 0] is warped using flow[:, 0] @ ... @ flow[:, -3] @ flow[:, -2]

    Args:
        x: (b, t, c, h, w) sequence of feature maps
        flow: (b, t, 6) sequence of 6 DoF pose
            from t to t+1 (only uses the xy poriton)

    """
    sequence_length = x.shape[1]
    if sequence_length == 1:
        return x

    flow = pose_vec2mat(flow)

    out = [x[:, -1]]
    cum_flow = flow[:, -2]
    for t in reversed(range(sequence_length - 1)):
        out.append(warp_features(x[:, t], mat2pose_vec(cum_flow), mode=mode, spatial_extent=spatial_extent))
        # @ is the equivalent of torch.bmm
        cum_flow = flow[:, t - 1] @ cum_flow

    return torch.stack(out[::-1], 1)

How many epochs should I train to get the same result like your .ckpt file given?

How many epochs should I train to get the same result like your ckpt file given?

How can I reproduce the reported results.

Hello!
I have another question. I trained a model from scratch with a batch size of 8 on a single A100 80GB GPU.
I conducted the training twice, but in both instances, the Volumetric Panoptic Quality (VPQ) was lower than the performance reported in the paper. Could you tell me how I can reproduce the results?

VPQ: 30.64(first), 30.29(second)

And how can I train the static model?

TAG: 'powerbev'

GPUS: [0]

BATCHSIZE: 8
PRECISION: 16

LIFT:
  # Long
  X_BOUND: [-50.0, 50.0, 0.5]  # Forward
  Y_BOUND: [-50.0, 50.0, 0.5]  # Sides

  # # Short
  # X_BOUND: [-15.0, 15.0, 0.15]  # Forward
  # Y_BOUND: [-15.0, 15.0, 0.15]  # Sides

MODEL:
  BN_MOMENTUM: 0.05

N_WORKERS: 16
VIS_INTERVAL: 100

How to visualize the ground truth?

I am sorry to bother you. I ran the visual.py, then only got the predictions. How can i get the gt?

Evaluation range

Thank you for your great work!

I'm a beginner in this field. When measuring evaluation ranges (i.e., short, long), shouldn't we measure both in one model and publish it in the paper? Did FIERY, for example, train two models with different resolutions for each range and measure performance?

you have asked for ntive AMP on CPU, but AMP is only available on GPU?

you have asked for ntive AMP on CPU, but AMP is only available on GPU?
When I followed your instructions, this error occurred.

Excuse me, dear author! Why transposes the flow_offset[0] and flow_offset[1]?

Excuse me, dear author! Why transposes the flow_offset[0] and flow_offset[1]?
While the related codes in the fiery don't take this action.

RuntimeError: Tensors must be CUDA and dense

Dear authors:
Hi ! @EdwardLeeLPZ
when i run the train.py with two gpus, I met the wrong, ie,
File "XXX/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size, authoritative_rank)
showed “Tensors must be CUDA and dense".
However,I examine the parameter of tensors, this is a list where all elements are on cuda: 0.Hence, I do not know what's wrong？Thanks！

mapping = (iou > 0.5).nonzero(as_tuple=False)

Hi @EdwardLeeLPZ ,

I'm sorry to bother you again. Could you please tell me how to choose the proper threshold (0.5) in mapping = (iou > 0.5).nonzero(as_tuple=False) of the metrics.py since it affects the results seriously? Thanks!