Giter Site home page Giter Site logo

edwardleelpz / powerbev Goto Github PK

View Code? Open in Web Editor NEW
81.0 3.0 17.0 456 KB

POWERBEV, a novel and elegant vision-based end-to-end framework that only consists of 2D convolutional layers to perform perception and forecasting of multiple objects in BEVs.

License: Other

Python 100.00%

powerbev's Introduction

PowerBEV

This is the official PyTorch implementation of the paper:

PowerBEV: A Powerful yet Lightweight Framework for Instance Prediction in Bird's-Eye View
Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, JΓΌrgen Gall

πŸ“ƒ Contents

πŸ“° News

βš™οΈ Setup

Create the conda environment by running

conda env create -f environment.yml

πŸ“ Dataset

  • Download the full NuScenes dataset (v1.0), which includes the Mini dataset (metadata and sensor file blobs) and the Trainval dataset (metadata and file blobs part 1-10).
  • Extract the tar files to the default nuscenes/ or to YOUR_NUSCENES_DATAROOT. The files should be organized in the following structure:
    nuscenes/
    β”œβ”€β”€β”€β”€ trainval/
    β”‚     β”œβ”€β”€β”€β”€ maps/
    β”‚     β”œβ”€β”€β”€β”€ samples/
    β”‚     β”œβ”€β”€β”€β”€ sweeps/
    β”‚     └──── v1.0-trainval/
    └──── mini/
          β”œβ”€β”€β”€β”€ maps/
          β”œβ”€β”€β”€β”€ samples/
          β”œβ”€β”€β”€β”€ sweeps/
          └──── v1.0-mini/
    

πŸ”₯ Pre-trained models

The config file can be found in powerbev/configs

Config Weights Dataset Past Context Future Horizon BEV Size IoU VPQ
powerbev.yml PowerBEV_long.ckpt NuScenes 1.0s 2.0s 100m x 100m (50cm res.) 39.3 33.8
powerbev.yml PowerBEV_short.ckpt NuScenes 1.0s 2.0s 30m x 30m (15cm res.) 62.5 55.5

Note: All metrics above are obtained by training based on pre-trained static weights (static long/static short).

🏊 Training

To train the model from scratch on NuScenes, run

python train.py --config powerbev/configs/powerbev.yml

To train the model from the pre-trained static checkpoint on NuScenes, download pre-trained static weights (static long/static short) to YOUR_PRETRAINED_STATIC_WEIGHTS_PATH and run

python train.py --config powerbev/configs/powerbev.yml \
                PRETRAINED.LOAD_WEIGHTS True \
                PRETRAINED.PATH $YOUR_PRETRAINED_STATIC_WEIGHTS_PATH

Note: These will train the model on 4 GPUs, each with a batch of size 2.

To set your configs, please run

python train.py --config powerbev/configs/powerbev.yml \
                DATASET.DATAROOT $YOUR_NUSCENES_DATAROOT \
                LOG_DIR $YOUR_OUTPUT_PATH \
                GPUS [0] \
                BATCHSIZE $YOUR_DESIRED_BATCHSIZE

The above settings can also be changed directly by modifying powerbev.yml. Please see the config.py for more information.

πŸ„ Prediction

Evaluation

Download trained weights (long/short) to YOUR_PRETRAINED_WEIGHTS_PATH and run

python test.py --config powerbev/configs/powerbev.yml \
                PRETRAINED.LOAD_WEIGHTS True \
                PRETRAINED.PATH $YOUR_PRETRAINED_WEIGHTS_PATH

Visualisation

Download trained weights (long/short) to YOUR_PRETRAINED_WEIGHTS_PATH and run

python visualise.py --config powerbev/configs/powerbev.yml \
                PRETRAINED.LOAD_WEIGHTS True \
                PRETRAINED.PATH $YOUR_PRETRAINED_WEIGHTS_PATH \
                BATCHSIZE 1

This will render predictions from the network and save them to an visualization_outputs folder. Note: To visualize Ground Truth, please add the config VISUALIZATION.VIS_GT True at the end of the command

πŸ“œ License

PowerBEV is released under the MIT license. Please see the LICENSE file for more information.

πŸ”— Citation

@article{li2023powerbev,
  title     = {PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird's-Eye View},
  author    = {Li, Peizheng and Ding, Shuxiao and Chen, Xieyuanli and Hanselmann, Niklas and Cordts, Marius and Gall, Juergen},
  journal   = {arXiv preprint arXiv:2306.10761},
  year      = {2023}
}
@inproceedings{ijcai2023p120,
  title     = {PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View},
  author    = {Li, Peizheng and Ding, Shuxiao and Chen, Xieyuanli and Hanselmann, Niklas and Cordts, Marius and Gall, Juergen},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-23}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Edith Elkind},
  pages     = {1080--1088},
  year      = {2023},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2023/120},
  url       = {https://doi.org/10.24963/ijcai.2023/120},
}

powerbev's People

Contributors

edwardleelpz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

powerbev's Issues

predict_instance_segmentation

Hello @EdwardLeeLPZ ,

Thanks for your great work! I have one question about the predict_instance_segmentation function in instance.py. Could you please tell me why you use output['instance_flow'][b, 1:2].detach(), rather than the first predicted instance flow [b, 0:1] to generate the instance in get_instance_segmentation_and_centers?

how to train on own dataset?

Hi, thank you for sharing the great work! I want to train the model on own dataset. Can you give me some advice? My dataset is the same as kitti style.

N_FUTURE_FRAMES: 0 is not working

Thank you for sharing your code first. I set N_FUTURE_FRAMES to 0 for testing the encoder of the model. I noticed that in this case, the loss calculation does not proceed correctly. Could you provide any comments or suggestions for me?

Question about warp features.

Thank you for your kindness answer.

I have another question.

        cum_flow = flow[:, t - 1] @ cum_flow

In the function, when t is 1, flow[:, 0] represents the transformation matrix from timestep 0 to timestep 1, and cum_flow represents the transformation matrix from timestep 1 to timestep 2. I'm wondering if it should be cum_flow @ flow[:, t - 1] instead, assuming the input x has timesteps 0, 1, and 2.

def cumulative_warp_features(x, flow, mode='nearest', spatial_extent=None):
    """ Warps a sequence of feature maps by accumulating incremental 2d flow.

    x[:, -1] remains unchanged
    x[:, -2] is warped using flow[:, -2]
    x[:, -3] is warped using flow[:, -3] @ flow[:, -2]
    ...
    x[:, 0] is warped using flow[:, 0] @ ... @ flow[:, -3] @ flow[:, -2]

    Args:
        x: (b, t, c, h, w) sequence of feature maps
        flow: (b, t, 6) sequence of 6 DoF pose
            from t to t+1 (only uses the xy poriton)

    """
    sequence_length = x.shape[1]
    if sequence_length == 1:
        return x

    flow = pose_vec2mat(flow)

    out = [x[:, -1]]
    cum_flow = flow[:, -2]
    for t in reversed(range(sequence_length - 1)):
        out.append(warp_features(x[:, t], mat2pose_vec(cum_flow), mode=mode, spatial_extent=spatial_extent))
        # @ is the equivalent of torch.bmm
        cum_flow = flow[:, t - 1] @ cum_flow

    return torch.stack(out[::-1], 1)

How can I reproduce the reported results.

Hello!
I have another question. I trained a model from scratch with a batch size of 8 on a single A100 80GB GPU.
I conducted the training twice, but in both instances, the Volumetric Panoptic Quality (VPQ) was lower than the performance reported in the paper. Could you tell me how I can reproduce the results?

VPQ: 30.64(first), 30.29(second)

And how can I train the static model?

image

TAG: 'powerbev'

GPUS: [0]

BATCHSIZE: 8
PRECISION: 16

LIFT:
  # Long
  X_BOUND: [-50.0, 50.0, 0.5]  #Β Forward
  Y_BOUND: [-50.0, 50.0, 0.5]  # Sides

  # # Short
  # X_BOUND: [-15.0, 15.0, 0.15]  #Β Forward
  # Y_BOUND: [-15.0, 15.0, 0.15]  # Sides

MODEL:
  BN_MOMENTUM: 0.05

N_WORKERS: 16
VIS_INTERVAL: 100

Evaluation range

Thank you for your great work!

I'm a beginner in this field. When measuring evaluation ranges (i.e., short, long), shouldn't we measure both in one model and publish it in the paper? Did FIERY, for example, train two models with different resolutions for each range and measure performance?

RuntimeError: Tensors must be CUDA and dense

Dear authors:
Hi ! @EdwardLeeLPZ
when i run the train.py with two gpus, I met the wrong, ie,
File "XXX/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(self.process_group, tensors, buffer_size, authoritative_rank)
showed β€œTensors must be CUDA and dense".
However,I examine the parameter of tensors, this is a list where all elements are on cuda: 0.Hence, I do not know what's wrong?Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.