Giter Site home page Giter Site logo

jiayuzou2020 / hft Goto Github PK

View Code? Open in Web Editor NEW
123.0 13.0 13.0 977 KB

[ICRA 2023] Official Pytorch implementation for HFT

Home Page: https://arxiv.org/abs/2204.05068

License: MIT License

Python 99.92% Shell 0.08%
bev-perception perspective-transform semantic-segmentation

hft's Introduction

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

This repositary contains the official Pytorch implementation for paper HFT: Lifting Perspective Representations via Hybrid Feature Transformation (2023 IEEE International Conference on Robotics and Automation , ICRA). image

Introduction

Autonomous driving requires accurate and detailed Bird's Eye View (BEV) semantic segmentation for decision making, which is one of the most challenging tasks for high-level scene perception. Feature transformation from frontal view to BEV is the pivotal technology for BEV semantic segmentation. Existing works can be roughly classified into two categories, i.e., Camera model-Based Feature Transformation (CBFT) and Camera model-Free Feature Transformation (CFFT). In this paper, we empirically analyze the vital differences between CBFT and CFFT. The former transforms features based on the flat-world assumption, which may cause distortion of regions lying above the ground plane. The latter is limited in the segmentation performance due to the absence of geometric priors and time-consuming computation. In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT). Specifically, we decouple the feature maps produced by HFT for estimating the layout of outdoor scenes in BEV. Furthermore, we design a mutual learning scheme to augment hybrid transformation by applying feature mimicking. Notably, extensive experiments demonstrate that with negligible extra overhead, HFT achieves a relative improvement of 13.3% on the Argoverse dataset and 16.8% on the KITTI 3D Object datasets compared to the best-performing existing method.

Install

To use our code, please install the following dependencies:

  • torch==1.9.1
  • torchvison==0.10.1
  • mmcv-full==1.3.15
  • CUDA 9.2+

For more requirements, please see requirements.txt for details. You can refer to the guidelines to install the environment correctly.

Data Preparation

We conduct experiments of nuScenes, Argoverse, Kitti Raw, Kitti Odometry, and Kitti 3D Object. Please down the datasets and place them under /data/nuscenes/ and so on. Note that calib.json contains the intrinsics and extrinsics matrixes of every image. Please follow here to generate the BEV annotation (ann_bev_dir) for KITTI datasets. Refer to the script make_labels to get the BEV annotation for nuScenes and Argoverse, respectively. The datasets' structures look like:

Dataset Structure

data
├── nuscenes
|   ├── img_dir
|   ├── ann_bev_dir
|   ├── calib.json
├── argoversev1.0
|   ├── img_dir
|   ├── ann_bev_dir
|   ├── calib.json
├── kitti_processed
|   ├── kitti_raw
|   |   ├── img_dir
|   |   ├── ann_bev_dir
|   |   ├── calib.json
|   ├── kitti_odometry
|   |   ├── img_dir
|   |   ├── ann_bev_dir
|   |   ├── calib.json
|   ├── kitti_object
|   |   ├── img_dir
|   |   ├── ann_bev_dir
|   |   ├── calib.json

Prepare calib.json

"calib.json" contains the camera parameters of each image. Readers can generate the "calib.json" file by the instruction of nuScenes, Argoverse, Kitti Raw, Kitti Odometry, and Kitti 3D Object. We also upload calib.json for each dataset to google drive and Baidu Net Disk.

Training

Take Argoverse as an example. To train a semantic segmentation model under a specific configuration, run:

cd HFT
python -m torch.distributed.launch --nproc_per_node ${NUM_GPU} --master_port ${PORT} tools/train.py ${CONFIG} --work-dir ${WORK_DIR} --launcher pytorch

For instance, to train Argoverse under this config, run:

cd HFT
python -m torch.distributed.launch --nproc_per_node 4 --master_port 14300 tools/train.py ./configs/pyva/pyva_swin_argoverse.py --work-dir ./models_dir/pyva_swin_argoverse --launcher pytorch

Evaluation

To evaluate the performance, run the following command:

cd HFT
python -m torch.distributed.launch --nproc_per_node ${NUM_GPU} --master_port ${PORT} tools/test.py ${CONFIG} ${MODEL_PATH} --out ${SAVE_RESULT_PATH} --eval ${METRIC} --launcher pytorch

For example, we evaluate the mIoU on Argoverse under this config by running:

cd HFT
python -m torch.distributed.launch --nproc_per_node 4 --master_port 14300 tools/test.py ./configs/pyva/pyva_swin_argoverse.py ./models_dir/pyva_swin_argoverse/iter_20000.pth  --out ./results/pyva/pyva_20k.pkl --eval mIoU --launcher pytorch

Visulization

To get the visulization results of the model, we first change the output_type from 'iou' to 'seg' in the testing process. Take this config as an example.

model = dict(
    decode_head=dict(
        type='PyramidHeadArgoverse',
        num_classes=8,
        align_corners=True),
    # change the output_type from 'iou' to 'seg'
    test_cfg=dict(mode='whole',output_type='seg',positive_thred=0.5)
)

And then, we can generate the visualization results by running the following command:

python -m torch.distributed.launch --nproc_per_node 4 --master_port 14300 tools/test.py ./configs/pyva/pyva_swin_argoverse.py ./models_dir/pyva_swin_argoverse/iter_20000.pth --format-only --eval-options "imgfile_prefix=./models_dir/pyva_swin_argoverse" --launcher pytorch

Acknowledgement

Our work is partially based on mmseg. Thanks for their contributions to the research community.

Citation

If you find our work useful in your research, please cite our work:

@article{zou2022hft,
  title={HFT: Lifting Perspective Representations via Hybrid Feature Transformation},
  author={Zou, Jiayu and Xiao, Junrui and Zhu, Zheng and Huang, Junjie and Huang, Guan and Du, Dalong and Wang, Xingang},
  journal={arXiv preprint arXiv:2204.05068},
  year={2022}
}

hft's People

Contributors

jiayuzou2020 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hft's Issues

About image resize with intrinsics in config.

Thank you for your contribution to the community through your work!
But I have some doubts about the code in config:
HFT/configs/pyva/pyva_swin_kd_simple_fpn_force_nuscenes.py

dict(type='LoadAnnotations', reduce_zero_label=False, with_calib=True, imdecode_backend='pyramid'),
dict(type='Resize', img_scale=(1024, 1024), resize_gt=False, keep_ratio=False),

These configurations resize the images of nuScenes to 1024 and in the load function:
if self.with_calib:
token = osp.basename(filename).split('.')[0]
intrinsics = torch.tensor(self.nuscenes[token])
# scale calibration matrix to account for image downsampling
intrinsics[0] *= 800 / results['img_shape'][1]
intrinsics[1] *= 600 /results['img_shape'][0]

Here, the calib (intrinsics) is multiplied by a scaling factor, provided that the image is resized to a fixed size of 800600, not 10241024.
This hardcore caused some misalignment, did I miss something?
Thank you for answering my question.

外参

你好,我注意到文章中关于lss-based模块没有外参调整部分,这部分对最后预测精度是不是造成一些损失

About data preparation

Thanks for your great work. I have some doubt reproducing your results about data preparation. The data structure puzzles me because they are not the default structure of KITTI, nuScenes and Argoverse. So I wonder how to convert from raw data to your data structure. Could you please explain in detail in README how to formulate the data? E.g. which cam to use and the calib.json file how to generate?

No such file or directory: '/data/argoversev1.0/calib.json'

Hi, Many thanks to the author for the project. I cannot find the calib.json file when I run the test code. How to generate the JSON file code?

self.argoverse = json.load(open('/data/argoversev1.0/calib.json','r'))

FileNotFoundError: [Errno 2] No such file or directory: '/data/argoversev1.0/calib.json'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.