Giter Site home page Giter Site logo

yancie-yjr / streamyolo Goto Github PK

View Code? Open in Web Editor NEW
298.0 15.0 40.0 2.87 MB

Real-time Object Detection for Streaming Perception, CVPR 2022

License: Apache License 2.0

Python 99.38% Shell 0.22% Cython 0.40%
object-detection streaming-perception autonomous-driving

streamyolo's Introduction

StreamYOLO

Real-time Object Detection for Streaming Perception

Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian
Real-time Object Detection for Streaming Perception, CVPR 2022 (Oral)
[Paper]

Benchmark

Model size velocity sAP
0.5:0.95
sAP50 sAP75 weights COCO pretrained weights
StreamYOLO-s 600×960 1x 29.8 50.3 29.8 github github
StreamYOLO-m 600×960 1x 33.7 54.5 34.0 github github
StreamYOLO-l 600×960 1x 36.9 58.1 37.5 github github
StreamYOLO-l 600×960 2x 34.6 56.3 34.7 github github
StreamYOLO-l 600×960 still 39.4 60.0 40.2 github github

Quick Start

Dataset preparation

You can download Argoverse-1.1 full dataset and annotation from HERE and unzip it.

The folder structure should be organized as follows before our processing.

StreamYOLO
├── exps
├── tools
├── yolox
├── data
│   ├── Argoverse-1.1
│   │   ├── annotations
│   │       ├── tracking
│   │           ├── train
│   │           ├── val
│   │           ├── test
│   ├── Argoverse-HD
│   │   ├── annotations
│   │       ├── test-meta.json
│   │       ├── train.json
│   │       ├── val.json

The hash strings represent different video sequences in Argoverse, and ring_front_center is one of the sensors for that sequence. Argoverse-HD annotations correspond to images from this sensor. Information from other sensors (other ring cameras or LiDAR) is not used, but our framework can be also extended to these modalities or to a multi-modality setting.

Installation
# basic python libraries
conda create --name streamyolo python=3.7

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

pip3 install yolox==0.3
git clone [email protected]:yancie-yjr/StreamYOLO.git

cd StreamYOLO/

# add StreamYOLO to PYTHONPATH and add this line to ~/.bashrc or ~/.zshrc (change the file accordingly)
ADDPATH=$(pwd)
echo export PYTHONPATH=$PYTHONPATH:$ADDPATH >> ~/.bashrc
source ~/.bashrc

# Installing `mmcv` for the official sAP evaluation:
# Please replace `{cu_version}` and ``{torch_version}`` with the versions you are currently using.
# You will get import or runtime errors if the versions are incorrect.
pip install mmcv-full==1.1.5 -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
Reproduce our results on Argoverse-HD

Step1. Prepare COCO dataset

cd <StreamYOLO_HOME>
ln -s /path/to/your/Argoverse-1.1 ./data/Argoverse-1.1
ln -s /path/to/your/Argoverse-HD ./data/Argoverse-HD

Step2. Reproduce our results on Argoverse:

python tools/train.py -f cfgs/m_s50_onex_dfp_tal_flip.py -d 8 -b 32 -c [/path/to/your/coco_pretrained_path] -o --fp16
  • -d: number of gpu devices.
  • -b: total batch size, the recommended number for -b is num-gpu * 8.
  • --fp16: mixed precision training.
  • -c: model checkpoint path.
Offline Evaluation

We support batch testing for fast evaluation:

python tools/eval.py -f  cfgs/l_s50_onex_dfp_tal_flip.py -c [/path/to/your/model_path] -b 64 -d 8 --conf 0.01 [--fp16] [--fuse]
  • --fuse: fuse conv and bn.
  • -d: number of GPUs used for evaluation. DEFAULT: All GPUs available will be used.
  • -b: total batch size across on all GPUs.
  • -c: model checkpoint path.
  • --conf: NMS threshold. If using 0.001, the performance will further improve by 0.2~0.3 sAP.
Online Evaluation

We modify the online evaluation from sAP

Please use 1 V100 GPU to test the performance since other GPUs with low computing power will trigger non-real-time results!!!!!!!!

cd sAP/streamyolo
bash streamyolo.sh

Citation

Please cite the following paper if this repo helps your research:

@inproceedings{streamyolo,
  title={Real-time Object Detection for Streaming Perception},
  author={Yang, Jinrong and Liu, Songtao and Li, Zeming and Li, Xiaoping and Sun, Jian},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5385--5395},
  year={2022}
}
@article{yang2022streamyolo,
  title={StreamYOLO: Real-time Object Detection for Streaming Perception},
  author={Yang, Jinrong and Liu, Songtao and Li, Zeming and Li, Xiaoping and Sun, Jian},
  journal={arXiv preprint arXiv:2207.10433},
  year={2022}
}

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

streamyolo's People

Contributors

goatmessi7 avatar pinto0309 avatar yancie-yjr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streamyolo's Issues

Figure 2 in the paper

Hi, I have read your paper.

I have a question in figure 2.

On the page3 in the paper, you wrote the expression "the output y1 of the frame F1 is matched and evaluated with the ground truth of F3 and the result of F2 is missed" about Figure 2.

I understood like that expression mean y1 is the output of the none-real-time detectors of frame F1.

But, before the frame F3 is received, the frame F2 is received in first.

So I can't understand that point and I also want to ask when the output of the frame f0 come out.

A small bug in README about Dataset Prep.

For Developers

Hi!
When reproducing your results on Argoverse-HD, I found that the directory structure you provided in Quick Start - Dataset preparation section doesn't match the original directory structure of Argoverse-HD dataset, as well as your code required.
The directory structure in Quick Start - Dataset preparation section:

StreamYOLO
├── exps
├── tools
├── yolox
├── data
│   ├── Argoverse-1.1
│   │   ├── annotations
│   │       ├── tracking
│   │           ├── train
│   │           ├── val
│   │           ├── test
│   ├── Argoverse-HD
│   │   ├── annotations
│   │       ├── test-meta.json
│   │       ├── train.json
│   │       ├── val.json

should be edited as:

StreamYOLO
├── exps
├── tools
├── yolox
├── data
│   ├── Argoverse-1.1
│   │   ├── tracking
│   │       ├── train
│   │       ├── val
│   │       ├── test
│   ├── Argoverse-HD
│   │   ├── annotations
│   │       ├── test-meta.json
│   │       ├── train.json
│   │       ├── val.json

which matches the directory structure of the Argoverse-HD dataset:
Screenshot 2022-09-21 151703.png

For Stargazers

BTW, if anyone manually modifies the directory structure to fit the one provided in README, an AssertionError will occur: (some parts of file path was edited)

AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\torch\utils\data\_utils\worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "%HOME%\anaconda3\envs\streamyolo\lib\site-packages\yolox\data\datasets\datasets_wrapper.py", line 110, in wrapper
    ret_val = getitem_fn(self, index)
  File "%WORKSPACE%\StreamYOLO\exps\data\tal_flip_mosaicdetection.py", line 255, in __getitem__
    img, support_img, label, support_label, img_info, id_ = self._dataset.pull_item(idx)
  File "%WORKSPACE%\StreamYOLO\exps\dataset\tal_flip_one_future_argoversedataset.py", line 227, in pull_item
    img = self.load_resized_img(index)
  File "%WORKSPACE%\StreamYOLO\exps\dataset\tal_flip_one_future_argoversedataset.py", line 180, in load_resized_img
    img = self.load_image(index)
  File "%WORKSPACE%\StreamYOLO\exps\dataset\tal_flip_one_future_argoversedataset.py", line 196, in load_image
    assert img is not None
AssertionError

If anyone gets the similar error message, the content in For Developers may be helpful.

How can i save the detection result?

Hi, thank you for suggesting your nice code.

I trained the model using Argoverse dataset following your readme.

I want to run demo and save detection results (image or video), how can i do that?

thank you.

ModuleNotFoundError: No module named 'exps'

hi everyone, I got this issue
...File "cfgs/m_s50_onex_dfp_tal_flip.py", line 189, in get_trainer
from exps.train_utils.double_trainer import Trainer
ModuleNotFoundError: No module named 'exps'

Actually I ran code on local I got this error but when I try "echo export PYTHONPATH=$PYTHONPATH:$ADDPATH >> " it worked. But as you can guess my local GPU didn't enough for training. And I established everything on colab but this time "echo export..." didn't save me.

Multi-camera setup

Hey @yancie-yjr, this project looks great! I had a question regarding using multiple cameras with one model.

Imagine a situation where you have N number of cameras for a car and a device that can run only one StreamYOLO model for inferencing. Can we get away with detecting on those N cameras by generating N feature buffers and swapping them out for each camera?

KeyError: 'model'

when I tried to train
File "/home/pe/projects/czy/StreamYOLO-main/exps/train_utils/double_trainer.py", line 314, in resume_train
ckpt = torch.load(ckpt_file, map_location=self.device)["model"]
│ │ │ │ └ 'cuda:0'
│ │ │ └ <exps.train_utils.double_trainer.Trainer object at 0x7fe2e69a3650>
│ │ └ '/home/pe/projects/czy/StreamYOLO-main/tools/yolox_s.pth'
│ └ <function load at 0x7fe2e8dc8710>
└ <module 'torch' from '/home/pe/anaconda3/envs/streamyolo/lib/python3.7/site-packages/torch/init.py'>
KeyError: 'model'
Could you tell me how to solve this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.