Giter Site home page Giter Site logo

super-resolution-on-object-detection's Introduction

YOLOv7 on Cityscapes with bbox cropping

Introduction

In this project, we aimed to enhance the quality of the dashcam and monitor videos without costly upgrades. Using object detection and super-resolution techniques, we explored identifying and improving the visual details of cars or persons within low-quality frames.

We trained a YOLOv7 model on the Cityscapes dataset (convert to COCO format using cityscapes-to-coco-conversion) to detect objects of interest. Additionally, we incorporated Latent Diffusion Models (LDM) for super-resolution to further enhance the cropped regions.

Environment

  • Python 3.10.11
  • Pytorch 1.13.1
  • Torchvision 0.14.1
  • CUDA 11.7

Setup

  1. Clone the project and its submodules

    $ git clone --recurse-submodules https://github.com/ghnmqdtg/yolov7-on-cityscapes-with-bbox-cropping.git
  2. Go into the project folder

    $ cd yolov7-on-cityscapes-with-bbox-cropping
  3. Run ./scripts/setup_env.sh to setup the env.

    $ sh scripts/setup_env.sh
    • Create a conda env named yolov7_with_cropping with python 3.10.11.

    • Install pytorch with cuda 11.7.

    • Install the dependencies.

  4. (Optional) Change VSCode interpreter path with ~/.conda/envs/yolov7_with_cropping/bin/python.

  5. Modify the ./scripts/setup_dataset.sh line 5 with your cityscapes username and password.

  6. Run ./scripts/setup_dataset.sh to setup the env; this takes some time.

    $ sh scripts/setup_dataset.sh
    • Download the dataset.

    • Use cityscapes-to-coco-conversion to generate bbox annotations of Cityscapes dataset using segmentation annotations. (Cityscapes has no bbox annotations).

    • Convert annotations from COCO format to YOLO format.

  7. Download the pretrained model and put it to ./yolov7 folder.

    $ wget https://github.com/ghnmqdtg/yolov7-on-cityscapes-with-bbox-cropping/releases/download/v0.1/yolov7_cityscapes.pt \
        -O ./yolov7/yolov7_cityscapes.pt

Test Interface

Test Interface

We provide web interface to test the model. You can use the following command to start the web server.

  1. Put your street view video in ./www, and rename it to street_view.mp4.

  2. Start the backend server on a terminal

    $ cd yolov7
    $ python detect-web.py
  3. Start the front-end on the other terminal

    $ cd www
    $ sh launch.sh
  4. Go to http://localhost:30700/

Train and evaluate the YOLOv7 model

  1. You should cd to yolov7 folder first

    $ cd yolov7
  2. Train the model with cityscapes

    $ python -m torch.distributed.launch \
        --nproc_per_node 1 \
        --master_port 9527 \
        train.py \
        --workers 2 \
        --device 0 \
        --sync-bn \
        --epochs 100 \
        --batch-size 32 \
        --data data/cityscape.yaml \
        --img 640 640 \
        --cfg cfg/training/yolov7.yaml \
        --weights ./yolov7.pt \
        --hyp data/hyp.scratch.p5.yaml

    The output will be saved in runs/train.

    Click to toggle contents of YOLOv7 Model Training Results
    Training & Evaluation Report
    mAP@50: 0.61266 mAP@50:95 : 0.38005)
    Confusion Matrix
    F1 curve PR curve
    P curve R curve
  3. Evaluation

    $ python test.py \
        --data data/cityscape.yaml \
        --img 640 \
        --batch 32 \
        --conf 0.001 \
        --iou 0.65 \
        --device 0 \
        --weights yolov7_cityscapes.pt \
        --name cityscapes_yolo_cityscapes

    The output will be saved in runs/test.

Run inference

  • On single image

    Only save the cropped region of width or height greater than 32px. Because if the region is too small, it will lead super resolution to generate the obvious artifact. The output will be saved in runs/detect.

    $ python detect.py \
        --weights yolov7_cityscapes.pt \
        --conf 0.25 \
        --img-size 640 \
        --source customdata/images/test/bonn/bonn_000004_000019_leftImg8bit.png \
        --sr
        --sr-step 100
    • --sr: Enable super resolution 4x.
    • --sr-step: Control the effect of super-resolution, the larger, the better.

    Performance of Cropping & Super Resolution
    Crop Crop & SR 4x Crop Crop & SR 4x

    If you want to test super resolution only, you can use utils/custom_features.py at yolov7/ to do super resolution. If the width or height is larger than 150px, it will be resized to 150px and keep the aspect ratio first, then do super resolution.

    $ python utils/custom_features.py \
        --input-img inference/images/cropped_car.jpg \
        --sr-step 100
    
  • On a video

    Nope, I haven't tried it yet.

LICENSE

The project is made available under the MIT license. See the LICENSE file for more information.

super-resolution-on-object-detection's People

Contributors

willychen0146 avatar

Stargazers

yang_ avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.