Giter Site home page Giter Site logo

cv-core / mit-driverless-cv-traininginfra Goto Github PK

View Code? Open in Web Editor NEW
102.0 10.0 32.0 9.65 MB

PyTorch pipeline of MIT Driverless Computer Vision paper(2020)

Home Page: https://mitdriverless.racing/

License: Apache License 2.0

Python 83.75% Jupyter Notebook 16.25%
yolov3 autonomous-vehicles keypoints pytorch mit-driverless resnet object-detection object-localization cnn

mit-driverless-cv-traininginfra's Introduction

Accurate Low Latency Visual Perception for Autonomous Racing: Challenges Mechanisms and Practical Solutions

This is the Pytorch side code for the accurate low latency visual perception system introduced by Kieran Strobel, Sibo Zhu, Raphael Chang, and Skanda Koppula. "Accurate Low Latency Visual Perception for Autonomous Racing: Challenges Mechanisms and Practical Solutions" . If you use the code, please cite the paper:

@misc{strobel2020accurate,
    title={Accurate, Low-Latency Visual Perception for Autonomous Racing:Challenges, Mechanisms, and Practical Solutions},
    author={Kieran Strobel and Sibo Zhu and Raphael Chang and Skanda Koppula},
    year={2020},
    eprint={2007.13971},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Abstract

Autonomous racing provides the opportunity to test safety-critical perception pipelines at their limit. This paper describes the practical challenges and solutions to applying state-of-the-art computer vision algorithms to build a low-latency, high-accuracy perception system for DUT18 Driverless(DUT18D), a 4WD electric race car with podium finishes at all Formula Driverless competitions for which it raced. The key components of DUT18D include YOLOv3-based object detection, pose estimation and time synchronization on its dual stereovision/monovision camera setup. We highlight modifications required to adapt perception CNNs to racing domains, improvements to loss functions used for pose estimation, and methodologies for sub-microsecond camera synchronization among other improvements. We perform an extensive experimental evaluation of the system, demonstrating its accuracy and low-latency in real-world racing scenarios.

CVC-YOLOv3

CVC-YOLOv3 is the MIT Driverless Custom implementation of YOLOv3.

One of our main contributions to vanilla YOLOv3 is the custom data loader we implemented:

Each set of training images from a specific sensor/lens/perspective combination is uniformly rescaled such that their landmark size distributions matched that of the camera system on the vehicle. Each training image was then padded if too small or split up into multiple images if too large.

Our final accuracy metrics for detecting traffic cones on the racing track:

mAP Recall Precision
89.35% 92.77% 86.94%

CVC-YOLOv3 Dataset with Formula Student Standard is open-sourced here

RektNet

RektNet is the MIT Driverless Custom Key Points Detection Network.

RektNet takes in bounding boxes outputed from CVC-YOLOv3 and outputs seven key points on the traffic cone, which is responsible for depth estimation of traffic cones on the 3D map. v Our final Depth estimation error VS Distance graph (The Monocular part):

RektNet Dataset with Formula Student Driverless Standard is open-sourced here

License

This repository is released under the Apache-2.0 license. See LICENSE for additional details.

mit-driverless-cv-traininginfra's People

Contributors

dependabot[bot] avatar kieranstrobel avatar shikhar413 avatar sibozhu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mit-driverless-cv-traininginfra's Issues

Unable to download RektNet dataset

Hello,

I would like to download the RektNet dataset, but I am unable to.
When I try to do wget https://storage.googleapis.com/mit-driverless-open-source/RektNet_Dataset.zip , as described in the Jupyter Notebook tutorial, I am getting the following output:

--2023-11-17 17:17:26--  https://storage.googleapis.com/mit-driverless-open-source/RektNet_Dataset.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.36.59, 142.250.179.155, 142.250.179.219, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.36.59|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-11-17 17:17:26 ERROR 403: Forbidden.

How can I access the dataset?

Kind regards,

Anton

Classes

Hello, first of all great job on this.
I wanted to ask what the YOLO classes were. I noticed that the yolo returns 80 class sigmoid values, however I am not sure what they represent. Are those only an auxiliary task to help in training or do they contain useful information such as colour or cone classification?
Thanks in advance.

Implementing Perspective-n-Point in monocular depth estimation

Hey, I've been looking around for implementing pnp once I get the keypoints of the cones, but I am not able to figure out how I can get the object points, given that I have the image points (essentially the keypoint coordinates) and the camera properties of zed 2i. Could someone help me out with how I should proceed further or how it was implemented as given in the documentation of this repository?
Any sort of help would be appreciated
Thanks!

RektNet Dataset

Hello,
Could You tell me, what program did you use for labeling your dataset for Key-point?

Perspective-n-point.

Hello.
Could you tell me, how you, having found 7 points on the cone and knowing the parameters of the calibrated camera and the dimensions of the cone, determine the position of the cone relative to the camera?
If you can, please show me the implementation in the code, or tell me what helped you in solving this problem.
Thanks.

Cant Access Pretrained Weights

I get :

This XML file does not appear to have any style information associated with it. The document tree is shown below.
`
UserProjectAccountProblem
User project billing account not in good standing.

The billing account for the owning project is disabled in state absent
`

Labelled cone dataset.

The all.csv file only have bounding box for images but not the class attribute for blue and yellow cone. Can you provide csv file with class labels as well. It would save lot of my time. thanks in advance.

DataParallel model stuck on loss.backward()

I have tried running the yolo_tutorial.ipynb code on 4 GPUs. For some reason, it gets stuck in the run_epoch method. Further analysis showed that get's stuck at the losses[0].sum().backward(). I am using pytorch 1.3.1.

Has anyone else tested this code on multiple GPUs and run in to the same issue?

I can run the code in a single GPU without an issue.

Label format - height is scaled with img width?

I have downloaded your dataset and the labels in the YOLO format (1 label file per image).

I tried to visualize the bounding boxes by using the labels and rescaling the bboxes by the image width and heigth to get the absolute pixel values.

Is it possible that you mixed up the scaling when you created the label files?

With the following code, I get the correct label position:

        h, w = img.shape[0], img.shape[1]

        for line in f.readlines():
            l = line.split()
            x_min = int(float(l[1])*w)
            y_min = int(float(l[2])*h)
            box_height = int(float(l[3])*w)
            box_width = int(float(l[4])*h)

At least from my understanding, the box_height should be scaled with the image height to get the absolute pixel values:

        h, w = img.shape[0], img.shape[1]

        for line in f.readlines():
            l = line.split()
            x_min = int(float(l[1])*w)
            y_min = int(float(l[2])*h)
            box_height = int(float(l[3])*h)
            box_width = int(float(l[4])*w)

Dataset format

Hi!
Could you, please, provide dataset format for targets?
Thank's

Dataset

RekNet for drone racing gates

Could this code work to implement keypoint regression on gates for drone racing? Of course, providing a proper training set.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.