This repo contains the codes and steps to perform object detection on stanford drone dataset in DarkNet YOLO-V4 framework. Below you can see the visualization of detected objects with their accuracies.
Inferencing on scene deathCircle
of Stanford Drone Dataset(Trained on very small data. So low accuracy). Let's start to get some cool detections like this!
![]() |
---|
Wait! Wait! There are lot more boring stuffs to do before diving into the deep learning. Since this is a video dataset, the major task is to preprocess and clean the data. There is no point to train with the whole dataset. First of all, you need to choose some of the videos according to your purpose of detection. But how you would choose them?? 80-90% of the dataset is occupied by pedestrians and bikers. For example, if you want to detect only cars, obviously random selection of videos will not work. I've made it easy for you. (If you want to skip this part, download the prepared dataset from my Drive). This dataset is balanced over 3 classes i.e. pedestrians, bikers and cars only.
Set up a python virtual environmet with requirements.txt
.
- Download the dataset from here
- Modify the directoy format into this format from which ever format you got. This will make the process easier.
- Put the
get_stats.py
into the parent directory i.e the dataset directory. This will give you the whole summery into each directory for each video in a Bar chart format. Like this
![]() |
![]() |
---|
This will also help you to overcome the class imbalances in your final dataset.
- Select the videos according to your detection task and put into another directory.
- Then rename the videos and labels into this format.
![]() |
![]() |
---|
This is going to help you in your remaining tasks.
$cd data_preparation
and Putget_train.py
,get_test.py
,get_valid.py
and run them bypython3 get_train.py
and similarly remianing two.- This will split each video into frames and store each 30th, 89th and 91st frame for train, test & validation respectively (taking 1 frame per second) And also generate corresponding
.csv
files. Make sure your system haveffmpeg
installed otherwise dosudo apt update
thensudo apt install ffmpeg
- . It will convert the labels to YOLO format also. YOLO needs the labels in the format
<class_id x_center_norm y_center_norm width height>
wherex_center_norm = x_center/width, y_center_norm = y_center/height
. For example for image.jpg, the image.txt will contain
1 0.8778 0.0143 0.04311 0.0287
2 0.8040 0.0236 0.06959 0.0435
2 0.0083 0.1345 0.01664 0.0379
Now you are good to go.
- Clone the repository
git clone https://github.com/AlexeyAB/darknet.git
For training on Stanford Drone Data, follow the step 1 from AlexeyAB's Repo
If you are training on local gpu, make sure you have cuda
and cuDNN
with compatible version installed in your system.
- Create file
obj.names
in the directorydarknet/data/
, with objects names - each in new line
pedestrian
biker
skater
car
cart
bus
- Create file
obj.data
in the directorydarknet/data/
, containing (where classes = number of objects):
classes = 6
train = data/train.txt
valid = data/test.txt
names = data/obj.names
backup = backup/
- Put image-files (.jpg) of your objects in the directory
darknet/data/obj/
- Create file
train.txt
in directorydarknet/data/
, with filenames of your images, each filename in new line, for example containing:
data/obj/img1.jpg
data/obj/img2.jpg
data/obj/img3.jpg
-
Download pre-trained weights for the convolutional layers and put to the directory
build\darknet\x64
- for
yolov4.cfg
,yolov4-custom.cfg
(162 MB): yolov4.conv.137 (Google drive mirror yolov4.conv.137 )
- for
-
Start training by using the command line:
./darknet detector train data/obj.data yolo-obj.cfg yolov4.conv.137
-
For Image:
!./darknet detector test data/obj.data cfg/yolov4-custom.cfg yolov4-custom_best.weights test_image.jpg -thresh 0.3
checkdarknet/predictions.jpg
-
For video:
!./darknet detector demo data/obj.data cfg/yolov4-custom.cfg yolov4-custom_best.weights -dont_show test_vid.mp4 -thresh 0.5 -i 0 -out_filename output.mp4