shizukachan / darknet-nnpack Goto Github PK

View Code? Open in Web Editor NEW

96.0 96.0 57.0 4.33 MB

Fork of darknet-nnpack

License: Other

Makefile 0.37% Python 0.91% C 90.52% Shell 0.21% Cuda 7.67% C++ 0.33%

darknet-nnpack's People

Contributors

Stargazers

Watchers

darknet-nnpack's Issues

yolov3 support

currently yolov3 supported on yolov3 branch.

Tested on AMD64; needs to be regression tested on Jetson (aarch64-neon), Pi3 (neon), and Pi0 (scalar)
I won't push changes until validated.

qmkl error during make: *** No targets specified and no makefile found. Stop.

@ Install qmkl

sudo apt-get install flex
git clone https://github.com/Terminus-IMRC/qpu-assembler2
cd qpu-assembler2
cmake .
make

I got this error

pi@raspberrypi:~/qmkl $ make
make: *** No targets specified and no makefile found.  Stop.

OpenCV on GPU or CPU?

When running NNPack on the Pi GPU, for video, does OpenCV also run on the GPU or is it that NNPack uses the GPU and OCV runs on the CPU?

I imagine the best speed would be NNPack for Neon on CPU and then OpenCV on the GPU, to free up the resource contention during video mode?

Thank you!

Problem with classifier

Hello.

YOLOv3-tiny detector works just fine. Despite that, I have a problem with classifier.

On AlexeyAB branch I can run classifier (cfg/tiny.cfg + tiny.weights) without any problems.
On this branch, not - I am either getting segmentation fault with command "classifier test" or infinite waiting for result after the network loads with command "classify".

I am testing it on command:
./darknet classify cfg/tiny.cfg tiny.weights data/dog,jpg

After the weights load, I am getting
Loading weights from tiny.weights... (Version 1) Done!
And nothing happens after that.

I am working on IMX6 board, on ubuntu 16.

Split out OpenCV and Darknet

Is it possible to run Darknet on CPU, while running OpenCV on the GPU\QPU? Separating them out could yield big improvements, especially for video.

Segmentation fault error

Hi, While detection on raspberry pi, I can run detection with 1.8 fps on yolov2-tiny-voc, but when trying with custom object trained weights, I am getting error like Segmentation fault.

poor detection on custom dataset

Ive trained my model on a different machine and used the weights file on my raspberry pi 4.
Im using opencv=1 and openmp=1 could not get NNPACK working properly.
But im getting poor detections when i try my model.
when i run the same model on colab i get around 90% confidence score but the same image on my raspberry pi gives only 25% confidence score.
@shizukachan Any ideas what could be the issue ?

How to ban the windows of output

The job You done is great.
But there is a question. Since I couldn't use detector.py directly（undefined symbols: nnp_convolutional_inference), I don't know other ways to use the yolo not with the cmd.

Could you tell what I could do?
I had tried to compiled the darknet.c again but there are many errors( Hart to split out the messages for me).

Implement XNOR?

The latest AlexeyAB supports XNOR. Do you plan to update this NNPack repo to allow XNOR? Perhaps that would speed it up even faster for Raspberry Pi?

See AlexeyAB/darknet#2382

Raspberry Pi Zero (W) - explicit instructions/ modifications required

Hi.
I'm trying to get this working on a Pi Zero. I see you make reference to modifications which need to be made "...It's also recommended to examine and edit https://github.com/digitalbrain79/NNPACK-darknet/blob/master/src/init.c#L215 to match your CPU architecture if you're on ARM..." but new to this and I'm having trouble. Could you please spell put the required modifications explicitly please.
Appreciate any efforts to help me.

Create a .txt file with the predictions a time stamp

Hello everyone,
I am a newbie on Raspberry and python projects.

I was messing around with YOLO and I have a question:

Is there a way to use the original -ext_output from YOLO on Raspberry to get the .txt file with the predictions?

Thank you in advance.

When I enable the NNPACK option, the network 's output is incorrect.

When I enable the NNPACK option, the network 's output is incorrect. And I disable NNPACK, the result is right.

YOLOv3-Tiny VOC Weights

Hello,

I am looking for the YOLOv3-Tiny VOC weights. I could not find it anywhere, but I a reference to the model in this model. Does anyone have hte YOLOv3-Tiny VOC weights.

Not getting same precision as Original darknet

i have trained my network in AlexeyAB repo, but for some reason im not getting same precision on inference when running with darknet nnpack.. is it because of the NNPACK optimization or what? should i train on the pjreddle darknet instead?

Odroid support?

I was wondering if this would work on a ODROID XU4 platform? has anyone tried yet?
this board uses neonV2 and VFPv4

@shizukachan thanks for your effords!

Not getting same results

Raspberry pi 4 takes 2.9 seconds to predict on the yolov3-tiny model.
Any idea why its taking this long with opencv,NNPACK,NNPACK_FAST,ARM NEON and OPENMP all = 1

Segmentation fault

Hi, thank you so much for the great work! It's very helpful.

I am getting a "Segmentation fault" when running ./darknet detector test cfg/voc.data ~/Downloads/yolov3-tiny.cfg ~/Downloads/yolov3-tiny.weights data/person.jpg and ./darknet detector test cfg/voc.data ~/Downloads/yolov2-tiny.cfg ~/Downloads/yolov2-tiny.weights data/person.jpg. Have you encountered it before by any chance? Do you know how I should go about fixing it?

Thank you in advance!

Question: Will this work with custom trained tiny-yolov3 on Raspberry Pi 4b?

I'm using a Raspberry Pi 4b with Raspbian Buster OS and I am training tiny-yolov3 on a custom dataset following these instructions from AlexeyAB's darknet fork. Will this darknet-nnpack version work with my custom trained tiny-yolov3 on the RPi?

can I take advantage of gpu with this implementation?

Hi everyone,
Thanks for this implementation. I'm thinking of developing a darknet variant along with nnpack and opencl to utilize all the resources for 'full acceleration' to achieve higher fps for opencl supported arm socs.
I appreciate your suggestions and comments.
Thanks!!

Problems in replicating the performance

Hello!
I'm trying to replicate your experiment: Raspberry Pi 3, Darknet19, NNPACK=1,ARM_NEON=1,NNPACK_FAST=1, all other switches are off (GPU, CUDNN, OPENCV, OPENMP, DEBUG, QPU_GEMM)

I should get: 1.3 (first frame), 0.66 (subsequent frames)
But I get: 6 seconds for the first frame and 3 for the subsequent ones, that is 4.5 times slower.

Do you think I'm missing some step in configuration? Did I set the wrong switches? Do I need to modify something in particular in NNPACK's init.c?

Thank you!

NNPACK error (50) when trying to run from python

I'm trying to run darknet from python on raspberry pi 3 using rasPi camera and opencv and after loading weights returns the following error:

pi@raspberrypi:~/darknet-nnpack $ python dare.py
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
15 detection
mask_scale: Using default '1.000000'
Loading weights from tiny-yolo-voc.weights... (Version 1) Done!
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)

But when I run darknet from terminal everything works fine, included real time stream from webcam

This is the code for webcam

import math
import random
import cv2

def sample(probs):
	s = sum(probs)
	probs = [a/s for a in probs]
	r = random.uniform(0, 1)
	for i in range(len(probs)):
		r = r - probs[i]
		if r <= 0:
			return i
	return len(probs)-1

def c_array(ctype, values):
	return (ctype * len(values))(*values)

class BOX(Structure):
	_fields_ = [("x", c_float),
				("y", c_float),
				("w", c_float),
				("h", c_float)]

class IMAGE(Structure):
	_fields_ = [("w", c_int),
				("h", c_int),
				("c", c_int),
				("data", POINTER(c_float))]

class METADATA(Structure):
	_fields_ = [("classes", c_int),
				("names", POINTER(c_char_p))]

	

#lib = CDLL("/home/pjreddie/documents/darknet/libdarknet.so", RTLD_GLOBAL)
lib = CDLL("./libdarknet.so", RTLD_GLOBAL)
lib.network_width.argtypes = [c_void_p]
lib.network_width.restype = c_int
lib.network_height.argtypes = [c_void_p]
lib.network_height.restype = c_int

predict = lib.network_predict
predict.argtypes = [c_void_p, POINTER(c_float)]
predict.restype = POINTER(c_float)

set_gpu = lib.cuda_set_device
set_gpu.argtypes = [c_int]

make_image = lib.make_image
make_image.argtypes = [c_int, c_int, c_int]
make_image.restype = IMAGE

make_boxes = lib.make_boxes
make_boxes.argtypes = [c_void_p]
make_boxes.restype = POINTER(BOX)

free_ptrs = lib.free_ptrs
free_ptrs.argtypes = [POINTER(c_void_p), c_int]

num_boxes = lib.num_boxes
num_boxes.argtypes = [c_void_p]
num_boxes.restype = c_int

make_probs = lib.make_probs
make_probs.argtypes = [c_void_p]
make_probs.restype = POINTER(POINTER(c_float))

detect = lib.network_predict
detect.argtypes = [c_void_p, IMAGE, c_float, c_float, c_float, POINTER(BOX), POINTER(POINTER(c_float))]

reset_rnn = lib.reset_rnn
reset_rnn.argtypes = [c_void_p]

load_net = lib.load_network
load_net.argtypes = [c_char_p, c_char_p, c_int]
load_net.restype = c_void_p

free_image = lib.free_image
free_image.argtypes = [IMAGE]

letterbox_image = lib.letterbox_image
letterbox_image.argtypes = [IMAGE, c_int, c_int]
letterbox_image.restype = IMAGE

load_meta = lib.get_metadata
lib.get_metadata.argtypes = [c_char_p]
lib.get_metadata.restype = METADATA

load_image = lib.load_image_color
load_image.argtypes = [c_char_p, c_int, c_int]
load_image.restype = IMAGE

rgbgr_image = lib.rgbgr_image
rgbgr_image.argtypes = [IMAGE]

predict_image = lib.network_predict_image
predict_image.argtypes = [c_void_p, IMAGE]
predict_image.restype = POINTER(c_float)

network_detect = lib.network_detect
network_detect.argtypes = [c_void_p, IMAGE, c_float, c_float, c_float, POINTER(BOX), POINTER(POINTER(c_float))]

set_batch_network = lib.set_batch_network
set_batch_network.argtypes = [c_void_p, c_int]

srand = lib.srand
srand.argtypes = [c_int]

nnp_initialize = lib.nnp_initialize

def classify(net, meta, im):
	out = predict_image(net, im)
	res = []
	for i in range(meta.classes):
		res.append((meta.names[i], out[i]))
	res = sorted(res, key=lambda x: -x[1])
	return res

def detect(net, meta, image, thresh=.5, hier_thresh=.5, nms=.45):
	#im = load_image(image, 0, 0)
	boxes = make_boxes(net)
	probs = make_probs(net)
	num =   num_boxes(net)
	network_detect(net, im, thresh, hier_thresh, nms, boxes, probs)
	res = []
	for j in range(num):
		for i in range(meta.classes):
			if probs[j][i] > 0:
				res.append((meta.names[i], probs[j][i], (boxes[j].x, boxes[j].y, boxes[j].w, boxes[j].h)))
	res = sorted(res, key=lambda x: -x[1])
	#free_image(im)
	#free_ptrs(cast(probs, POINTER(c_void_p)), num)
	return res

def array_to_image(arr):
    arr = arr.transpose(2,0,1)
    c = arr.shape[0]
    h = arr.shape[1]
    w = arr.shape[2]
    arr = (arr/255.0).flatten()
    data = c_array(c_float, arr)
    im = IMAGE(w,h,c,data)
    return im
	
if __name__ == "__main__":
	net = load_net("cfg/tiny-yolo-voc.cfg", "tiny-yolo-voc.weights", 0)
	meta = load_meta("cfg/voc.data")
	cap = cv2.VideoCapture(0)
	while True:
		ret, frame = cap.read()
		arr = frame
		im = array_to_image(arr)
		rgbgr_image(im)
		r = detect(net, meta, im)
		print r
		if cv2.waitKey(1) & 0xFF == ord('q'):
			break

Makefile:114: recipe for target 'obj/gemm.o' failed for S5P6818 (Cortex-A53)

Hardware: NanoPi3 Fire
CPU: Samsung S5P6818 Octa-Core Cortex-A53
OpenCV: 3.4.2
gcc version 4.9.2 (Debian 4.9.2-10+deb8u1)
Linux NanoPi3 3.4.39-s5p6818 #4 SMP PREEMPT Tue Sep 12 11:40:09 HKT 2017 armv7l GNU/Linux

Makefile settings
GPU=0
CUDNN=0
OPENCV=1 (same error if OPENCV=0)
NNPACK=1
NNPACK_FAST=1
ARM_NEON=1
OPENMP=0
DEBUG=0
QPU_GEMM=0

Output:

gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -O0 -g -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/gemm.c -o obj/gemm.o
*** Error in `gcc': double free or corruption (top): 0x000b6198 ***
Aborted
Makefile:114: recipe for target 'obj/gemm.o' failed
make: *** [obj/gemm.o] Error 134

Is there a setting I can use to get this to compile for S5P6818 (Cortex-A53)?

NNpack questions

First of all, great work. This application is faster on a PI 3. I would like to use it but have a few questions. Please note that I have never used c++, but not afraid to get my hands dirty.

1;) In the documentation you say that subsequent frames take less time than the first frame. May you please elaborate how can that be achieved because it seems like the weights are loaded from the file every time.

2.) I would like to make an addon so that I can use it in my node application. May you please advice on how to proceed. Not on making the addon, but ways to ensure that the application is not always loaded and advice in general to make it run smooth. I think just on making a simple main.cpp where I can call and use the relevant functions.

3.) There are alot of outputs but most will not be needed for my task. Which file sshould I look at if I just want to return a json/object/whatever with all the information (name, percenatge, x, y, width ...) and remove all the other information.

Thank you and great work.

Running in multi camera problem

hi,
I'm running this in nanopi m4v2 and it have multiple camera support, I connected and usb webcam and it get index 10, when i try to run darknet in webcam mode I got this error:

Loading weights from yolov3-tiny.weights... (Version 2) Done!
HIGHGUI ERROR: V4L: index 10 is not correct!
GStreamer Plugin: Embedded video playback halted; module v4l2src0 reported: Internal data stream error.
OpenCV Error: Unspecified error (GStreamer: unable to start pipeline
) in icvStartPipeline, file /home/pi/opencv-2.4/opencv-2.4/modules/highgui/src/cap_gstreamer.cpp, line 386
terminate called after throwing an instance of 'cv::Exception'
  what():  /home/pi/opencv-2.4/opencv-2.4/modules/highgui/src/cap_gstreamer.cpp:386: error: (-2) GStreamer: unable to start pipeline
 in function icvStartPipeline

Aborted (core dumped)

how to solve this?
thanks

undefined reference to aligned_alloc when DARKNET_NNPACK=ON

Hello, I am trying to run darknet-nnpack on android (aarch64-v8a). However, I receive compiler error:

darknet-nnpack/src/convolutional_layer.c:(.text.forward_convolutional_layer_nnpack+0xfc): undefined reference to `aligned_alloc'
clang38: error: linker command failed with exit code 1 (use -v to see invocation)

I am using aarch64-linux-android-clang and aarch64-linux-android-clang++
I added android-ndk directory that contains cstdlib and added header "include in convolution_layer.c, then it also shows error.

How did you compile the repo with DARKNET_NNPACK=ON ?
thank you.

cd to home folder before every git command?

Hello I am noob.
Is it cd to Home folder/directory before any git clone?

NNpack build issue on raspberry pi 4

Following your guide on raspberry pi 4, but NNpack won't build.

Can you help me please with that issue?

subporcess issue when running the test yolo code

Hello there

I just started recently learning about YOLO and how to implement it and hopefully contribute to improve it.

I tried running the darknet-nnpack procedures and everything was okay until when i tried to test the python code rpi_record.py, then I ran into the following error:-

I'm running python 2.7.16

pi@raspberrypi:~/darknet-nnpack $ sudo python rpi_record.py
Traceback (most recent call last):
File "rpi_record.py", line 26, in
stdin = PIPE, stdout = PIPE)
File "/usr/lib/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Could you please help to fix this issue. BTW I'm capturing an image everytime i run the code (in the directory only). Also, I'm using raspberry pi camera v2.

Build fails with ARM_NEON=1 on RPi with Aarch64 Bullseye

I am building darknet-nnpack on RPi with OpenCV3.4. The build was successful with ARM_NEON=0 , however the inference time was same with the original darknet for yolov3-tiny network which was around 8sec.

However the build fails with ARM_NEON=1 with below error:


gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv`  -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/gemm.c -o obj/gemm.o
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv`  -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/utils.c -o obj/utils.o
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv`  -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/cuda.c -o obj/cuda.o
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv`  -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/deconvolutional_layer.c -o obj/deconvolutional_layer.o
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
make: *** [Makefile:114: obj/gemm.o] Error 1
make: *** Waiting for unfinished jobs....
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
make: *** [Makefile:114: obj/utils.o] Error 1
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
make: *** [Makefile:114: obj/deconvolutional_layer.o] Error 1
make: *** [Makefile:114: obj/cuda.o] Error 1

A simple google search tells that NEON might be used with different set of flags from here .

Do I need to update the Makefile or downgrade gcc for the library to be built correctly?

shizukachan / darknet-nnpack Goto Github PK

darknet-nnpack's People

Contributors

Stargazers

Watchers

Forkers

darknet-nnpack's Issues

I'm trying to run darknet from python on raspberry pi 3 using rasPi camera and opencv and after loading weights returns the following error:

But when I run darknet from terminal everything works fine, included real time stream from webcam

This is the code for webcam

Recommend Projects

Recommend Topics

Recommend Org