shizukachan / darknet-nnpack Goto Github PK
View Code? Open in Web Editor NEWFork of darknet-nnpack
License: Other
Fork of darknet-nnpack
License: Other
currently yolov3 supported on yolov3 branch.
Tested on AMD64; needs to be regression tested on Jetson (aarch64-neon), Pi3 (neon), and Pi0 (scalar)
I won't push changes until validated.
@ Install qmkl
sudo apt-get install flex
git clone https://github.com/Terminus-IMRC/qpu-assembler2
cd qpu-assembler2
cmake .
make
I got this error
pi@raspberrypi:~/qmkl $ make
make: *** No targets specified and no makefile found. Stop.
When running NNPack on the Pi GPU, for video, does OpenCV also run on the GPU or is it that NNPack uses the GPU and OCV runs on the CPU?
I imagine the best speed would be NNPack for Neon on CPU and then OpenCV on the GPU, to free up the resource contention during video mode?
Thank you!
Hello.
YOLOv3-tiny detector works just fine. Despite that, I have a problem with classifier.
On AlexeyAB branch I can run classifier (cfg/tiny.cfg + tiny.weights) without any problems.
On this branch, not - I am either getting segmentation fault with command "classifier test" or infinite waiting for result after the network loads with command "classify".
I am testing it on command:
./darknet classify cfg/tiny.cfg tiny.weights data/dog,jpg
After the weights load, I am getting
Loading weights from tiny.weights... (Version 1) Done!
And nothing happens after that.
I am working on IMX6 board, on ubuntu 16.
Is it possible to run Darknet on CPU, while running OpenCV on the GPU\QPU? Separating them out could yield big improvements, especially for video.
Hi, While detection on raspberry pi, I can run detection with 1.8 fps on yolov2-tiny-voc, but when trying with custom object trained weights, I am getting error like Segmentation fault.
Ive trained my model on a different machine and used the weights file on my raspberry pi 4.
Im using opencv=1 and openmp=1 could not get NNPACK working properly.
But im getting poor detections when i try my model.
when i run the same model on colab i get around 90% confidence score but the same image on my raspberry pi gives only 25% confidence score.
@shizukachan Any ideas what could be the issue ?
The job You done is great.
But there is a question. Since I couldn't use detector.py directly(undefined symbols: nnp_convolutional_inference), I don't know other ways to use the yolo not with the cmd.
Could you tell what I could do?
I had tried to compiled the darknet.c again but there are many errors( Hart to split out the messages for me).
The latest AlexeyAB supports XNOR. Do you plan to update this NNPack repo to allow XNOR? Perhaps that would speed it up even faster for Raspberry Pi?
Hi.
I'm trying to get this working on a Pi Zero. I see you make reference to modifications which need to be made "...It's also recommended to examine and edit https://github.com/digitalbrain79/NNPACK-darknet/blob/master/src/init.c#L215 to match your CPU architecture if you're on ARM..." but new to this and I'm having trouble. Could you please spell put the required modifications explicitly please.
Appreciate any efforts to help me.
Hello everyone,
I am a newbie on Raspberry and python projects.
I was messing around with YOLO and I have a question:
Is there a way to use the original -ext_output from YOLO on Raspberry to get the .txt file with the predictions?
Thank you in advance.
Hello,
I am looking for the YOLOv3-Tiny VOC weights. I could not find it anywhere, but I a reference to the model in this model. Does anyone have hte YOLOv3-Tiny VOC weights.
Hi
i have trained my network in AlexeyAB repo, but for some reason im not getting same precision on inference when running with darknet nnpack.. is it because of the NNPACK optimization or what? should i train on the pjreddle darknet instead?
I was wondering if this would work on a ODROID XU4 platform? has anyone tried yet?
this board uses neonV2 and VFPv4
@shizukachan thanks for your effords!
Raspberry pi 4 takes 2.9 seconds to predict on the yolov3-tiny model.
Any idea why its taking this long with opencv,NNPACK,NNPACK_FAST,ARM NEON and OPENMP all = 1
Hi, thank you so much for the great work! It's very helpful.
I am getting a "Segmentation fault" when running ./darknet detector test cfg/voc.data ~/Downloads/yolov3-tiny.cfg ~/Downloads/yolov3-tiny.weights data/person.jpg
and ./darknet detector test cfg/voc.data ~/Downloads/yolov2-tiny.cfg ~/Downloads/yolov2-tiny.weights data/person.jpg
. Have you encountered it before by any chance? Do you know how I should go about fixing it?
Thank you in advance!
I'm using a Raspberry Pi 4b with Raspbian Buster OS and I am training tiny-yolov3 on a custom dataset following these instructions from AlexeyAB's darknet fork. Will this darknet-nnpack version work with my custom trained tiny-yolov3 on the RPi?
Hi everyone,
Thanks for this implementation. I'm thinking of developing a darknet variant along with nnpack and opencl to utilize all the resources for 'full acceleration' to achieve higher fps for opencl supported arm socs.
I appreciate your suggestions and comments.
Thanks!!
Hello!
I'm trying to replicate your experiment: Raspberry Pi 3, Darknet19, NNPACK=1,ARM_NEON=1,NNPACK_FAST=1, all other switches are off (GPU, CUDNN, OPENCV, OPENMP, DEBUG, QPU_GEMM)
I should get: 1.3 (first frame), 0.66 (subsequent frames)
But I get: 6 seconds for the first frame and 3 for the subsequent ones, that is 4.5 times slower.
Do you think I'm missing some step in configuration? Did I set the wrong switches? Do I need to modify something in particular in NNPACK's init.c?
Thank you!
pi@raspberrypi:~/darknet-nnpack $ python dare.py
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024
14 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
15 detection
mask_scale: Using default '1.000000'
Loading weights from tiny-yolo-voc.weights... (Version 1) Done!
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
NNPACK error! (50)
import math
import random
import cv2
def sample(probs):
s = sum(probs)
probs = [a/s for a in probs]
r = random.uniform(0, 1)
for i in range(len(probs)):
r = r - probs[i]
if r <= 0:
return i
return len(probs)-1
def c_array(ctype, values):
return (ctype * len(values))(*values)
class BOX(Structure):
_fields_ = [("x", c_float),
("y", c_float),
("w", c_float),
("h", c_float)]
class IMAGE(Structure):
_fields_ = [("w", c_int),
("h", c_int),
("c", c_int),
("data", POINTER(c_float))]
class METADATA(Structure):
_fields_ = [("classes", c_int),
("names", POINTER(c_char_p))]
#lib = CDLL("/home/pjreddie/documents/darknet/libdarknet.so", RTLD_GLOBAL)
lib = CDLL("./libdarknet.so", RTLD_GLOBAL)
lib.network_width.argtypes = [c_void_p]
lib.network_width.restype = c_int
lib.network_height.argtypes = [c_void_p]
lib.network_height.restype = c_int
predict = lib.network_predict
predict.argtypes = [c_void_p, POINTER(c_float)]
predict.restype = POINTER(c_float)
set_gpu = lib.cuda_set_device
set_gpu.argtypes = [c_int]
make_image = lib.make_image
make_image.argtypes = [c_int, c_int, c_int]
make_image.restype = IMAGE
make_boxes = lib.make_boxes
make_boxes.argtypes = [c_void_p]
make_boxes.restype = POINTER(BOX)
free_ptrs = lib.free_ptrs
free_ptrs.argtypes = [POINTER(c_void_p), c_int]
num_boxes = lib.num_boxes
num_boxes.argtypes = [c_void_p]
num_boxes.restype = c_int
make_probs = lib.make_probs
make_probs.argtypes = [c_void_p]
make_probs.restype = POINTER(POINTER(c_float))
detect = lib.network_predict
detect.argtypes = [c_void_p, IMAGE, c_float, c_float, c_float, POINTER(BOX), POINTER(POINTER(c_float))]
reset_rnn = lib.reset_rnn
reset_rnn.argtypes = [c_void_p]
load_net = lib.load_network
load_net.argtypes = [c_char_p, c_char_p, c_int]
load_net.restype = c_void_p
free_image = lib.free_image
free_image.argtypes = [IMAGE]
letterbox_image = lib.letterbox_image
letterbox_image.argtypes = [IMAGE, c_int, c_int]
letterbox_image.restype = IMAGE
load_meta = lib.get_metadata
lib.get_metadata.argtypes = [c_char_p]
lib.get_metadata.restype = METADATA
load_image = lib.load_image_color
load_image.argtypes = [c_char_p, c_int, c_int]
load_image.restype = IMAGE
rgbgr_image = lib.rgbgr_image
rgbgr_image.argtypes = [IMAGE]
predict_image = lib.network_predict_image
predict_image.argtypes = [c_void_p, IMAGE]
predict_image.restype = POINTER(c_float)
network_detect = lib.network_detect
network_detect.argtypes = [c_void_p, IMAGE, c_float, c_float, c_float, POINTER(BOX), POINTER(POINTER(c_float))]
set_batch_network = lib.set_batch_network
set_batch_network.argtypes = [c_void_p, c_int]
srand = lib.srand
srand.argtypes = [c_int]
nnp_initialize = lib.nnp_initialize
def classify(net, meta, im):
out = predict_image(net, im)
res = []
for i in range(meta.classes):
res.append((meta.names[i], out[i]))
res = sorted(res, key=lambda x: -x[1])
return res
def detect(net, meta, image, thresh=.5, hier_thresh=.5, nms=.45):
#im = load_image(image, 0, 0)
boxes = make_boxes(net)
probs = make_probs(net)
num = num_boxes(net)
network_detect(net, im, thresh, hier_thresh, nms, boxes, probs)
res = []
for j in range(num):
for i in range(meta.classes):
if probs[j][i] > 0:
res.append((meta.names[i], probs[j][i], (boxes[j].x, boxes[j].y, boxes[j].w, boxes[j].h)))
res = sorted(res, key=lambda x: -x[1])
#free_image(im)
#free_ptrs(cast(probs, POINTER(c_void_p)), num)
return res
def array_to_image(arr):
arr = arr.transpose(2,0,1)
c = arr.shape[0]
h = arr.shape[1]
w = arr.shape[2]
arr = (arr/255.0).flatten()
data = c_array(c_float, arr)
im = IMAGE(w,h,c,data)
return im
if __name__ == "__main__":
net = load_net("cfg/tiny-yolo-voc.cfg", "tiny-yolo-voc.weights", 0)
meta = load_meta("cfg/voc.data")
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
arr = frame
im = array_to_image(arr)
rgbgr_image(im)
r = detect(net, meta, im)
print r
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Hardware: NanoPi3 Fire
CPU: Samsung S5P6818 Octa-Core Cortex-A53
OpenCV: 3.4.2
gcc version 4.9.2 (Debian 4.9.2-10+deb8u1)
Linux NanoPi3 3.4.39-s5p6818 #4 SMP PREEMPT Tue Sep 12 11:40:09 HKT 2017 armv7l GNU/Linux
Makefile settings
GPU=0
CUDNN=0
OPENCV=1 (same error if OPENCV=0)
NNPACK=1
NNPACK_FAST=1
ARM_NEON=1
OPENMP=0
DEBUG=0
QPU_GEMM=0
Output:
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv
-DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -O0 -g -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/gemm.c -o obj/gemm.o
*** Error in `gcc': double free or corruption (top): 0x000b6198 ***
Aborted
Makefile:114: recipe for target 'obj/gemm.o' failed
make: *** [obj/gemm.o] Error 134
Is there a setting I can use to get this to compile for S5P6818 (Cortex-A53)?
First of all, great work. This application is faster on a PI 3. I would like to use it but have a few questions. Please note that I have never used c++, but not afraid to get my hands dirty.
1;) In the documentation you say that subsequent frames take less time than the first frame. May you please elaborate how can that be achieved because it seems like the weights are loaded from the file every time.
2.) I would like to make an addon so that I can use it in my node application. May you please advice on how to proceed. Not on making the addon, but ways to ensure that the application is not always loaded and advice in general to make it run smooth. I think just on making a simple main.cpp where I can call and use the relevant functions.
3.) There are alot of outputs but most will not be needed for my task. Which file sshould I look at if I just want to return a json/object/whatever with all the information (name, percenatge, x, y, width ...) and remove all the other information.
Thank you and great work.
hi,
I'm running this in nanopi m4v2 and it have multiple camera support, I connected and usb webcam and it get index 10, when i try to run darknet in webcam mode I got this error:
Loading weights from yolov3-tiny.weights... (Version 2) Done!
HIGHGUI ERROR: V4L: index 10 is not correct!
GStreamer Plugin: Embedded video playback halted; module v4l2src0 reported: Internal data stream error.
OpenCV Error: Unspecified error (GStreamer: unable to start pipeline
) in icvStartPipeline, file /home/pi/opencv-2.4/opencv-2.4/modules/highgui/src/cap_gstreamer.cpp, line 386
terminate called after throwing an instance of 'cv::Exception'
what(): /home/pi/opencv-2.4/opencv-2.4/modules/highgui/src/cap_gstreamer.cpp:386: error: (-2) GStreamer: unable to start pipeline
in function icvStartPipeline
Aborted (core dumped)
how to solve this?
thanks
Hello, I am trying to run darknet-nnpack on android (aarch64-v8a). However, I receive compiler error:
darknet-nnpack/src/convolutional_layer.c:(.text.forward_convolutional_layer_nnpack+0xfc): undefined reference to `aligned_alloc'
clang38: error: linker command failed with exit code 1 (use -v to see invocation)
I am using aarch64-linux-android-clang and aarch64-linux-android-clang++
I added android-ndk directory that contains cstdlib and added header "include in convolution_layer.c, then it also shows error.
How did you compile the repo with DARKNET_NNPACK=ON ?
thank you.
Hello I am noob.
Is it cd to Home folder/directory before any git clone?
Hello there
I just started recently learning about YOLO and how to implement it and hopefully contribute to improve it.
I tried running the darknet-nnpack procedures and everything was okay until when i tried to test the python code rpi_record.py, then I ran into the following error:-
I'm running python 2.7.16
pi@raspberrypi:~/darknet-nnpack $ sudo python rpi_record.py
Traceback (most recent call last):
File "rpi_record.py", line 26, in
stdin = PIPE, stdout = PIPE)
File "/usr/lib/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Could you please help to fix this issue. BTW I'm capturing an image everytime i run the code (in the directory only). Also, I'm using raspberry pi camera v2.
I am building darknet-nnpack on RPi with OpenCV3.4. The build was successful with ARM_NEON=0
, however the inference time was same with the original darknet for yolov3-tiny network which was around 8sec.
However the build fails with ARM_NEON=1
with below error:
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv` -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/gemm.c -o obj/gemm.o
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv` -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/utils.c -o obj/utils.o
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv` -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/cuda.c -o obj/cuda.o
gcc -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv` -DNNPACK -DNNPACK_FAST -DARM_NEON -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -march=native -Ofast -DOPENCV -DNNPACK -DNNPACK_FAST -DARM_NEON -mfpu=neon-vfpv4 -funsafe-math-optimizations -ftree-vectorize -c ./src/deconvolutional_layer.c -o obj/deconvolutional_layer.o
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
make: *** [Makefile:114: obj/gemm.o] Error 1
make: *** Waiting for unfinished jobs....
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
make: *** [Makefile:114: obj/utils.o] Error 1
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
gcc: error: unrecognized command-line option ‘-mfpu=neon-vfpv4’
make: *** [Makefile:114: obj/deconvolutional_layer.o] Error 1
make: *** [Makefile:114: obj/cuda.o] Error 1
A simple google search tells that NEON might be used with different set of flags from here .
Do I need to update the Makefile
or downgrade gcc for the library to be built correctly?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.