doonny / pipecnn Goto Github PK

An OpenCL-based FPGA Accelerator for Convolutional Neural Networks

License: Apache License 2.0

C++ 26.88% Makefile 2.11% C 32.70% Shell 8.81% Common Lisp 0.02% Verilog 21.03% Tcl 8.23% Batchfile 0.01% Perl 0.20% Fortran 0.01%

opencl fpga-accelerator hls hardware altera-opencl-sdk fpga deep-learning deep-neural-networks

pipecnn's People

Contributors

Stargazers

Watchers

Forkers

zhangyiii kafkafield hbhbts agoniii hackeren pooluut iefiac aicarmark biglii takeshineshiro zxustc pn-junction davidsai zhl95 arasharchor feixianhu jingsnow fantasy-fibonacci ycchangtw nlpguyz saadmahboob yurikaka amirzainol gandautama sammy17 xymeng16 davenso mpmoturi xuke225 tmxklqy shi27feng nafest liubinfeng xuguozhi jiancunwang askfor2000 winthor666 huipengzhang carollee1993 skylook zwt5858 dg9j ddsxh cuihaoran sandyhouse suisuisi fengzhangcs xywang93 yiwenlu66 allahyarzadeh bdybb00 yuechengli hoangt denethor1997 shuixo lilyevanshogwarts hwpro2017 sh1neon yaohufcs haleski47 eddieburning jeonggunlee brucezhan01 kouyss lcf2016 petitobus bhaskarnallani xingqianli microratay randysuen vnaveen0 huoqiang1993 qwik-e-crypto erkang nightheronry gyb1325 gittyhub2018 sdnuzwk lab930boss leeeexp reiisky 5for3to1 d123456ddq yztong wangqqhust debughacker zavierhan hulksun myih liyang53719 innothoughts liosure siriuxyfliu chybhao666 ossdc yuchen5098 ivy921 shiyangcool shrivinayak briansune

pipecnn's Issues

error: Channel support is not enabled

Hi,I compiled project with Intel OpenCL SDK for FPGA 17.1 and get the following error
In file included from /home/wangjf/PipeCNN/project/__all_sources.cl:2:
PipeCNN/project/device/conv_pipe.cl:75:24: error: Channel support is not enabled
channel channel_vec data_ch attribute((depth(0)));

Anyone got an idea? Thanks in advance!!

Floating point and fixed point

hi Prof @doonny , I noticed the current version is in fixed point mac. And how was the resource utilization compared to Floating point mac?
What is the Pipe depth ?
#define PIPE_DEPTH 6

Configuration for pool_size and conv_stride

Since the width and height of a pooling rectangle could be different, it seems pool_size could not cover this sort of case.
So does it for conv_stride if the stride size varies for width and height.

Who can supply thelicense of Intel FPGA SDK for OpenCL?

Why LRN in software

Any particular reason for the choice of implementing LRN on CPU and not FPGA (and gain the acceleration benefits of the FPGA)?

How to reduce RAMB utilization

Dear doonny,
I'm trying to test PipeCNN framework on some Xilinx's FPGA embedded boards for take some power measurement. Actually I would like to compile the framework for Digilent ZedBoard but the synthetized program is too large for this FPGA platform, the XOCC compiler returns this error:

297 RAMB18 and RAMB36/FIFO required but only 280

Could you give me some hints for reducing the BRAM utilization?

Thank You

Can't find arm32 lib when make host for de1soc

When compiling de1soc host ,can't find arm lib. Maybe ,modify Makefile Line 73 74
$(shell aocl compile-config) --> $(shell aocl compile-config --arm)
$(shell aocl link-config) --> $(shell aocl link-config--arm)
windows64,Quartus 16.1 and compile with SoC EDS 16.1 Command Shell

Best Parameter Setting on KCU1500 Board?

Hello, Prof. Wang.

I'm trying to run on KCU1500 board.
What is a best parameter setting of VEC_SIZE, LANE_NUM, CONV_GP_SIZE_X?

When I tried VEC_SIZE=8, LANE_NUM=16, then an error occurred like as below.

Thank you.

theoretical exaplanation of Host code.

I am trying to understand the host and fpga code as well. However I am not able to link it with available sources, I have.
Can you please provide or suggest me and other viewer on this topic? Can you provide any link based on which you have written your code?

Build failing for Xilinx flow (hardware accelerator integration stage)

When I build, I get the following errors during hardware accelerator integration stage.

INFO: [XOCC 60-251]   Hardware accelerator integration...

===>The following messages were generated while processing /PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.sim/sim_1/behav :
ERROR: [XOCC 10-426] cannot find port pool_ch15_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:736]
ERROR: [XOCC 10-426] cannot find port pool_ch15_TVALID on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:735]
ERROR: [XOCC 10-426] cannot find port pool_ch15_TDATA on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:734]
ERROR: [XOCC 10-426] cannot find port pool_ch14_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:733]
ERROR: [XOCC 10-426] cannot find port pool_ch14_TVALID on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:732]
ERROR: [XOCC 10-426] cannot find port pool_ch14_TDATA on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:731]
ERROR: [XOCC 10-426] cannot find port pool_ch13_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:730]
ERROR: [XOCC 10-426] cannot find port pool_ch13_TVALID on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:729]
ERROR: [XOCC 10-426] cannot find port pool_ch13_TDATA on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:728]
ERROR: [XOCC 10-426] cannot find port pool_ch12_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:727]
ERROR: [XOCC 10-426] cannot find port pool_ch12_TVALID on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:726]
ERROR: [XOCC 10-426] cannot find port pool_ch12_TDATA on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:725]
ERROR: [XOCC 10-426] cannot find port pool_ch11_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:724]
ERROR: [XOCC 10-426] cannot find port pool_ch11_TVALID on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:723]
ERROR: [XOCC 10-426] cannot find port pool_ch11_TDATA on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:722]
ERROR: [XOCC 10-426] cannot find port pool_ch10_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:721]
ERROR: [XOCC 10-426] cannot find port pool_ch10_TVALID on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:720]
ERROR: [XOCC 10-426] cannot find port pool_ch10_TDATA on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:719]
ERROR: [XOCC 10-426] cannot find port pool_ch9_TREADY on this module [/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/ipiprj/ipiprj.srcs/sources_1/bd/dr/ipshared/8aa7/hdl/verilog/maxPool.v:718]
ERROR: [XOCC 43-3322] Static elaboration of top level Verilog design unit(s) in library work failed
ERROR: [XOCC 60-399] vivado failed, please see log file for detail: '/PipeCNN/Emulation-HW/_xocc_link_pipecnn/impl/build/hw_em/pipecnn/sv/pipecnn_ipi/vivado.log'
ERROR: [XOCC 60-626] Kernel link failed to complete
ERROR: [XOCC 60-703] Failed to finish linking
make: *** [pipecnn.xclbin] Error 1

21:27:33 Build Finished (took 10m:11s.280ms)

Xilinx flow

When #define XILINX is set, generates this error.

‘write_event’ was not declared in this scope
                             0 /* flags, 0 means from host*/,0, NULL,&write_event[i]);
                                                                      ^~~~~~~~~~~
../src/host/main.cpp:466:71: error: ‘write_event’ was not declared in this scope
                              0 /* flags, 0 means from host*/,0, NULL,&write_event[i]);
                                                                       ^~~~~~~~~~~
../src/host/main.cpp: In function ‘int prepare()’:
../src/host/main.cpp:1414:5: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
     else
     ^~~~
../src/host/main.cpp:1418:2: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
  for(unsigned n = 0; n<layer_config[0][data_n]/VEC_SIZE; n++){
  ^~~
make: *** [src/host/main.o] Error 1

What should I change in the code when I want to run CPU-Emulation for VGG16 in sdaccel

I uncomment the code for VGG16 and comment the code for AlexNet in the layer_config.h and the main.c.
The code following is what I change in main.c:

// AlexNet
// Original problem size
// File size is in num of DTYPE numbers
//#define IMAGE_FILE_SIZE   (227*227*3)
////#define WEIGHTS_FILE_SIZE 60965224 //fc8-1000
//#define WEIGHTS_FILE_SIZE 61063552  //fc8-1024
//#define LAYER_NUM         8
//#define CONV_NUM          5
//const char *weight_file_path = "./data/data_alex/weights.dat";
//const char *input_file_path = "./data/data_alex/image.dat";
//const char *ref_file_path = "./data/data_alex/fc8.dat";
//const char *dump_file_path = "./result_dump.txt";


// VGG16
// Original problem size
// File size is in num of DTYPE numbers
#define IMAGE_FILE_SIZE   (224*224*3)
#define WEIGHTS_FILE_SIZE 138455872  //fc8-1024
#define LAYER_NUM         16
#define CONV_NUM          13

const char *weight_file_path = "./data/data_vgg/weights.dat";
const char *input_file_path = "./data/data_vgg/image.dat";
const char *ref_file_path = "./data/data_vgg/fc8.dat";
const char *dump_file_path = "./result_dump.txt";

The code following is what I change in layer_config.h:

// Test with batch=1
// Alexnet Configuration
/*
unsigned layer_config[][NUM_CONFIG_ITEM] = {{0,
							227, 227, 3, 11, 11, 3, 96, 96,
							0,
							55, 55, 96, 4, 0, 0, 1,
							1, 27, 27, 96, 3, 2,
							1,
							1},//Layer-1
							{0,
							27, 27, 96, 5, 5, 48, 256, 256,
							0,
							27, 27, 256, 1, 2, 1, 1,
							1, 13, 13, 256, 3, 2,
							1,
							1},//Layer-2
							{0,
							13, 13, 256, 3, 3, 256, 384, 384,
							0,
							13, 13, 384, 1, 1, 0, 1,
							0, 13, 13, 384, 0, 0,
							0,
							1},//Layer-3
							{0,
							13, 13, 384, 3, 3, 192, 384, 384,
							1,
							13, 13, 384, 1, 1, 1, 1,
							0, 13, 13, 384, 0, 0,
							0,
							0},//Layer-4
							{0,
							13, 13, 384, 3, 3, 192, 256, 256,
							0,
							13, 13, 256, 1, 1, 1, 1,
							1, 6, 6, 256, 3, 2,
							0,
							2},//Layer-5  Note: for last conv layer, outputs are write to fc buffer
							{1,
							6, 6, 256, 6, 6, 256, 4096, 4096,  // Note: The input size (dim1/dim2) is the combined data size (batched)
							2,
							1, 1, 4096, 6, 0, 0, 1,
							0, 1, 1, 4096, 0, 0,
							0,
							3},//Layer-6 fc
							{1,
							1, 1, 4096, 1, 1, 4096, 4096, 4096,
							3,
							1, 1, 4096, 1, 0, 0, 1,
							0, 1, 1, 4096, 0, 0,
							0,
							2},//Layer-7 fc
							{1,
							1, 1, 4096, 1, 1, 4096, 1024, 1024,
							2,
							1, 1, 1024, 1, 0, 0, 0,
							0, 1, 1, 1024, 0, 0,
							0,
							3}//Layer-8 fc
							};

char precision_config[][3] ={{8,  0, -4},//Layer-1
							{ 8,  0, -2},//Layer-2
							{ 8,  0, -1},//Layer-3
							{ 8, -1, -1},//Layer-4
							{ 8, -1, -1},//Layer-5
							{11, -1,  0},//Layer-6
							{10,  0,  2},//Layer-7
							{10,  2,  2}//Layer-8
							};

unsigned input_config[5] = {227, 227, 3, 1}; //original image size(dim1, dim2, dim3), batch size

//unsigned output_config[3] = {27, 27, 96};//Layer-1
//unsigned output_config[3] = {55, 55, 96};//Layer-1

//unsigned output_config[3] = {13, 13, 256};//Layer-2

//unsigned output_config[3] = {6, 6, 256};//Layer-5

//unsigned output_config[3] = {1, 1, 4096};//Layer-6

unsigned output_config[3] = {1, 1, 1024};//Layer-8  Note: only one result is extracted and verified

*/


// Test with batch=1
// VGG-16 Configuration
unsigned layer_config[][NUM_CONFIG_ITEM] = {{0,
							224, 224, 3, 3, 3, 3, 64, 64,
							0,
							224, 224, 64, 1, 1, 0, 1,
							0, 224, 224, 64, 0, 0,
							0,
							1},//Layer-1 (conv1_1)
							{0,
							224, 224, 64, 3, 3, 64, 64, 64,
							1,
							224, 224, 64, 1, 1, 0, 1,
							1, 112, 112, 64, 2, 2,
							0,
							0},//Layer-2 (conv1_2)
							{0,
							112, 112, 64, 3, 3, 64, 128, 128,
							0,
							112, 112, 128, 1, 1, 0, 1,
							0, 112, 112, 128, 0, 0,
							0,
							1},//Layer-3 (conv2_1)
							{0,
							112, 112, 128, 3, 3, 128, 128, 128,
							1,
							112, 112, 128, 1, 1, 0, 1,
							1, 56, 56, 128, 2, 2,
							0,
							0},//Layer-4 (conv2_2)
							{0,
							56, 56, 128, 3, 3, 128, 256, 256,
							0,
							56, 56, 256, 1, 1, 0, 1,
							0, 56, 56, 256, 0, 0,
							0,
							1},//Layer-5 (conv3_1)
							{0,
							56, 56, 256, 3, 3, 256, 256, 256,
							1,
							56, 56, 256, 1, 1, 0, 1,
							0, 56, 56, 256, 0, 0,
							0,
							0},//Layer-6 (conv3_2)
							{0,
							56, 56, 256, 3, 3, 256, 256, 256,
							0,
							56, 56, 256, 1, 1, 0, 1,
							1, 28, 28, 256, 2, 2,
							0,
							1},//Layer-7 (conv3_3)
							{0,
							28, 28, 256, 3, 3, 256, 512, 512,
							1,
							28, 28, 512, 1, 1, 0, 1,
							0, 28, 28, 512, 0, 0,
							0,
							0},//Layer-8  (conv4_1)
							{0,
							28, 28, 512, 3, 3, 512, 512, 512,
							0,
							28, 28, 512, 1, 1, 0, 1,
							0, 28, 28, 512, 0, 0,
							0,
							1},//Layer-9  (conv4_2)
							{0,
							28, 28, 512, 3, 3, 512, 512, 512,
							1,
							28, 28, 512, 1, 1, 0, 1,
							1, 14, 14, 512, 2, 2,
							0,
							0},//Layer-10 (conv4_3)
							{0,
							14, 14, 512, 3, 3, 512, 512, 512,
							0,
							14, 14, 512, 1, 1, 0, 1,
							0, 14, 14, 512, 0, 0,
							0,
							1},//Layer-11  (conv5_1)
							{0,
							14, 14, 512, 3, 3, 512, 512, 512,
							1,
							14, 14, 512, 1, 1, 0, 1,
							0, 14, 14, 512, 0, 0,
							0,
							0},//Layer-12  (conv5_2)
							{0,
							14, 14, 512, 3, 3, 512, 512, 512,
							0,
							14, 14, 512, 1, 1, 0, 1,
							1, 7, 7, 512, 2, 2,
							0,
							2},//Layer-13  (conv5_3)    Note: for last conv layer, outputs are write to fc buffer
							{1,
							7, 7, 512, 7, 7, 512, 4096, 4096,
							2,
							1, 1, 4096, 7, 0, 0, 1,
							0, 1, 1, 4096, 0, 0,
							0,
							3},//Layer-14  (fc6)							
							{1,
							1, 1, 4096, 1, 1, 4096, 4096, 4096,
							3,
							1, 1, 4096, 1, 0, 0, 1,
							0, 1, 1, 4096, 0, 0,
							0,
							2},//Layer-15  (fc7)
							{1,
							1, 1, 4096, 1, 1, 4096, 1024, 1024,
							2,
							1, 1, 1024, 1, 0, 0, 0,
							0, 1, 1, 1024, 0, 0,
							0,
							3}//Layer-16  (fc8)		
							};

char precision_config[][3] ={{7,  0, -2},//Layer-1
							{ 8, -2, -5},//Layer-2
							{ 8, -5, -5},//Layer-3
							{ 8, -5, -6},//Layer-4
							{ 7, -6, -7},//Layer-5
							{ 8, -7, -7},//Layer-6
							{ 8, -7, -7},//Layer-7
							{ 8, -7, -6},//Layer-8
							{ 8, -6, -5},//Layer-9
							{ 8, -5, -5},//Layer-10
							{ 9, -5, -4},//Layer-11
							{ 9, -4, -3},//Layer-12
							{ 8, -3, -2},//Layer-13
							{ 8, -2,  0},//Layer-14
							{ 7,  0,  2},//Layer-15
							{ 7,  2,  2}//Layer-16
							};

unsigned input_config[4] = {224, 224, 3, 1};

//unsigned output_config[3] = {224, 224, 64};//Layer-1

//unsigned output_config[3] = {56, 56, 128};//Layer-4(pool2)

//unsigned output_config[3] = {28, 28, 256};//Layer-7(pool3)

//unsigned output_config[3] = {28, 28, 512};//Layer-8(relu4_1)

//unsigned output_config[3] = {28, 28, 512};//Layer-9(relu4_2)

//unsigned output_config[3] = {14, 14, 512};//Layer-10(pool4)

//unsigned output_config[3] = {7, 7, 512};//Layer-13(pool5)

//unsigned output_config[3] = {1, 1, 4096};//Layer-14

unsigned output_config[3] = {1, 1, 1024};//Layer-16

I compile the project successfully in the cpu-emulation mode in sdaccel gui.However,when I run the project,the error occurs:

***************************************************
PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs 
***************************************************
Error: required win_buffer size is 3456, configured size is 2304 
Allocate memory for data and weights failed !!!

How can I solve this problem? What else should I change in the code?

collect2: error: ld returned 1 exit status. Makefile:129: recipe for target 'run.exe' failed

Hello! I compile the project and get the following error;

in PLATFORM = arm32 and FLOW = sw_emu
I tried to remove all .o files and recompile it. But it didn't work.
Anyone got an idea?
Thanks in advance!!

Logic Optimization failing when compiling on AWS F1 for xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0

Finished 1st of 5 tasks (FPGA synthesis). Elapsed time: 00h 49m 48s.
.....
Finished 2nd of 5 tasks (FPGA logic optimization). Elapsed time: 00h 06m 40s.
....
ERROR: [XOCC 60-704] Integration error, problem implementing OCL region, place_design ERROR
ERROR: [XOCC 60-626] Kernel link failed to complete
ERROR: [XOCC 60-703] Failed to finish linking
make: *** [binary_container_1.xclbin] Error 1

Integration error, problem implementing OCL region.

Is this something I am doing wrong?

full console output here.

HW Configuration for DE1SOC

Hi, referring to instructions, the best result of AlexNet model executed on De1Soc is 149 ms.
However, the best result i got is just around 450 ms with the following hw configurations:

VEC_SIZE = 8
LANE_NUM = 4
CONV_GP_SIZE_X = 7

I do tried to increase the LANE_NUM to 8. But i got the following error even-though the resources of the device is not fully used up.

"kernel cannot fit into device"

Could you kindly share us the appropriate hw configuration for the VEC_SIZE, LANE_NUM, and CONV_GP_SIZE_X??

Thank you

Compiler Error: Unrecognized function call: mult_add_fix8bx4

Hi,
I am using Intel SDK for OpenCL on Arria10 to run the PipeCNN - Alex net. I am getting the above error when I compile the kernel with this command -
$ aoc device/conv_pipe.cl -o bin_fpga/conv_pipe.aocx --board bdw_fpga_v1.0 -v -g
aoc: Environment checks are completed successfully.
You are now compiling the full flow!!
aoc: Selected target board bdw_fpga_v1.0
aoc: Running OpenCL parser....
In file included from :11140:
:2:30: warning: ISO C99 requires whitespace after the macro name
#define ACL_BOARD_bdw_fpga_v1.0 1
^
:3:31: warning: ISO C99 requires whitespace after the macro name
#define AOCL_BOARD_bdw_fpga_v1.0 1
^
2 warnings generated.
aoc: OpenCL parser completed successfully.
aoc: Compiling....
Compiler Error: Unrecognized function call: mult_add_fix8bx4
Error: Optimizer FAILED.
Refer to conv_pipe/conv_pipe.log for details.

When I run the same on an emulator it is running fine and gives the expected output.

Why is it not able to recognise the function mult_add_fix8bx4 ? Should it be compiled separately ?

Thanks,
Akash

Compile error on Intel FPGA SDK for OpenCL.

This source is supposed to be compiled on Intel FPGA SDK for OpenCL but I am getting following error for arria 10. I am using following version of the Intel(R) FPGA SDK for OpenCL(TM), 64-Bit Offline Compiler.
Version 17.0.2 Build 297

Error:

error: function 'read_channel_altera' is not supported by the Intel(R) FPGA SDK for OpenCL(TM), and no user definition is provided

error: function 'write_channel_altera' is not supported by the Intel(R) FPGA SDK for OpenCL(TM), and no user definition is provided

Does it work on Windows 10?

Hello. As the title said, does PipeCNN support running on Windows 10? Thanks!

OpenCL runtime error

Dear doonny,

I'm very interested in testing your PipeCNN in my Zynq Ultrascale ZCU102.
I have compiled the source code with Xilinx SDSoC v2017.1 with zcu102_es1_ocl platform, then before launching the PipeCNN I have sent these commands:

cd /mnt
cp libxilinxopencl.so /usr/lib
export XILINX_OPENCL=/mnt

(libxilinxopencl.so is the opencl library for aarch64), then for launching the CNN:

./PipeCNN.elf conv.aocx

and finally the output is:

61063552 total weights read
154587 bytes image read
1024 total output reference read

ERROR: No device found
ERROR: CL_DEVICE_NOT_FOUND

Could you give me some help?

Thanks in advance

Inference result not displayed.

HI how to print:
" The inference result is n02123045 tabby, tabby cat (the prob is 56.00) ." at last in your documentation?
Thanks in advance..
Regards,
Ganda.

Makefile:129: recipe for target 'run.exe' failed

Hello! I compile the project and get the following error
lcf@lcf-9020:~/work/PipeCNN-master/project$ make
g++ ./host/main.o ../common/ocl_util.o ../common/timer.o -o run.exe -L/home/lcf/intelFPGA/16.1/hld/board/de10_standard/arm32/lib -L/home/lcf/intelFPGA/16.1/hld/host/arm32/lib -L/home/lcf/intelFPGA/16.1/hld/host/linux64/lib -Wl,--no-as-needed -lalteracl -lalterammdpcie -lstdc++ -lelf
/usr/bin/ld: skipping incompatible /home/lcf/intelFPGA/16.1/hld/board/de10_standard/arm32/lib/libalteracl.so when searching for -lalteracl
/usr/bin/ld: skipping incompatible /home/lcf/intelFPGA/16.1/hld/host/arm32/lib/libalteracl.so when searching for -lalteracl
/usr/bin/ld: skipping incompatible /home/lcf/intelFPGA/16.1/hld/board/de10_standard/arm32/lib/libalterammdpcie.so when searching for -lalterammdpcie
/usr/bin/ld: skipping incompatible /home/lcf/intelFPGA/16.1/hld/host/arm32/lib/libalterammdpcie.so when searching for -lalterammdpcie
/usr/bin/ld: cannot find -lalterammdpcie
/usr/bin/ld: skipping incompatible /home/lcf/intelFPGA/16.1/hld/board/de10_standard/arm32/lib/libelf.so when searching for -lelf
/usr/bin/ld: skipping incompatible /home/lcf/intelFPGA/16.1/hld/host/arm32/lib/libelf.so when searching for -lelf
collect2: error: ld returned 1 exit status
Makefile:129: recipe for target 'run.exe' failed
make: *** [run.exe] Error 1

Anyone got an idea? Thanks in advance!!

Memory bank mismatch

I am facing the following error when executing PipeCNN on AWS F1.


Loading kernel/binary from file cnnf1_pythonpipe2.awsxclbin
ERROR: ERROR: Memory bank specified for kernel instance "memRead_1" of kernel "memRead" for argument index 21 does not match the physical connectivity from the binary.
Bank specified on host side is "M01_AXI" while bank from the binary is "M00_AXI".

ERROR: clSetKernelArg() for kernel "memRead", argument index 21.

ERROR: CL_INVALID_MEM_OBJECT 
Location: ../src/host/main.cpp:730
Failed to set argument 21 kernel memRd

Was I supposed to set any parameter?
full output here

make: *** [conv.aocx] Error 1

Hi,I compiled the project based on de10-standard,but it implied that LABs isn't enough.What's the solution?
aoc: First stage compilation completed successfully.
Compiling for FPGA. This process may take a long time, please be patient.
Error (170012): Fitter requires 4243 LABs to implement the design, but the device contains only 4191 LABs
Error: Cannot fit kernel(s) on device
Makefile:135: recipe for target 'conv.aocx' failed
make: *** [conv.aocx] Error 1

Thanks

Xilinx Flow - Run time hang

Hello Prof. Wang,

I'm trying to run the PipeCNN on Alpha Data 7v3 FPGA. [xilinx_adm-pcie-7v3_1ddr_3_0]
I'm using SDx 2017.2 and software emulation runs properly giving correct results.
When hardware is built, the timing is not met for some paths and the tool reduces the clock speed to 170.3 MHz (from the original 200 MHz).
But when i run the generated binary conv.xclbin on the FPGA it results in a hang at run time

Executing Layer 1:

Launching single work-item kernel winbuffer

Launching single work-item kernel Conv

Launching single work-item kernel Pooling

Launching kernel MemWr with local size: 1, 1, 16  (global size: 27, 27, 96)

Launching kernel lrn with local size: 1, 1, 24  (global size: 27, 27, 24)

Could you please help me in figuring out the issue?
Thanks in advance

Regards

AWS F1 Build failing

Linking failed in Emulation-HW for AWS F1.
Hardware Platform: xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0

SDAccel compilation

conv_pipe.cl is heavily depend on altera specific extensions (write_channel_altera, read_channel_altera), so it can't be compiled to use with xilinx FPGAs.

a problem about running the project in sdaccel gui

I put the image.dat,weights.dat and fc8.dat in the data folder, build and run the project in cpu emulation mode.The build process finishes successfully.However, when I attempt to run the exe file,it finishes so quick and no errors occur.The console output is very short:

***************************************************
PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs 
***************************************************

61063552 total weights read 
154587 bytes image read

I'm using SDaccel 2017.2 gui mode.Why is the output log so short?I don't see any output files created after I run the project.

BSP for DE1-soc

Hi sir,
According to the description, the version of Intel's OpenCL SDK v16.1 is used in this project.
We would like to test run the program on the Terasic De1-SoC board.
But we found only the BSP for Altera SDK OpenCL 16.0 is provided in the official webpage.
Is it the one you used in this project?
And is it work fine in Intel's OpenCL SDK v16.1?

Thank you

How to handle data is not multiple of VEC_SIZE?

Q1:
How to handle data is not multiple of VEC_SIZE?
take Alexnet for example:
conv1 have 11x11x3 that can't divided by VEC_SIZE 4.
so at mac operation, a1xb1+a2xb2+a3xb3+a4xb4, and at last one whouldn't have a3 and a4, is this will auto assign 0?

Q2:
And It looks like data_vec is reading bottom linearly with size of VEC_SIZE ?
in the pipeCNN paper that describe that weight is divided into size of VEC_SIZE at Z direction
ex. I have weight 3x3x4, I should have VEC_SIZE(4) of group of weight at Z direction, each have 3x3x1=9 datas.
0, 1, 2, 3, ... , 8
9,10,11,12, ... 17
18,...
27,...

but in algorithm, you group data into data_vec linearly:
{0, +1, +2 +3}, {+4, +5, +6, +7}, .... , ....., + 35
how this divided weight at Z direction?

for(unsigned short win_itm_z=0; win_itm_z<weight_dim3/VEC_SIZE; win_itm_z++){
	for(unsigned char  win_itm_y=0; win_itm_y<win_size_y; win_itm_y++){
		for(unsigned char  win_itm_x=0; win_itm_x<win_size_x; win_itm_x++){
			feature_idx_dim1 = win_itm_x;
			feature_idx_dim2 = win_itm_y;
			feature_idx_dim3 = win_itm_z;
			if(xy is at correct location){	
				data_vec = bottom[data_offset*data_dim1xdim2 + feature_idx_dim3*data_dim1xdim2 + (feature_idx_dim2-padding)*data_dim1 + (feature_idx_dim1-padding)];
			}
			else{
				#pragma unroll
				for(unsigned char vv=0; vv<VEC_SIZE; vv++){
					data_vec.data[vv] = CZERO;
				}
			}	
			// start from using buffer[0]
			win_buffer[0][win_itm_z*win_size_y*win_size_x + win_itm_y*win_size_x + win_itm_x] = data_vec;
		}
	}
}

Compile error with alteracl.lib using mingw-w64 on Windows 10

Hello! I am trying to compile the code with the Intel OpenCL FPGA SDK 17.0 and an Arria 10 board on Windows 10 using mingw-w64. I got an error when the makefile run a command looks like:

g++ ./host/main.o ../common/ocl_util.o ../common/timer.o -o run.exe -LC:/intelFPGA_pro/17.0/hld/board/a10_ref/windows64/lib -LC:/intelFPGA_pro/17.0/hld/host/windows64/lib -laltera_a10_ref_mmd -lalteracl -lacl_emulator_kernel_rt -lpkg_editor -llibelf -lacl_hostxml

and I got an error saying like:

C:/intelFPGA_pro/17.0/hld/host/windows64/lib/alteracl.lib(d:/SJ/nightly/17.0/290/w64/acds/hld/obj/windows64/acl/acl_program.obj).text[l_build_from_source_in_dir]+0xa2): undefined reference to `__imp__wassert'
C:/intelFPGA_pro/17.0/hld/host/windows64/lib/alteracl.lib(d:/SJ/nightly/17.0/290/w64/acds/hld/obj/windows64/acl/acl_program.obj).text[l_load_binary_pkg]+0xb36): undefined reference to `__security_check_cookie'
C:/intelFPGA_pro/17.0/hld/host/windows64/lib/alteracl.lib(d:/SJ/nightly/17.0/290/w64/acds/hld/obj/windows64/acl/acl_program.obj).xdata[$unwind$l_compute_hash]+0x10): undefined reference to `__GSHandlerCheck'

(too long and get truncated; only repeating these 3 errors.)

Anyone has an idea? Thanks in advance!!

ERROR: CL_COMPILER_NOT_AVAILABLE

Hello! I compile the project in emulator mode and get the following error when I run "run.exe conv.aocx" with alexnet data:

run.exe conv.aocx
***************************************************
PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs
***************************************************

61063552 total weights read
154587 bytes image read
1024 total output reference read


Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
  Device 0: EmulatorDevice : Emulated Device
Device OpenCL Version: OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 17.0
Device Max Compute Units: 1
Device Max WorkGroup Size: 2147483647
Device Max WorkItem Size: 2147483647
Device Global Memory Size: 2048 MBytes
Device Local Memory Size: 16 KBytes
Device Max Clock Freq: 1000 Mhz

Loading kernel/binary from file conv.aocx
ERROR: CL_COMPILER_NOT_AVAILABLE
Location: ../common/ocl_util.cpp:429
Failed to build program with source

Environment: Windows 10, MinGW-w64, Arria 10 board, Intel OpenCL SDK for FPGA 17.0, MSVC 12.0

Anyone got an idea? Thanks in advance!!

Error message by SDAccel 2017.4

It is needed to rename platform to xilinx_kcu1500_dynamic_5_0 for SDAccel 2017.4
A lot of warnings like this "device/conv_pipe_xilinx.cl:680:708: warning: double precision constant requires cl_khr_fp64, casting to single precision"
Finally an error message:
ERROR: [XOCC 60-896] For unified platforms, please use -c or -l
ERROR: [XOCC 60-598] Kernel build setup failed to complete
ERROR: [XOCC 60-702] Failed to finish compilation and linking
Makefile:142: recipe for target 'conv.xclbin' failed
make: *** [conv.xclbin] Error 1

MemRead

Hi Professor @doonny , I read your paper and i'm still confused how the Memrd works. Can sir gives some pointer to understand this kernel (memrd)? Thank you in advance.

MaxPool Problem

I'm sorry, I'm too much hurry with this issue.
This problem is actual if add padding(=1) at right side and lower side.
for geting result with sizes : size_x = 13, size_y = 13
More information:
input: size_x = 13, size_y = 13, pool_size = 2, pool_stride = 1, depth = 512
result: size_x = 13, size_y = 13, depth = 512

Can you explain why did you use "accum_piped"

Hi, Dong
I have one more question.
Can you explain why did you use in this code "accum_piped"
Why PIPE_DEPTH = 6 ?
`

for(unsigned char ll=0; ll<LANE_NUM; ll++){

lane_accum[ll] = (MASK_ACCUM & accum_piped[ll][PIPE_DEPTH-1]) + (MASK_MULT & mac(mac_data.lane[ll], mac_weight.lane[ll]));
		
// Shift the pipelined registers backwards
#pragma unroll
for(unsigned int p=PIPE_DEPTH-1; p>0; p-- ){
	accum_piped[ll][p] = MASK_ACCUM & accum_piped[ll][p-1];
}
			
// update the first copy
accum_piped[ll][0] = MASK_ACCUM & lane_accum[ll];

}
`
Thank you
Best regards

platform error

The following error was found in SDAcell version (SDK v2017.4) after executing make file under the project folder.
I think the solution is to change 'xilinx:kcu1500:4ddr-xpr:4.0' to 'xilinx_vcu1525_dynamic_5_0'.
But it is not sure that the PipeCNN's environment and behavior are still valid.

` * Error log ----------------------------------------------
ERROR: [XOCC 60-705] No device was found that matches 'xilinx:kcu1500:4ddr-xpr:4.0'. The supported devices are:
xilinx_vcu1525_dynamic_5_0
xilinx_kcu1500_dynamic_5_0

ERROR: [XOCC 60-587] Failed to add a device: specified platform xilinx:kcu1500:4ddr-xpr:4.0 is not found
Makefile:151: recipe for target 'conv.xclbin' failed
make: *** [conv.xclbin] Error 1
------------------------------------------------------------`

Fixed-point model numbers

Dear Prof. Wang,
For the quantized parameter, you used (n,m) pair to denote the precision. For example, in VGG-16 1st layer, you used (8,7) (8, 0) (8,-2) to denote frac_w, frac_input and frac_output, for last FC layer, you used (8, 2) (8,2) (4,7). Is there any rules or constrains how to decide the numbers? Or just use any numbers you like? Can I change to other numbers? If I changed the fraction numbers and convert a new model, does it works or not? Thank you so much.

编译都完成了，也没有报错，但是运行./run.exe conv.aocx后就没了

你好，我前面所有的工作都完成了，预训练的模型也放在data文件夹下了，到最后一步./run.exe conv.aocx ，运行后就什么也没有发生，没报错也没输出。为什么？谢谢了

Can not get the correct result

Dear Prof. Wang,

We have tried to run your code on Altera FPGA DE5a_net_e1, unfortunately, we can not get the correct result. The result is random everytime. Sometime it is fox, sometime it is Cardigan or Pomerania. Could you please help me to figure out what was wrong? Thank you so much.

Best regards!

[root@dhcp70 project]# ./run.exe conv.aocx

PipeCNN: An OpenCL-Based FPGA Accelerator for CNNs

61063552 total weights read

Loading picture ./data/picture/cat.jpg .....

1024 total output reference read

Platform: Intel(R) FPGA SDK for OpenCL(TM)
Using 1 device(s)
Device 0: de5a_net_e1 : Arria 10 Reference Platform (aclde5a_net_e10)
Device OpenCL Version: OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 16.1
Device Max Compute Units: 1
Device Max WorkGroup Size: 2147483647
Device Max WorkItem Size: 2147483647
Device Global Memory Size: 8192 MBytes
Device Local Memory Size: 16 KBytes
Device Max Clock Freq: 1000 Mhz

Loading kernel/binary from file conv.aocx
Reprogramming device [0] with handle 1

Executing Layer 1:

Launching single work-item kernel winbuffer