spurslipu / yolov3v4-modelcompression-multidatasettraining-multibackbone Goto Github PK
View Code? Open in Web Editor NEWYOLO ModelCompression MultidatasetTraining
License: GNU General Public License v3.0
YOLO ModelCompression MultidatasetTraining
License: GNU General Public License v3.0
@SpursLipu
其他检测模型是否通用例如ssd yolo4 ?还是有很多需要注意的的点需要自己修改
分类和检测的模型压缩有什么区别吗?现在发布的论文都是验证的分类模型,不可以都通用吗?
when I use the anchor box clustered from my own data, why map is decreased? use the source code's anchors size, my best map is 0.95, but when I use the anchor size for my own data, the best map just is nearly 0.7
首先謝謝這份實作!
想請教你在實驗記錄裡提到你發現模型結構有錯,目前已經在最新的repo版本中修正了嗎?
你好,请问下如何制作并训练自己的数据集呢?本人期末project是做X图片里的检测,同时也想学习模型压缩的知识,十分感谢
Traceback (most recent call last):
File "F:/Code/YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone-master/train.py", line 516, in
train(hyp) # train normally
File "F:/Code/YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone-master/train.py", line 151, in train
load_darknet_weights(model, weights, pt=opt.pt)
File "F:\Code\YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone-master\models.py", line 566, in load_darknet_weights
assert ptr == len(weights)
AssertionError
你好,能不能在readme中贴出prune的参考论文链接,小白第一次接触压缩,不知道code中的剪枝是对应哪些论文。还有在进行剪枝稀疏化训练时我查看了代码好像并没有生成新的cfg文件,请问还是使用原来的yolov3.cfg文件吗?还有i请问下,你在训练时percent的参数一般给的是多少,希望能做下参考,十分感谢
训练yolov3-monilenet的命令中python3 train.py --data data/coco2017.data --batch-size 32 --accumulate 1 -pt --weights weights/yolov3-mobilenet.weights --cfg cfg/yolov3tiny-mobilenet/yolov3tiny-mobilenet-small-coco.cfg --img_size 608 ,cfg为什么是yolov3tiny-mobilenet??
您好,我之前是使用你师兄的代码去训练工程,但是加载训练完成后保存的best.pt去预测视频时候图像被缩放为256x416的:
video 1/1 (1452/67673) /home/yhy/YOLOv3_pruning/ch03.mp4: 256x416 Done.
检测图片的时候也是会被重置为256x416,检测的效果非常差,但是看训练时的mAP倒是挺好,70%左右。
你师兄的工程貌似已经没有在维护了,不知道你有没有在你师兄的工程中遇到图像被缩放为256x416的情况?
找不到原因,所以用你的工程,但是训练的时候报错:
Traceback (most recent call last):
File "train.py", line 497, in
train(hyp) # train normally
File "train.py", line 387, in train
dataloader=testloader)
File "/home/yhy/pruning/test.py", line 74, in test
_ = model(torch.zeros((1, 3, img_size, img_size), device=device)) if device.type != 'cpu' else None # run once
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 449, in forward
outputs = self.parallel_apply(self._module_copies[:len(inputs)], inputs, kwargs)
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 474, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/yhy/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/yhy/pruning/models.py", line 306, in forward
return self.forward_once(x)
File "/home/yhy/pruning/models.py", line 358, in forward_once
yolo_out.append(module(x, out))
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/yhy/pruning/models.py", line 279, in forward
io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid # xy
File "/home/yhy/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 576, in getattr
type(self).name, name))
AttributeError: 'YOLOLayer' object has no attribute 'grid'
这个错误在你师兄或者ultralytics/yolov3的工程里都没有出现。
能否提供yolov3-mobilenetv3系列训练好的权重文件
我使用自己的数据集进行训练网络,修改了yolo_mobilenet.cfg中的class和yolo前的filters,在cat时报告维度不匹配
1.在MobileNetV3中,原作者在bneck中并未使用relu6和linear,relu6和linear是作者在V2中使用的,在V3中使用ReLU和h-swish
2.yolov3tiny-mobilenet-small-coco.cfg中每一个深度卷积和逐点卷积后都有激活函数,这个是如何得出的,在论文中并未看到这样的操作?
报错如下:chenjunsong@chenjunsong-GJ5CN64:~/U-YOLOv3$ python3 normal_prune.py --data data/obj.data --cfg cfg/yolov2/yolo-obj.cfg --weights weights/best.pt --percent 0.1
Namespace(cfg='cfg/yolov2/yolo-obj.cfg', data='data/obj.data', img_size=608, percent=0.1, weights='weights/best.pt')
Model Summary: 120 layers, 4.07562e+07 parameters, 4.07562e+07 gradients
Caching labels (500 found, 0 missing, 0 empty, 0 duplicate, for 500 images): 100%|█████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 10877.12it/s]
Class Images Targets P R [email protected] F1: 100%|█████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:28<00:00, 1.12it/s]
all 500 2.09e+03 0.57 0.00759 0.135 0.015
Threshold should be less than 0.9868.
The corresponding prune ratio is 0.987.
Channels with Gamma value less than 0.0965 are pruned!
Number of channels has been reduced from 9888 to 8900
Prune ratio: 0.100
layer index: 0 total channel: 32 remaining channel: 30
layer index: 2 total channel: 64 remaining channel: 59
layer index: 5 total channel: 64 remaining channel: 57
layer index: 10 total channel: 128 remaining channel: 115
layer index: 15 total channel: 256 remaining channel: 224
layer index: 18 total channel: 256 remaining channel: 232
layer index: 23 total channel: 512 remaining channel: 470
layer index: 26 total channel: 512 remaining channel: 471
layer index: 29 total channel: 512 remaining channel: 461
layer index: 30 total channel: 1024 remaining channel: 910
layer index: 31 total channel: 512 remaining channel: 453
layer index: 32 total channel: 1024 remaining channel: 923
layer index: 33 total channel: 512 remaining channel: 461
layer index: 34 total channel: 1024 remaining channel: 933
layer index: 41 total channel: 256 remaining channel: 234
layer index: 42 total channel: 512 remaining channel: 457
layer index: 43 total channel: 256 remaining channel: 229
layer index: 44 total channel: 512 remaining channel: 462
layer index: 45 total channel: 256 remaining channel: 229
layer index: 46 total channel: 512 remaining channel: 459
layer index: 53 total channel: 128 remaining channel: 105
layer index: 54 total channel: 256 remaining channel: 238
layer index: 55 total channel: 128 remaining channel: 118
layer index: 56 total channel: 256 remaining channel: 232
layer index: 57 total channel: 128 remaining channel: 113
layer index: 58 total channel: 256 remaining channel: 225
Prune channels: 988 Prune ratio: 0.063
Caching labels (500 found, 0 missing, 0 empty, 0 duplicate, for 500 images): 100%|█████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 11243.52it/s]
Class Images Targets P R [email protected] F1: 100%|█████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:27<00:00, 1.15it/s]
all 500 2.09e+03 0.584 0.0067 0.134 0.0133
after prune_model_keep_size map is 0.13384984434263122
Model Summary: 120 layers, 3.55866e+07 parameters, 3.55866e+07 gradients
Traceback (most recent call last):
File "normal_prune.py", line 183, in
init_weights_from_loose_model(compact_model, pruned_model, CBL_idx, Other_idx, CBLidx2mask)
File "/home/chenjunsong/U-YOLOv3/utils/prune_utils.py", line 194, in init_weights_from_loose_model
input_mask = get_input_mask(loose_model.module_defs, idx, CBLidx2mask)
File "/home/chenjunsong/U-YOLOv3/utils/prune_utils.py", line 154, in get_input_mask
return CBLidx2mask[idx - 2]
KeyError: 7
你在训练YOLOv4的时候采用的是yolov4.weights,为什么不用预训练权重yolov4.conv.137训练呢?
我输入指令python3 train.py --data data/obj.data --batch-size 4 --cfg cfg/yolov3-mobilenet/yolov3-mobilenet.cfg --weights weights/best.weights --quantized 3 并没有获得量化效果,请问是我命令输入的有问题吗?
你好,我在进行知识蒸馏时将训练好的yolov3中的best.pt作为teacher model, 将yolov3-tiny训练好的last.pt作为student model,当开始计算下面这条语句时发生错误:
loss_st = criterion_st(nn.functional.log_softmax(output_s / T, dim=1), nn.functional.softmax(output_t / T, dim=1)) * (T * T) / batch_size
错误信息:RuntimeError: The size of tensor a (45486) must match the size of tensor b (10830) at non-singleton dimension 0
我猜测是由于tiny只有两个yololayer,yolov3有三个yololayer。
知识蒸馏的指令如下:python train.py --data cfg/xray.data --batch_size 2 --KDstr 1 --weights weights/last.pt --cfg cfg/yolov3tiny/yolov3-tiny.cfg --img_size 608 --epochs 80 --quantized 1 --qlayers 72 --t_cfg cfg/yolov3/yolov3.cfg --t_weights yolov3-result/yolov3/best.pt
请问该如何解决这个问题?诚盼回复,十分感谢!
感谢作者的开源,请问关于加速推理方面。该项目中量化,蒸馏能不能加快推理。
具体代码:elif mdef['type'] == 'yolo':
yolo_index += 1
stride = [32, 16, 8, 4] # P5, P4, P3 strides
if 'panet' in cfg or 'yolov4' in cfg: # stride order reversed
stride = list(reversed(stride))
layers = mdef['from'] if 'from' in mdef else []
modules = YOLOLayer(anchors=mdef['anchors'][mdef['mask']], # anchor list
nc=mdef['classes'], # number of classes
img_size=img_size, # (416, 416)
yolo_index=yolo_index, # 0, 1, 2...
layers=layers, # output layers
stride=stride[yolo_index])
只要在stride = [32, 16, 8, 4] 中加上4就行了吧,还需要改别的地方吗?
imgsz_min, imgsz_max, imgsz_test = opt.img_size # img sizes (min, max, test)
gs = 64
assert math.fmod(imgsz_min, gs) == 0, '--img-size %g must be a %g-multiple' % (imgsz_min, gs)
你好,今天下午运行你最新版的程序时,我的img_size为608,所以我的imgsz_min也是608,因此出错了,所以不是很理解为何要有这个assert语句,grid size我理解应该是三个yololayer输出的feature map的尺寸,为何这里grid size直接设置为64了
0/499 6.3G 2.79 0.299 0.201 3.29 4 1.02e+03: 100%|█| 110/110 [02:37<00:00, 1.43s/it]
Traceback (most recent call last):
File "train.py", line 497, in
train(hyp) # train normally
File "train.py", line 387, in train
dataloader=testloader)
File "D:\2020\prune0513\YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone\test.py", line 74, in test
_ = model(torch.zeros((1, 3, img_size, img_size), device=device)) if device.type != 'cpu' else None # run once
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\2020\prune0513\YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone\models.py", line 306, in forward
return self.forward_once(x)
File "D:\2020\prune0513\YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone\models.py", line 358, in forward_once
yolo_out.append(module(x, out))
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\2020\prune0513\YOLOv3-ModelCompression-MultidatasetTraining-Multibackbone\models.py", line 279, in forward
io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid # xy
RuntimeError: The size of tensor a (32) must match the size of tensor b (20) at non-singleton dimension 3
我又尝试在yolov3后面再添加了个yolo检测层,结果还是同样的报错,
报错如下:chenjunsong@chenjunsong-GJ5CN64:~/U-YOLOv3$ python3 train.py --data data/obj.data --batch-size 6 --cfg cfg/yolov3-mobilenet/yolov3-mobilenet1.cfg
Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex
Namespace(KDstr=-1, accumulate=2, adam=False, batch_size=6, bucket='', cache_images=False, cfg='./cfg/yolov3-mobilenet/yolov3-mobilenet1.cfg', data='data/obj.data', device='', epochs=601, evolve=False, img_size=[320, 640], multi_scale=False, name='', nosave=False, notest=False, prune=-1, pt=False, qlayers=-1, quantized=-1, rect=False, resume=False, s=0.001, single_cls=False, sr=False, t_cfg='', t_weights='', weights='')
Using CUDA device0 _CudaDeviceProperties(name='GeForce GTX 1060', total_memory=6072MB)
Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
Traceback (most recent call last):
File "train.py", line 512, in
train(hyp) # train normally
File "train.py", line 94, in train
model = Darknet(cfg, quantized=opt.quantized, qlayers=opt.qlayers).to(device)
File "/home/chenjunsong/U-YOLOv3/models.py", line 364, in init
qlayers=self.qlayers)
File "/home/chenjunsong/U-YOLOv3/models.py", line 233, in create_modules
stride=stride[yolo_index])
IndexError: list index out of range
模型能够剪枝,剪枝后模型能够被压缩,推理速度也确实变快了,但是剪枝后精度为0,就算percent 0.1精度还是为0。
我在执行蒸馏时使用的是如下指令:--data cfg/xray.data --batch_size 2 --KDstr 1 --weights weights/yolov3-mobilenet.weights --cfg cfg/yolov3-mobilenet/yolov3-mobilenet-coco.cfg --img_size 608 --epochs 80 --quantized 1 --qlayers 72 --t_cfg cfg/yolov3/yolov3.cfg --t_weights weights/best.pt
其中best.pt是我训练yolov3得到的最好的权重文件,为何会出现assertionerror, 该条错误出现在models.py 第544行的assert ptr==len(weights)这条语句。
我在跑yolov3的时候却没有出现错误,跑yolov3的指令如下:--data data/xray.data --batch_size 2 --accumulate 1 -pt --weights weights/yolov3-608.weights --cfg cfg/yolov3/yolov3.cfg --img_size 608 --epochs 60
诚盼回复,谢谢!
当我执行normal prune时,当执行到prune_utils.py 中函数prune_model_keep_size的next_conv = pruned_model.module_list[next_idx][0]语句时出现IndexError: index 0 is out of range的错误信息。
command:
正常剪枝,剪枝率0.8,加载剪枝后的模型进行fine-tune时报错,显示GPU内存溢出,如下图
@SpursLipu 麻烦您帮忙看一下,是否是因为剪枝之后的网络结构问题导致训练出错的,感谢。
请问这是因为环境版本出现的问题吗
您好,您的代码里关于dorefa量化的部分不是很理解,dorefa论文看的也比较晕,您能不能大概解释一下下面代码的量化思路?
我用这种方法做完量化训练后,想把模型转换成int8部署到FPGA上,不太清楚保存的fp32模型要怎么转成int8.
AttributeError: 'YOLOLayer' object has no attribute 'grid'
你好,百度云无法下载yolov3-tiny和yolov3-mobilenet的权重,能否重新分享下
你好,我在对yolov3(best.pt)进行稀疏化剪枝后得到的prune_0.5_yolov3.weights,percent为0.5,然后将best.pt和prune_0.5_yolov3.weights进行蒸馏时,发现map上升的很奇怪,描述如下:
epoch=0一直到epoch=4, map: 0.332-->0.528-->0.645-->0.706-->0.801,这段上升的十分快,可是从epoch=5一直到epoch=100,map一直在(0.802,0.83)这段区间内,彷佛蒸馏在前几个epoch就完成了,能否解答这个现象,不是很明白,十分感谢!
在各数据集上大概可以提升多少?学生网络压缩比?
如果我对yolov3-mobilenet网络结构做了一点改动,想使用你的yolov3-mobilenet预训练模型,命令行需要改动吗?还是说改动train.py文件?
你好,用了你的Dior数据集的权重文件,在检测的时候俯视图的车辆检测效果并不好,跟你README.md上的显示效果不一样,我用一张截图测试了一下,放在下面链接上:
链接: https://pan.baidu.com/s/1AYaojpyz8btcHpQ7MngXTg 提取码: n5hf
我只有一类目标,可以用你在项目中给出的coco的80类的yolov3-mobilenet.weights预训练模型吗?我用这个预训练模型会报错,是需要改一下什么地方吗?
你好,就之前我提问的为何量化后模型大小没有发生变化,你回答说是实际部署才会变小, 请问下该如何进行实际部署,临近期末课程课题的deadline, 诚盼回复,十分感谢!
为何last.pt的模型相对于其他存储下来的模型所占的内存那么小?在使用知识蒸馏时,教师模型是best.pt, 学生模型是last.pt。我正常训练时epoch为60,last.pt和best.pt的map相差不了多少,为何知识蒸馏时要这样设置教师和学生模型
剪枝后,map从0.84->0.035,为何下降如此的低, prune ratio是0.2, --percent参数是0.2,诚盼回复,十分感谢
请问下这个repo支持yolov4吗
您好,我现在模型大小通过剪枝已经剪到了原来模型10%,请问量化效果还大吗?
你好,可以分享一下Yolo训练BDD100K的权重文件吗,百度云链接失效了
训练时测试精度,显示已经到了91。但是我对最后保存的best.pt就行测试时模型精度确只有88%。
python3 train.py --data data/coco2017.data --batch-size 32 --accumulate 1 -pt --weights weights/yolov3-608.weights --cfg cfg/yolov3/yolov3.cfg --img_size 608
when I train yolov3 with the command above, I encountered the assertion error, due to the python code: assert ptr == len(weights), in line 544, models.py, and I debug the code found that just assign the weights when mdef['type'] == 'convolutional' in line 456, models.py, how I solve this problem? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.