Giter Site home page Giter Site logo

yatenglg / retinanet-pytorch Goto Github PK

View Code? Open in Web Editor NEW
264.0 7.0 96.0 152 KB

Retinanet目标检测算法(简单,明了,易用,全中文注释,单机多卡训练,视频检测)(based on pytorch,Simple, Clear, Mutil GPU)

Python 100.00%
pytorch retinanet object-detection

retinanet-pytorch's Introduction

GIthub使用指北:

1.想将项目拷贝到自己帐号下就fork一下.

2.持续关注项目更新就star一下

3.watch是设置接收邮件提醒的.


Retinanet-Pytorch

Retinanet目标检测算法pytorch实现,

本项目不是完全的复现论文(很多参数以及实现方式上与原论文存在部分差异,有疑问欢迎issues)

由于一些原因,训练已经过测试,但是并没有训练完毕,所以不会上传预训练模型.

但项目代码验证无误.(但在使用时需要自己进行调整。不建议新手进行尝试。)


项目在架构上与 SSD-Pytorch 采用了相似的结构.

重用了大量SSD-Pytorch中代码,如训练器,测试器等.


本项目单机多卡,通过torch.nn.DataParallel实现,将单机环境统一包装.支持单机单卡,单机多卡,指定gpu训练及测试,但不支持多机多卡和cpu训练和测试. 不限定检测时的设备(cpu,gpu均可).


Requirements

  1. pytorch
  2. opencv-python
  3. torchvision >= 0.3.0
  4. Vizer
  5. visdom

(均可pip安装)

项目结构

文件夹 文件 说明
Data 数据相关
Dataloader 数据加载器类'Our_Dataloader', 'Our_Dataloader_test'
Dataset_VOC VOC格式数据集类
Transfroms 数据Transfroms
Transfroms_tuils Transfroms子方法
Model 模型相关
base_models/Resnet 支持resnet18,34,50,101,152
structs/Anchors retinanet默认检测框生成器
structs/MutiBoxLoss 损失函数
structs/Focal_Loss focal_loss损失函数
structs/Fpn 特征金字塔结构
structs/PostProcess 后处理
structs/Predictor 分类及回归网络
evaler 验证器,用于在数据集上对模型进行验证(测试),计算ap,map
retainnet Retinanet模型类
trainer 训练器,用于在数据集上训练模型
Utils 各种工具
boxs_op 各种框体操作,编码解码,IOU计算,框体格式转换等
Weights 模型权重存放处
pretrained 预训练模型权重存放处,本项目模型并没有训练完毕,因而没有上传训练好的模型,但是训练过程已经过验证
trained 训练过程中默认模型存放处
---- Configs.py 配置文件,包含了模型定义,数据以及训练过程,测试过程等的全部参数,建议备份一份再进行修改
---- Demo_train.py 模型训练的例子,训练过程中的模型会保存在Weights/Our/
---- Demo_eval.py 模型测试的例子,计算模型ap,map
---- Demo_detect_one_image.py 检测单张图片例子
---- Demo_detect_video.py 视频检测例子,传入一个视频,进行检测

Demo

本项目配有训练,验证,检测部分的代码,所有Demo均经过测试,可直接运行.

训练train

针对单机多卡环境的SSD目标检测算法实现(Single Shot MultiBox Detector)(简单,明了,易用,中文注释)一样,项目使用visdom进行训练过程可视化.在运行前请安装并运行visdom.

同样的,训练过程也只支持单机单卡或单机多卡环境,不支持cpu训练.

# -*- coding: utf-8 -*-
# @Author  : LG

from Model import RetainNet, Trainer
from Data import vocdataset
from Configs import _C as cfg
from Data import transfrom,targettransform


# 训练数据集,VOC格式数据集, 训练数据取自 ImageSets/Main/train.txt'
train_dataset=vocdataset(cfg, is_train=True, transform=transfrom(cfg,is_train=True),
                         target_transform=targettransform(cfg))

# 测试数据集,VOC格式数据集, 测试数据取自 ImageSets/Main/eval.txt'
test_dataset = vocdataset(cfg=cfg, is_train=False,
                          transform=transfrom(cfg=cfg, is_train=False),
                          target_transform=targettransform(cfg))

if __name__ == '__main__':
    """
    使用时,请先打开visdom
    
    命令行 输入  pip install visdom          进行安装 
    输入        python -m visdom.server'    启动
    """
  
    # 首次调用会下载resnet预训练模型
    
    # 实例化模型. 模型的具体各种参数在Config文件中进行配置
    net = RetainNet(cfg)
    # 将模型移动到gpu上,cfg.DEVICE.MAINDEVICE定义了模型所使用的主GPU
    net.to(cfg.DEVICE.MAINDEVICE)
    # 初始化训练器,训练器参数通过cfg进行配置;也可传入参数进行配置,但不建议
    trainer = Trainer(cfg)
    # 训练器开始在 数据集上训练模型
    trainer(net, train_dataset)

验证eval

验证过程支持单机多卡,单机单卡,不支持cpu.

# -*- coding: utf-8 -*-
# @Author  : LG

from Model import RetainNet, Evaler
from Data import vocdataset
from Configs import _C as cfg
from Data import transfrom,targettransform


# 训练数据集,VOC格式数据集, 训练数据取自 ImageSets/Main/train.txt'
train_dataset=vocdataset(cfg, is_train=True, transform=transfrom(cfg,is_train=True),
                         target_transform=targettransform(cfg))

# 测试数据集,VOC格式数据集, 测试数据取自 ImageSets/Main/eval.txt'
test_dataset = vocdataset(cfg=cfg, is_train=False,
                          transform=transfrom(cfg=cfg, is_train=False),
                          target_transform=targettransform(cfg))

if __name__ == '__main__':
    # 模型测试只支持GPU单卡或多卡,不支持cpu
    net = RetainNet(cfg)
    # 将模型移动到gpu上,cfg.DEVICE.MAINDEVICE定义了模型所使用的主GPU
    net.to(cfg.DEVICE.MAINDEVICE)
    # 模型从权重文件中加载权重
    net.load_pretrained_weight('XXX.pkl')
    # 初始化验证器,验证器参数通过cfg进行配置;也可传入参数进行配置,但不建议
    evaler = Evaler(cfg, eval_devices=None)
    # 验证器开始在数据集上验证模型
    ap, map = evaler(model=net,
                     test_dataset=test_dataset)
    print(ap)
    print(map)

检测Detect

单次检测过程支持单机单卡,cpu.

单张图片检测

# -*- coding: utf-8 -*-
# @Author  : LG
from Model import RetainNet
from Configs import _C as cfg
from PIL import Image
from matplotlib import pyplot as plt

# 实例化模型
net = RetainNet(cfg)
# 使用cpu或gpu
net.to('cuda')
# 模型从权重文件中加载权重
net.load_pretrained_weight('XXX.pkl')
# 打开图片
image = Image.open("XXX.jpg")
# 进行检测, 分别返回 绘制了检测框的图片数据/回归框/标签/分数.
drawn_image, boxes, labels, scores = net.Detect_single_img(image=image,score_threshold=0.5)

plt.imsave('XXX_det.jpg',drawn_image)
plt.imshow(drawn_image)
plt.show()

视频检测

# -*- coding: utf-8 -*-
# @Author  : LG
from Model import RetainNet
from Configs import _C as cfg

# 实例化模型
net = RetainNet(cfg)
# 使用cpu或gpu
net.to('cuda')
# 模型从权重文件中加载权重
net.load_pretrained_weight('XXX.pkl')

video_path = 'XXX.mp4'

# 进行检测,
# if save_video_path不为None,则不保存视频,如需保存视频save_video_path=XXX.mp4 ,
# show=True,实时显示检测结果
net.Detect_video(video_path=video_path, score_threshold=0.02, save_video_path=None, show=True)

support by jetbrains.

Jetbrains

https://www.jetbrains.com/?from=SSD-Pytorch


retinanet-pytorch's People

Contributors

yatenglg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

retinanet-pytorch's Issues

迭代次数

设置训练迭代次数为20000,但是到了之后它不会自动停下。

如何分别设置输入图片IMAGE_SIZE的长宽?

感谢提供此代码!但由于原本设置的是将输入图像的长宽都resize到同一IMAGE_SIZE大小(600px),但对于像KITTI这样长宽比悬殊的数据集,原图长宽比大约为1200*375左右,若resize到同一大小,就将导致行人/自行车到目标的像素严重缺失,无法识别。因此我希望能分别设置IMAGE_SIZE的width和height,请问这样的话,对于anchor和feature map大小,以及内部一系列参数的设置该如何修改?

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

--- load weight finish ---
Setting up a new session...
Max_iter = 120000, Batch_size = 20
Model will train on cuda:[0]
--- Focal_loss alpha = 0.25 ,将对背景类进行衰减,请在目标检测任务中使用 ---
--- Multiboxloss : α=0.25 γ=2 num_classes=21
Set optimizer : SGD (
Parameter Group 0
dampening: 0
initial_lr: 0.001
lr: 0.001
momentum: 0.9
nesterov: False
weight_decay: 0.0005
)
Set scheduler : <torch.optim.lr_scheduler.MultiStepLR object at 0x00000248040508B0>
Set lossfunc : multiboxloss(
(loc_loss_fn): SmoothL1Loss()
(cls_loss_fn): focal_loss()
)
Start Train......


Traceback (most recent call last):
File "D:\software\PyCharm\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\software\PyCharm\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/code/ai/Retinanet/Retinanet-Pytorch-master/Demo_train.py", line 36, in
trainer(net, train_dataset)
File "D:\code\ai\Retinanet\Retinanet-Pytorch-master\Model\trainer.py", line 112, in call
for iteration, (images, boxes, labels, image_names) in enumerate(data_loader):
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next
data = self._next_data()
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data
data.reraise()
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch_utils.py", line 428, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch\utils\data_utils\worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\software\supermap\idesktopX\support\MiniConda\conda\envs\retinanet\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\code\ai\Retinanet\Retinanet-Pytorch-master\Data\Dataset_VOC.py", line 48, in getitem
image, boxes, labels = self.transform(image, boxes, labels)
File "D:\code\ai\Retinanet\Retinanet-Pytorch-master\Data\Transfroms.py", line 40, in call
img, boxes, labels = t(img, boxes, labels)
File "D:\code\ai\Retinanet\Retinanet-Pytorch-master\Data\Transfroms_utils.py", line 263, in call
mode = random.choice(self.sample_options)
File "mtrand.pyx", line 920, in numpy.random.mtrand.RandomState.choice
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.
请问这是什么原因导致的呀

数据集下载

您好!请问可以提供数据集下载链接吗?也希望您能够提供具体的训练步骤,谢谢!

输入尺寸和预测框大小

谢谢提供这个模型,由于我修改了FPN的结构,在训练的时候(8G的显存),输入尺寸为600时,总是出现CUDA out of memory,我想减小到输入尺寸为300,特征图(5层)应该变成 38、19、10、5、3,那么对应的预测框大小该如何设置?

assign_priors里面负样本相关

RT,原始论文中 0.5>IOU>0.4的anchor label好像都赋值为-1以此来忽略最终的loss计算,IOU<0.4的才记为负样本。而你的代码中 IOU<0.5的都记为负样本。你这么做的依据在哪或者说有什么其他参考吗

关于计算loss相关

如题,我看你计算loc和cls损失时都是计算正负样本的总损失。但是最后返回时却只除以了正样本数量。你能解释一下为什么要这样做吗。或者给我一个相关链接也行。

fpn最后的3X3卷积都是用的conv1的吗?

嗨喽大佬你好正在学习你的代码,fpn.py里最后的3*3卷积都是用的self.top_down_conv1吗?这样就共享权重了?那上面怎么还定义了conv2和conv3呢?求大佬的解答

BUG! 当图片(.xml)中不包含任何 object 时!

当训练集中存在一张图片不包含任何目标时,Data文件夹下的Transfroms_utils.py代码在进行boxes[:, 0] /= width计算时,会报错IndexError: too many indices for array。
原因是 这张图片并没有真值图,即xml文件中无法找到bbox,所以报错。
现在的新数据集中,这类图片很常见,希望大神解决一下,谢谢!

RuntimeError: Found dtype Double but expected Float

--- load weight finish ---
Setting up a new session...
Max_iter = 120000, Batch_size = 20
Model will train on cuda:[0]
--- Focal_loss alpha = 0.25 ,将对背景类进行衰减,请在目标检测任务中使用 ---
--- Multiboxloss : α=0.25 γ=2 num_classes=21
Set optimizer : SGD (
Parameter Group 0
dampening: 0
initial_lr: 0.001
lr: 0.001
momentum: 0.9
nesterov: False
weight_decay: 0.0005
)
Set scheduler : <torch.optim.lr_scheduler.MultiStepLR object at 0x7f7d7f196e20>
Set lossfunc : multiboxloss(
(loc_loss_fn): SmoothL1Loss()
(cls_loss_fn): focal_loss()
)
Start Train......


/home/pdj/PycharmProjects/lyy/Retinanet-Pytorch/Data/Transfroms_utils.py:263: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)
/home/pdj/PycharmProjects/lyy/Retinanet-Pytorch/Data/Transfroms_utils.py:263: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
mode = random.choice(self.sample_options)

Traceback (most recent call last):
File "/home/pdj/PycharmProjects/lyy/Retinanet-Pytorch/Demo_train.py", line 36, in
trainer(net, train_dataset)
File "/home/pdj/PycharmProjects/lyy/Retinanet-Pytorch/Model/trainer.py", line 122, in call
loss.backward()
File "/home/pdj/anaconda3/envs/lyy/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/pdj/anaconda3/envs/lyy/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: Found dtype Double but expected Float

Process finished with exit code 1
请问个是为什么,我在Transforms.py中明明看到有ConvertFromInts()
并且在Transforms_utils.py中明明看到有return image.astype(np.float32), boxes, labels
为什么就报RuntimeError: Found dtype Double but expected Float这个错误了呢,
难道是 上面VisibleDeprecationWarning这个的问题。
库版本如下:
python 3.8.5 h7579374_1
pytorch 1.7.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
torchvision 0.8.1 py38_cu101 pytorch
numpy 1.19.4 pypi_0 pypi
opencv-python 4.4.0.46 pypi_0 pypi
yacs 0.1.8 pypi_0 pypi
visdom 0.1.8.9 pypi_0 pypi
vizer 0.1.5 pypi_0 pypi

false positives 特别多的问题

您好,非常感谢您的代码。我尝试将您的focal loss部分加到retinaface模型中,没有改动参数,发现误检特别多。想问下知道可能的原因嘛?是因为我没有进行hard negative mining还是因为我参数没有调对呢?(尝试过修改阈值但作用非常微小)

报错

IndexError: The shape of the mask [8, 1] at index 1 does not match the shape of the indexed tensor [8, 67995] at index 1
我将batch_size修改为8,这个错是在哪里进行修改啊

检测速度问题

网络的检测速度出奇的慢呀,有没有想法改善一下呢

RunTimeError

Traceback (most recent call last):
File "F:/Retinanet-Pytorch-master/Demo_train.py", line 36, in
trainer(net, train_dataset)
File "F:\Retinanet-Pytorch-master\Model\trainer.py", line 115, in call
reg_loss, cls_loss = self.loss_func(cls_logits, bbox_preds, labels, boxes)
File "D:\anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "F:\Retinanet-Pytorch-master\Model\struct\MultiBoxLoss.py", line 57, in forward
predicted_locations = predicted_locations[pos_mask, :].view(-1, 4)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
配置文件里面将batch_size减小为1,学习率也进行了修改为1e-4,还是报错,请问是什么原因

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.