datvuthanh / hybridnets Goto Github PK

View Code? Open in Web Editor NEW

574.0 16.0 118.0 55.68 MB

HybridNets: End-to-End Perception Network

License: MIT License

Python 84.91% Jupyter Notebook 6.42% CMake 0.37% C++ 8.30%

detection bifpn segmentation multitask-learning hybridnets autonomous-driving end2end-network

hybridnets's Introduction

loss.backward()

hybridnets's People

Contributors

Stargazers

Watchers

Forkers

xoiga123 nuaasxr peternara gov-ai wilfredgithuka stephennfernandes pinto0309 huyhoang17 anminhhung jie311 zp1018 ai-framwork c0ngtri123 xiaojinu vietnhatthai atangfan beraterenterzi sunarditay techthiyanes laurojljunior ramanabotta waqarsqureshi cv-ip maliktalha1996 suanlab longervision haroonshakeel gone2808 danielschulz rickycorte kailthen betbetliu tanjingme richardtoo skoomn car-coder-907 seeeeeyo red-eyed yanggui19891007 leonoyz yuhz288 hdh7485 l-net-1992 fol4highwaypilot otagg tryweirdier alizadehali jimmyyang20 liufqing rohitkeshari joskid agauto ojasonbernal fly-pluche dphphat andrechang tsukajiyusuke jeffreyhuparallel merdanbay jinu8 shoxa-mir jad-goh yaghdev gwkim-eintelligence licn526 geewoney aqsc rachana219 bennyustc yaoxingtian trand2k vantuyenatoma expectedargument mengfei25 vietanhdev jizhishutong wenuyang hcayirli97 developjoy shuvo001 pgdtgq hnbach bittuh eyalmichaeli qinhuaping seongsukim95 chuanxinlan sassypinar yutong-gannis zy0851 sunglyoungkim bert13069598 jinshubai vision-agh evdcush thaibabao2002 meetraj19 liu7055 shitoudidi noticeable

hybridnets's Issues

Guide for Custom Dataset Training

Hi,

Is there any guide/tutorial to train on a custom dataset? Especially fine tuning the pretrained weights towards a single downstream task (only object detection labels)?

Kind regards,
Talal

TypeError: 'NoneType' object is not callable

Excuse me, how can I solve this error？

How to change backbone model? How to use timm to create another backbone replacing efficientnet?

Training log?

Hello
I want to retrain your model.
Could u provide a training log file?
thanks very much

a problem in training?

您好，我在训练完一个epoch，做val的时候训练程序被杀死，没有任何其他报错，请问是什么原因啊？

Evaluate the model ?

hi, I have two problems and hope to help me out, thank you!

I have trained new model and want to evaluate the model. There are two functions val() and val_from_cmd(), in val.py. what's the difference between the two functions?
I have modified the number of categories to 10, and when I run the val.py, there is a error:ValueError: operands could not be broadcast together with shapes (12,) (4,)

light weight backbone

would you support some light weight backbones such as RepVGG which is GPU friendly?

Checkpoint File?

Hi, I'm trying to reproduce your results. Could you please provide the best checkpoint file? Thanks in advance!

hello，
In the ### Training stages
Are these three steps independent? My understanding is that the second step needs to continue training on the model of the first step, the third step continues to train on the model of the second step, and the second and third steps are specified by -w.
Right?

Thank you

Issue with output segmentation mask

When my seg_list is ['road', 'lane'], output segmentation mask class contains [0, 1, 2], What do they mean ?

add traffic sign, traffic light and pedestrian to detection

How to modify bdd100k.yaml so object detection includes traffic sign, traffic light and pedestrian to detection?

Thanks,

MultiLabel Classification

@datvuthanh @xoiga123 Thanks for sharing the code base, it is really helpful , but i had a few queries

Can we have multi-label classification also from the existing code? as I see two things one is that in the annotations there is attribute:{"trafficator": green} and in the loss.py there is a macro MULTILABEL mode is there
I am looking for a single bounding box with multilabel classification output so what are the modifications to be made in the existing code

Thanks in advance

How do you generate the lane segmentation mask png picture for the BDD100K?

./datasets/da_seg_annotations
./datasets/ll_seg_annotations

Out of memory error during call val.py?

In the training phase when it calls val.py to evaluate the model performance it will suspend for about half an hour and then display killed. Use the command:

dmesg | egrep -i -B100 "killed process"

It would report that the python process was killed because out of memory error occured.

How to solve this problem?

Lane Line mean Intersection Over Union

I've noticed that you have mentioned in the abstract that 31.6 is mean Intersection Over Union. I wonder if 31.6 is Intersection Over Union (without background) not mean Intersection Over Union

Did anyone successfully export onnx?

code as follows, but export nothing:

weight_path = 'weights/hybridnets.pth'
device = 'cuda' if torch.cuda.is_available() else 'cpu'
params = Params(os.path.join(Path(__file__).resolve().parent, "projects/bdd100k.yml"))
model = HybridNetsBackbone(num_classes=len(params.obj_list), compound_coef=3,
                           ratios=eval(params.anchors_ratios), scales=eval(params.anchors_scales),
                           seg_classes=len(params.seg_list), backbone_name=None)
model.load_state_dict(torch.load(weight_path, map_location=device))
model.eval()
inputs = torch.randn(1, 3, 384, 640)
print("begin to convert onnx")
torch.onnx.export(model, inputs, 'HybridNetsBackbone.onnx',
                  verbose=False, opset_version=12, input_names=['images'])
print("done")

shell log:

HybridNets/utils/utils.py:673: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  anchor_boxes = torch.from_numpy(anchor_boxes.astype(dtype)).to(image.device)
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.

...
ONNX export failed: Couldn't export Python operator SwishImplementation

How to replace Conv2dStaticSamePadding and MaxPool2dStaticSamePadding in order to be able to transfer out dlc on the Qualcomm platform

As shown above, using Qualcomm toolchain to export dlc file, there is error:

Encountered Error: ERROR_ASYMMETRIC_PADS_VALUES: Asymmetric pads values is not supported

I try EfficientNet-PyTorch to exprot dlc successfully.
I found the problem to arise Conv2dStaticSamePadding and MaxPool2dStaticSamePadding, the using F.pad to asymmetric padding.
So, How to replace them?

Docker support?

Thanks for sharing great work!

I've got Segmentation Fault when run demo image/video with pre-trained weight. Different version of dependencies(e.g: CUDA ..) may casue error. Would it be possible to provide Docker file to build container?

Evaluation results not accurate?

Hi,
I've been trying to recreate your results from the Hybridnets paper. I've run the eval code on the 100k dataset but the results I'm getting are nowhere close to the actual results you presented in the paper. I'm sure I'm missing something here, could you please tell me what could possibly be going wrong here.
I think the root locations that I'm giving are inaccurate.

data root - raw images from the 100k dataset
label root - I saved separate json file for each image from the "bdd100k_labels_images_val.json" that I downloaded from the bdd100k dataset. So in total I saved 10,000 separate json (one for each image) into another folder named "val".
Road labels in Seg_list - I used the drivable masks
Lane labels in Seg_list - I used the lane masks

The output that I got after evaluation of 100 images:

The iou values are inconsistent with the ones mentioned in the paper, they are nowhere near them. I was also wondering why the precision is so low, is there a specific reason as to why there are so many false positives. I hope you can help me out, thanks.

inference latency

the inference latency 37ms on V100 with FP16 from the paper.
does it test with tensorrt or just python inference ?

and how about the speed with preprocess and NMS postprocessing?

thanks very much!

The loss doesn't converge when training segmentation head only.

I changed the backbone to Efficientnet-b0 and reduce the number of BiFPN layers from 6 to 1, in order to cut down the runtime of inference. After training 200 epochs with segmentation head frozen, I tried to train the model freezing backbone and detection head. But I found that the train loss of segmentation head dose not seem to converge. And the loss of valuation and mIOU are reducing at the same time, which doesn't make sense.
Apart from that, I also found that when freezing segmentation head, the segmentation loss is not set to 0 in the code. which can affect the updating of weights in backbone, I suppose.

When I infer my images, error occurs. How can i solve it?

my image shape is (1028, 1232, 3)

When training, my image shape is (717, 1276, 3). Training and validating procession is ok.

ImportError: cannot import name 'display' from 'utils.utils'

Hello,

I'm encountering an issue where loss.py is trying to import a function named display from utils/utils.py but this function is undefined. I couldn't run train.py as a result of this.

Inferring pictures and videos

In model validation，Inferring pictures and videos，appear RuntimeError: unexpected EOF, expected 596029 more bytes. The file might be corrupted

To control the network configuration

Hello
How are you?
I want to train a model for ONLY object detection except segmentation.
Could u provide a way to control this in .yaml file?
Thanks.

Could not use Pytorch quantization for model

model_to_quantize = copy.deepcopy(model)

qconfig_dict = {"": torch.quantization.get_default_qconfig('qnnpack')}

model_to_quantize.eval()

# prepare
model_prepared = quantize_fx.prepare_fx(model_to_quantize, qconfig_dict)

# calibrate (not shown)
# quantize
model_quantized = quantize_fx.convert_fx(model_prepared)

When using Pytorch quantization example for your model, I get this following error:

~/Documents/DL_course_project/HybridNets/backbone.py in forward(self, inputs)
100
101 # p1, p2, p3, p4, p5 = self.backbone_net(inputs)
102 --> p2, p3, p4, p5 = self.encoder(inputs)[-4:] # self.backbone_net(inputs)
103
104 features = (p3, p4, p5)
NameError: module is not installed as a submodule

How can I avoid this error?

How to generate drivable area and lane masks?

I know you shared a drive link for bdd100k drivable area and lane masks for the dataloader but I want to replicate it for understanding what to do for my custom dataset. I looked for bdd repo and there is "to_mask.py" which has some scripts to do it. It passes its own tests but I cannot get similar results as you shared. Can you please explain how to generate those masks? Thanks in advance.

Reproducing training results

First of all, thanks for sharing your study. I am trying to reproduce your results by training the network with strategy you gave. For me to compare how am I doing, could you please share your loss plots for each phases (freeze seg, freeze backbone and det, train end-to-end)?

Drivable and Lane Type training

@datvuthanh thanks for sharing the code based i have following queries

since bdd100k has different types of lanes eg double yellow, single white lane , dashed, can we use current source code to train for different lane types? is so what are the modifications need to be made int he code based
Can we similar train the current source code with the driveable area and alternate drivable area labels ? if so what are the changes to be made

Please share your thoughts
Thanks in advance

Lane color and Lane type Segmentation

In the traffic light and traffic sign detection issue, you've mentioned that we just have to change the obj_list in project.yml by adding the classes needed. Does that apply to seg_list as well?

If I want to detect lane color and lane type, can I change seg_list as follows

seg_list: ['road', 'double white', 'double yellow', 'single white', 'single yellow', 'solid', 'dashed']

Actually, I need classes like double solid yellow, double solid white, single solid yellow, single solid white, single dashed yellow, single dashed white, double dashed yellow, double dashed white, but BDD100K already has labeled double white, double yellow, single white, single yellow classes under Lane Categories and solid, dashed classes under Lane Styles

Will the change in seg_list as shown above work, if not, how to do it

How should I prepare my segmentation mask label

if I have two classes, background and lane. My segmentation mask label contains 0 and 1 or 0 and 255?

[Discussion] Gradient flow

Back when we were toying with mosaic, we removed the segmentation head completely from the model and dataloader. Now that we try to add mosaic augmentation officially, we have to make a decision of not using it for segmentation training.

hybridnets/dataset.py

if self.use_mosaic:
    # honestly, mosaic is not for road and lane segmentation anyway
    # you cant expect road and lane to be split up in 4 separate corners in an image, do you?
    # only use mosaic with freeze_seg :)
    img, labels, seg_label, lane_label, (h0, w0), (h, w), path = self.load_mosaic(idx)

Only images and object annotations are mosaic, while segmentation annotations are kept intact, which produces incorrect segmentation loss but that doesn't matter because we froze segmentation head anyway, thinking that requires_grad=False makes the segmentation head disappear from backprop graph. But that is wrong, the backbone is still affected by segmentation loss.

Check this colab for interactive stuffs.

So we've been planning to just straight ahead set the losses to 0 when you --freeze_head like this:

cls_loss, reg_loss, seg_loss, regression, classification, anchors, segmentation = model(imgs, annot, seg_annot, obj_list=params.obj_list)
cls_loss = cls_loss.mean() if not opt.freeze_det else 0
reg_loss = reg_loss.mean() if not opt.freeze_det else 0
seg_loss = seg_loss.mean() if not opt.freeze_seg else 0

Is this approach too naive? Are there any recommendation regarding this matter? Or should we also mosaic the segmentation labels?

Only do lane line detection

What if I only need to do lane line detection?
Can I achieve faster inference?

How to change backbone model? How to use timm to create another backbone replacing efficientnet?

How to select number of gpus?

hi,
i want to train on multiple gpus with train_DDP.py,
but i do not know which param determine the number of gpus.
Looking forward to your reply ！ Thank you!

针对BDD100K数据如何生成可行驶区域mask和车道线mask数据图像？

Problem in Training stage

Hello,
I tried to follow your suggestion to train the model. So accordingly, at first I freeze the segmentation and trained for some epoch.

python train.py -p bdd100k -c 3 -n 4 -b 8 --freeze_seg True --lr 1e-5 --optim adamw --num_epochs 75 --val_interval 1 --log_path D:\HybridNets\rgb-clean --saved_path D:\HybridNets\rgb-clean --save_interval 500 --verbose True --num_gpus 1 --plots True

After that I am freezing the backbone and detection head.

python train.py -p bdd100k -c 3 -n 4 -b 8 --freeze_backbone True --freeze_det True --lr 1e-5 --optim adamw --num_epochs 12 --val_interval 1 --log_path D:\HybridNets\rgb_clean --saved_path D:\HybridNets\rgb_clean --save_interval 500 --verbose True --num_gpus 1 --plots True -w D:\HybridNets\rgb-clean\bdd100k\hybridnets-d3_74_129225_best.pth

But I am getting the error

Can you please suggest how to solve this issue?

Thank you in advance

how to recognition each road line

for example, left line and right line,not pixel

RuntimeError: unexpected EOF, expected 77466 more bytes. The file might be corrupted.

您好，在训练过程中遇到一个文件不能下载，请问是什么原因啊

How to change backbone model? How to use timm to create another backbone replacing efficientnet?

How to prepare the bdd100k dataset?

Hi.
According to the README.txt, I had prepared bdd100k dataset. But the BddDataset failed to load the dataset. I think something wrong with my bdd100k dataset preparation process. In fact I had no idea about where to put colormaps, masks, polygons, rles folder of drivable and lane. I don't know where to put the json file of detection labels.

Please give the detailed folder structure.

AssertionError BUG

if i just want to seg one class，such as seg_list only have ’road‘.
Then i run train.py ,
in loss.py line 538,
in soft_tversky_score assert output.size() == target.size()
AssertionError

then i debug code,find output.size() = torch.Size([2, 1, 245760]) target.size() = torch.Size([2, 1, 491520])

How to fix that???

Issue with FPS mistake in the article

Hello. First of all,thank you for this work.

I noted you mistake the code about the inf_time and fps.

So I think maybe you calculation the inference time incorrectly in the article , your article show that YOLOP have 52ms the infercence time per frame(batch size 1), which mean 20fps? (although 41 fps show in the YOLOP's article).

And sadly in the hybridnets_test.py , i try calculate the HYBRIDNET's inference time but only get 0.06s(only model(x) ) ,which means 17-20fps. （Tesla v100 ）but get 0.021s(only model(x) ) in YOLOP, which means 48 fps（Tesla v100 ）

Sadly , it may not faster than YOLOP and not reach the real-time.

Tensor size mismatch

@datvuthanh @xoiga123
Hi. I am receiving this error when I pass a .jpg input image of size (2160,4096,3) for testing. Can you please help me resolve this issue? Thank you!

Command I ran - python hybridnets_test.py -w weights/hybridnets.pth --source demo/image --output demo_result --imshow False --imwrite True

Traceback (most recent call last):
  File "hybridnets_test.py", line 121, in <module>
    features, regression, classification, anchors, seg = model(x)
  File "env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "folder/HybridNets/backbone.py", line 104, in forward
    features = self.bifpn(features)
  File "env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "env/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "folder/HybridNets/hybridnets/model.py", line 179, in forward
    outs = self._forward_fast_attention(inputs)
  File "folder/HybridNets/hybridnets/model.py", line 211, in _forward_fast_attention
    p5_up = self.conv5_up(self.swish(weight[0] * p5_in + weight[1] * self.p5_upsample(p6_up)))
RuntimeError: The size of tensor a (11) must match the size of tensor b (12) at non-singleton dimension 2

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

When I run on CPU this error appear. Can you help me fix it?

amp & channels_last

channels_last:

While PyTorch operators expect all tensors to be in Channels First (NCHW) dimension format, PyTorch operators support 3 output memory formats.
Contiguous: Tensor memory is in the same order as the tensor’s dimensions.
ChannelsLast: Irrespective of the dimension order, the 2d (image) tensor is laid out as an HWC or NHWC (N: batch, H: height, W: width, C: channels) tensor in memory. The dimensions could be permuted in any order.

amp:

http://www.idris.fr/eng/ia/mixed-precision-eng.html
https://pytorch.org/docs/stable/amp.html
Had to remove last activation from segmentation head because of https://pytorch.org/docs/stable/amp.html#prefer-binary-cross-entropy-with-logits-over-binary-cross-entropy

extra:

http://blog.ezyang.com/2019/05/pytorch-internals/
https://discuss.huggingface.co/t/why-is-grad-norm-clipping-done-during-training-by-default/1866
NVIDIA/apex#221 (comment) Tensor loves the number 8

prefetch:

Going to train from scratch to see what's good, with a working log this time.
UPDATE 12/07/2022: Seems like the bottleneck is in dataloading, which takes an unholy amount of time even though I cached everything in RAM. Currently profiling CPU & GPU and trying out this dataloader which allegedly actually does prefetch.
UPDATE: It all makes sense now, Pytorch's Dataloader can only prefetch batches in the current running epoch. For the next epoch, there is apparently no prefetch whatsoever.

IndexError: boolean index did not match indexed array along dimension 0

Hello , when i put a image of size 1920*1080 for test,there is the following error.
Can you please help me resolve this issue? Thank you!
"IndexError: boolean index did not match indexed array along dimension 0; dimension is 1080 but corresponding boolean dimension is 720"

Multi-class vs multi-label segmentation

#15 #38
We were using multi-label dataloader, loss and metrics for a multi-class problem. Basically they work fine and the results are correct (maybe focal loss segment is a little bit off, who knows, will check further) but to someone reading the code, the semantic meaning is wrong.

~~TODO: Generalize to multi-class as default, with a switch to multi-label.~~
TODO in another issue: Multi-label for object detection.

Issue with FPS calculation code.

Hello. First of all, great work.

While running the hybridnets_test_videos.py I found some issues with the FPS calculation part.
In the hybridnets_test_videos.py script, the FPS is calculated as:
(t2-t1)/frame_count)

But it seems that the above code will give the inference time per frame and not the FPS. Most probably it should be:
1/((t2-t1)/frame_count))
That is, we need to divide it by 1.

Please let me know if any updates happen on this front.

Possibility of training on COCO 2017 for object detection module

Has authors of this paper trained on COCO? If so, are there any pre-trained weights available?

If not, any suggestion on parameters that needs to be tuned to train on the COCO dataset?

datvuthanh / hybridnets Goto Github PK

hybridnets's Introduction

loss.backward()

hybridnets's People

Contributors

Stargazers

Watchers

Forkers

hybridnets's Issues

Recommend Projects

Recommend Topics

Recommend Org