Comments (6)
Please check your PyTorch version. Your PyTorch version may be 2.0, so it passes local-rank
to the launching script by default, instead of our local_rank
. You can use PyTorch 1.12 or 1.13.
Update by feedback from @SDivakarBhat: PyTorch 1.12 is required for our local_rank
argument. When using PyTorch 1.13, you may need to change the local_rank
to local-rank
in the training script argument.
from unimatch.
Please check your PyTorch version. Your PyTorch version may be 2.0, so it passes
local-rank
to the launching script by default, instead of ourlocal_rank
. You can use PyTorch 1.12 or 1.13.
First of all thank you for providing access to such a great work!
I am facing the same issue even though my PyTorch version is 1.13. I have tried running it even using the version 1.12.
When I change the loss for labeled set (criterion_l) to CELoss the code starts running for a couple of epochs but then fails again with the same issue.
When the criterion_l is not calculated, the code runs and finishes smoothly. ( this was done for just debug purpose as this obviously then makes the results useless.)
Would be great if this can be resolved.
from unimatch.
@SDivakarBhat Hi, Thank you for your feedback. Could I know what you mean by "When I change the loss for labeled set (criterion_l) to CELoss the code starts running for a couple of epochs but then fails again with the same issue." because we already use the CELoss as labeled loss by default on Pascal. Do you mean changing the OHEM loss to CELoss on Cityscapes?
And can you provide more details about "the same issue"? Do you mean the local-rank error? It is very strange because this problem may only appear at the very start when launching the script.
from unimatch.
@SDivakarBhat Hi, Thank you for your feedback. Could I know what you mean by "When I change the loss for labeled set (criterion_l) to CELoss the code starts running for a couple of epochs but then fails again with the same issue." because we already use the CELoss as labeled loss by default on Pascal. Do you mean changing the OHEM loss to CELoss on Cityscapes?
And can you provide more details about "the same issue"? Do you mean the local-rank error? It is very strange because this problem may only appear at the very start when launching the script.
Hi thank you for your quick response.
Yes I am using cityscapes. I think my error is not due to the local rank problem it seems to be appearing randomly in between epochs. Below is the part of the error, it would be of great help if you can shed some light on the possible reason.
File "unimatch.py", line 267, in │
main() │
File "/opt/conda/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper │
return f(*args, **kwargs) │
File "unimatch.py", line 190, in main │
loss_u_s1 = loss_u_s1.sum() / (ignore_mask_cutmixed1 != 255).sum().item() │
RuntimeError: CUDA error: an illegal memory access was encountered
from unimatch.
In most cases, this is because your groundtruth masks contain some values that are larger than your model output dimensions (classes). I think your labeled masks are incorrect, since the script can be finished when the labeled loss is removed. You can use torch.unique or np.unique to check the GT masks. Btw, whether you use our provided Cityscapes masks or the official masks?
from unimatch.
In most cases, this is because your groundtruth masks contain some values that are larger than your model output dimensions (classes). I think your labeled masks are incorrect, since the script can be finished when the labeled loss is removed. You can use torch.unique or np.unique to check the GT masks. Btw, whether you use our provided Cityscapes masks or the official masks?
Thank you for the response. I am using the official masks. I think I have resolved the issue.
from unimatch.
Related Issues (20)
- 关于医学图像分割场景下的结果 HOT 4
- Image-level Perturbations in LEVIR HOT 1
- Pascal Voc 数据集的checkpoint HOT 19
- 关于特征空间扰动 HOT 4
- 关于第四页的Algorithm 1 HOT 2
- Question about the batch size
- Reproducability of PASCAL VOC HQ dataset results HOT 4
- 关于在Transformer系列模型使用 HOT 2
- Can't find Resnet 101
- How much memory its necessary? HOT 1
- 轻量级backbone及训练问题 HOT 1
- Question about crop size HOT 2
- 关于sliding_window HOT 1
- 复现训练代码
- 256×256的分辨率需要多少的GPU运行内存 HOT 1
- 评估和可视化代码 HOT 2
- color_map function in the utils.py file HOT 1
- Minor Typo HOT 1
- 关于自己数据集的问题 HOT 2
- 关于您该项目的实验 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from unimatch.