Comments (4)
I'm observing similar behaviour as well. I can train only with a batch size of 1. The GPU memory isn't fully utilized either. I'm training on a GTX 1080; vram is 8gb.
from light-weight-refinenet.
can't help with this one, would suggest to make sure that no other GPU processes are being run alongside.
I think with batch size of 1 1080 should be enough, for reference I am using 1080Ti with the batch size of 6
from light-weight-refinenet.
I realize you can't help, but I am also getting this error. I am using Nvidia Quadro P4000 with 8 GB vram.
The Task Manager shows very low GPU memory usage until the program prints:
Train epoch: 0 [0/132] Avg. Loss: 3.711 Avg. Time: 2.425
Then the GPU memory usage jumps in under a second to over 90% and throws the error:
File "C:\Users\rfairhur\Documents\Jupyter Notebooks\light-weight-refinenet-master\src\train.py", line 425, in
main()
File "C:\Users\rfairhur\Documents\Jupyter Notebooks\light-weight-refinenet-master\src\train.py", line 409, in main
args.freeze_bn[task_idx])
File "C:\Users\rfairhur\Documents\Jupyter Notebooks\light-weight-refinenet-master\src\train.py", line 280, in train_segmenter
loss.backward()
File "C:\Users\rfairhur\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\torch\tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "C:\Users\rfairhur\AppData\Local\Programs\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\torch\autograd_init_.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 230.00 MiB (GPU 0; 8.00 GiB total capacity; 5.81 GiB already allocated; 159.27 MiB free; 333.44 MiB cached)
I believe my batch size is set to 1. Anyway, I will search Google about this error to see if there is anything I can try.
from light-weight-refinenet.
Apparently I was wrong about my batch size setting. It must have been set to 6 or higher, because when I made sure it was set to 5 or less the training ran successfully, but it failed if I set the batch size to 6.
from light-weight-refinenet.
Related Issues (20)
- Having a hard time reproducing the results for NYU dataset HOT 4
- train mbv2 model HOT 2
- Error when importing miou_utils HOT 2
- LOSS meaniou no change HOT 3
- How does a single GPU run train?
- The following error occurs when changing a category to your own category HOT 2
- How to infer with my own trained model? HOT 3
- class dictionary HOT 4
- Cityscapes's Model
- RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED HOT 2
- InvertedResidualBlock adding extra ConvBNReLU than the vanilla implementation HOT 1
- How to calculate FPS? HOT 4
- how to get the FLOPs? HOT 2
- Add CPU-only in serialization.py
- How to use ResNet-18 as backbone? HOT 2
- Visualizing the training process
- ./src/config HOT 2
- No datasets file
- Broken pipe HOT 5
- Some questions about transposing the data and results HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from light-weight-refinenet.