Comments (5)
Judging by your description, that sounds like kernel-mode memory leak, since keeping allocated memory after process ending would be not safe and would open the vector to perform DoS attack.
It could be, for instance, a faulty driver allocating excess memory and not freeing it up. Try using
sudo slabtop
It should give info on kernel-mode memory usage. It should not normally take up more than several hundreds of MBs (my machine keeps 200). If it takes a GB or more, there must be something bad going in kernel-space (nvidia drivers leak kernel memory?).
from torch-rnn.
Interesting. The total space used under slabtop isn't much (~150MB) but running the sample.lua repeatedly caused hundreds of thousands of kmalloc-32 objects to be allocated and not destroyed (right now, my dev box has 1.47M kmalloc-32 objects).
from torch-rnn.
Further testing. I can cause this memory leak to occur using either char-rnn or torch-rnn. Using keras (Theano), I don't get the same issue with similar networks.
I also note that the memory leak is consistent - 110 objects (ish - sometimes 111, sometimes 109) of 128 bytes each, regardless of network size.
Weird. I'm not sure how to proceed from here.
from torch-rnn.
This is a cutorch error most likely. In sample.lua, if i halt execution error() after "require 'cutorch'", I can cause the memory leak. Before "require 'cutorch'", no leak.
I'll close this issue. It isn't torch-rnn :)
from torch-rnn.
I put this over in cutorch, FYI: torch/cutorch#379
from torch-rnn.
Related Issues (20)
- Arch Linux multilib install error HOT 1
- Error training HOT 1
- HDF5 No accessibility and not valid? HOT 2
- Make the model "forget" or modify vocabulary it has been trained on? HOT 2
- lua:56: expected align(#) on line 579
- Deterministic output of sample.lua with -start_test flag HOT 6
- Convert to classification
- Cannot serialise number: must not be NaN or Infinity HOT 1
- sample.lua doesn't support bytes output
- Should I call model:clearState() before/after saving/loading a RNN model?
- ./util/utils.lua:43: attempt to index local 'f' (a nil value) HOT 1
- Loss increases gradually HOT 1
- Implementation of Tensorboard?
- Use of the generated json during checkpoint?
- how to set learn rate?
- how to optimize
- Hi. Is the code available for python based libraries? Such as Keras, and PyTorch HOT 3
- Sampling output file has Chinese characters in it
- Learning this as I go, can't sort out this error (init.lua, unable to find HDF5 lib) HOT 3
- Training error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torch-rnn.