Comments (19)
I haven't observed this problem. If so, it is really a severe problem.
I am out for a project now. I could only use my laptop to check it. Can you provide more information?
from caffe-windows.
The version I am using is on 08/19/2015, commit id: d43aefc, with your latest 3rdparty-cudnnv3 library, and the newest lmdb.lib overwritten.
First convert_imageset
, an error like 'initialized twice' occurs, this is resolved by using the BVLC/caffe version. i.e.:
gflags::ParseCommandLineFlags(&argc, &argv, true);
// ::google::InitGoogleLogging(argv[0]);
Then, using convert_imageset, I created two lmdb databases for training and testing (around 2 million images). After 36 hours, the memory achieves >100GB.
from caffe-windows.
It seems that the problem is caused by lmdb. lmdb is recently modified to work in Windows. However, it has not been tested strictly. I will contact with the author of lmdb. Before the problem solved, I suggest to use leveldb first.
from caffe-windows.
OK, thanks.
from caffe-windows.
Hi, i having problem for using convert_imageset as well, however my data just few if compare to @taoari . i getting the error below:
Log file created at: 2015/08/28 09:38:19
Running on machine: NGLL-PC
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F0828 09:38:19.677536 6132 utilities.cc:317] Check failed: !IsGoogleLoggingInitialized() You called InitGoogleLogging() twice!
Is @taoari solve the problem? Thanks.
from caffe-windows.
@linng85
I have fixed it. Just comment the first line in main().
from caffe-windows.
Are you sure this is a bug in lmdb? To me, the leveldb encounters the same problem.
from caffe-windows.
@taoari
I am not sure. But I have trained lots of models and never found memory leak. The only thing I modified recently is lmdb. So I guess it is an lmdb error. After you reported this issue, I trained a new model again and the memory is stable after several hours. Now I do not know what problem you have came across.
It is a difficult work to check memory leak problem and I do not have enought time to do it. So I suggest you to try other repositories, such as https://github.com/willyd/caffe-builder.
from caffe-windows.
@happynear
Finally I have confirmed that the memory leak problem is caused by the LMDB library. (Sorry for the previous claim on LEVELDB, as I heuristically judged from the task manager, in fact, LEVELDB does not have the problem, the drawback is that LEVELDB can only be accessed by one instance of caffe.)
The memory leak can not be observed in the task manager as it only shows the Private memory used. And it can only be observed in the Resource Monitor "Shareable (KB)" column.
The memory is caused in db_lmdb.hpp class LMDBCursor::Seek():: mdb_cursor_get() method. If one force op = MDB_FIRST, there will be no memory leak. So this is a bug of the LMDB library when mdb_cursor_get() is passed with op=MDB_NEXT. Hope that this will be soon resolved.
Here is a Python script to show the memory leak problem:
import os
print 'PID: %d' % os.getpid()
import lmdb
lmdb_name = 'ilsvrc12_train_lmdb'
env = lmdb.open(lmdb_name, readonly=True)
print 'Entries: %d' % env.stat()['entries']
with env.begin() as txn:
cursor = txn.cursor()
for i,(k,v) in enumerate(cursor):
pass
The Shareable memory in Resource Monitor will soon reach to serveral GBs.
from caffe-windows.
Thanks a lot for debuging.
Maybe it is caused by the sparse ntfs file? Look at the discussion in this issue:
BVLC/caffe#2816
I have modified the codes in https://github.com/happynear/lmdb/tree/cmake-ntfs-sparse , you may compile a lmdb.lib by yourself and see if the memory leak problem still exists.
from caffe-windows.
@dw , @LitingLin, @woozzu ,
Could you help us solve this problem?
from caffe-windows.
Hi there,
This is not a memory leak, it is the expected behaviour of LMDB.
Quoting TechNet:
Working Set is the term that defines the amount of memory currently in use for a process. Private Working Set is the amount of memory that is dedicated to that process and will not be given up for other programs to use; Shareable Working Set can be surrendered if physical RAM begins to run scarce. Peak Working Set is the highest value recorded for the current instance of this process.
In other words, Windows will steal this memory back from LMDB as is necessary to handle other allocations. So for example, if you run a second application that allocates 64GB of heap, the LMDB-using process(es) shareable figure will shrink accordingly.
from caffe-windows.
@dw
Thanks for your professional explanation. We can use lmdb freely now.
from caffe-windows.
But this will lead the memory to be up to more than 200GB, when I am training on ImageNet dataset with CaffeNet, which will make the server irresponsible. Is this a expected behavior? Or I have a misunderstanding?
Best,
from caffe-windows.
@happynear @taoari You can refer to BVLC/caffe#1377. @dw is right, this is not a memory leak. The memory increasing behavior is because OS maps every memory mapped data from file to physical RAM. And the map size is larger than physical RAM especially for ImageNet. But, in caffe read from db is sequential. So, we can force to release the used memory from physical RAM.
from caffe-windows.
@woozzu
I noticed that you solved this problem by add some codes after seek,
if (op != MDB_FIRST)
VirtualUnlock(mdb_value_.mv_data, mdb_value_.mv_size);
So where should I add these codes accurately? I haven't found them in your repository https://github.com/woozzu/py-lmdb .
from caffe-windows.
@happynear Actually, I did not modify LMDB code. It should be added to caffe code. Please refer to woozzu/caffe@4c9bbc2
from caffe-windows.
@woozzu Nice! This solves the problem.
from caffe-windows.
I have updated my caffe as @woozzu suggested.
Thanks, everyone.
from caffe-windows.
Related Issues (20)
- how to use Contrastive/Triplet loss layer? Is there any train_val.ptototxt example?
- Accessing .\windows
- building caffe issue fatal error LNK1104 cannot open file python36.lib HOT 6
- can not compile in the windows
- python3.5报错LINK : fatal error LNK1104: 无法打开文件“python36.lib” HOT 2
- Can not find the prototxt including the LabelSpecificAdd
- fatal error: caffe/proto/caffe.pb.h: No such file or directory HOT 1
- softmax_loss = 0 ,under ubuntu16.04,cuda10,opencv 3.4.5
- caffe_pb2.py has no LabelMap() HOT 1
- win10+cuda10+cudnn7.3编译出现MSB271错误? HOT 3
- 编译出现:InitLogLevelPipe错误
- python3.5
- Compile on ubuntu
- mnist demo failed
- compile error on win10 + VS2017 + cuda9
- after deleting a Net instance, the GPU memory is not released
- 求助大佬,win7 64位,vs2015+cuda10.2 环境下编译需要如何修改啊?
- 0 projects, projects or modules unavailable
- cuda11.1+win10+vs2015+rtx3090 NetParameter problem
- libcaffe编译成功,pycaffe 编译失败
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caffe-windows.