It seems that this version of Caffe encounters severe memory leak, the memory can be u

The version I am using is on 08/19/2015, commit id: <a class="commit-link" data-hoverc

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

memory leak about caffe-windows HOT 19 CLOSED

happynear commented on July 21, 2024

memory leak

from caffe-windows.

Comments (19)

happynear commented on July 21, 2024

I haven't observed this problem. If so, it is really a severe problem.
I am out for a project now. I could only use my laptop to check it. Can you provide more information?

from caffe-windows.

taoari commented on July 21, 2024

The version I am using is on 08/19/2015, commit id: d43aefc, with your latest 3rdparty-cudnnv3 library, and the newest lmdb.lib overwritten.

First convert_imageset, an error like 'initialized twice' occurs, this is resolved by using the BVLC/caffe version. i.e.:

 gflags::ParseCommandLineFlags(&argc, &argv, true);
 // ::google::InitGoogleLogging(argv[0]);

Then, using convert_imageset, I created two lmdb databases for training and testing (around 2 million images). After 36 hours, the memory achieves >100GB.

from caffe-windows.

happynear commented on July 21, 2024

It seems that the problem is caused by lmdb. lmdb is recently modified to work in Windows. However, it has not been tested strictly. I will contact with the author of lmdb. Before the problem solved, I suggest to use leveldb first.

from caffe-windows.

taoari commented on July 21, 2024

OK, thanks.

from caffe-windows.

linng85 commented on July 21, 2024

Hi, i having problem for using convert_imageset as well, however my data just few if compare to @taoari . i getting the error below:
Log file created at: 2015/08/28 09:38:19
Running on machine: NGLL-PC
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F0828 09:38:19.677536 6132 utilities.cc:317] Check failed: !IsGoogleLoggingInitialized() You called InitGoogleLogging() twice!

Is @taoari solve the problem? Thanks.

from caffe-windows.

happynear commented on July 21, 2024

@linng85
I have fixed it. Just comment the first line in main().

from caffe-windows.

taoari commented on July 21, 2024

Are you sure this is a bug in lmdb? To me, the leveldb encounters the same problem.

from caffe-windows.

happynear commented on July 21, 2024

@taoari
I am not sure. But I have trained lots of models and never found memory leak. The only thing I modified recently is lmdb. So I guess it is an lmdb error. After you reported this issue, I trained a new model again and the memory is stable after several hours. Now I do not know what problem you have came across.

It is a difficult work to check memory leak problem and I do not have enought time to do it. So I suggest you to try other repositories, such as https://github.com/willyd/caffe-builder.

from caffe-windows.

taoari commented on July 21, 2024

@happynear
Finally I have confirmed that the memory leak problem is caused by the LMDB library. (Sorry for the previous claim on LEVELDB, as I heuristically judged from the task manager, in fact, LEVELDB does not have the problem, the drawback is that LEVELDB can only be accessed by one instance of caffe.)

The memory leak can not be observed in the task manager as it only shows the Private memory used. And it can only be observed in the Resource Monitor "Shareable (KB)" column.

The memory is caused in db_lmdb.hpp class LMDBCursor::Seek():: mdb_cursor_get() method. If one force op = MDB_FIRST, there will be no memory leak. So this is a bug of the LMDB library when mdb_cursor_get() is passed with op=MDB_NEXT. Hope that this will be soon resolved.

Here is a Python script to show the memory leak problem:

import os
print 'PID: %d' % os.getpid()

import lmdb
lmdb_name = 'ilsvrc12_train_lmdb'
env = lmdb.open(lmdb_name, readonly=True)
print 'Entries: %d' % env.stat()['entries']

with env.begin() as txn:
    cursor = txn.cursor()
    for i,(k,v) in enumerate(cursor):
        pass

The Shareable memory in Resource Monitor will soon reach to serveral GBs.

from caffe-windows.

happynear commented on July 21, 2024

Thanks a lot for debuging.
Maybe it is caused by the sparse ntfs file? Look at the discussion in this issue:
BVLC/caffe#2816

I have modified the codes in https://github.com/happynear/lmdb/tree/cmake-ntfs-sparse , you may compile a lmdb.lib by yourself and see if the memory leak problem still exists.

from caffe-windows.

happynear commented on July 21, 2024

@dw , @LitingLin, @woozzu ,
Could you help us solve this problem?

from caffe-windows.

dw commented on July 21, 2024

Hi there,

This is not a memory leak, it is the expected behaviour of LMDB.

Quoting TechNet:

Working Set is the term that defines the amount of memory currently in use for a process. Private Working Set is the amount of memory that is dedicated to that process and will not be given up for other programs to use; Shareable Working Set can be surrendered if physical RAM begins to run scarce. Peak Working Set is the highest value recorded for the current instance of this process.

In other words, Windows will steal this memory back from LMDB as is necessary to handle other allocations. So for example, if you run a second application that allocates 64GB of heap, the LMDB-using process(es) shareable figure will shrink accordingly.

from caffe-windows.

happynear commented on July 21, 2024

@dw
Thanks for your professional explanation. We can use lmdb freely now.

from caffe-windows.

taoari commented on July 21, 2024

@dw @happynear

But this will lead the memory to be up to more than 200GB, when I am training on ImageNet dataset with CaffeNet, which will make the server irresponsible. Is this a expected behavior? Or I have a misunderstanding?

Best,

from caffe-windows.

woozzu commented on July 21, 2024

@happynear @taoari You can refer to BVLC/caffe#1377. @dw is right, this is not a memory leak. The memory increasing behavior is because OS maps every memory mapped data from file to physical RAM. And the map size is larger than physical RAM especially for ImageNet. But, in caffe read from db is sequential. So, we can force to release the used memory from physical RAM.

from caffe-windows.

happynear commented on July 21, 2024

@woozzu
I noticed that you solved this problem by add some codes after seek,

if (op != MDB_FIRST)
    VirtualUnlock(mdb_value_.mv_data, mdb_value_.mv_size);

So where should I add these codes accurately? I haven't found them in your repository https://github.com/woozzu/py-lmdb .

from caffe-windows.

woozzu commented on July 21, 2024

@happynear Actually, I did not modify LMDB code. It should be added to caffe code. Please refer to woozzu/caffe@4c9bbc2

from caffe-windows.

taoari commented on July 21, 2024

@woozzu Nice! This solves the problem.

from caffe-windows.

happynear commented on July 21, 2024

I have updated my caffe as @woozzu suggested.
Thanks, everyone.

from caffe-windows.

memory leak about caffe-windows HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent