Giter Site home page Giter Site logo

memory leak about caffe-windows HOT 19 CLOSED

happynear avatar happynear commented on July 21, 2024
memory leak

from caffe-windows.

Comments (19)

happynear avatar happynear commented on July 21, 2024

I haven't observed this problem. If so, it is really a severe problem.
I am out for a project now. I could only use my laptop to check it. Can you provide more information?

from caffe-windows.

taoari avatar taoari commented on July 21, 2024

The version I am using is on 08/19/2015, commit id: d43aefc, with your latest 3rdparty-cudnnv3 library, and the newest lmdb.lib overwritten.

First convert_imageset, an error like 'initialized twice' occurs, this is resolved by using the BVLC/caffe version. i.e.:

 gflags::ParseCommandLineFlags(&argc, &argv, true);
 // ::google::InitGoogleLogging(argv[0]);

Then, using convert_imageset, I created two lmdb databases for training and testing (around 2 million images). After 36 hours, the memory achieves >100GB.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

It seems that the problem is caused by lmdb. lmdb is recently modified to work in Windows. However, it has not been tested strictly. I will contact with the author of lmdb. Before the problem solved, I suggest to use leveldb first.

from caffe-windows.

taoari avatar taoari commented on July 21, 2024

OK, thanks.

from caffe-windows.

linng85 avatar linng85 commented on July 21, 2024

Hi, i having problem for using convert_imageset as well, however my data just few if compare to @taoari . i getting the error below:
Log file created at: 2015/08/28 09:38:19
Running on machine: NGLL-PC
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
F0828 09:38:19.677536 6132 utilities.cc:317] Check failed: !IsGoogleLoggingInitialized() You called InitGoogleLogging() twice!

Is @taoari solve the problem? Thanks.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

@linng85
I have fixed it. Just comment the first line in main().

from caffe-windows.

taoari avatar taoari commented on July 21, 2024

Are you sure this is a bug in lmdb? To me, the leveldb encounters the same problem.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

@taoari
I am not sure. But I have trained lots of models and never found memory leak. The only thing I modified recently is lmdb. So I guess it is an lmdb error. After you reported this issue, I trained a new model again and the memory is stable after several hours. Now I do not know what problem you have came across.

It is a difficult work to check memory leak problem and I do not have enought time to do it. So I suggest you to try other repositories, such as https://github.com/willyd/caffe-builder.

from caffe-windows.

taoari avatar taoari commented on July 21, 2024

@happynear
Finally I have confirmed that the memory leak problem is caused by the LMDB library. (Sorry for the previous claim on LEVELDB, as I heuristically judged from the task manager, in fact, LEVELDB does not have the problem, the drawback is that LEVELDB can only be accessed by one instance of caffe.)

The memory leak can not be observed in the task manager as it only shows the Private memory used. And it can only be observed in the Resource Monitor "Shareable (KB)" column.

The memory is caused in db_lmdb.hpp class LMDBCursor::Seek():: mdb_cursor_get() method. If one force op = MDB_FIRST, there will be no memory leak. So this is a bug of the LMDB library when mdb_cursor_get() is passed with op=MDB_NEXT. Hope that this will be soon resolved.

Here is a Python script to show the memory leak problem:

import os
print 'PID: %d' % os.getpid()

import lmdb
lmdb_name = 'ilsvrc12_train_lmdb'
env = lmdb.open(lmdb_name, readonly=True)
print 'Entries: %d' % env.stat()['entries']

with env.begin() as txn:
    cursor = txn.cursor()
    for i,(k,v) in enumerate(cursor):
        pass

The Shareable memory in Resource Monitor will soon reach to serveral GBs.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

Thanks a lot for debuging.
Maybe it is caused by the sparse ntfs file? Look at the discussion in this issue:
BVLC/caffe#2816

I have modified the codes in https://github.com/happynear/lmdb/tree/cmake-ntfs-sparse , you may compile a lmdb.lib by yourself and see if the memory leak problem still exists.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

@dw , @LitingLin, @woozzu ,
Could you help us solve this problem?

from caffe-windows.

dw avatar dw commented on July 21, 2024

Hi there,

This is not a memory leak, it is the expected behaviour of LMDB.

Quoting TechNet:

Working Set is the term that defines the amount of memory currently in use for a process. Private Working Set is the amount of memory that is dedicated to that process and will not be given up for other programs to use; Shareable Working Set can be surrendered if physical RAM begins to run scarce. Peak Working Set is the highest value recorded for the current instance of this process.

In other words, Windows will steal this memory back from LMDB as is necessary to handle other allocations. So for example, if you run a second application that allocates 64GB of heap, the LMDB-using process(es) shareable figure will shrink accordingly.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

@dw
Thanks for your professional explanation. We can use lmdb freely now.

from caffe-windows.

taoari avatar taoari commented on July 21, 2024

@dw @happynear

But this will lead the memory to be up to more than 200GB, when I am training on ImageNet dataset with CaffeNet, which will make the server irresponsible. Is this a expected behavior? Or I have a misunderstanding?

Best,

from caffe-windows.

woozzu avatar woozzu commented on July 21, 2024

@happynear @taoari You can refer to BVLC/caffe#1377. @dw is right, this is not a memory leak. The memory increasing behavior is because OS maps every memory mapped data from file to physical RAM. And the map size is larger than physical RAM especially for ImageNet. But, in caffe read from db is sequential. So, we can force to release the used memory from physical RAM.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

@woozzu
I noticed that you solved this problem by add some codes after seek,

if (op != MDB_FIRST)
    VirtualUnlock(mdb_value_.mv_data, mdb_value_.mv_size);

So where should I add these codes accurately? I haven't found them in your repository https://github.com/woozzu/py-lmdb .

from caffe-windows.

woozzu avatar woozzu commented on July 21, 2024

@happynear Actually, I did not modify LMDB code. It should be added to caffe code. Please refer to woozzu/caffe@4c9bbc2

from caffe-windows.

taoari avatar taoari commented on July 21, 2024

@woozzu Nice! This solves the problem.

from caffe-windows.

happynear avatar happynear commented on July 21, 2024

I have updated my caffe as @woozzu suggested.
Thanks, everyone.

from caffe-windows.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.