<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Reduce memory footprint of TemplateDictionary sets about ctemplate HOT 13 CLOSED

olafvdspek commented on July 22, 2024

Reduce memory footprint of TemplateDictionary sets

from ctemplate.

Comments (13)

GoogleCodeExporter commented on July 22, 2024

Source code to demonstrate memory usage, edited output of varying sizes for
TemplateDictionary and HDF.

Original comment by [email protected] on 25 Jul 2008 at 5:40

Attachments:

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

You posted the ctemplate_dict binary instead of the source file.  Can you post
ctemplate_dict.cc too?

I've made a change to allocate the dicts lazily.  I'm hoping that will help for 
a
test case like this.  I'll report the data once I can reproduce your test setup.

Original comment by [email protected] on 29 Jul 2008 at 4:57

Changed state: Started

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

Ah, excuse me.  You should find it attached.

I'll 'svn up' and see how much the memory footprint has been reduced.

Original comment by [email protected] on 29 Jul 2008 at 5:23

Attachments:

ctemplate_dict.cc

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

I don't see any commits regarding lazy instantiation to the project's 
repository.  

Perhaps they need to be committed to the (public) project repository (if you'd
forgotten?)

The test case would be a tricky one, as I'm using 'pmap', which may or may not 
be
available.  If you have any suggestions, I'm open to it.  At the very least it 
could
be an optional test.

Original comment by [email protected] on 29 Jul 2008 at 6:16

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

OK, I looked at the new code, and found that 10k nodes take up about 2.4M of 
memory.
 So a bit better, but still quite some work to be done.

To get this data, I tried a program similar to yours, that does the following:
---
  HeapProfilerStart("/var/tmp/dictsize");
  TemplateDictionary* dict = new TemplateDictionary("LIST");
  for (int i = 0; i < 10000; i++) {
    TemplateDictionary* subdict = dict->AddSectionDictionary("ELEMENT");
    subdict->SetValue("KEY", "VAL");
  }
  dict->Dump();
  HeapProfilerDump("Dumping");
---

This uses the heap-profiler functionality from google-perftools, so the numbers
aren't directly comparable to your tests, which use /proc/self/maps. But they 
should
be similar.

How close is this benchmark to the actual usage pattern you see?  Sizing the
hashtables that store variable values is a tricky business, and it's definitely 
not
currently optimized for one template-variable per dictionary, like the benchmark
does.  So it's not a huge surprise to see somewhat outsize numbers, though I 
think we
can do better than what we have now.

The heap-profiler has shown where the memory use is going, so I can try some 
more
tricks to get the size down.  I'm not sure how small we'll be able to get it
eventually, though; the code was written to optimize speed over space use.  But 
I
think we can definitely still do better than 2.4M.

Original comment by [email protected] on 29 Jul 2008 at 7:03

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

Oops, sorry, I realized my data was completely wrong: I had set up the test 
wrong.  I
have to go now, but I hope I'll be able to get real numbers up tomorrow.

Original comment by [email protected] on 29 Jul 2008 at 7:10

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

You had my hopes up for about 5 seconds, as 2.4MB for 10k nodes beats out HDF.  
:)

Original comment by [email protected] on 29 Jul 2008 at 4:28

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

Hmm, with my new test I'm getting 3.3M, which is more than I had expected.  I 
had
thought the numbers would go down with the fixes to my test, but instead 
they've gone
up.  About 2M of the 3.3M is in vector::reserve, so it looks like the vector 
class is
reserving more memory than is useful (no surprise since each vector holds only 
one
element, in this test).

I have some ideas to bring down the memory use a little bit, but the big win is 
to
not reserve so much memory in the vectors.  That helps this benchmark a lot, 
but I
don't know how much it helps in real life.  Maybe we can make it tunable.  I'll 
play
around with this a bit more.

Original comment by [email protected] on 30 Jul 2008 at 1:55

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

Ah, I figured out the problem: the standard gnu STL hash_map implementation has 
a
crazy-large minimum bucket size: 53.  ctemplate asks for a hash_map with only 3
buckets in it, but the hash_map implementation rounds up to the smallest value 
it
allows, which is 53.  So you've got a 53-element vector even when you have a 
hash-map
with only one item in it.

At google, we've munged this header to allow for a smaller minimum bucket size 
(I
forget what, but something like 7).  So in my earlier tests, where I was 
accidentally
using the google version of hash_map, I was getting quite small sizes.  Now 
that I'm
back to using the standard header, the size is going up again.

This is a problem with the gcc stl, as far as I'm concerned.  I don't know if 
that's
the STL you're using; if not, you won't see the kind of 3M sizes that I 
reported.  If
you are, you may want to consider making a similar change.  To do so, look for
something like /usr/include/c++/4.0/ext/hashtable.h, and there's a line:

  static const unsigned long __stl_prime_list[_S_num_primes] =
    {
      53ul,         97ul,         193ul,       389ul,       769ul,

Add a new element before 53ul, something like 5ul or 7ul.  You shouldn't need 
to make
any other changes; once you recompile ctemplate with this, you should just see a
significant reduction in memory use, at least for your benchmark.  In fact, you
should see such a reduction even without my change, which further reduces 
memory use.

Original comment by [email protected] on 30 Jul 2008 at 2:23

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

One nit: in addition to adding 5ul (or 7ul), you'll also need to increment
_S_num_primes by 1.  That should be in the line right above the 
__stl_prime_list line.

Original comment by [email protected] on 30 Jul 2008 at 2:27

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

Thanks, I'll give that a go.  Were you able to reproduce the numbers that I was
seeing using your perftools implementation?

I'm a little surprised (pleasantly), as you had mentioned that you might be 
able to
reduce the memory footprint by 50%, but 3.3M is a factor of ten.  It would make 
sense
if the majority of my memory footprint was the ~50 extra hashtable buckets.

Thanks again for the research, I'll see how much that frees up (without 
ctemplate
code changes.)

Original comment by [email protected] on 30 Jul 2008 at 4:53

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

This should be much smaller in ctemplate 0.91.  (Also, note the hashtable 
changes I
marked above).  I'm going to close this bug, but feel free to reopen if you feel
there's more than can be done.

Original comment by [email protected] on 21 Aug 2008 at 12:56

Changed state: Fixed

from ctemplate.

GoogleCodeExporter commented on July 22, 2024

Excellent, I'll give it a go.

Original comment by [email protected] on 21 Aug 2008 at 4:12

from ctemplate.

Reduce memory footprint of TemplateDictionary sets about ctemplate HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent