Giter Site home page Giter Site logo

lightcollections's Introduction

What is LightCollections used for?

The .Net Framework provides powerful generic collections (List, Dictionary<TKey, TValue>) but their memory management can become really bad when dealing with high volumes of data.

These implementations are internally using arrays whose size is multiplied by two (or the first primary value after multiplying the existing size by two for dictionaries) when the size turns out to be insufficient.

This leads to two issues:

  • Many arrays are created-and-replaced. The Garbage Collector has many instances to collect as this creates large memory traffic,
  • Worst thing is that when the collection starts being large (> 85k, or even before if stored elements are structures in many cases) the instances are stored in the Large Object Heap. This causes the memory to become more fragmented.

This package is an attempt to avoid creating arrays that will be droped when the size of the collection become insufficient and make sure that nothing will be stored in the Large Object Heap.

The package contains mainly two types:

  • JaggedDictionary: used to store data on multiple dimensions,
  • LightDictionary: used to store large data without impacting the LoH.

Zoom on LightDictionary

How?

To avoid droping arrays when they become too small, the idea is to add additional arrays to store the additional values. The size of these arrays is computed in such a way that they won't be stored in the Large Object Heap.

Here is how the standard dictionary behaves: starts by creating small internal bucket & value arrays. If more values are required, new bigger arrays are created and are replacing older ones. Here is a typical growth: 7, 17, 37, 89, 197, 431, 919, 1931, 4049, 8419, 17519, 36353, 75431, 156437, 324449, 672827, 1395263, 2893249, 5999471, 11998949 The light dictionary only drops arrays when they keep being small. Once they reach the LoH limit then fixed size arrays are appended. The max size depends on the array types.

Actual performance results

I did some benchmarks to compare the performance between LightDictionary<string, int> and Dictionary. The test consists of setting 10,000,000 values and getting all the values.

Type Total memory traffic LoH memory usage Setting values Getting values Total duration
LightDictionary 257,392 Mbytes 0 :-) 4,566 ms 2,617 ms 7,184 ms
Dictionary 472,063 Mbytes 471,481 Mbytes - largest array needs 50 Mbytes contiguous memory 4,129 ms 2,298 ms 6,428 ms

This shows that LightDictionary doesn't consume any memory in the Large Object Heap anymore. Furthermore the global memory traffice is twice as small. The drawback is that the LightDictionary is around 11% slower than the standard Dictionary.

Plan for improving performance

I did many tunings to have the best performance. According to the profiler, the remaining reason why the LightDictionary is slow is because of the additional indexed array access: entries[n] is now replaced with entries[x][y].

Any help on that would be much appreciated as I can't see any way of improving this...

Build

This project is built on appveyor: Build status

And available on nuget: Nuget

This library depends on ExpressionReflection/StaticReflection library: https://github.com/mtranter/StaticReflection/

License

This library is licensed under MIT License. See LICENSE file

lightcollections's People

Contributors

frblondin avatar dammejed avatar pfeuv avatar tomap avatar dependabot[bot] avatar

Stargazers

Vitaliy Shibaev avatar Mogens Heller Grabe avatar Li Wei avatar  avatar Ryan avatar Thomas Démoulins avatar

Watchers

James Cloos avatar Thomas Démoulins avatar  avatar  avatar

lightcollections's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.