jtmueller / collections.pooled Goto Github PK
View Code? Open in Web Editor NEWFast, low-allocation ports of List, Dictionary, HashSet, Stack, and Queue using ArrayPool and Span.
License: MIT License
Fast, low-allocation ports of List, Dictionary, HashSet, Stack, and Queue using ArrayPool and Span.
License: MIT License
PooledList: the indexer-get operation is 9-10 times slower than that of List, despite being the exact same code, as far as I can tell! I haven't found any way of closing that gap other than removing the bounds-check, which isn't a good idea. I suspect some sort of List-specific optimization in the framework that PooledList can't benefit from. However, indexing into PooledList.Span is 84% faster than indexing into List, so there's that.
I will look into this.
Is your feature request related to a problem? Please describe.
We already support ValueTuple, but there are cases where support for Tuple would be convenient, particularly in F# to make an equivalent to the dict
function.
Describe the solution you'd like
Constructors, AddRange, and extension methods for PooledDictionary should support IEnumerable<Tuple<TKey, TValue>>
. No need to support spans of same.
Is your feature request related to a problem? Please describe.
PooledQueue is the only major non-Sorted collection left to support.
Describe the solution you'd like
Implement PooledQueue, as with the others.
Is your feature request related to a problem? Please describe.
Currently the pooled collections are all hard-coded to use ArrayPool<T>.Shared
, but ArrayPool<T>
is an abstract class intended to allow for custom implementations. For example, on a resource-constrained device someone might need an ArrayPool of a fixed size, or that pre-allocates arrays or something. At the moment there is no way to get PooledCollections to make use of such a custom ArrayPool.
Describe the solution you'd like
We need a way to optionally specify a custom ArrayPool, and default to ArrayPool<T>.Shared
if none is provided. However, passing in an instance of a custom ArrayPool wouldn't work for PooledDictionary or PooledSet, because those types rely on arrays of a private type inaccessible from outside.
Perhaps there's a way to use a generic type parameter to specify the type of a custom ArrayPool, and then create an instance of this type inside the collection code?
Has it been considered to provide an option of returning the rented buffers/arrays from the collection finalyzer. May be useful when disposing of a pooled collection instance is too complicated or impossible to enforce. For instance, a collection is being returned from some library as its base interface (IList as an example).
To not waste anybody's time I am not providing possible ways to implementing such feature until know for sure it is not a duplicate to some other, known request or issue.
Thank you.
Is your feature request related to a problem? Please describe.
If foo is PooledList<int>
...
new PooledStack<int>(foo.Span)
...does not work without an explicit cast.
Describe the solution you'd like
It would be nice to not have to cast. There may be other pooled collections to which this applies.
Is your feature request related to a problem? Please describe.
.NET 6 exposes a low-level method to get a ref to the entry in a Dictionary<,>
: dotnet/runtime#49388
It would be nice to have an equivalent to this in the pooled collections.
Describe the solution you'd like
Something like PooledCollectionsMarshal.GetValueRefOrNullRef
?
.NET Standard 2.1 is in Preview (as part of .NET Core 3.0) as of this writing. The current version...
The downside is this:
My current plan is this, when the final release of .NET Core 3 comes out:
By incrementing the major version, .NET Core 2.x users will not auto-upgrade to the new release, and can keep using Collections.Pooled 1.x. Users of .NET Core 3 will get the Standard 2.1 version, and all other users will get Standard 2.0 (until another framework adds support for Standard 2.1 of course).
This code is currently fully implemented in the core-3
branch if you want to experiment with it.
@dzmitry-lahoda and @Wixred I'm tagging you because you've shown an interest. Do you have any objection to this plan, or any alternate suggestions?
I looked into dozen of dict benchmarks. I do not see when pooled is faster. In some cases it is even slower for some reason. Could you please document?
Is your feature request related to a problem? Please describe.
Next method is kind of slow in Unity runtime (not sure about Core):
public Span<T> Span => _items.AsSpan(0, _size);
Describe the solution you'd like
Mark this method as aggressive optimization and inlining.
Describe alternatives you've considered
Provide unsafe access to writable array as ReadOnlyProperty.
Additional context
We get data from Span by ref.
I know Core 3 optimized Span, but there are issues to measure that because of BDN-JIT instability. Same time Unity is involved into .NET Ecosystem and hopefully will optimize Span in some time (because people will use .NET Standard with Spans and Unity stated it is 2.0 and will be 2.1)
Describe the bug
When nesting pooled collections inside each other, the debugger crashes when taking a heap snapshot.
To Reproduce
Create a nested collection, for example PooledDictionary<int, PooledSet<int>>
and attempt to take a heap snapshot in the Visual Studio debugger. This occurs in both the prerelease and 1.0 Nuget packages.
Expected behavior
The debugger should not crash.
Is it also possible that nesting pooled collections is just not recommended usage? I'm not really sure how it works internally.
There is not enough external observation for developers to start using dictionary as replacement.
Describe the bug
Lookups fail in a PooledDictionary<(int, int), string>
but succeed in a Dictionary<(int, int), string>
.
To Reproduce
class Program
{
private static void PooledTest()
{
var dict = new PooledDictionary<(int, int), string>
{
{ (1, 0), "1,0" },
{ (2, 0), "2,0" },
{ (0, 1), "0,1" },
{ (0, 2), "0,2" }
};
bool found1 = dict.TryGetValue((1, 0), out var _);
bool found2 = dict.TryGetValue((2, 0), out var _);
bool found3 = dict.TryGetValue((0, 1), out var _);
bool found4 = dict.TryGetValue((0, 2), out var _);
Console.WriteLine("PooledDictionary: {0}, {1}, {2}, {3}", found1, found2, found3, found4);
}
private static void DictTest()
{
var dict = new Dictionary<(int, int), string>
{
{ (1, 0), "1,0" },
{ (2, 0), "2,0" },
{ (0, 1), "0,1" },
{ (0, 2), "0,2" }
};
bool found1 = dict.TryGetValue((1, 0), out var _);
bool found2 = dict.TryGetValue((2, 0), out var _);
bool found3 = dict.TryGetValue((0, 1), out var _);
bool found4 = dict.TryGetValue((0, 2), out var _);
Console.WriteLine("Dictionary: {0}, {1}, {2}, {3}", found1, found2, found3, found4);
}
static void Main(string[] args)
{
DictTest();
PooledTest();
}
}
Expected behavior
Dictionary: True, True, True, True
PooledDictionary: False, False, False, True
We should get the same results from both dictionary types, without having to specify custom comparers.
I am new to Span and Memory, and am trying to migrate my library to them. I discovered PooledList when looking into how to AddRange a Span to a List. My initial migration increased the warm of my program by 30% (parsing a large data file and loading it into different lists and dictionaries in memory). Thanks!
However I did find one point that I needed to work around:
It currently isn't possible to get a Memory from PooledList.
Being able to access a Memory from the PooledList will allow me to pass around heap storable references of sub-ranges which makes the PooledList even more flexible. I'd be able to use references to it with Linq IEnumerable and also be able to use it in local functions. Ie. it would free the references for the restrictions that are required by Span.
I understand that PooledList is built upon ArrayPool which means that returning a Memory will cause a memory leak when expanding the PooledList.
Is it possible to implement this functionality - or would it come as an unacceptable tradeoff to the performance/complexity of the library?
Thanks again
Is your feature request related to a problem? Please describe.
Pooled collections have different behavior between .NET Core 2.1+ and .NET Standard 2.0. Because .NET Standard has no mechanism to easily determine if a value type contains a reference type, the .NET Standard version always clears values before returning arrays to the array pool.
The .NET Core 2.1+ version has an optimization that only clears values before returning arrays when those values are reference types or contain reference types. This helps performance, but some users might be concerned about leaking sensitive data to other code running in the same process, and might want even value-type data to be cleared before returning arrays to the pool.
Describe the solution you'd like
Pooled collections should allow the user to specify an "always clear" behavior - at least in the .NET Core 2.1+ version.
Is there is any plan to support .NET6?
Describe the bug
This is a reminder to test a likely issue that's not been verified yet: it's likely that when pooled collections are serialized and then deserialized, disposing them causes an exception because the deserialized arrays do not come from the ArrayPool, which throws an error if you return to it an array it didn't create.
Expected behavior
Either ArrayPool returns should be in a try/catch block, or deserialization should copy the deserialized array into an ArrayPool array instead of using it directly.
Would it be possible to have a PooledLookup implementation that implements ILookup?
This would drastically help us in scenarios we have a large (>1m objects), intermediate lookup we create that gets GCād right away.
Would be amazing if you could add support for ObservableCollection as it's the only collection type that fully supports binding. If you add support I would recommend or suggest you also add a AddRange method as the original collection hasn't but for performance reasons it's a must-have. There are many modified versions that added such as a method but none which is optimized with ArrayPool etc.
It would be great if there was PooledArray implementation along w/ the it's supporting LINQ methods (ToPooledArray etc)
INotifyCollectionChanged is useful ,I wanted it~
In the following code stored lists are not returned to the pool:
var lookup = new PooledDictionary<int, PooledList<string>>(clearMode: ClearMode.Always);
// add values to collection...
lookup.Dispose();
Is your feature request related to a problem? Please describe.
Zero overhead structs clone.
Describe the solution you'd like
PooledList<PlayerState> players = null;
if (Players != null)
{
players = new PooledList<PlayerState>(Players.Count, preallocate : true); // i guess preallocated just sets _size = capacity;
for (var i = 0; i < players.Count; i++)
{
PlayerState.Clone(in Players.Span[i], ref players.Span[i]);
}
}
Describe alternatives you've considered
PooledList<PlayerState> players = null;
if (Players != null)
{
players = new PooledList<PlayerState>(Players.Count);
for (var i = 0; i < players.Count; i++)
{
// Clone will fail until
players[i] = default; // overhead to create 200+ byte struct on stack and pass it into indexer (not sure if optimized away by compilers?)
PlayerState.Clone(in Players.Span[i], ref players.Span[i]);
}
}
Additional context
Usual List
could not do that because it would leak abstraction (that it is actually array
list).
PooledList
already leaks Span
as property and ArrayPool
as ctor arg(also ctor arg is no of so problem). So adding that method will not increase leakage, but will allow more performance uses for structs.
so we clear read on what does copy-allocate and what is not.
Is your feature request related to a problem? Please describe.
In Unity runtime if key is struct, than key == null
produced garbage of boxing. Probably same happens within .NET Core.
Describe the solution you'd like
Could check !typeof(TKey).IsValueType
before comparing with null, or may be is null
could do that already. How fast is that check?
Describe alternatives you've considered
Copy and paste whole code.
Hi,
I've update my project to pooled collections, but it seems time costs has not been imporoved. I've used a lot of linq technologies in my project. So I wonder does linq benefits from Pooled?
Thank you.
As I see current semantics is to return array into pool, set capacity-size to 0, and allow use later.
/// <summary>
/// Returns the internal buffers to the ArrayPool.
/// </summary>
public void Dispose()
{
ReturnArray();
_size = 0;
_version++;
}
This contradicts with corefx
classes Dispose
when you cannot use object after Dispose
.
Real semantic seems to be Dispose
= Reallocate(capacity: 0)
. ReturnPool => Reallocate(capacity: 0)
. I suggest to rename Dispose
to something else.
Or document:
/// <summary>
/// Returns the internal buffers to the ArrayPool and sets capacity to zero. Instance still can be used.
/// </summary>
Describe the bug
PooledDictionary.Remove with Count 0 = Exception
To Reproduce
var dict = new PooledDictionary<string, string>();
dict.Remove("Something");
Expected behavior
return without doing anything.
Is your feature request related to a problem? Please describe.
Although not need for my primary use cases, there are some situations where I'd find it beneficial for a Stack implementation to be indexable and insertable. Unfortunately, most implementations do not support this, therefore I have to use a List and it's Add and RemoveAt and forgo the optimizations done for Pop, Peak, and Push.
Describe the solution you'd like
Add an Indexer and implement Insert(int index, T item) for PooledStack
in my test HashSet is much faster and 0 gc
using System.Collections.Generic;
using Collections.Pooled;
using Gma.DataStructures;
using UnityEngine;
public class Test169 : bs
{
HashSet<int> HashSet = new HashSet<int>(){1,2,3,4,5,6,7,8,9};
PooledSet<int> PooledSet = new PooledSet<int>(){1,2,3,4,5,6,7,8,9};
public void Update()
{
using (bs.Profile("pooled"))
for (int i = 0; i < 1000; i++)
{
PooledSet.Remove(2);
}
using(bs.Profile("list"))
for (int i = 0; i < 1000; i++)
{
HashSet.Remove(2);
}
// Debug.Log(dd);
}
}
/// <summary>
/// Returns the internal buffers to the ArrayPool.
/// </summary>
public void Dispose()
{
ReturnArray();
_size = 0;
_version++;
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ššš
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ā¤ļø Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.