Giter Site home page Giter Site logo

Comments (13)

jonhoo avatar jonhoo commented on May 20, 2024 9

I spent some more time thinking about this in the context of #70, and I wonder if the real solution here is to abstract away the core concurrency primitive from evmap, and then just have two implementations that share that primitive. The question then becomes what that looks like.

I think what you'd want is a LeftRight type that is generic over:

  • T, the type we're sharing (i.e., HashMap<K, Values<V>> for evmap as it stands today)
  • O, the oplog type (i.e., Operation in today's evmap)

LeftRight would require that T: Absorb<O>, where:

trait Absorb<O> {
  fn apply_first(&mut self, op: &mut O);
  fn apply_second(&mut self, op: O);
}

This, I think, should let today's Evmap<K, V> just be a wrapper around

LeftRight<HashMap<K, Values<V>>, Operation<K, V>>

with an appropriate impl Absorb block. And then, we can get a single-value implementation without too much issue by wrapping

LeftRight<HashMap<K, V>, SingleOperation<K, V>>

LeftRight itself would provide a lower-level version of ReadHandle and WriteHandle, which is used by the higher level variants that look like the ones we have today. ReadHandle would be little more than a thing that gives access to &T, whereas WriteHandle would mainly have:

impl WriteHandle<O> {
  fn refresh(&mut self);
  fn append_op(&mut self, op: O);
}

Now, I think implementing all of this would be super neat, and I think it would enable a number of neat use cases like super fast vectors, BTreeMaps, and whatnots. It would also probably be interesting to implement. Maybe I'll do it in a live stream one of these days! But if someone beats me to it, that'd be cool too!

from left-right.

comath avatar comath commented on May 20, 2024 2

Here's the lib I used the fork in: https://github.com/comath/grandma

The fork is currently a vestigial library embedded in the code. I haven't had a chance to try the changes you suggested. If they work I'll add them and not fork, but if not I can just fork.

Grandma is currently 3rd near complete rewrite of this library and doesn't yet have a some of the old features that the previous generations had. The first generation used traditional references to build the tree, with RwLocks where needed, and the second used a arena with a hashmap with per bucket locking. The node-cache updates in the old generations absolutely killed the performance. Just querying the tree with this evmap fork is much faster, I haven't tried cache updates yet.

from left-right.

jonhoo avatar jonhoo commented on May 20, 2024 1

I think some of the changes that landed recently in #48 may help with this. You may find the discussion there interesting, as I think your use-case is somewhat similar to that of @MayaWolf :)

from left-right.

jonhoo avatar jonhoo commented on May 20, 2024

Glad to hear it!

Out of curiosity, how much of a difference did the swap from SmallVec<T> to T make? It should be marginal if you only ever have one T per key, since the SmallVec<T; 1> should be able to hold one T without allocating. You definitely cannot safely do in-place updates with either though, since the values are aliased between the maps (unless you are cloning them instead?).

As far as merging goes, I'd prefer finding a way to avoid replicating much of the unsafety. One way to do so might be to provide a thin wrapper type around evmap that has an API that assumes and enforces just one entry per key. And behind the scenes that would always benefit from the one-slot space held by the SmallVec.

from left-right.

comath avatar comath commented on May 20, 2024

I clone them. I couldn't figure out a way without each update cloning the value, modifying the clone, deleting the old entry and finally inserting the update. The savings is mostly in the update code that is simpler and lets me clone only once at creation.

I was thinking of two solutions to multiple largely redundant unsafe blocks. One is to have a back end with all the unsafe that both maps use. This is probably too hard to get right, but if we did it would be nice.
The other is to have the map have 2 values, a metadata value that's cloned and a normal one that's a SmallVec value. The base map would have:

pub struct WriteHandle<K, D, V, M = (), S = RandomState>
where
    K: Eq + Hash + Clone,
    S: BuildHasher + Clone,
    V: Eq + Hash + ShallowCopy,
    D: Clone + Default,
    M: 'static + Clone,
{ ... }

The mono map would then just be a WriteHandle<K, D, ()> and the multi-value map would be WriteHandle<K, (), V>. This metadata-enhanced map would probably be the most helpful over all to me personally, I am implementing a hierarchical clustering system that could use both values extensively.

I hadn't thought of having a the monomap be a single-value enforced multimap. I don't think it'd work with a single value in the SmallVec, though. A pair of values in the SmallVec could work but we'd have to track which one we should read from.

from left-right.

jonhoo avatar jonhoo commented on May 20, 2024

A back end that holds the data and that both maps update would not be safe, since you'd then still have read-write conflicts on it. You could of course always use a lock, but that's equivalent to having your data type be Arc<Mutex<T>>.

Having both a cloned and a non-cloned value seems like a very strange design indeed. Are there use-cases where someone would need both? I'm hesitant to have to juggle two different types of values in an already complex data-structure — it makes it much more likely that there'll be bugs.

Ah, yes, I was proposing just using the SmallVec for a case where you never needed &mut to a value, but wanted a simpler API (i.e., and API that doesn't expect multi-values for each key).

Ultimately, I don't think extending evmap to supporting &mut to the values is going to be reasonable. It is far too easy to shoot yourself in the foot with it. Many of the APIs would have to become unsafe, since, for example, if a user runs an accidentally non-deterministic update function, the two maps would become out of sync, and all sorts of things would break! If you want mutable access, my suggestion would be using Arc<RwLock<T>> as the value type (maybe using parking_lot). That way, you do per-value locking, and you have the ability to modify the value if needed.

from left-right.

comath avatar comath commented on May 20, 2024

My KNN data structure is a tree where the nodes have several fields that hold cached results and other information, and a list of children. Since the list of children is small, between 1 and 20 ints, I just clone it. The clustering is similar, but with a much bigger child list and a much more complicated metadata to enable some topological data analysis.

I think this use case might be specialized for trees, so if you don't want the added complexity I could fork it and call it evtreemap or something.

from left-right.

jonhoo avatar jonhoo commented on May 20, 2024

I wonder -- have you benchmarked just replacing the child list instead of updating it in-place? My guess is that it won't be that much slower.

from left-right.

comath avatar comath commented on May 20, 2024

I have not, I ran into issue updating the metadata early and then made the monomap. The child list isn't updated often, so I don't think it'll change the overall performance much.

from left-right.

jonhoo avatar jonhoo commented on May 20, 2024

Might be worth trying that so that you don't have to maintain your own fork :) We could in theory also provide a handy method for "replace by calling a closure", which sounds like it might work well in your case!

from left-right.

comath avatar comath commented on May 20, 2024

That could greatly simplify the code for most. I can make a PR for that.

I forked my mono-map a while ago. There's been a lot of changes since then :D.

from left-right.

jonhoo avatar jonhoo commented on May 20, 2024

Hehe, I believe it. Many of them should make the map much nicer to work with! From memory, some of them were also correctness fixes.

from left-right.

mattdm avatar mattdm commented on May 20, 2024

FWIW I would also be interested in this, and also a unique-key (like a normal hashmap instead of a bag) version.

from left-right.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.