Giter Site home page Giter Site logo

left-right's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

left-right's Issues

RFC: Hash and Eq requirement for values

Maybe a duplicate of #45 but I figured I'd ask anyway.

I'm trying to use evmap as the base store for an MQTT broker. For simple subscription matches I'm keying off the topic, and my initial attempt uses a struct containing a token+futures::channel::mpsc::Sender as the value item in order to route messages. The evmap crate seems to be a good fit for this application: the most common case is using the collection to look up subscribers for a topic, and then pushing publish messages into their queues. I want this to be lock-free and fast. Occasionally I have to modify the subscriptions list when subscribers come and go, and I don't want to have to block all publishers to do so. And having the value be a collection is actually useful, because each topic can have multiple routing destinations. Storing a list of these items is something I'd have to do anyway. But using hashbag internally creates some challenges:

  1. futures::channel::mpsc::Sender doesn't impl Eq or Hash, and in fact most senders don't even implement PartialEq let alone Hash (futures_channel, crossbeam, flume). I used the derivative crate to try out an implementation where senders to the same receiver were equivalent, but...
  2. A lot of sender implementations require &mut to send. For reasons that are obvious in retrospect (extensive use of interior aliasing) evmap isn't set up to allow interior mutability of values. I can work with unbounded channels in a prototype because I don't need &mut, but this isn't a great approach in a production application-- I'm trying to bound memory usage and balance performance for clients that can publish lots of messages in a short time, and using an unbounded channel could cause resource starvation or long tails elsewhere.

I can actually work around the second issue with flume, because the sender doesn't need to be mutable. But flume doesn't have a way to determine whether two senders point at the same receiver, and I'm not sure how to sanely implement Hash for it either.

So I guess I'm trying to figure out whether it makes any sense at all to attempt a feature patch/PR that removes the Eq and Hash requirement by allowing a different container, perhaps a normal linear sequence like Vec for Long. Are there other reasons for using hashbag that I'm not seeing?

concurrent-map-bench does not compile

Building concurrent-map-bench in benchmark results in:

$ cargo build --release
...
error[E0599]: no method named `get_and` found for reference `&evmap::ReadHandle<u64, u64>` in the current scope
   --> src/main.rs:228:15
    |
228 |             r.get_and(&key, |v| v[0]).unwrap_or(0)
    |               ^^^^^^^ method not found in `&evmap::ReadHandle<u64, u64>`

I think this is caused by the API change in a55390c.


When I build the crate with cargo, I see the evmap dependency updated in Cargo.lock (9.0 vs. the committed 7.0). I think this is expected for a local path dependency, but please let me know if I'm doing something unexpected (I'm new to Rust).

If you can confirm that this is an issue, I'd be happy to create a PR with a fix.

More than two internal maps?

This isn't really an issue, more of a question. Have you considered using more than 2 internal maps in a "ring"? The benefit would be less waiting on readers, but the down sides are more memory use (but memory is cheap!) and more complicated logic to apply changes to each map in the ring.

Avoid contention on reference count for readers

The current implementation has an Arc behind the atomic pointer to the currently active map. This is used by the writer to determine when there are no more readers present for the previous map when doing a refresh(). Unfortunately, this also leads to readers contending on the cache-line of that Arc, even when there are no writes.

evmap should move away from the Arc-based scheme to a solution that does not incur this cost (nor the extra pointer indirection that Arc introduces). We can do that by shifting more work onto the writer. The gist of the idea is that every reader will have a local epoch number, which they increment when they read from the map. After swapping, the writer will wait until it has seen the epoch number change for every reader, which ensures that every reader has moved on to the new map.

Specifically, WriteHandle and ReadHandle will both hold an Arc<Mutex<Vec<Arc<AtomicUsize>>>>. Whenever a ReadHandle is cloned, a new AtomicUsize will be allocated, and will be added to the Vec by taking the Mutex. This may seem as though it adds a lot of overhead, but notice that a) the outer Arc is only ever updated when a new Reader is created; b) the Mutex is only taken by readers when one is cloned — in the common case, only the writer is accessing it; and c) the Arc wrapping the AtomicUsize will always carry a reference count of 2, and will only be modified when a reader arrives or leaves (i.e., during creation and destruction). Note also that only the writer needs to use strong consistency when accessing the AtomicUsize; the readers can update it with Ordering::Relaxed.

It is worth noting that the scheme as described thus far encounters problems when a reader does a single read, and then never reads again. The writer will never observe its epoch number change, and thus will wait forever after doing a swap. To combat this, the per-reader AtomicUsize also includes a bit to indicate that the reader has finished its read. The writer can now ignore any reader epochs that are marked as "finished", since it knows they cannot be accessing the map through the old pointer.

This should significantly improve the read performance in situations where there are many fast readers, especially when working with many cores or on NUMA architectures. Implementation of this will likely not happen for a little while, as more pressing tasks are awaiting elsewhere, but at least we now have a plan.

Interest in a single value map?

Thanks for this great lib! This cleared up a significant bottleneck for me.

I have an eventually consistent editable KNN data structure that I'm using a simplification/modification of this library for. I changed the value of the hashmaps from a SmallVec<T> to just a T and changed the operations to just insert, update and remove. My objects are relatively small, but complex. Doing small updates in place is more critical than the memory savings. You can find the modification here: https://github.com/comath/rust-evmap/tree/master/src/monomap

Is there any interest in a PR to bring this modification into this lib? There's a similar issue here.

Possible UB from Box<T> aliasing

While watching yesterday's live stream Box shallow copying was mentioned, and me and a couple viewers asked about aliasing

I asked in the UFC zulip stream about it

Quoting RalfJ:

RalfJ: it's more that Box is an exclusive ptr and even reads 
        through aliases are disallowed when a ptr is exclusive

RalfJ: (this grants extra optimization power because it means 
        nobody else can even observe the current value stored behind the ptr)

Example which triggers UB according to MIRI (As provided by bjorn3 in the zulip topic)

Even just making a shared reference instead of actually reading the Copy value also triggers UB according to MIRI

Changing it to make the shallow copy the same way as the Box<T> shallow copy impl seems to change nothing: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6e7b08eb28d6032fbdb41028225f4dde

Switch oplog to VecDeque

Given that the oplog is always added to at the back and drained from the front, it seems like Vec could be switched to a VecDeque. This should improve performance of writes. On my machine it improved the write ops/s for the benchmarks by 20%.

Reader not seeing writes even after refresh?

I was implementing my own LRU cache on top of evmap and was having problems, decided to re-implement the test with just evmap and I think I may have found a bug where this test usually fails with the len() being 0. Occasionally it passes.

    #[test]
    fn ev_multi_threaded() {
        let (read_handle, mut write_handle) = evmap::new();
        let reader_a = read_handle.clone();
        let reader_b = read_handle.clone();
        let reader_c = read_handle.clone();
        let barrier = Arc::new(Barrier::new(4));
        let ba = barrier.clone();
        let bb = barrier.clone();
        let bc = barrier.clone();
        // The write-side thread (the one that does inserts)
        std::thread::spawn(move || {
            for i in 1..=10 {
                write_handle.insert(i, i.to_string());
                write_handle.refresh();
            }
            dbg!("A");
            ba.wait();
        });
        // Two "read" sides
        std::thread::spawn(move || {
            while let None = reader_a.get_and(&6, |v| v[0].clone()) {}
            while let None = reader_a.get_and(&8, |v| v[0].clone()) {}
            dbg!("B");
            bb.wait();
        });
        std::thread::spawn(move || {
            while let None = reader_b.get_and(&7, |v| v[0].clone()) {}
            while let None = reader_b.get_and(&9, |v| v[0].clone()) {}
            dbg!("C");
            bc.wait();
        });
        barrier.wait();
        dbg!("Done");
        assert_eq!(reader_c.len(), 10);
    }

Am I misusing something here? I might be doing dumb things with Barrier. From what I can tell this should pass consistently. You can even put a long sleep in before the call to len() and it'll still fail.

Thanks for your work on this library! Hopefully this helps and isn't a fluke.

#[derive(ShallowCopy)] has prohibitive "T: ShallowCopy" bound

Currently if you wish to use evmap_derive::ShallowCopy for some thing like:

pub struct ShallowArc<T>(Arc<T>); // or more complicated structures

The generated trait implementation has a T: ShallowCopy bound, which is unnecessary -- the ShallowCopy impl for Arc<T> doesn't have such a bound and if the code was just generated without the bound it should compile without issues -- here's the open-coded version I used (though I will admit this may be entirely wrong -- though the fact that Arc<T> is safe leads me to think that this must also be safe):

impl<T> ShallowCopy for ShallowArc<T> {
    // SAFETY: This is safe because the evmap crate's ShallowCopy implementation
    // is safe for Arc.
    unsafe fn shallow_copy(&self) -> ManuallyDrop<Self> {
        ManuallyDrop::new(Self(ManuallyDrop::into_inner(self.0.shallow_copy())))
    }
}

The same applies for Box, Rc, and probably more examples.

For background, the reason why you'd want to have struct ShallowArc<T>(Arc<T>) is because Arc<T> implements deep equality and hashing, but for my use-case I strictly care about inner-pointer equality (Arc::ptr_eq). Also my T contains RwLock and other structures which cannot be compared or hashed. So I need to wrap Arc<T> -- and ShallowArc is being used only in a single-value evmap anyway.

rollback

When looking at how the refresh mechanism works on stream, it ocurred to me that it seems like a (single level) rollback mechanism seems like it would be possible to implement, by:

  • pointing readers to the stale map,
  • dropping the current read map,
  • cloning the stale map, as the new write map,
  • abandoning the oplog,

I think the first/second drop seems tricky though, because you would need to perform the second drop, for the items in the oplog. (i.e. you want to actually drop the values which are being rolled back, but don't exist in the stale map, while not performing the second drop on items in the stale map).

There was a bit of musing on uses for non-collection evmaps and what they are useful for.
I think the addition of rollback could make them more obviously useful.

This is not really something I need, but seemed not impossible to implement

Recreating WriteHandle after its dropped

I'm considering evmap for a database, however I have a requirement that it doesn't (yet) fulfil. I'm using the actor model and a single actor is responsible for writing to the map, with many other actors reading from it, good fit so far. However if the writing actor crashes it is automatically restarted, but now I need a way to get another WriteHandle to the same evmap. Currently this isn't possible.

Would you consider an addition of a WriteHandleFactory type (similar toReadHandleFactory)? With a method handle(&mut self) -> Option<WriteHandle>, which ensures that at most a single WriteHandle exists at a given time.

Test with loom

@carllerche recently published loom which does correctness checking for concurrent algorithms. It'd be neat to try to get that working for evmap's tests!

Thread Safety Error on ReadHandler

I have created a TCP server which adds a random number to evmap and sends all the value for a given test key.

I am facing std::cell::Cell<()> cannot be shared between threads safely error if I spawn a thread for each HTTP req.

evmap is concurrent data struct so I think we are able to do crud op concurrently using multiple threads.

Below is the exact code.

Please point out if anything wrong I am doing or any change by which I can read/update evmap using threads.

use std::sync::Mutex;
use rand::Rng;
use std::net::{TcpListener};
use std::io::{Write};
use std::thread;

fn main() {
    let listener = TcpListener::bind("127.0.0.1:8080").unwrap();
    println!("Listening for connections on port {}", 8080);

    let (read_handler, write_handler) = evmap::new();
    let write_lock = Mutex::new(write_handler);

    for stream in listener.incoming() {
        match stream {
            Ok(mut stream) => {
                thread::spawn(|| {
                    let mut rand_generator = rand::thread_rng();
                    let mut lock = write_lock.lock().unwrap();
                    let random_number = rand_generator.gen_range(0, 100);
                    lock.insert("1".to_string(), random_number);
                    lock.refresh();

                    let mut response = String::new();
                    if let Some(values) = read_handler.clone().get("1") {
                        for value in values.iter() {
                            if response.len() > 0 {
                                response.push_str("~");
                            }
                            response.push_str(format!("{}", *value).as_str());
                        }
                    }

                    let response = format!("HTTP/1.1 200 OK\r\nContent-Type: text/html; charset=UTF-8\r\n\r\n<html><body>{}</body></html>\r\n", response);
                    stream.write(response.as_bytes()).unwrap();
                });
            }
            Err(e) => {
                println!("Unable to connect: {}", e);
            }
        }
    }
}

Create evmap in one thread and send read handle to another...

Hi! I'm deeping into Evmap more and more and now i'm stuck at this issue here is a design.

-------------------------------------------------------------------------------
|   Arc<HashMap<String,ReaderHandler>>   |   Future Reciever Thread (FRT)   |
--------------------------------------------------------------------------------
|   Future Sender Thread (FST)    |
---------------------------------

I want to create EVmap inside FST and then transfer ReadHandle to FRT where it assigns new Reader to new HashMap record. The process fails due to readers !Sync marker and Arc<> wrapper which requires Sync. But i do not want to share Reader across threads i just want to transfer (Send) and store it in one top-level variable.

Here is the issue :)
P.S.: Will appreciate any tips...
P.P.S.: Due to my limitation a cant set EVmap statically at the top.

ShallowCopy for byte::Bytes

Seems like byte::Bytes can not be used as type, which is unfortunate, as it should be more or less among the most favored types as it is made for fast data transfer, continuous memory footprint and such.

copy in Bytes is shallow by design, so the following compiler error might be avoidable...

error[E0277]: the trait bound bytes::Bytes: evmap::ShallowCopy is not satisfied
--> src/foo.rs:143:5
|
143 | w:WriteHandle<[u8; 32], Bytes, (), BuildNoHashHasher>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait evmap::ShallowCopy is not implemented for bytes::Bytes
|
= note: required by evmap::WriteHandle

error: aborting due to previous error

Replace Clone bound with Freeze

There is currently an RFC being considered for the Freeze trait. I believe that targeting this trait could simplify evmap's implementation (likely removing ShallowCopy) and expanding its bounds (it's an auto trait for all structs not directly containing UnsafeCell). The current belief is that there are only niche use cases for this trait and not worth adding to the standard library, but it would be great to hear if a more well known crate like evmap could benefit from it's addition.

Global Evmap

Hi!

i'd like to use a map in actixweb.
What's the best and current method to use an evmap as a global?

Thanks!

Make evmap generic over underlying map

It would be really great if evmap could be generic over the underlying map implementation. This may not be possible, but it would let us support use-cases like this one by supporting IndexMap as one of the map types. Given that we only support a subset of the operations on maps anyway, maybe it's doable?

Support multiple hashtable backends

Branching off of https://github.com/Amanieu/hashbrown/issues/43

I'm not sure what the best way to do this is, so this issue is inviting ideas.

A simple start would be a trait such as:

pub trait Backend<K, V>: Extend<(K, V)>
where 
    K: Hash + Eq,
    V: Eq + ShallowCopy,
{
    fn apply_first(&mut self, ops: ...);
    fn apply_second(&mut self, ops: ...);

    unsafe fn clear_nodrop(&mut self);
}

impl<K, V, S> Backend<K, V> for hashbrown::HashMap<K, V, S>
where
    K: Hash + Eq,
    V: Eq + ShallowCopy,
    S: BuildHasher,
{ /*...*/ }

// also for std::collections::HashMap
// also for rahashmap::HashMap
// etc.

Perhaps the S hash builder parameter could be replaced with the above, so that:

pub struct WriteHandle<K, V, M = (), S = RandomState>
where
    K: Eq + Hash + Clone,
    S: BuildHasher + Clone,
    V: Eq + ShallowCopy,
    M: 'static + Clone,
{ /*...*/ }

becomes:

pub struct WriteHandle<K, V, M = (), B = hashbrown::HashMap<K, V, RandomState>>
where
    K: Eq + Hash + Clone,
    V: Eq + ShallowCopy,
    M: 'static + Clone,
    B: Backend<K, V>
{ /*...*/ }

This would allow us to leverage the particular features and algorithms for different libraries.

If necessary, two traits could be used to hide the real implementation:

mod internal {
    pub trait RealTrait {
        fn real_operation_we_dont_want_seen();
    }
}

pub trait PublicTrait: internal::RealTrait {}

impl<T> PublicTrait for T where T: internal::RealTrait {}

pub struct Structure<B>
where
    B: PublicTrait
{ /*...*/ }

Method for clearing map

There is currently no way of clearing the entire map; you would have to iterate over all the keys and remove each one (#18). Adding a .clear() operation should be pretty straightforward. It also has the nice property that it clears the operation log when you add it!

Relaxation of epoch invariant

During the last stream the epoch invariant came up, specifically that it is currently increasing, and greater than,
Someone (not sure who), mentioned that it could be relaxed, to wrapping, and not equal.

This is an argument for why the above relaxation is correct:

  • std::u{N}::MAX == 2 ^ N - 1 is odd,
  • std::u{N}::MAX + 1 == 0 is even,

Therefore when epoch wrapping occurs a reader is exiting a leftright, and the meaning ascribed to even and odd shouldn't flip.

I'm not sure of the cycle for variously sized types of atomic instructions, but perhaps this would allow a faster atomic type to be chosen (whatever is fast, but still lessens the probability of writer reading the same value twice as a reader wraps around).

Though it probably would take a real short enter/read/exit to actually matter, its a potential micro-optimization which removes of a theoretical panic, so I believe its worthwhile.

Possible to shallowcopy a HashMap

Thank you for this work. We are recently in using evmap in our DB system as the caching module, but we find out a case that need to use a HashMap as the Value. However, according to the trait bound, the HashMap needs to be ShallowCopiable, which is not yet implemented in the codebase. Is it possible to make it so? Thank you.

Seems key is inserted before refresh sometimes

Using evmap in a high concurrency situation. I use contains_key to check if the evmap already contains the data and read it, if not, insert something and call refresh. But sometimes it seems that a contains_key check could happen between adding the key and inserting the value during a refresh. This makes contains_key returning true and get_and fails with None

(Yeah the issue title is not quite accurate)

data races due to incorrect `Send + Sync` bounds

evmap let's you trivially ignore Send and Sync requirements on types, which allow for horrors such as:

use std::cell::Cell;
use std::hash::Hash;
use std::sync::{Arc, Barrier};

pub struct Foo(Cell<i32>);

impl Hash for Foo {
    fn hash<H: std::hash::Hasher>(&self, hasher: &mut H) {}
}

impl Eq for Foo {}
impl PartialEq for Foo {
    fn eq(&self, _: &Self) -> bool {
        true
    }
}

fn main() {
    let (x, mut y) = evmap::new();
    let barrier = Arc::new(Barrier::new(2));

    let foo = std::rc::Rc::new(Foo(Cell::new(0)));
    y.insert((), foo.clone());
    y.refresh();

    let j = std::thread::spawn({
        let barrier = barrier.clone();
        move || {
            let read = x.get(&()).unwrap();
            let value = read.get_one().unwrap();
            let x = std::rc::Rc::clone(value);

            barrier.wait();
            for _ in 0..1000 {
                x.0.set(x.0.get() + 1);
            }
        }
    });

    barrier.wait();
    for _ in 0..1000 {
        foo.0.set(foo.0.get() + 1);
    }

    j.join().unwrap();

    println!("{}", foo.0.get());
}

This compile and runs, and is very much UB with data races. On my machine it has given the "correct" result of 2000, but also 1257, 1095, and 1068. Which is some more proof of UB.

This bug is in evmap version 10^ and the current master branch (commit hash: d4c12b8758c7e35145c40856f3d3474befe6ecda).

A potential alternative to epochs, if it works

When I was thinking about how epochs work and some ways to “improve” the implementing I came up with a new idea. I have no idea if it will work as well as epochs or if it is correct. I was hoping for some feedback on the idea, any error cases I overlooked or other improvements. After the holiday week when I have free time I hope to try implementing it (If there are no huge flaws).

When I tried explaining the epoch system to someone they replied with “why not use an ARC". There are a few reasons why ARCs do not work but it got me thinking. Is there a lock free way where each reader did not need to keep a personal state and where they can tell the writer when they are done with the "map"?

Seeing as I have not made it and I'm asking you to double check me I'm not sure. But I hope the idea is close or will spark more ideas. I will be trying to consolidate the idea but seeing as this is the first time I'm writing it all out in one place it may get a bit messy.

This is the basic idea of what the struct(have not named it yet) would look like.

struct {
active_ptr: atomic_ptr, // For active "Map"
active_bucket: atomic_u8, // What bucket is "active"
(atomic_usize, atomic_usize) "Buckets" // “active”, “next”
}

Basically the idea revolves around making a “clean” bucket and then waiting for the “dirt” bucket(s) to empty. We can guarantee that the clean bucket readers have only the active pointer but we are unsure of the dirty bucket(s). The dirty bucket(s) could be the old pointer(old pointer old bucket) or the new pointer (old bucket new pointer).

How it works:

Reader

  1. Read/save the active_bucket as local saved_active_bucket
  2. Show readers intention to read "Buckets"[saved_active_bucket]++ (check to see if we are at max)
    2 (Error case) if the active_bucket is changed then we could be in either the wrong bucket or correct bucket if multiple changes occurred. Either way the pointer is correct and the writer is waiting on us to finish if we are in the wrong bucket.
  3. Read the pointer
  4. Do what you need to …
  5. Show you are done with the associated bucket "Buckets"[saved_active_bucket]-- (check to see if we are at 0. If so “call” the writer)

Writer

  1. Swap active_ptr
  2. Make sure the “next” bucket has no readers (Make sure no-one snuck in before we swapped or is left over from previous swap)
    2 (Error case). Wait for a "call" from readers emptying maps. We can proceed when the “next” bucket hits 0.
  3. change active_bucket to “next”
  4. check to see if any(not active_bucket) bucket is at 0. if so mark it. (We don’t need to check later if they are at 0 now. Extends to multiple buckets)
  5. If old bucket(s) were not at 0 when we checked Wait for 0 “call” on non active bucket(s) (if buckets hit 0 we can swap. Even if they count up after)

Error cases

  • I cannot think of any more than what I have already listed.

Clarifying

  • When I say “Call” there needs to be a thread safe way for each bucket(reader) to tell a writer it is finished. This will most likely need to be a convar or single MPSC.
  • When I say “Buckets” it can be a tuple, array or anything that we can index. I am sticking to 2 buckets for now but I believe we can be expanded to more.

More ideas

  • I think there is a way to add another writer by adding more buckets but lets work on the one writer case for now.
  • For a reader if we know we want to do another read there is “potential” optimizations we can make here by comparing our saved_active_bucket(or the pointer) to the active_bucket(or pointer). If it is the same no writer has attempted to swap … that we know of. I have no idea if this will help but its an idea.
  • The same idea as above can be used to check if our reading is “outdated”(A new map has been published during our read). It will never be perfect but its one thing we can do to get closer.
  • To possibly reduce contention we can make more buckets or spread out each atomic so it is not on the same cache line … maybe?
  • Can we make the pointer and active_bucket one large atomic? Would that benefit us at all? I don't think so because we would need to check that we did not overwrite the pointer when we increment.

Down sides

  • Readers have to do more work. At least least one more read and whatever work is needed to “call” the writer.
  • More contention on the shared counters?
  • Writer may need to wait longer as it needs to clear 2+ buckets
  • We will have to wait for the next bucket to empty. This could take longer then just checking ever reader. (Some of this could still happen on epochs if we move to a u8 but they would need to perform a lot of reads) – I feel like this could be mitigated if we add more buckets.

Up sides?

  • Readers can have 0 size. They can just walk up and read no need to register as a reader.
  • Writer does not need to know about readers directly
  • Writer can “sleep” and wait to be told about changes
  • It is fixed size. No need for new allocations. Disregarding the need to expand the “map”

Thinks I don’t know

  • I have no idea if the trade off is worth anything for this project. I need to run benchmarks after it has been proven and implemented.
  • I am not great with atomics so I may have overlooked something here.

Final
There may be a few issues with it as I have already made minor changes to make it work since I originally came up with it. If this already exists under a different name let me know. Ask questions and try to pick it apart. Also if it was not already “invented” suggest names.

Efficiently Keeping value sets under a certain size?

This definitely relates to #29

So, all that understood, I'm wondering it there's a particular trick for this usecase that might stay efficient (namely, I want my value sets/vecs to always be N or fewer elements, a sort of LRU cache)? Intuitively it seems this could be implemented in terms of retain and keeping a tiny bit of extra metadata on each item.

Alternatively, I had started implementing this with a get_and, copy, and then update with the new values. Only to realize update doesn't swap out the entire value set, but rather replaces the new set with a one-value set. Adding a replace method that allows you to provide an entire set/vec as the second param would be desirable. I think the update method is a bit unintutive in that respect.

Really like the library! Thanks for your work!

Sorting underlying values vector?

Guys, is it possible to sort values vector without replacing whole array?
I mean just make vector.sort() or like that to save new values order?

Float as value

Hi,
I'm trying to use evmap with float values, like this:

let (read_handler, mut write_handler ) = evmap::new();
write_handler.insert("my key", 0.12 as f32);

However, I've got two errors:

  • the trait std::cmp::Eq is not implemented for f32
  • the trait evmap::shallow_copy::ShallowCopy is not implemented for f32

As I'm guessing the main reason is that float does not implement f32 (and that's why evmap did not implement ShallowCopy), but still I think is a valid use case.

I am also not sure how to implement ShallowCopy for my own structure. Putting it in a Box solves the problem, but I don't see why I would need a box. I think it would be usefull to put in the documentation an example for something like this:

#[derive(PartialEq, Eq, Hash)]
struct MyStruct(i32);

Thank you for you help.

possible to include examples of using evmap across threads?

I'm working through which data structures to use in a project and came across this. It looks promising however I'm fairly confused about using it across threads. Eventually I came across the benchmark code and saw that for multiple writers you are wrapping the writer handle in Arc<Mutex<_>>. If you need arc mutex for writing is there still a gain in performance? Is that because the reader handle can still be used? These things are not obvious to someone who is trying to understand a concurrent hashmap for the first time! The examples in the docs are good for how to insert/modify/access but don't address how to use it in its primary use case. (Not said in criticism, just to provide a helpful perspective from someone who is looking at it for the first time. Thanks for making the library available!)

Blanket implementation of ShallowCopy for Copy types

Hi!

First off, thank you for this great library! It's perfect for the application I'm developing.
One thing that's been making it a little awkward in use is the requirement for ShallowCopy. As far as I can see, it's really intended to be a bit of an extended copy - anything that can be copied should be, with special cases for certain types that could normally only implement Clone - correct?
If so, I think the library could benefit (both in use and in internal code) from a blanket implementation for Copy, like so:

impl<T> ShallowCopy for T where T: Copy {
    unsafe fn shallow_copy(&self) -> ManuallyDrop<Self> {
        ManuallyDrop::new(*self)
    }
}

What do you think?

Support for no_std?

Motivation

Im currently working on a Hobby project where this might be very useful, however the Project is written for a no_std environment meaning that I currently can not use this Crate.

Situation

From quickly looking through the Code I found that most of the uses of std can be easily replaced by either core or alloc, expect for the use of std::thread::yield_now in this line for the WriteHandle

Ideas

Boxed yield closure

We could store a boxed closure for yield in the Write Handle, which would by default just call std::thread::yield_now but could then also be changed to support other mechanisms, which might also enable an async refresh where we dont block the thread but rather yield the current future.

However this would also introduce the extra Cost of another Allocation + calling the closure through the Box for everyone, although the writer is usually not that important for speed in this crate I would rather avoid unnecessary overhead where possible.

Generic-Type-Parameter yield closure

This idea is basically the same as the Boxed yield closure one, although instead of boxing a closure we would instead store the closure or function-pointer in a field and have its type specified using a Generic-Type-Parameter, which would avoid the Cost of the Box + overhead. However this would increase the overall Complexity of the WriteHandle Type, which might not be desireable. However this also gives the most flexability for consumers of the library.

Function Pointers

This idea is again like the previous 2, but we now store a simple function-pointer instead of a closure. This would mitigate both the Problem of extra heap allocations and also not increase the complexity of the WriteHandle Type. However this might limit the consumers of the Library as the function has to be stateless, but this might not matter much as the current function is also stateless.

These were just my initial thoughts on this and I would love to hear others thoughts about it as well. If we come to a conclusion everyone is happy with, I would also happily make these changes and implement it

Memory leak when destroy is not called explicitly

In the case where WriteHandle::destroy is not called, I think we end up leaking pretty much the entire map. Here's why:

If the writer is dropped first, they first swap twice to make the two maps identical, and then forget all the records + drop the write-side of the map. When the last reader goes away, they drop their ReadHandle, which drops the last Arc<AtomicPtr<Inner>>, but that does not in turn drop the Inner!

You could imagine this being solved by the AtomicPtr<Inner> being replaced with a newtype wrapper that drops the Inner when it is itself dropped, but then:

If the last reader is dropped first, it in turn drops the Inner, which de-allocates the map (and all the contained values). When the writer is later dropped, it now has lots of dangling pointers (due to ShallowCopy), a null pointer that it doesn't expect from the reader side, and it doesn't know that it should be forgetting all of its values.

I'm not sure what the best way to fix this is. Possibly we just want dropping the writer to always fully drop the map (basically make WriteHandle::drop == WriteHandle::destroy), since that's the only immediate way I can see of making this safe, but it does mean that you can't now just "keep a map around" without also keeping the writer. Though maybe that's fine. Thoughts @novacrazy? We should probably fix this ASAP.

u256 (sha256)

Hi;
Me again :)

I'd like to store u256 or u8;32 (sha256 hashes) as key/value.

How can i insert it here:
pub struct AppState {
read: evmap::ReadHandle<u64, u64>,
write: Arc<Mutex<evmap::WriteHandle<u64, u64>>>,
}

I tried to put it it says:
the trait evmap::shallow_copy::ShallowCopy is not implemented for [u8; 32]

Thanks :)

Is there a way to get the backing data structure out?

Once we create a writer and reader is there a way to get the underlying data out?

It would be the opposite of left_right::new_from_empty<T, _>

Something like left_right::consume_writer<O>(writer: WriteHandle<T, O>) -> T

It would be only safe to do this once all the readers moved on from the current write side. And the readers could still read the read side so we don't need to consume them.

Would something like this be possible?

Use questions?

  1. How to clear underlying map without dropping it or iterating over all keys and clearing? If i need to start from a scratch?
  2. I see that there are no functions to remove key itself? So after any of remove/clear key is remains in map. Is it normal behavior?
  3. If i use write_handler to read and write after every write operation i should make *.refresh() to get new value on *.get_and(). Is it correct?

Thx in advance!

Memory bloat when using with async runtime

Hi, this is my very first issue on Github (yeah, really), so I'm sorry in advance if I'll do something wrong here.

Consider this simple example:

use std::net::SocketAddr;
use hyper::{Request, Response, Body, Error, Server};
use hyper::service::{service_fn, make_service_fn};

async fn handle(
    _req: Request<Body>,
    _routes: evmap::ReadHandle<&str, &str>
) -> Result<Response<Body>, Error> {
    Ok(Response::new(Body::from("Hello world")))
}

#[tokio::main]
async fn main() {
    let (map_reader, mut map_writer) = evmap::new();
    map_writer.insert("test", "value");
    map_writer.refresh();

    let svc = make_service_fn(move |_| {
        let data = map_reader.clone();
        async {
            Ok::<_, Error>(service_fn(move |req| {
                handle(req, data.clone())
            }))
        }
    });

    let addr: SocketAddr = "127.0.0.1:3000".parse().unwrap();
    let server = Server::bind(&addr).serve(svc);

    if let Err(err) = server.await {
        println!("server error: {}", err);
    }
}

This code does compile.
Here's the list of dependencies for this example, fairly straightforward:

[dependencies]
hyper = "0.13"
tokio = { version = "0.2", features = ["full"] }
evmap = "10.0"

However, the application consumes ever-growing amount of memory, and it becomes specifically obvious under load testing. It doesn't matter whether _routes is actually used in the handler. After the handler was executed multiple times, if we print the ReadHandle afterwards, we would see something like this:

ReadHandle {
    epochs: Mutex {
        data: [
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
        ],
    },
    epoch: 0,
    my_epoch: 0,
}

The data inside the Mutex grows indefinitely, causing a memory bloat.

Thank you!

How to update values without copying

Thanks for this great crate!

Is there a way to work with values inside an evmap mutably? I am storing larger records inside an evmap and would like the ability to update them without making copies. Something like get_and, but with a FnMut? The desire is for something akin to the Entry API that allows fetching and updating a value with a single operation.

As I understand it, this is a bit more complex since this is a multivalued map, and every operation must be encoded as, well, an Operation. But perhaps it could be done by index, for instance

let result: Option<()> = rh.get_and_update(&key, idx, |elem| elem += 1)

where, similarly to get_and, None is returned if the key or index didn't lead to an element.

If I'm understanding Operation correctly, perhaps this could be encoded as
Update(K, Index, NewV), or simply as a Remove followed by an Add.

If this sounds at all like a reasonable or possible thing to add, I would be happy to try and help with implementation. Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.