Giter Site home page Giter Site logo

Comments (10)

keithw avatar keithw commented on June 10, 2024

I'm definitely not an expert in wait/notify primitives. Would we be first non-Web Wasm engine to support this proposal? We might be blazing totally new trails in terms of implementing this outside the comforts of a JavaScript VM that already has mature implementations of wait/notify. If you want useful feedback on the design, it might be necessary to read a broader overview of your thinking first. :-) BTW: do you have a particular multi-threaded application in mind that it would be cool to support with wasm2c? (Is emscripten/LLVM already producing these instructions?)

Off the top of my head it seems like we'd have a few options:

  1. In the wasm2c-generated code, mimic how a C++20 standard library implements std::atomic_wait and std::atomic_notify_all. This seems to be very platform-specific, e.g. with futex on Linux, ulock on MacOS, something else on Windows, etc. (C doesn't seem to have a standardized atomic wait/notify even in C23?) So we might have to start out with partial platform support.
  2. Punt the wait/notify instructions to a new runtime API surface, and make the runtime figure it out. Maybe that's with a platform-specific API (futex/ulock/whatever), or maybe the runtime is written in C++ and uses std::atomic, or maybe it's with POSIX or pthreads, in which case I guess the runtime has to maintain some kind of association between memory locations and (??? POSIX semaphores? pthreads condition variables???) so that a wait on a memory location can become a sem_wait or pthread_cond_wait? (I'm guessing this is where a "map" data structure gets involved?)
  3. In the wasm2c-generated code, implement wait/notify in terms of pthreads, and maintain an internal association table between memory locations and a semaphore/condvar/??? (which would require the generated code to start using some sort of map/hash table)?

Is this sort of the landscape as you see it?

from wabt.

sbc100 avatar sbc100 commented on June 10, 2024

Maybe for these advances features we could maybe generate C++ and use std::atomic_wait and std::atomic_notify_all (as well as std::map if needed). We might need to rename the tools to wasm2mostlyc :-/

from wabt.

shravanrn avatar shravanrn commented on June 10, 2024

Would we be first non-Web Wasm engine to support this proposal? We might be blazing totally new trails in terms of implementing this outside the comforts of a JavaScript VM that already has mature implementations of wait/notify.

I think this maybe supported in wamr and wasmtime as well
https://bytecodealliance.org/articles/wasi-threads

Wamr for example uses lists and hashmaps to implement this
https://github.com/bytecodealliance/wasm-micro-runtime/blob/92e073b8ce80f1d3cc1506b4e9cd458970315182/core/iwasm/common/wasm_shared_memory.c#L238

If you want useful feedback on the design, it might be necessary to read a broader overview of your thinking first. :-)

Ah actually, I was deliberately avoiding this to keep things simple. Broadly speaking, await notify has to employ a mix of slower spinlock fallback strategies and platform specific fast APIs when available as used in the C++ standard library

I think this is actually a bit of a distraction though. The TLDR here is that there is no way to implement these strategies in Wasm compliant way without hashmaps and lists.

The full details in case you are curious, is that wasm specifies that there are no "spurious wakeups" which is bad design choice imho, as this means wasm implementations are on the hook to maintain datastructures that prevent spurious notifies (hence the hashmaps and lists, i.e. the association map from your earlier comment). In practice, spurious notifies are best handled a layer above, as there are often fast domain specific approaches to check if a wakeup is spurious. For example, the typical use of wait/notify is to indicate when a memory value has changed, so a spurious wakeup can be checked by re-reading the value at the memory location and comparing with the old problem (ABA problems notwithstanding)

BTW: do you have a particular multi-threaded application in mind that it would be cool to support with wasm2c? (Is emscripten/LLVM already producing these instructions?)

i kinda just want pthread_create and pthread_join to work tbh :) there are libraries we want to sandbox which do use threadpools internally. In our research, we have previously seen this in video decoding libraries like libvpx

and uses std::atomic,

I think we are on the same page, but I'll document the caveat here just in case --- We definitely won't get to a place where we can fully rely on std::atomic for atomic operations like atomic add, sub etc.
This is because C/C++ atomics are incompatible with having any single address treated as both an atomic variable and non-atomic variable. So using atomic ops on a random index in the linear memory wouldn't work (it will trigger UB). So were are stuck with the current compiler intrinsics approach.

Maybe for these advances features we could maybe generate C++ and use std::atomic_wait and std::atomic_notify_all (as well as std::map if needed). We might need to rename the tools to wasm2mostlyc :-/

We could potentially use C++ std::atomic_wait/std::atomic_notify for just the await/notify part of the spec. A pure C implementation currently requires the following

  • A map data structure
  • A list data structure
  • A timer API
  • A sleep API
  • A mutex API

while the C++ implementation will include all of the above with C++ 11 and provides a builtin atomic wait/notify with C++ 20
https://en.cppreference.com/w/cpp/atomic/atomic/wait

Personally I don't have a strong preference between the C or C++11 version and I'm happy to go with either. I am not a fan of relying on C++20 yet, as many platforms don't have this yet, and Firefox doesn't use this yet. If we go with the C implementation, we would have to find external implementation of at least the map data structures (the others are reasonable enough to implement by hand if needed). What are your preferences @sbc100 @keithw ?

from wabt.

keithw avatar keithw commented on June 10, 2024

Hmm, can you link to the Wasmtime / V8 / SpiderMonkey implementations? I'm curious what they do. (Wasmtime doesn't check the box at https://webassembly.org/roadmap/ so I assumed they don't support this one.)

I don't quite understand the connection between preventing spurious wakeups and maintaining a dynamic data structure (like a map) -- doesn't std::atomic also prevent spurious wakeups without any ancillary data structure? (Point taken that std::atomic can't be used for add, sub, etc.)

What is your thinking about whether this belongs in the generated code vs. creating a new runtime API surface and punting these dynamic data structures to the runtime? Right now the generated code never calls malloc or free (and doesn't maintain any dynamic global data structures at all). Tentatively it seems like it would be nice to maintain this -- you ok with that?

from wabt.

shravanrn avatar shravanrn commented on June 10, 2024

The wasmtime implementation is here bytecodealliance/wasmtime#5255 (comment)

They also rely on a map, but with conditional variables. Not sure if these cvs are backed by futex syscalls behind the scenes though.

https://github.com/bytecodealliance/wasmtime/blob/1cabece50b88e4016e885e226d670916eb27b286/crates/runtime/src/parking_spot.rs#L78

I haven't looked at the browser engine implementations though as these are likely to be heavily tied to JavaScript engines/APIs which we can't easily reuse.

I don't quite understand the connection between preventing spurious wakeups and maintaining a dynamic data structure (like a map) -- doesn't std::atomic also prevent spurious wakeups without any ancillary data structure?

So here is my train of thought. When using syscalls like futex, there is chance of spurious wakeups. If wasm allowed spurious wakeups, we could just do something like

void wait(uintptr_t address) {
  futex_wait(address);
}

void notify(uintptr_t address){ 
   futex_notify(address);
}

But wasm doesn't allow for spurious wakeups. So we need some way to check if the wakeup is spurious. For this check, we can maintain an auxiliary data structure which has state that we can check. Here is a simple example

struct state {
  std::atomic<bool> real_wakeup;
};

std::map<uintptr_t, state> wait_state;

void wait(uintptr_t address) {
  wait_state[address].real_wakeup = false;
  do {
    futex_wait(address)
   } while (wait_state[address].real_wakeup == false); 
}

void notify(uintptr_t address){
   wait_state[address].real_wakeup = true;
   futex_notify(address);
}

std::atomic also prevent spurious wakeups without any ancillary data structure

Depends on the implementation I think. See discussion around tables here

https://github.com/ogiroux/atomic_wait

In general though, my conclusion is that the table-free implementations are not possible if we want to meet all wasm criteria such as:

  • wait/notify on 32 and 64 bit values --- this is hard because futexes only work in 32 bit values. Auxiliary data structure solves this as described in https://github.com/ogiroux/atomic_wait
  • prevent spurious wakeups --- auxiliary data structure solves this as outlined above

What is your thinking about whether this belongs in the generated code vs. creating a new runtime API surface

I don't have a preference where this code lives tbh as long as I can "git clone wabt", and use wasm2c to compile a wasm file which uses pthreads and have everything work as expected. So I'm fine adding some of this to the wasm runtime code in this repo versus generated code if y'all prefer that.

Takeaway: whether implemented in the generated code or runtime code, the algorithms above need a map data structure (and a list as well to support multiple waiters) to store state to prevent spurious wakeups. We should either pick a C data structure library and include that as a dependency of wasm2c (sort of like how the simde library is pulled in as a dependency to implement wasm simd), or go with the c++11 impl.

from wabt.

keithw avatar keithw commented on June 10, 2024

I think if you're amenable to having this be a runtime responsibility, and having our own runtime implementation require C++20 and basically just shell out to std::atomic_wait / atomic_notify_x, that might be the best path to happiness for everybody -- it seems like a lot less code to worry about maintaining/testing/reviewing than if we have to implement this stuff from scratch. :-/

I did find an article on the (seemingly bleeding-edge) implementation of atomic_wait/notify in libstdc++. They agree with you that an ancillary table is required to efficiently wait on a 64-bit value on Linux. It looks like they're using a strategy without dynamic data structures (map or list) -- they seem to statically declare a table of 16 waiters (futex or pthreads condition variable), and they use a dead-simple hash function (basically just modulo) to map addresses to the 16 waiters, and that seems to be basically it as far as data structures. It looks like each wait-ing thread keeps its true "old" value on the stack and double-checks for inequality before returning from atomic_wait. I guess that's how they handle hash collisions -- there's some extra wakeups but it doesn't end up being visible to the calling code. All this complexity/sophistication does reinforce my (tentative) desire to avoid reinventing the wheel if we can avoid it. :-)

from wabt.

keithw avatar keithw commented on June 10, 2024

So using atomic ops on a random index in the linear memory wouldn't work (it will trigger UB).

BTW, isn't it also UB in C any time that two threads load/store from an overlapping memory region unless both operations are atomic? Are we okay with this, or... what does the Wasm proposal say is supposed to happen if two agents load/store from the same shared memory if at least one of them isn't using atomic operations?

from wabt.

conrad-watt avatar conrad-watt commented on June 10, 2024

The Wasm proposal says you'll get very weak, defined behaviour (roughly, all possible interleavings of all bytes of all racing writes may be observed by each racing read). Naively translating Wasm non-atomics back to C non-atomics would theoretically leave open the possibility for non-Wasm-conforming behaviours to be observed when the C is compiled natively. A more conservative translation would be to convert Wasm non-atomics to C11 relaxed atomics if the memory is shared.

from wabt.

anuraaga avatar anuraaga commented on June 10, 2024

Hi all, I just randomly found this issue which seemed to be looking for implementations of wait/notify outside browser. Just in case it helps, here is an implementation in Go

https://github.com/wasilibs/wazero/blob/main/internal/wasm/memory.go#L298

Indeed, the core data structures are a map of lists. Synchronization is with channels, which we're lucky to have in Go.

We're currently holding off on merging this officially as the threads proposal settles (hence "finish thread proposal" in this issue title catching my eye, I understand this is about wasm2c specifically :) ), but linking just in case it can help. Note that while I don't know if this approach is optimal, it is more or less the same implementation as the semi-official Go semaphore implementation (golang.org/x/sync/semaphore) so I suspect it's at least reasonable.

from wabt.

shravanrn avatar shravanrn commented on June 10, 2024

Thanks conrad-watt and anuraaga for the info :)

I think based on the above, I agree we can tentatively start with the C++ 20 runtime implementation as a stopgap --- this will allow us to see what the implementation looks like and play with real code. However, if/when this code is actively used in Firefox, this use case would likely need either a C++11 or C implementation of the same (and thus need to interact with the OS specific futex APIs) . But we can cross the bridge when we get to it.

from wabt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.