Comments (6)
I think the biggest difference is the intention behind each tool.
- wasm2c is implementing the WebAssembly spec, including the "sandboxing" (e.g., bounds-checking of memory and table accesses, type-safety of indirect calls, clean trap on stack exhaustion, etc.), including deterministic traps for out-of-bounds accesses even if the result of the read/get doesn't affect the result of the function (and so could get optimized out), etc. There's a performance cost to some of this. wasm2c passes 100% of the current Wasm testsuite even when the output goes through an optimizing compiler1.
- as I understand it, w2c2 is trying to take a "hopefully well-behaved" WebAssembly module and sort of "un-translate" it back to the C it might have come from and then run it, using the protections of the operating system to isolate the (possibly misbehaving) program from other processes. My understanding is that w2c2 doesn't try to follow the WebAssembly spec when it comes to trapping OOB accesses, making OOB accesses deterministic, handling stack exhaustion, runtime checks for indirect call types, etc.; the expectation is that the program is already sandboxed by the OS and that it's okay for misbehaving programs to be nondeterministic.
So I think if you're looking for a tool that creates "sandboxed executables," I don't think w2c2 is trying to do that unless you consider the OS's process isolation to be enough sandboxing. (But to be fair, we should probably ask the w2c2 folks -- I don't want to speak for them!)
Beyond that, I think the remaining differences are more minor. I think w2c2 used to be a lot faster than wasm2c on transpiling large modules, but wasm2c made a big speed improvement in #2171 and I would be surprised if there were still big differences. (Although we'd love to know if there are!) It looks like both wasm2c and w2c2 have "parallel output" modes that can output, e.g., 256 .c files that can all be compiled in parallel; this makes it practical to transpile things like clang or other gigantic modules and then compile the result to machine code.
It looks like wasm2c supports a broader list of Wasm features (multi-value, multi-memory, memory64, reference types, exception-handling, SIMD, extended-const, tail calls coming in #2272); on the other hand, w2c2 has built-in support for much of WASI preview1 and uses libdwarf to include debugging information in the C output (traced back to the debugging information in the Wasm input), which is a great feature.
Footnotes
-
As of the last update. https://github.com/WebAssembly/wabt/pull/2287 catches us up to the current testsuite. ↩
from wabt.
It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API.
Yes, exactly. Here's an in-process implementation that you might find helpful to draw from: #2002
What does the __snapshot__preview1 aspect mean in the function call name?
wasi_snapshot_preview1
is the full name of the API that emscripten is producing code against (https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#-wasi_snapshot_preview1). It's the "preview1" version of WASI. (I understand the plan is to have a "preview2" version in the coming months.)
And what about w2c_env_0x5F_syscall_unlinkat? Why is this not translated into the path_unlink_file WASI call?
It looks like emscripten's current implementation of the unlink
function is written in terms of emscripten's own filesystem API and not in terms of the WASI API, so it's showing up as a symbol in LLVM's global C namespace ("env"). I think you'd (probably) get different results compiling with wasi-sdk. Either way, though, the host will need to provide an implementation of this function.
What is struct w2c_env and struct w2c_wasi__snapshot__preview1? How are these initialised? Are they documented anywhere?
These are opaque instance pointers whose contents are up to the implementer of the WASI API (and in this case, also the "env" API that provides unlinkat
). They represent the imported modules; the host has to provide these pointers when constructing an instance of your module, and then the module provides the same pointer back when calling a method from the corresponding API. It's basically like the this
pointer when calling a method in C++.
Here's an example where the host defines the host
API and the w2c_host
structure, and then gives a pointer to the module so it can call functions from the host
API: https://github.com/WebAssembly/wabt/blob/main/wasm2c/examples/rot13/main.c
from wabt.
Thanks a lot for the reply @keithw. Your answers clarify most aspects of my questions.
It looks like emscripten's current implementation of the
unlink
function is written in terms of emscripten's own filesystem API and not in terms of the WASI API, so it's showing up as a symbol in LLVM's global C namespace ("env"). I think you'd (probably) get different results compiling with wasi-sdk. Either way, though, the host will need to provide an implementation of this function.
Indeed, compiling with wasi-sdk
fixed this (now using path_unlink_file
).
Am I understanding the following process correctly:
- A web assembly compiler takes a C file and compiles it into a web assembly (object?) module.
- If this C code uses
libc
, there will be various external symbols (e.g.,open
,close
,unlinkat
). - The
wasi-sdk
links againstwasi-libc
, which provides an implementation of these functions in terms of the WASI API, whereas the Emscriptenlibc
provides an implementation of these functions as a mixture of WASI API and Emcripten API calls. - The runtime (e.g., Node, Wasmtime, web browser, wasm2c) must provide an implementation of any of these API functions, if they are used.
- #2002 is a work-in-progress implementation of the WASI-API for the wasm2c runtime?
Out of interest, does web assembly differentiate (nominally or otherwise) between 'object' code and 'executable' modules?
It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API.
Yes, exactly. Here's an in-process implementation that you might find helpful to draw from: #2002
Thanks for the pointer. Does this imply that previous/current sandboxing, which has utilised wasm2c (e.g., on Firefox), relies on the sandboxed code not needing to interact with the system (i.e., being computation only, rather than making system calls)?
from wabt.
Am I understanding the following process correctly:
1. A web assembly compiler takes a C file and compiles it into a web assembly (object?) module.
✅ (I think most compilers are using the LLVM wasm backend to generate the Wasm...)
2. If this C code uses `libc`, there will be various external symbols (e.g., `open`, `close`, `unlinkat`).
✅
3. The `wasi-sdk` links against `wasi-libc`, which provides an implementation of these functions in terms of the WASI API, whereas the Emscripten `libc` provides an implementation of these functions as a mixture of WASI API and Emcripten API calls.
✅
4. The runtime (e.g., Node, Wasmtime, web browser, wasm2c) must provide an implementation of any of these API functions, if they are used.
Here I would distinguish between the WebAssembly "runtime" (which for wasm2c is extremely minimal: https://github.com/WebAssembly/wabt/blob/main/wasm2c/README.md#symbols-that-must-be-defined-by-the-embedder) vs. the "host," who provides the imported host functions.
In wasm2c, the Wasm module is transformed into generated C code that expects to link with
- an implementation of the runtime API (we provide one implementation here: https://github.com/WebAssembly/wabt/blob/main/wasm2c/wasm-rt-impl.c)
- implementations of the imported functions, which may be provided either by
a. another Wasm module (also run through wasm2c), or
b. the host
5. [wasm2c: uvwasi support #2002](https://github.com/WebAssembly/wabt/pull/2002) is a work-in-progress implementation of the WASI-API for the wasm2c runtime?
Yeah, I'd say it's a WIP implementation of the WASI API as host functions that will work with Wasm modules run through wasm2c. You could also imagine implementating the WASI API as another Wasm module and run that through wasm2c (but ultimately there needs to be some host API that gets called if you want to interact with the real world).
Out of interest, does web assembly differentiate (nominally or otherwise) between 'object' code and 'executable' modules?
Yes, "objects" carry custom sections for linking/relocations that are used by the LLVM linker. (The linker doesn't have to parse the code section -- it just applies relocations.)
Thanks for the pointer. Does this imply that previous/current sandboxing, which has utilised wasm2c (e.g., on Firefox), relies on the sandboxed code not needing to interact with the system (i.e., being computation only, rather than making system calls)?
I'm not 100% sure -- #2002 is from the same team that's been responsible for the wasm2c work in Firefox, so my understanding is that they are starting to need support for some WASI calls. But I think you're basically right; a lot of libraries do not need to make (many) system calls, so you can get pretty far with sandboxing without needing much support on that end. (Even many libc functions don't ultimately end up making a syscall, including malloc and free.) They have a lot of information here: https://rlbox.dev/
from wabt.
Closing as answered, but happy to keep helping if we can.
from wabt.
Thanks again for your reply, @keithw, and apologies for my slow responses; it takes me some time to do additional reading/process your answers, with the hope that I don't then ask silly questions!
I am currently working on performance benchmarking various compartmentalisation/sandboxing mechanisms, hence trying to understand the various stages in the pipeline from source code to sandboxed executable. At each stage, there are quite a few options, which causes a small combinatoric explosion. It may be that all configurations perform relatively similarly, but I suppose various runtimes (and possibly even different host implementations of APIs, such as WASI) will have different performance characteristics.
This leads to my next question: what is the difference between wasm2c
and w2c2
? Are they simply competing tools, or do they serve different purposes/have different aims? Are there any significantly differing design decisions? I am interested, as turbolent/w2c2#1 seems to suggest differences in performance (runtime performance, binary size etc.). I appreciate that this is another tool, so you may not know!
from wabt.
Related Issues (20)
- wasm2c: Are module instances truly thread-safe? HOT 5
- can wast2json write out binary modules as they are without error checking? HOT 3
- wat2wasm fails converting .wast files in testsuite: error: unexpected token (, expected EOF. HOT 4
- feature request: support for WASI preview 2 component model HOT 1
- wat2wasm segfaults on .wat file with many nested if statements HOT 9
- Use of call_ref does not take a type indice in wat2wasm HOT 1
- Output results of the wasm-decompile to be easier to understand which function is called by call_indirect HOT 1
- [wasm2c] MSVC miscompiles for certain fp constants HOT 5
- WASM2C - What happened to wasm_rt_allocate_memory HOT 3
- Error using wasm2wat on a wasm file generated by Moonbit: "unexpected type form (got -0x30)" HOT 1
- Out-of-Memory Program Abort in wabt::interp::Table::Grow() HOT 2
- Out-of-Memory Program Abort in BinaryReaderInterp::OnDataCount()
- Invalid Memory Read in FreeList<wabt::interp::Object*>::IsUsed()
- error initializing module: invalid import "a.a" HOT 1
- Error while running testsuite (simd_lane, simd_load) "loop not vectorized" HOT 3
- wasm2wat: support component wasm HOT 1
- Wrong type error when validating globals with gc proposal features
- wat2wasm: Assertion `!"ParseExpr should only be called when IsExpr() is true"' failed in wabt::WastParser::ParseExpr
- Wast2Json fails on the testsuite HOT 8
- Library not loaded: /usr/local/opt/openssl@3/lib/libcrypto.3.dylib HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wabt.