Giter Site home page Giter Site logo

Comments (6)

keithw avatar keithw commented on June 2, 2024 1

I think the biggest difference is the intention behind each tool.

  • wasm2c is implementing the WebAssembly spec, including the "sandboxing" (e.g., bounds-checking of memory and table accesses, type-safety of indirect calls, clean trap on stack exhaustion, etc.), including deterministic traps for out-of-bounds accesses even if the result of the read/get doesn't affect the result of the function (and so could get optimized out), etc. There's a performance cost to some of this. wasm2c passes 100% of the current Wasm testsuite even when the output goes through an optimizing compiler1.
  • as I understand it, w2c2 is trying to take a "hopefully well-behaved" WebAssembly module and sort of "un-translate" it back to the C it might have come from and then run it, using the protections of the operating system to isolate the (possibly misbehaving) program from other processes. My understanding is that w2c2 doesn't try to follow the WebAssembly spec when it comes to trapping OOB accesses, making OOB accesses deterministic, handling stack exhaustion, runtime checks for indirect call types, etc.; the expectation is that the program is already sandboxed by the OS and that it's okay for misbehaving programs to be nondeterministic.

So I think if you're looking for a tool that creates "sandboxed executables," I don't think w2c2 is trying to do that unless you consider the OS's process isolation to be enough sandboxing. (But to be fair, we should probably ask the w2c2 folks -- I don't want to speak for them!)

Beyond that, I think the remaining differences are more minor. I think w2c2 used to be a lot faster than wasm2c on transpiling large modules, but wasm2c made a big speed improvement in #2171 and I would be surprised if there were still big differences. (Although we'd love to know if there are!) It looks like both wasm2c and w2c2 have "parallel output" modes that can output, e.g., 256 .c files that can all be compiled in parallel; this makes it practical to transpile things like clang or other gigantic modules and then compile the result to machine code.

It looks like wasm2c supports a broader list of Wasm features (multi-value, multi-memory, memory64, reference types, exception-handling, SIMD, extended-const, tail calls coming in #2272); on the other hand, w2c2 has built-in support for much of WASI preview1 and uses libdwarf to include debugging information in the C output (traced back to the debugging information in the Wasm input), which is a great feature.

Footnotes

  1. As of the last update. https://github.com/WebAssembly/wabt/pull/2287 catches us up to the current testsuite.

from wabt.

keithw avatar keithw commented on June 2, 2024

It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API.

Yes, exactly. Here's an in-process implementation that you might find helpful to draw from: #2002

What does the __snapshot__preview1 aspect mean in the function call name?

wasi_snapshot_preview1 is the full name of the API that emscripten is producing code against (https://github.com/WebAssembly/WASI/blob/main/legacy/preview1/docs.md#-wasi_snapshot_preview1). It's the "preview1" version of WASI. (I understand the plan is to have a "preview2" version in the coming months.)

And what about w2c_env_0x5F_syscall_unlinkat? Why is this not translated into the path_unlink_file WASI call?

It looks like emscripten's current implementation of the unlink function is written in terms of emscripten's own filesystem API and not in terms of the WASI API, so it's showing up as a symbol in LLVM's global C namespace ("env"). I think you'd (probably) get different results compiling with wasi-sdk. Either way, though, the host will need to provide an implementation of this function.

What is struct w2c_env and struct w2c_wasi__snapshot__preview1? How are these initialised? Are they documented anywhere?

These are opaque instance pointers whose contents are up to the implementer of the WASI API (and in this case, also the "env" API that provides unlinkat). They represent the imported modules; the host has to provide these pointers when constructing an instance of your module, and then the module provides the same pointer back when calling a method from the corresponding API. It's basically like the this pointer when calling a method in C++.

Here's an example where the host defines the host API and the w2c_host structure, and then gives a pointer to the module so it can call functions from the host API: https://github.com/WebAssembly/wabt/blob/main/wasm2c/examples/rot13/main.c

from wabt.

kiancross avatar kiancross commented on June 2, 2024

Thanks a lot for the reply @keithw. Your answers clarify most aspects of my questions.

It looks like emscripten's current implementation of the unlink function is written in terms of emscripten's own filesystem API and not in terms of the WASI API, so it's showing up as a symbol in LLVM's global C namespace ("env"). I think you'd (probably) get different results compiling with wasi-sdk. Either way, though, the host will need to provide an implementation of this function.

Indeed, compiling with wasi-sdk fixed this (now using path_unlink_file).

Am I understanding the following process correctly:

  1. A web assembly compiler takes a C file and compiles it into a web assembly (object?) module.
  2. If this C code uses libc, there will be various external symbols (e.g., open, close, unlinkat).
  3. The wasi-sdk links against wasi-libc, which provides an implementation of these functions in terms of the WASI API, whereas the Emscripten libc provides an implementation of these functions as a mixture of WASI API and Emcripten API calls.
  4. The runtime (e.g., Node, Wasmtime, web browser, wasm2c) must provide an implementation of any of these API functions, if they are used.
  5. #2002 is a work-in-progress implementation of the WASI-API for the wasm2c runtime?

Out of interest, does web assembly differentiate (nominally or otherwise) between 'object' code and 'executable' modules?

It's my understanding that the w2c_wasi__snapshot__preview1_fd_write error is because I need to provide an implementation of the WASI API.

Yes, exactly. Here's an in-process implementation that you might find helpful to draw from: #2002

Thanks for the pointer. Does this imply that previous/current sandboxing, which has utilised wasm2c (e.g., on Firefox), relies on the sandboxed code not needing to interact with the system (i.e., being computation only, rather than making system calls)?

from wabt.

keithw avatar keithw commented on June 2, 2024

Am I understanding the following process correctly:

1. A web assembly compiler takes a C file and compiles it into a web assembly (object?) module.

✅ (I think most compilers are using the LLVM wasm backend to generate the Wasm...)

2. If this C code uses `libc`, there will be various external symbols (e.g., `open`, `close`, `unlinkat`).

3. The `wasi-sdk` links against `wasi-libc`, which provides an implementation of these functions in terms of the WASI API, whereas the Emscripten `libc` provides an implementation of these functions as a mixture of WASI API and Emcripten API calls.

4. The runtime (e.g., Node, Wasmtime, web browser, wasm2c) must provide an implementation of any of these API functions, if they are used.

Here I would distinguish between the WebAssembly "runtime" (which for wasm2c is extremely minimal: https://github.com/WebAssembly/wabt/blob/main/wasm2c/README.md#symbols-that-must-be-defined-by-the-embedder) vs. the "host," who provides the imported host functions.

In wasm2c, the Wasm module is transformed into generated C code that expects to link with

  1. an implementation of the runtime API (we provide one implementation here: https://github.com/WebAssembly/wabt/blob/main/wasm2c/wasm-rt-impl.c)
  2. implementations of the imported functions, which may be provided either by
    a. another Wasm module (also run through wasm2c), or
    b. the host
5. [wasm2c: uvwasi support #2002](https://github.com/WebAssembly/wabt/pull/2002) is a work-in-progress implementation of the WASI-API for the wasm2c runtime?

Yeah, I'd say it's a WIP implementation of the WASI API as host functions that will work with Wasm modules run through wasm2c. You could also imagine implementating the WASI API as another Wasm module and run that through wasm2c (but ultimately there needs to be some host API that gets called if you want to interact with the real world).

Out of interest, does web assembly differentiate (nominally or otherwise) between 'object' code and 'executable' modules?

Yes, "objects" carry custom sections for linking/relocations that are used by the LLVM linker. (The linker doesn't have to parse the code section -- it just applies relocations.)

Thanks for the pointer. Does this imply that previous/current sandboxing, which has utilised wasm2c (e.g., on Firefox), relies on the sandboxed code not needing to interact with the system (i.e., being computation only, rather than making system calls)?

I'm not 100% sure -- #2002 is from the same team that's been responsible for the wasm2c work in Firefox, so my understanding is that they are starting to need support for some WASI calls. But I think you're basically right; a lot of libraries do not need to make (many) system calls, so you can get pretty far with sandboxing without needing much support on that end. (Even many libc functions don't ultimately end up making a syscall, including malloc and free.) They have a lot of information here: https://rlbox.dev/

from wabt.

keithw avatar keithw commented on June 2, 2024

Closing as answered, but happy to keep helping if we can.

from wabt.

kiancross avatar kiancross commented on June 2, 2024

Thanks again for your reply, @keithw, and apologies for my slow responses; it takes me some time to do additional reading/process your answers, with the hope that I don't then ask silly questions!

I am currently working on performance benchmarking various compartmentalisation/sandboxing mechanisms, hence trying to understand the various stages in the pipeline from source code to sandboxed executable. At each stage, there are quite a few options, which causes a small combinatoric explosion. It may be that all configurations perform relatively similarly, but I suppose various runtimes (and possibly even different host implementations of APIs, such as WASI) will have different performance characteristics.

This leads to my next question: what is the difference between wasm2c and w2c2? Are they simply competing tools, or do they serve different purposes/have different aims? Are there any significantly differing design decisions? I am interested, as turbolent/w2c2#1 seems to suggest differences in performance (runtime performance, binary size etc.). I appreciate that this is another tool, so you may not know!

from wabt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.