Giter Site home page Giter Site logo

Comments (30)

turbolent avatar turbolent commented on May 18, 2024 3

Thank you for this analysis Volodymyr!

These are great points for the README, I will detail better which features are supported and also which are not, and the trade-offs / goals of the project itself.

Performance

I had no idea clang was itself available as a WebAssembly binary, amazing! This is a great stress-test.

My main goal is porting new software to older machines, which have only tens of MHz and memory, ideally on the host itself (no cross-compiling). This is why I looked into streaming compilation (no IR), but also took a few short-cuts.

For now I have focused on getting the test suite to pass and some small programs to compile and run. So far I have not focused on any performance optimizations, other than employing a streaming-compilation approach. It is great to see that even large programs like clang can be translated and performance is already good.

I think there are multiple opportunities to improve the performance and resource footprint further, by improving memory allocation, improving speed by parallelizing compilation, and especially for reducing size by removing the unnecessarily generated code:

  • Always remove parameter names from function prototypes, they are not needed
  • Behind a "compact" flag:
    • Remove spaces for indentation
    • Remove braces for blocks
    • Remove newlines for statement of a single op (e.g. jump op is if-statement + goto statement)
    • Remove unnecessary labels. Streaming compilation makes this harder

Great point about compiling large C files: This is one of the reasons I started this project. SwiftWasm generates very large WebAssembly binaries, which take long to compile to C and then very long to compile to a native binary.

I am currently working on parallel compilation, where the compiler is writing function implementations into several smaller files that can be compiled independently, and also performing this code generation concurrently. This will hopefully make compiling the resulting C code more manageable and also speed up the translation process from WebAssembly to C.

Features

For now I have focused on supporting the core specification 1.0 as outlined in https://www.w3.org/TR/wasm-core-1/.

I have not looked into new approved features, like the ones you listed, mostly because I was not sure if they are needed for compiling applications and libraries written in C, C++, Rust, Swift, etc. It seems that most compilers seem to target the "MVP" specification.

As for the concrete proposals you listed:

  • Sign-extension operations and non-trapping float-to-int conversions: I think this should be "just" implementing the new opcodes
  • Multi-value: I had a quick look, it seems like it is going to complicate the compiler quite a bit
  • Bulk memory operations, exception handling, reference types, multiple memories: These features all seem to add new opcodes, new types, and extend the language quite a lot. They are going to be quite a bit of work to implement

Do you know if there are certain compilers targeting WebAssembly that leverage or even require these feature?

Coremark results

Performance should be very similar to wasm2c, as the generated code pretty much the same.
I have not looked into opportunities to generate more efficient code.

wasm2native integration

This looks like a great project! I was looking for a wasi implementation for wasm2c, awesome work!

Currently the API that w2c2 provides is slightly different than wasm2c, and it produces differently mangled names (mostly they are shorter, e.g. don't include type information). I think this difference could be worked around or even removed.

from w2c2.

kripken avatar kripken commented on May 18, 2024 3

Some thoughts (great project btw!):

Sign-extension [..] non-trapping float-to-int [..] Multi-value [..] Bulk memory operations, exception handling, reference types, multiple memories [..] Do you know if there are certain compilers targeting WebAssembly that leverage or even require these feature?

Several of those features are implemented in LLVM and are optional in the Emscripten and Rust toolchains for example. But they are not required, so projects using them could just be recompiled for wasm 1.0.

Exceptions is maybe the hardest of those. You can compile C++ exceptions and longjmp with wasm 1.0 today, but then you have to use the Emscripten EH model which requires supporting some extra imports (emscripten's wasm2c layer emits them), but also it is probably slower than wasm EH.

Overall, my guess is that compiling wasm 1.0 to C is "good enough" for most things in the C/C++/Rust/Zig/etc. world today. Exceptions maybe raise the question of compiling to C++ instead, and wasm GC maybe suggests compiling to a GC language, but maybe those features would be out of scope of this project anyhow?

Coremark results: Performance should be very similar to wasm2c, as the generated code pretty much the same. I have not looked into opportunities to generate more efficient code.

I would guess nothing is needed in w2c2 or wasm2c for performance since the C compiler does the hard work anyhow. The big question is whether there are things that wasm->C translation can optimize that a C compiler can't, but it's hard for me to think of anything...

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 3

@turbolent , wasm3 w2c2.wasm w2c2.wasm > w2c2.c works with #7 ! 🎉 🎉 🎉

from w2c2.

turbolent avatar turbolent commented on May 18, 2024 2

@vshymanskyy I've added support for parallel compilation in #2. Going to work on reducing the output size now

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 2

Ok, got it working. clang.wasm was compiled and linked (using Clang 12 + LLD) in parallel mode (-j 12) in just 3m34s.
Will send a PR with my changes.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 2

@turbolent just pushed changes to wasm2native. You should be able to:

git clone https://github.com/vshymanskyy/wasm2native.git
cd wasm2native
export CC="clang-12"
export LDFLAGS="-fuse-ld=lld"
./build.sh path/to/clang.wasm

To run the resulting clang.elf, you can replace the compilation command in wasm3-self-compiling.
But it should work as a standalone compiler as well.

from w2c2.

turbolent avatar turbolent commented on May 18, 2024 2

Update: The WASI implementation now has support for big-endian machines and fd_readdir.

QEMU's user mode emulation is really useful for development, but also has some issues, e.g. when using it on a 64-bit host running 32-bit executables and performing filesystem operations.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 1

Great news. #2 along with #3 bring clang compilation time from 20 minutes to ~3 minutes (gcc, 12 threads, -O3).
I was not able to link it yet, will take look into this later.

Overall, this is a huge improvement 🎉 🎉 🎉

from w2c2.

cjihrig avatar cjihrig commented on May 18, 2024 1

I also wonder what the minimum requirements for uvwasi and especially libuv are (C standard, endianness, etc.)

Both target C89 and run on little and big endian machines.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 1

I was able to add w2c2 as an alternative translator for wasm2native.
Re-mapping of symbols is easy:

    #define Z_fd_prestat_getZ_iii               fdX5FprestatX5Fget
    #define Z_fd_prestat_dir_nameZ_iiii         fdX5FprestatX5FdirX5Fname
    #define Z_environ_sizes_getZ_iii            environX5FsizesX5Fget
    #define Z_environ_getZ_iii                  environX5Fget
    ....

Along with customized IMPORT_IMPL* definitions.

@turbolent When generating multiple files with -j flag, I'm getting lots of "multiple definitions" of e_X5Fstart and e_memory (defined in decls.h, then included in each c file). Fixed it by making them extern and defining in my main.c.

With this I was able to build multiple rather complex wasi apps.
But for clang.wasm, I'm getting huge amount of those:

/usr/bin/ld: <artificial>:(.text+0x29a8): undefined reference to `e_X5FZNSt3X5FX5F26locale8X5FX5FglobalEv'
/usr/bin/ld: <artificial>:(.text+0x29b9): undefined reference to `e_X5FZNSt3X5FX5F26localeC2Ev'
/usr/bin/ld: <artificial>:(.text+0x29ca): undefined reference to `e_X5FZNSt3X5FX5F26localeC2ERKS0X5F'
/usr/bin/ld: <artificial>:(.text+0x29db): undefined reference to `e_X5FZNSt3X5FX5F26localeD2Ev'
/usr/bin/ld: <artificial>:(.text+0x29ec): undefined reference to `e_X5FZNSt3X5FX5F26localeC2EPKc'
/usr/bin/ld: <artificial>:(.text+0x29fd): undefined reference to `e_X5FZNSt3X5FX5F26localeC2ERKNSX5F12ba
...

from w2c2.

turbolent avatar turbolent commented on May 18, 2024 1

@cjihrig that's great! Thank you for the information 👍

https://github.com/libuv/libuv/blob/v1.x/SUPPORTED_PLATFORMS.md looks good too, I'll try on some of my older machines (e.g. Mac OS X < 10.7, especially on PowerPC; IRIX on MIPS; OpenStep/NeXTSTEP on x86/PowerPC/HPPA)

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 1

@turbolent do all/any of these platforms support CMake? Do you compile on the target, or do you cross-compile?

wasm2native should conceptually support Big-Endian systems, like Wasm3 does. It needs some debugging, as testing with QEMU showed there are issues (but looks like we're almost there).

from w2c2.

cjihrig avatar cjihrig commented on May 18, 2024 1

@vshymanskyy Node.js builds libuv with gyp. I'm not sure if you can use it for inspiration, but here is the gyp file.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 1

Overall, it may be a good idea to move wasi implementation into a separate project. It's rather complicated, esp. if targeting multiple OS environments.
I this case it could be reused by wasm3, for example.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024 1

With some efforts, it should be possible to w2c2 w2c2.wasm > w2c2.c 😱
Or even... wasm3 w2c2.wasm w2c2.wasm > w2c2.c
But parallel translation won't be posible with wasm2c.wasm atm (no pthreads in WASI yet: WebAssembly/wasi-libc#209).

from w2c2.

turbolent avatar turbolent commented on May 18, 2024 1

@vshymanskyy Wow, awesome! This is actually going to be really useful on platforms with bad C compilers (e.g. the SGI MIPSpro compiler on IRIX) 👍

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024

Features comparison.

w2c2 is not yet handling some of the important wasm proposals:

  • Sign-extension operations
  • Non-trapping Float-to-int Conversions
  • Multi-value

These are implemented in wasm2c and covered by opam-1.1.1 spec tests.

Other important proposals include:

  • Bulk memory operations
  • Exception handling
  • Reference types
  • Multiple memories

These (AFAIK at the time of writing) are not available in wasm2c. Spec tests for these features are available in main branch.

P.S. I think I can add w2c2 as an alternative translator for wasm2native.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024

Coremark 1.0 results

Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz, single-thread

GCC 10.3.0, optimization level: -O3

  • Native: 33305.578684 (direct execution, without wasm stage)
  • w2c2: 27469.603167
  • wasm2c: 27458.936027

Clang 12.0.0, optimization level: -O3

  • Native: 28793.550245
  • w2c2: 26085.192936
  • wasm2c: 26125.000733

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

@kripken Thank you for answering my questions, that makes a lot of sense!

Exceptions maybe raise the question of compiling to C++ instead, and wasm GC maybe suggests compiling to a GC language, but maybe those features would be out of scope of this project anyhow?

I have no immediate plans for generating other code in other languages, but even though the C code generation (in c.c) is currently writing C directly, it would be possible to abstract it away, e.g. by calling code generating functions, for different target languages.

[...] the C compiler does the hard work anyhow. The big question is whether there are things that wasm->C translation can optimize that a C compiler can't, but it's hard for me to think of anything..

I mostly focused on streaming code generation and assumed that tools like binaryen's wasm-opt can take care of optimizations and w2c2 does not need to reinvent the wheel here. It is likely that such pre-C optimizations are actually useful, as older C compilers likely do not have as advanced optimizations as modern compilers, though I have not tested this yet.

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

With #3 merged, the output size is now reasonable

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

@vshymanskyy That's great to hear! Linking clang.wasm will require implementing quite a few more WASI functions (e.g. file I/O, FS operations), so far I had only implemented enough to get coremark.wasm to run.

I saw you started work on a WASI implementation for wasm2c in wasm2native, great work! It would be nice to leverage this existing work and not have to re-implement the WASI spec specifically for w2c2 again. First step would be to make the API of the generated code of w2c2 match that of wasm2c. I started looking into the differences and noticed that wasm2c includes the parameter types and return types in the mangled names of the imports and exports – is this really necessary?

I also wonder what the minimum requirements for uvwasi and especially libuv are (C standard, endianness, etc.)

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

WASI implementation improvements are work in progress in #4

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

@vshymanskyy Wow, nice! It didn't even occur to me to use the pre-processor to add compatibility, great idea 👍

Also, thank you for the bug report and also the fix!

Great to hear clang builds in reasonable time 🎉 Maybe it can be sped up with -O0? (see https://maskray.me/blog/2021-12-19-why-isnt-ld.lld-faster)

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024

I'd like to get rid of CMake dependency for wasm2native, but libuv only supports autotools or CMake officially.

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

@vshymanskyy For Mac OS X 10.4/10.5 tigerbrew provides CMake 3.6, I'm not sure if that's enough.

As for big-endian support: wasm2c and w2c2 both use "negative memory" (see the end of https://skmp.dev/blog/negative-addressing-bswap/), so access in memory mem with size size at offset off is mem + size - off. Values are in native endianness, but that also means that e.g. data (e.g. string values) is stored in "reversed order" and e.g. needs a reverse before a syscall (e.g. write). I think the first step is to add a big-endian definition for https://github.com/vshymanskyy/wasm2native/blob/afa64bee90b3483e8747f3533906a7332c588a6b/src/wasi-main.c#L16 and then add reverse operations where data is read. I wonder which of the two options I can think off are more efficient:

  • Two reverses in the linear memory that is pointed to
  • One copy of the memory that is pointed to into a temporary allocation and a single reverse of it

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

@vshymanskyy I now have enough of a WASI implementation that should be able to run clang.wasm, e.g. the input file is checked to be existing, but then clang just exits. A simple Rust program exercising all the functionality works as expected, so I'm wondering what I'm missing.

Do you have instructions on how you compiled/created clang.wasm? I would like to build it with assertions enabled, so I can debug it further.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024

@turbolent thanks for explanations on Big-Endian, will check it soon.

Just checked my clang compilation. For this test I replaced clang.wasm here in my wasm3 self compilation experiment here: https://github.com/wasm3/wasm3-self-compiling/blob/ee61ccecdf30bee73f3c640764896da8f6ca439d/Makefile#L33
It looks working well.

I didn't push my changes to wasm2native yet. I'll let you know when it's ready.

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

@vshymanskyy Great to see you've added support for w2c2!

I've got clang.wasm working in #4 and documented the status for all WASI syscalls in the README. There are still a few missing and I'll add them as needed, contributions are very welcome! Agreed on moving the WASI implementation to a separate repository once it is in a more complete state.

When compiling clang.wasm I noticed that inits.c is still huge (16MB), so I'll look into splitting how to split it up further, as currently I can only compile it with -O0.

from w2c2.

vshymanskyy avatar vshymanskyy commented on May 18, 2024

Awesome! 🙌

from w2c2.

turbolent avatar turbolent commented on May 18, 2024

As mentioned above, I ran into linking errors for clang on Mac OS X 10.4 on PowerPC. I assumed this was due to large static arrays that are generated for the data segments. I found some platform-specific ways of embedding the data segments directly in the binary without having to generate large array literals in the source in #8. This is generally useful, however, I am still getting linker issues for the example above, probably due to the binary being too large and jump instructions overflowing?

from w2c2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.