cranestation / lightbeam Goto Github PK

View Code? Open in Web Editor NEW

248.0 248.0 15.0 777 KB

Lightbeam has moved and now lives in the Wasmtime repository!

Home Page: https://github.com/CraneStation/wasmtime

License: Apache License 2.0

Rust 99.97% WebAssembly 0.03%

codegen compiler jit rust wasm

lightbeam's People

Contributors

Stargazers

Watchers

Forkers

pepyakin eira-fransham isgasho tiborvass jakelang afinch7 s-you happy-ferret yurydelendik sknutti mrowqa jlb6740 forkkit iq-scm

lightbeam's Issues

Lightbeam vs Cranelift and Faerie

Having not used Cranelift but trying to undertsand around how all the projects in Cranestation fit and are to be used together, I have noticed that Cranelift has a component faerie that I think is intended to take Cranelift IR and create machine code. How does lightbeam compare? Is it intended to be a standalone tool or a component in a VM or something else?

not yet implemented: We can't handle cycles in the register allocator

I tried to compile rustc_binary.wasm using lightbeam backed wasmtime, but it panicked with:

thread 'main' panicked at 'not yet implemented: We can't handle cycles in the register allocator: [(Reg(Rq(6)), Reg(Rq(2))), (Reg(Rq(2)), Reg(Rq(6)))]', lightbeam/src/backend.rs:4638:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: std::panicking::begin_panic_fmt
   7: lightbeam::backend::Context<M>::pass_outgoing_args
   8: lightbeam::backend::Context<M>::call_direct_imported
   9: lightbeam::function_body::translate
  10: <wasmtime_environ::lightbeam::Lightbeam as wasmtime_environ::compilation::Compiler>::compile_module
  11: wasmtime_jit::compiler::Compiler::compile
  12: wasmtime_jit::instantiate::RawCompiledModule::new
  13: wasmtime_jit::instantiate::instantiate
  14: wasmtime_jit::context::Context::instantiate_module
  15: wasmtime::handle_module
  16: wasmtime::main
  17: std::rt::lang_start::{{closure}}
  18: std::panicking::try::do_call
  19: __rust_maybe_catch_panic
  20: std::rt::lang_start_internal
  21: main

New stack value type: condition

This means we can avoid temporary variables in the vast majority-case that the test is directly before a jump (emitting cmp ...; jnae ... instead of using setnae, for example), but it introduces significant complexity because we can't emit the cmp immediately, we have to delay emitting the cmp until the condition is either used or something else is pushed, because in the case that the condition has to be stored to a register we have to zero that register before the cmp. This complexity will probably be confined to push, but it's still problematic.

EDIT: I just realised that we can use mov ..., 0 to zero the register which doesn't affect flags

ctz and clz for 0 gives unexpected result on older cpus

Tests fail on older cpus(intel sandy bridge and older). ctz and clz give 0 instead of bit width.
Cpus tested: e5-2670 v1(fail), i7-3820(fail), i7-4790K(pass)

I would assume this has to do with the way that tzcnt and lzcnt are implemented on these cpus.

Milestone: Fibonacci function

Recursive Fibonacci is a nice next milestone because it needs some control flow operators and calls, which should really help establish the shape of the backend.

The steps are:

Implement if+else+end.
- We'll need a stack for tracking nested control flow constructs. To start with there's only one kind of control-flow stack entry, for if/else/end, but eventually we'll have more. It can hold DynamicLables to keep track of the labels for branching.
- The result value of an if can be carried by the stack just like normal operator results. (For now; later with on-the-fly register allocation it can be more sophisticated.)
Implement calls
- This is a good time to generalize the handling of function signatures. I think we can just collect all the FuncTypes from the type section into a Vec and pass that around for now.
- Optionally, to keep things simple for this step, we could limit support to just 6 integer arguments, to avoid having to deal with stack arguments just yet.

I've left out a lot of the low-level details here; feel free to ask for more detail!

Milestone: Compile a complete function

Let's start by working to compile this function into machine code. The steps are:

Handle local variable declarations -- for now, we'll assume the locals will be allocated in a contiguous memory region, so the task here is: using the information from the local declarations at the top of function_body::translate, to write a function that maps from local indices to offsets in that region, and to compute the needed size of that region.
Keep track of the stack pointer -- Move the backend's Registers struct into a backend Context, add a field recording the current depth of the stack pointer, and make push_i32 and pop_i32 subtract from and add to this field so that we always know where the stack pointer is (relative to where it started).
Implement get_local/set_local -- In function_body.rs, use the mapping we created above to get the offset within the locals area, then pass that to backend.rs to do the load or store. In the backend, this will look like mov offset(%rsp), Rq(op) where offset is the offset within the locals area plus the current stack depth.
Implement function entry -- For now, we can use this simple sequence:

	push   %rbp               # save incoming frame pointer
	mov    %rbp, %rsp         # copy stack pointer to frame pointer
        sub    %rsp, framesize    # allocate the stack frame (the stack grows down)

where framesize is the size of the locals area.

Store the incoming function arguments into their slots in the locals area. If you're on Linux/Mac/etc., the first 4 arguments are in rdi, rsi, rdx, rcx. If you're on Windows, they're in rcx, rdx, r8, r9. Eventually we'll want to make the calling convention configurable, but it's ok to hardcode stuff to get started with.
Implement returns - at the function exit, pop the last remaining i32, copy it into RAX, add the size of the locals area back to the stack pointer, then do a ret.
- Then, at the end of function_body::translate, instead of just disassembling the output, return the compiled code, make examples/test.rs transmute the address to a function pointer and... call it!
- Write a test case to test that "add" works :-).

Once that milestone is achieved, the work can branch out a little bit. One set of tasks is implementing more integer arithmetic operations. Another is to add floating-point register support and after that, floating-point arithmetic operations. And independently of those, the next big milestone will be to compile a fibonacci function. Once we achieve this first milestone, I'll write up a new design issue for that :-).

For anyone interested in getting involved, welcome! and please post in the issue here so that we can coordinate work.

Sandboxing loads and stores

This is the first JIT that I've worked on, so I don't know how one goes about same-process memory isolation without generating check-address-and-trap instructions. Obviously check-and-trap is viable, but I feel like it must be possible to jack into the OS's (and therefore, hardware's) memory protection mechanisms to get the same protections with better performance. I assume it works by calling into the operating system to set accessible memory regions before jumping into wasm code and then resetting the accessible regions afterwards or when calling into host functions, but how do you stop the wasm code from doing an i32.store onto the program counter (by using some method to guess the location of it) without also preventing it from writing to the stack?

Failed to build due to mismatched types

Hi ... I was attempt to walk through lightbeam's example when I failed to compile:

I compiled with: cargo build --example test

This is a fresh checkout. Are there some requirements I am unaware of for building or is this a real issue?

Side Quest: Integer arithmetic

With #1 done, it's now straightforward to start work on the rest of the integer arithmetic opcodes as we can compile, execute, and test them. This can happen in the background, as #3 and the next few milestones won't depend on it. I'm filing it now to track it, and in case anyone else is interested in something to get started with.

I suggest doing the i32 and i64 versions of each instruction at the same time, because it's easy to do so on x86-64 :-). As always, feel free to ask questions!

The simple cases first. i32.add is already done (aside, should we rename add_i32 to i32_add?), so that's is an example to work from. Use Rd for 32-bit operations and Rq for 64-bit operations.

Next, comparisons. x86 is a little weird here because set<cc> can only write to 8-bit registers. So I suggest using xor REG, REG to zero out the result register first, and then using Rb(REG) with set<cc> to write the result on top of it.

Shift and rotates, they need their count operand in %cl, so we'll need a way to allocate that register specifically.

Div/rem. These also need specific registers, and they can also trap. I suggest starting with simple conditional branches testing the trap conditions and using ud2 to do the traps for now.

div_s
div_u
rem_s
rem_u

These are easy if you have sufficiently new CPUs :-). At this step, we'll need to figure out how we want to handle subtarget features.

clz
ctz
popcnt

cranestation / lightbeam Goto Github PK

lightbeam's People

Contributors

Stargazers

Watchers

Forkers

lightbeam's Issues

Lightbeam vs Cranelift and Faerie

not yet implemented: We can't handle cycles in the register allocator

New stack value type: condition

ctz and clz for 0 gives unexpected result on older cpus

Milestone: Fibonacci function

Milestone: Compile a complete function

Sandboxing loads and stores

Failed to build due to mismatched types

Side Quest: Integer arithmetic

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent