cranestation / lightbeam Goto Github PK
View Code? Open in Web Editor NEWLightbeam has moved and now lives in the Wasmtime repository!
Home Page: https://github.com/CraneStation/wasmtime
License: Apache License 2.0
Lightbeam has moved and now lives in the Wasmtime repository!
Home Page: https://github.com/CraneStation/wasmtime
License: Apache License 2.0
Having not used Cranelift but trying to undertsand around how all the projects in Cranestation fit and are to be used together, I have noticed that Cranelift has a component faerie that I think is intended to take Cranelift IR and create machine code. How does lightbeam compare? Is it intended to be a standalone tool or a component in a VM or something else?
I tried to compile rustc_binary.wasm using lightbeam backed wasmtime, but it panicked with:
thread 'main' panicked at 'not yet implemented: We can't handle cycles in the register allocator: [(Reg(Rq(6)), Reg(Rq(2))), (Reg(Rq(2)), Reg(Rq(6)))]', lightbeam/src/backend.rs:4638:17
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
1: std::sys_common::backtrace::_print
2: std::panicking::default_hook::{{closure}}
3: std::panicking::default_hook
4: std::panicking::rust_panic_with_hook
5: std::panicking::continue_panic_fmt
6: std::panicking::begin_panic_fmt
7: lightbeam::backend::Context<M>::pass_outgoing_args
8: lightbeam::backend::Context<M>::call_direct_imported
9: lightbeam::function_body::translate
10: <wasmtime_environ::lightbeam::Lightbeam as wasmtime_environ::compilation::Compiler>::compile_module
11: wasmtime_jit::compiler::Compiler::compile
12: wasmtime_jit::instantiate::RawCompiledModule::new
13: wasmtime_jit::instantiate::instantiate
14: wasmtime_jit::context::Context::instantiate_module
15: wasmtime::handle_module
16: wasmtime::main
17: std::rt::lang_start::{{closure}}
18: std::panicking::try::do_call
19: __rust_maybe_catch_panic
20: std::rt::lang_start_internal
21: main
This means we can avoid temporary variables in the vast majority-case that the test is directly before a jump (emitting cmp ...; jnae ...
instead of using setnae
, for example), but it introduces significant complexity because we can't emit the cmp
immediately, we have to delay emitting the cmp
until the condition is either used or something else is pushed, because in the case that the condition has to be stored to a register we have to zero that register before the cmp
. This complexity will probably be confined to push
, but it's still problematic.
EDIT: I just realised that we can use mov ..., 0
to zero the register which doesn't affect flags
Tests fail on older cpus(intel sandy bridge and older). ctz
and clz
give 0 instead of bit width.
Cpus tested: e5-2670 v1
(fail), i7-3820
(fail), i7-4790K
(pass)
I would assume this has to do with the way that tzcnt
and lzcnt
are implemented on these cpus.
Recursive Fibonacci is a nice next milestone because it needs some control flow operators and calls, which should really help establish the shape of the backend.
The steps are:
if
+else
+end
.
DynamicLable
s to keep track of the labels for branching.if
can be carried by the stack just like normal operator results. (For now; later with on-the-fly register allocation it can be more sophisticated.)FuncType
s from the type section into a Vec
and pass that around for now.I've left out a lot of the low-level details here; feel free to ask for more detail!
Let's start by working to compile this function into machine code. The steps are:
function_body::translate
, to write a function that maps from local indices to offsets in that region, and to compute the needed size of that region.Registers
struct into a backend Context
, add a field recording the current depth of the stack pointer, and make push_i32
and pop_i32
subtract from and add to this field so that we always know where the stack pointer is (relative to where it started).mov offset(%rsp), Rq(op)
where offset
is the offset within the locals area plus the current stack depth. push %rbp # save incoming frame pointer
mov %rbp, %rsp # copy stack pointer to frame pointer
sub %rsp, framesize # allocate the stack frame (the stack grows down)
where framesize
is the size of the locals area.
Store the incoming function arguments into their slots in the locals area. If you're on Linux/Mac/etc., the first 4 arguments are in rdi
, rsi
, rdx
, rcx
. If you're on Windows, they're in rcx
, rdx
, r8
, r9
. Eventually we'll want to make the calling convention configurable, but it's ok to hardcode stuff to get started with.
Implement returns - at the function exit, pop the last remaining i32
, copy it into RAX, add the size of the locals area back to the stack pointer, then do a ret
.
function_body::translate
, instead of just disassembling the output, return the compiled code, make examples/test.rs transmute the address to a function pointer and... call it!Once that milestone is achieved, the work can branch out a little bit. One set of tasks is implementing more integer arithmetic operations. Another is to add floating-point register support and after that, floating-point arithmetic operations. And independently of those, the next big milestone will be to compile a fibonacci function. Once we achieve this first milestone, I'll write up a new design issue for that :-).
For anyone interested in getting involved, welcome! and please post in the issue here so that we can coordinate work.
This is the first JIT that I've worked on, so I don't know how one goes about same-process memory isolation without generating check-address-and-trap instructions. Obviously check-and-trap is viable, but I feel like it must be possible to jack into the OS's (and therefore, hardware's) memory protection mechanisms to get the same protections with better performance. I assume it works by calling into the operating system to set accessible memory regions before jumping into wasm code and then resetting the accessible regions afterwards or when calling into host functions, but how do you stop the wasm code from doing an i32.store
onto the program counter (by using some method to guess the location of it) without also preventing it from writing to the stack?
With #1 done, it's now straightforward to start work on the rest of the integer arithmetic opcodes as we can compile, execute, and test them. This can happen in the background, as #3 and the next few milestones won't depend on it. I'm filing it now to track it, and in case anyone else is interested in something to get started with.
I suggest doing the i32
and i64
versions of each instruction at the same time, because it's easy to do so on x86-64 :-). As always, feel free to ask questions!
The simple cases first. i32.add
is already done (aside, should we rename add_i32
to i32_add
?), so that's is an example to work from. Use Rd
for 32-bit operations and Rq
for 64-bit operations.
imul
)Next, comparisons. x86 is a little weird here because set<cc>
can only write to 8-bit registers. So I suggest using xor REG, REG
to zero out the result register first, and then using Rb(REG)
with set<cc>
to write the result on top of it.
Shift and rotates, they need their count operand in %cl
, so we'll need a way to allocate that register specifically.
Div/rem. These also need specific registers, and they can also trap. I suggest starting with simple conditional branches testing the trap conditions and using ud2
to do the traps for now.
These are easy if you have sufficiently new CPUs :-). At this step, we'll need to figure out how we want to handle subtarget features.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.