Giter Site home page Giter Site logo

Comments (1)

neumark avatar neumark commented on August 25, 2024

wasm UDF communication

Passing native primitive datatypes (i32, i64, f32, f64) from the host (wasmtime for seafowl) and receiving a native primitive result is straight forward. More complex data types such as string, structs, not so much.

WIT

In the long run, WebAssembly Interface Types (WIT) promise to provide an elegant solution to the problem of passing complex data between webassembly functions written in various high-level languages and the host. WIT includes an IDL, also called "wit" which can be used for code generation.
For example, below is the WIT description of a function which converts an input string to uppercase and returns the result:

upper: func(s: string) -> string

import code

WIT-generated calling code, in our case run by the seafowl process.

#[allow(clippy::all)]
mod input {
  pub fn upper(s: & str,) -> String{
    unsafe {
      let vec0 = s;
      let ptr0 = vec0.as_ptr() as i32;
      let len0 = vec0.len() as i32;
      
      #[repr(align(4))]
      struct __InputRetArea([u8; 8]);
      let mut __input_ret_area: __InputRetArea = __InputRetArea([0; 8]);
      let ptr1 = __input_ret_area.0.as_mut_ptr() as i32;
      #[link(wasm_import_module = "input")]
      extern "C" {
        #[cfg_attr(target_arch = "wasm32", link_name = "upper: func(s: string) -> string")]
        #[cfg_attr(not(target_arch = "wasm32"), link_name = "input_upper: func(s: string) -> string")]
        fn wit_import(_: i32, _: i32, _: i32, );
      }
      wit_import(ptr0, len0, ptr1);
      let len2 = *((ptr1 + 4) as *const i32) as usize;
      String::from_utf8(Vec::from_raw_parts(*((ptr1 + 0) as *const i32) as *mut _, len2, len2)).unwrap()
    }
  }
}

export code

WIT-generated wrapper around guest code (in our case the UDF).

#[allow(clippy::all)]
mod input {
  #[export_name = "upper: func(s: string) -> string"]
  unsafe extern "C" fn __wit_bindgen_input_upper(arg0: i32, arg1: i32, ) -> i32{
    let len0 = arg1 as usize;
    let result1 = <super::Input as Input>::upper(String::from_utf8(Vec::from_raw_parts(arg0 as *mut _, len0, len0)).unwrap());
    let ptr2 = __INPUT_RET_AREA.0.as_mut_ptr() as i32;
    let vec3 = (result1.into_bytes()).into_boxed_slice();
    let ptr3 = vec3.as_ptr() as i32;
    let len3 = vec3.len() as i32;
    core::mem::forget(vec3);
    *((ptr2 + 4) as *mut i32) = len3;
    *((ptr2 + 0) as *mut i32) = ptr3;
    ptr2
  }
  #[export_name = "cabi_post_upper"]
  unsafe extern "C" fn __wit_bindgen_input_upper_post_return(arg0: i32, ) {
    wit_bindgen_guest_rust::rt::dealloc(*((arg0 + 0) as *const i32), (*((arg0 + 4) as *const i32)) as usize, 1);
  }
  
  #[repr(align(4))]
  struct __InputRetArea([u8; 8]);
  static mut __INPUT_RET_AREA: __InputRetArea = __InputRetArea([0; 8]);
  pub trait Input {
    fn upper(s: String,) -> String;
  }
}

There exists a very early pre-alpha WIT implementation for rust supporting both rust hosts and WASM guests. The developers urge everyone interested in using this in production to hold their horses and look for other alternatives while the WIT standard is finalized, I'd guess somewhere between 12 - 18 months from now.

Alternatives until WIT can be used

Passing raw strings

The least ambitious, but by no means easiest approach is to extend the existing integer and float types currently supported in seafowl UDFs with strings. Not only would this provide support for using CHAR, TEXT, VARCHAR types in UDFs, more complex data structures could be submitted as serialized strings using JSON, MessagePack, CBOR, etc.

I wrote example proof of concept upper() function based on this excellent blogpost. Both the code invoking the WASM function, and that of the upper() function itself are fairly complex.

The complexity stems from the following:

  • WASM functions cannot access the hosts' memory. Any input or output passed via pointers must point to the module's memory. This places the burden of copying input from the host's memory to the guest's, malloc() -ing guest memory, copying the results back to host memory, and free()-ing input and output buffers. The result buffer must be allocated by the guest (since the size of the response isn't necessarily known), but must be freed by the host (since it must read the result before deallocating the result).
  • pascal vs c-style strings. C strings are just raw pointers terminated with \0. Pascal-style strings are prepended with their length in bytes, generally considered a better design these days. Naively returning a (length, pointer) would require passing multiple values, which isn't possible, but receiving and passing a pointer to the i32-encoded string length followed by the string itself is possible (this is what the WIT-generated code above does).

If strings aren't necessary UTF-8 string, but rather MessagePack-encoded streams of values, then all of the function arguments could be encoded in a single string, resulting in a simplified UDF WASM function signature:

fn(len: u32, ptr: u32) -> u32

Where the result is a pointer to a pascal-style string like in the WIT-generated code.

WaPC

The waPC project attempts to simplify wasm host-guest RPC. They provide a rust host and a number of supported guest languages. WaPC has its own GraphQL-inspired IDL language (WIDL). Based on GitHub activity, it seems to be an active project but lacks significant backing (written and mostly by 3 guys at a startup called Vino until recently). Links to step-by-step tutorials are all broken. WaPC uses MessagePack to serialize data by default.

WASM-bindgen

As a name that kept coming up during my research, wasm-bindgen deserves a mention. Its a mature solution for WASM RPC, but unfortunately limited to JavaScript host -> Rust WASM module guest calls. There was experimental support for WIT, but its not longer supported. In a future where WIT support returns, wasm-bindgen could be an ergonomic route to UDFs with complex inputs / outputs. Currently the guide on using it with rust hosts does not work as advertised.

WASI-based communication

The WebAssembly System Interface is an extension to WASM providing an interface to module functions for interacting with the host filesystem, command line arguments, environment variables, etc.
Like most things WASM-related, WASI itself is still in it's infancy and subject to change (the compiled wasm links to wasi_snapshot_preview1). Still, unlike WIT, WASI is already used in production and using it doesn't require a PhD in compiler design. Based on this blog post I implemented a version of upper() which gets its input from environment variables and prints the result to stdout. The env vars and standard output aren't the actual env vars and stdout of the host process, they're what seafowl passes as such to wasmtime. In other words, it's a convenient was to pass state to the WASM module function without having to deal with all the malloc and free choreography of the first solution. How much overhead this solution incurs compared to the first solution, I don't know yet.

Recommendation

Everyone -including myself- looks upon WIT as the "ultimate" solution to WASM RPC. Unfortunately, when WIT stabilizes is anyone's guess. The good news is that we don't have to commit to a single UDF interface for all time.

Seafowl already expects a language field in its UDF function creation statement, which could be used to distinguish between calling conventions.

If the overhead of using WASI is acceptable, reading serialized input from stdin and writing serialized output to stdout seems like a more ergonomic approach than requiring users creating UDFs to implement by hand code similar to what WIT generates. We could even allow error messages to be sent to stderr.

For "normal" UDFs, the input consists of a tuple of supported arrow types, so the serialized input could look something like this:

| i32: total bytes | messpack-encoded vector of arrow types | messagepack stream of serialized values |

from seafowl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.