Giter Site home page Giter Site logo

Comments (14)

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024 1

@hugovincent OK, great. Let's press ahead then!

from veracruz.

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024

What should be in the C SDK? It seems a large chunk of a typical libc (e.g. the strings and memory-related functions) and libm can be included. What do we think the best strategy for doing this is? I have been looking through the OpenBSD libc and libm library sources this morning, and it looks very clearly written, in plain C. Should we take the relevant aspects of those libraries as the basis of our SDK (actually, this is also what the Intel SGX SDK does for its libc)?

from veracruz.

hugovincent avatar hugovincent commented on August 15, 2024

In my earlier internal-branch work I mostly used code from musl and klibc. Another, perhaps better, approach might be to try to re-use stuff from WASI or CloudABI. One other factor we should think about is test suite(s) for whatever we do with libc/libm – we don't want to support all of libc, or rather, we really only want to support a fairly small subset (no files, no signals, no threads, etc) and that (plus the Veracruz execution model) is going to make testing slightly tricky. (Most C libraries are surprisingly impure even for notionally pure functions e.g. in libm due to error handling wanting to output messages and/or exit). I think I'd therefore favour starting small and only pulling in known-needed and known-pure functionality.

Previously I took an approach that used provided a text-centric approach. Taking this further could give a useful programming environment within the Veracruz I/O framework: treat the input data as file-like objects and expose them via stdio functions, and likewise for the result data (e.g. first input accessed via stdin, and output via stdout).

Advantages:

  • Familiar programming environment.
  • Many existing C programs can be run with little to no modification – not necessarily useful for "real" Veracruz workloads, but handy for testing, benchmarking, experimentation etc.
  • Permits easy cross-development of Veracruz programs with a lightweight non-enclaved runner (chichuahuasm and the browser-runner in my internal branch) and SDK variant for more rapidly developing and debugging Veracruz programs.

Disadvantages:

  • Diverges from the Rust SDK.
  • Inadvertently encourages text I/O (printf into output data buffer etc), which is untyped and may make future Veracruz directions (e.g. multi-program, channels etc) harder.

Thoughts?

from veracruz.

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024

I think we can change the Rust SDK–––and also the Veracruz ABI–––quite freely if we decide that there's a better way of doing things, so I don't think any potential divergence from that is a big issue. (Though, to clarify, we already have a lightweight non-enclaved runner in sdk/freestanding-chihuahua which is just a CLI wrapper around chihuahua).

One way of doing this would be for the Veracruz ABI to expose a filesystem-like series of functions that take a filename and return an input. At the moment, inputs are identified and referenced by indices. I think associating them with a particular name, specified in the global policy file, may make things much nicer especially with use-cases where the inputs are of different "kinds", e.g. a dataset to search, and a term to search for, or similar, in which case the filename is suggestive of the purpose of the input. Then, we can build familiar C-like stdio file I/O functions on top, and similar in the Rust SDK, too, with suitable error codes returned if the "file" is not found.

For the rest of the SDK, I think the suggestion of starting small, or proceeding lazily, adding functionality that we know we need driven by examples/known use-cases/requests, is a good idea.

from veracruz.

hugovincent avatar hugovincent commented on August 15, 2024

Yep I know about sdk/freestanding-chihuahua – I meant one that can access local files directly through WASI (which in turn means not using chihuahua and instead wasmtime (or whatever) directly). Likewise I found having a freestanding browser runner really useful since browsers have great source-level debugging and visualisation capabilities.

What are your thoughts on filesystem-like (generally) versus C library / POSIX-style (or CloudABI/WASI style) file IO more specifically? Bespoke API functions that "take a filename and return an input" are IMHO likely to be limiting in the future – consider large files, streaming data, perhaps even sockets etc. Instead, having more of the POSIX style API (seek, read etc) leaves much more future flexibility without adding much near-term complexity for the current whole-buffer-in-memory case. I'm obviously not proposing full POSIX/etc compatibility, and some things (e.g. mmap/etc) are off the table.

If you're on board with starting small, I could start by bringing in what I had previously. Do you want to merge your directory restructuring patches etc first?

from veracruz.

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024

What are your thoughts on filesystem-like (generally) versus C library / POSIX-style (or CloudABI/WASI style) file IO more specifically? Bespoke API functions that "take a filename and return an input" are IMHO likely to be limiting in the future – consider large files, streaming data, perhaps even sockets etc. Instead, having more of the POSIX style API (seek, read etc) leaves much more future flexibility without adding much near-term complexity for the current whole-buffer-in-memory case. I'm obviously not proposing full POSIX/etc compatibility, and some things (e.g. mmap/etc) are off the table.

Ah OK, sorry I misunderstood the previous suggestion. Yes, this sounds good.

If you're on board with starting small, I could start by bringing in what I had previously. Do you want to merge your directory restructuring patches etc first?

Yes, I think we need to split this into smaller chunks, and starting from what you had previously sounds good.

from veracruz.

hugovincent avatar hugovincent commented on August 15, 2024

Great. Why don't you create a PR for your directory and build system changes, we'll review that, and then I'll build on that.

from veracruz.

hugovincent avatar hugovincent commented on August 15, 2024

Proposal for first stage support of WASI API for exposing data buffers in Veracruz as file-like objects:
https://gist.github.com/hugovincent/f1368808284cdb5d06afcc967d70ce53

from veracruz.

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024

This sounds like a good plan. How will this integrate with the streaming support that @ShaleXIONG is added? Will streams be exposed as files?

from veracruz.

ShaleXIONG avatar ShaleXIONG commented on August 15, 2024

It should be good for streaming support. I already added a small buffer component in mexico-city. It will hold any necessary data and unprocessed data. I can image the file-like objects can link to this component.

from veracruz.

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024

OK, good. Further questions about both the streaming and this proposed change (as I think the two are intertwined):

  • At the moment, a Veracruz program can produce a single result as its answer. Do we want to relax that? If the runtime is maintaining something that looks like a trivial filesystem, anyway, then this seems easy to fit in.
  • Do we support mixed streaming/batch computations? That is, is a Veracruz program fully-stream oriented, or can some inputs be batch inputs, too? With this proposed change, the two can work together seamlessly be just treating everything as a file, but aspects like the state machine will likely need careful attention.
  • The proposal sees us adopting WASI as a whole, but we realistically will only be using a tiny subset of the full WASI spec for Veracruz, with the vast majority stubbed out. While this is great for reusing existing toolchain support, it limits us to being only able to expose functionality that WASI exposes in our ABI, no? In particular, we have discussed in the past extending the Veracruz ABI with support for cryptography (cc: @dreemkiller), and also the ability to programmatically reduce the scope of the global policy in a "safe" way, amongst other things (for e.g. secret sharing). Is adopting WASI not going to be constraining, for us?

In short, there's two aspects to the proposal:

  1. (General): change the existing Veracruz ABI to expose a more file-oriented programming model,
  2. (Specific): adopt WASI as that change.

I'm convinced of (1), but less convinced of (2)...

from veracruz.

ShaleXIONG avatar ShaleXIONG commented on August 15, 2024

For stream/batch, it at least needs to store some notion of previous results, because some applications, e.g. block-hash and training a model in ML, are stateful.
I tend to think of streaming is an extension of a batch process engine. One famous stream framework, apache spark uses a batch process engine + a streaming pre-process frontend to deal with streaming data. There is also apache kafka which is an event-driven streaming model. Yet I feel it may not fit veracruz. Anyway, in general, this change should be good for streaming/batch process.

from veracruz.

hugovincent avatar hugovincent commented on August 15, 2024

@dominic-mulligan-arm I don't think it is necessarily always going to be the case that, "we realistically will only be using a tiny subset of the full WASI spec". I've sketched some future use-cases in the gist (under 'Some possible future directions') to illustrate that a bit. My current view (well, really more of a hunch) is that we'll want to extend the functionality over time – and WASI gives us a time-honoured (via POSIX/UNIX) and well understood direction.

Regarding crypto, there is of course wasi-crypto but it looks like it's still at an early stage, and it is also a rather large API. But more generally, WASI (afaict) does not preclude us from having additional hcall APIs. If anything in the proposal precludes adding functionality, it is a bug and we should fix it ;-) WASI is moving in a direction of increased modularity (see the WIP future version of WASI called ephemeral); it splits args, clock, fd, etc into separate, standalone modules that can be provided or not separately. This also enables capability-based access to that functionality – e.g. if you have wasi_ephemeral_environ for example, that means the sandbox has given you the (coarse-grained) capability to use the environment functionality.

@ShaleXIONG that new section also sketches how we could do streaming and combined batch+stream scenarios, let me know what you think.

from veracruz.

dominic-mulligan-arm avatar dominic-mulligan-arm commented on August 15, 2024

With the recent WASI changes, this is now resolved.

from veracruz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.