Comments (14)
@hugovincent OK, great. Let's press ahead then!
from veracruz.
What should be in the C SDK? It seems a large chunk of a typical libc
(e.g. the strings and memory-related functions) and libm
can be included. What do we think the best strategy for doing this is? I have been looking through the OpenBSD libc
and libm
library sources this morning, and it looks very clearly written, in plain C. Should we take the relevant aspects of those libraries as the basis of our SDK (actually, this is also what the Intel SGX SDK does for its libc
)?
from veracruz.
In my earlier internal-branch work I mostly used code from musl and klibc. Another, perhaps better, approach might be to try to re-use stuff from WASI or CloudABI. One other factor we should think about is test suite(s) for whatever we do with libc
/libm
– we don't want to support all of libc, or rather, we really only want to support a fairly small subset (no files, no signals, no threads, etc) and that (plus the Veracruz execution model) is going to make testing slightly tricky. (Most C libraries are surprisingly impure even for notionally pure functions e.g. in libm
due to error handling wanting to output messages and/or exit
). I think I'd therefore favour starting small and only pulling in known-needed and known-pure functionality.
Previously I took an approach that used provided a text-centric approach. Taking this further could give a useful programming environment within the Veracruz I/O framework: treat the input data as file-like objects and expose them via stdio
functions, and likewise for the result data (e.g. first input accessed via stdin
, and output via stdout
).
Advantages:
- Familiar programming environment.
- Many existing C programs can be run with little to no modification – not necessarily useful for "real" Veracruz workloads, but handy for testing, benchmarking, experimentation etc.
- Permits easy cross-development of Veracruz programs with a lightweight non-enclaved runner (
chichuahuasm
and the browser-runner in my internal branch) and SDK variant for more rapidly developing and debugging Veracruz programs.
Disadvantages:
- Diverges from the Rust SDK.
- Inadvertently encourages text I/O (
printf
into output data buffer etc), which is untyped and may make future Veracruz directions (e.g. multi-program, channels etc) harder.
Thoughts?
from veracruz.
I think we can change the Rust SDK–––and also the Veracruz ABI–––quite freely if we decide that there's a better way of doing things, so I don't think any potential divergence from that is a big issue. (Though, to clarify, we already have a lightweight non-enclaved runner in sdk/freestanding-chihuahua
which is just a CLI wrapper around chihuahua
).
One way of doing this would be for the Veracruz ABI to expose a filesystem-like series of functions that take a filename and return an input. At the moment, inputs are identified and referenced by indices. I think associating them with a particular name, specified in the global policy file, may make things much nicer especially with use-cases where the inputs are of different "kinds", e.g. a dataset to search, and a term to search for, or similar, in which case the filename is suggestive of the purpose of the input. Then, we can build familiar C-like stdio
file I/O functions on top, and similar in the Rust SDK, too, with suitable error codes returned if the "file" is not found.
For the rest of the SDK, I think the suggestion of starting small, or proceeding lazily, adding functionality that we know we need driven by examples/known use-cases/requests, is a good idea.
from veracruz.
Yep I know about sdk/freestanding-chihuahua
– I meant one that can access local files directly through WASI (which in turn means not using chihuahua
and instead wasmtime
(or whatever) directly). Likewise I found having a freestanding browser runner really useful since browsers have great source-level debugging and visualisation capabilities.
What are your thoughts on filesystem-like (generally) versus C library / POSIX-style (or CloudABI/WASI style) file IO more specifically? Bespoke API functions that "take a filename and return an input" are IMHO likely to be limiting in the future – consider large files, streaming data, perhaps even sockets etc. Instead, having more of the POSIX style API (seek, read etc) leaves much more future flexibility without adding much near-term complexity for the current whole-buffer-in-memory case. I'm obviously not proposing full POSIX/etc compatibility, and some things (e.g. mmap/etc) are off the table.
If you're on board with starting small, I could start by bringing in what I had previously. Do you want to merge your directory restructuring patches etc first?
from veracruz.
What are your thoughts on filesystem-like (generally) versus C library / POSIX-style (or CloudABI/WASI style) file IO more specifically? Bespoke API functions that "take a filename and return an input" are IMHO likely to be limiting in the future – consider large files, streaming data, perhaps even sockets etc. Instead, having more of the POSIX style API (seek, read etc) leaves much more future flexibility without adding much near-term complexity for the current whole-buffer-in-memory case. I'm obviously not proposing full POSIX/etc compatibility, and some things (e.g. mmap/etc) are off the table.
Ah OK, sorry I misunderstood the previous suggestion. Yes, this sounds good.
If you're on board with starting small, I could start by bringing in what I had previously. Do you want to merge your directory restructuring patches etc first?
Yes, I think we need to split this into smaller chunks, and starting from what you had previously sounds good.
from veracruz.
Great. Why don't you create a PR for your directory and build system changes, we'll review that, and then I'll build on that.
from veracruz.
Proposal for first stage support of WASI API for exposing data buffers in Veracruz as file-like objects:
https://gist.github.com/hugovincent/f1368808284cdb5d06afcc967d70ce53
from veracruz.
This sounds like a good plan. How will this integrate with the streaming support that @ShaleXIONG is added? Will streams be exposed as files?
from veracruz.
It should be good for streaming support. I already added a small buffer component in mexico-city
. It will hold any necessary data and unprocessed data. I can image the file-like objects can link to this component.
from veracruz.
OK, good. Further questions about both the streaming and this proposed change (as I think the two are intertwined):
- At the moment, a Veracruz program can produce a single result as its answer. Do we want to relax that? If the runtime is maintaining something that looks like a trivial filesystem, anyway, then this seems easy to fit in.
- Do we support mixed streaming/batch computations? That is, is a Veracruz program fully-stream oriented, or can some inputs be batch inputs, too? With this proposed change, the two can work together seamlessly be just treating everything as a file, but aspects like the state machine will likely need careful attention.
- The proposal sees us adopting WASI as a whole, but we realistically will only be using a tiny subset of the full WASI spec for Veracruz, with the vast majority stubbed out. While this is great for reusing existing toolchain support, it limits us to being only able to expose functionality that WASI exposes in our ABI, no? In particular, we have discussed in the past extending the Veracruz ABI with support for cryptography (cc: @dreemkiller), and also the ability to programmatically reduce the scope of the global policy in a "safe" way, amongst other things (for e.g. secret sharing). Is adopting WASI not going to be constraining, for us?
In short, there's two aspects to the proposal:
- (General): change the existing Veracruz ABI to expose a more file-oriented programming model,
- (Specific): adopt WASI as that change.
I'm convinced of (1), but less convinced of (2)...
from veracruz.
For stream/batch, it at least needs to store some notion of previous results, because some applications, e.g. block-hash and training a model in ML, are stateful.
I tend to think of streaming is an extension of a batch process engine. One famous stream framework, apache spark uses a batch process engine + a streaming pre-process frontend to deal with streaming data. There is also apache kafka which is an event-driven streaming model. Yet I feel it may not fit veracruz. Anyway, in general, this change should be good for streaming/batch process.
from veracruz.
@dominic-mulligan-arm I don't think it is necessarily always going to be the case that, "we realistically will only be using a tiny subset of the full WASI spec". I've sketched some future use-cases in the gist (under 'Some possible future directions') to illustrate that a bit. My current view (well, really more of a hunch) is that we'll want to extend the functionality over time – and WASI gives us a time-honoured (via POSIX/UNIX) and well understood direction.
Regarding crypto, there is of course wasi-crypto but it looks like it's still at an early stage, and it is also a rather large API. But more generally, WASI (afaict) does not preclude us from having additional hcall APIs. If anything in the proposal precludes adding functionality, it is a bug and we should fix it ;-) WASI is moving in a direction of increased modularity (see the WIP future version of WASI called ephemeral); it splits args
, clock
, fd
, etc into separate, standalone modules that can be provided or not separately. This also enables capability-based access to that functionality – e.g. if you have wasi_ephemeral_environ
for example, that means the sandbox has given you the (coarse-grained) capability to use the environment functionality.
@ShaleXIONG that new section also sketches how we could do streaming and combined batch+stream scenarios, let me know what you think.
from veracruz.
With the recent WASI changes, this is now resolved.
from veracruz.
Related Issues (20)
- Update documentation and scripts for freestanding-execution-engine with "--pipeline" HOT 1
- Add VOD example to CI HOT 2
- Add freestanding execution engine to CI HOT 1
- Veracruz-Client exits if fetching a file fails
- Investigate whether we can remove our dependency on WABT
- `compute_file_hash()` is misleading
- Errors and inconsistencies in examples
- Examples directory not updated in comments & doc HOT 1
- Add Nitro CI HOT 1
- CONTRIBUTING.markdown suggests using the project Makefile which no longer exists
- Try upgrading protobuf to 3.*
- Random failure of basic_client_read_non_existent
- Add support for pipeline execution in CLI Veracruz-client
- Transition our PSA-Attestation client activity to use PARSEC HOT 2
- Add composite action for parsing READMEs and executing them
- Network doc out of date
- Server's public and private keys are swapped HOT 1
- Unspecified AWS Nitro CLI
- Mentorship Opportunity for CCC Projects
- Enable LFX Insights On The Veracruz Project
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from veracruz.