urbit / ares Goto Github PK
View Code? Open in Web Editor NEWThe new runtime for Urbit
License: MIT License
The new runtime for Urbit
License: MIT License
cargo run ./test_data/shax.jam
Segmentation fault (core dumped)
This is a regression caused by #60 . I am still working out the parameters under which it happens, but I at least have an example: a jamfile of [[shax 0x1] 9 2 10 [6 0 3] 0 2]
, produced on a ship that has all of the math %sham
jets implemented in hoon.hoon
. So far, the layer 1 and 2 Hoon standard library calls that I've tried have worked fine, but I've only tried a handful. +shax
is in layer 3.
Ares writes the jammed input to NockStack
fine, but falls over when it gets to cue
. I tried pumping up the amount of memory Ares up til the point that the machine could not allocate it to see if it was just using an absurd amount of memory, but this didn't result in a success.
This jamfile runs fine on Ares pre split-stack, so it is not caused by the sham jets.
https://github.com/urbit/ares/blob/jon/cue-fix/rust/ares/test_data/shax.jam
If you clone the as/trace
branch from #59 and attempt to change the call to the Ackermann function in the toddler
pill from [2 1]
to [4 1]
, Ares will crash with an error in the HAMT while trying to run preserve()
.
I believe it's an overflow error: the HAMT will attempt to call preserve
on an IndirectAtom
whose usize
is returned as 16140918534541148574
words.
I tried debugging the issue, but I'm too unfamiliar with the HAMT to make any headway, and it's not integral to my work on stack traces and interrupts.
CI should attempt to boot a fakezod as an integration test.
Open questions:
status
branch merges?The current implementation of handling SIGINT
signals in Ares from king sets a sentinel value which is polled on every call to Nock 2
, Nock 9
, and push to mean stack. If the sentinel value is set, the event bails with a non-deterministic error. If the sentinel value is already set when the user attempts to set it again, the serf process exits.
An alternative solution is to mprotect()
the entire NockStack
on SIGINT
. The next access will hit SIGSEGV
, from where we can un-protect the NockStack
memory and bail with a non-deterministic error. A second call to SIGINT while the region is already mprotect()
ed will kill the serf process.
A tracing profiler would be very useful to identify cores which still require jets, as well as other issues preventing us from getting to a full boot.
The format should match that of vere: (example https://gist.github.com/eamsden/e4134b2b140207f42dc475e533c2f712) which is readable by Google Chrome's profiler `chrome://tracing/ and records both Arvo event durations and the durations of fast-hinted computations.
The json
crate could be used to write the individual trace events to the tracing output. The current head of the eamsden/demo-debug
branch provides a matches()
function for the cold state, which could be invoked on the core computed for a Nock 9 to see if there is a recorded fast label for that core. To properly handle tail calls, save a stack of profiler spans containing fast label and start time in each stack frame, pushing whenever a fast-labeled core is called in tail position, and initializing with the fast path of a fast-labeled core called in non-tail position. When a frame is popped, get the time, compute the duration of each span, and dump full spans to the file.
For events: the serf can write a begin event as each event starts and an end event when each ends.
Once #135 is tested and merged we will have working codegen in Ares! However, the current rust-side representation is mostly just the noun representation of the IR, but with $map
s replaced by HAMTs. We can do much better. With lifecycles already in place, we can create a bytecode representation which will be extremely cache-friendly, and replace linked-list iteration by array iteration. Further, it can include pointers to jets and to bytecode for called arms directly, removing HAMT lookups except at indirect callsites.
TODO: rough spec in this issue
When assert_no_alloc()
fires, it doesn't tell you what caused it. It would be good to have a debug flag that turns it off to find out why, instead of commenting it out like I currently do. (maybe this is already an option somehow with the crate)
Right now when the king sends us a %live %exit
command we eprintln("exit")
...and do not exit. We should simply exit at this point.
I'm seeing non-determinism in jam
on the ctrlc branch which contains my SIGINT
work. The code is hooked up and working correctly, but roughly 1/4 interrupts cause the King to fail with an error during cue
(_cs_cue_xeno
via u3s_cue_xeno_with
via _lord_on_plea
).
This can be tested on the above-linked branch by compiling toddler.hoon
into a pill, starting it with the Ares code from that same branch, waiting until initial boot completes, and then pressing any key to start a long-running event (~2 seconds) and interrupting it with ctrl-C
.
We should verify the correct behaviour of the scry logic in mink
with unit tests.
(peg saf 2)
should be (peg 2 saf)
because we are pulling the it from the head of the subject, not the head of the subject from it
Vere will shortly be able to cache data across events using a persistent cache. This is not currently implemented in Ares.
We should have Github Actions cache our rust artifacts in CI. This would speed up CI time tremendously
https://mastyr-bottec.coeli.network/scratch/view/f5822?rmsg=saved
rustc 1.74.1 (a28077b28 2023-12-04)
cargo 1.74.1 (ecb9851af 2023-10-18)
MacOS (darwin) Aarch64
(edit by @eamsden)
We need a way to put nouns in the executable's static memory, preferably with
Cells
Direct Atoms
D(x)
works fine, no further work needed
Indirect atoms
The difficulty here is that byte manipulation in constant context in rust is quite difficult. You can easily e.g. get the length of a byteslice in a const
function, but creating a new byte array based on that size is a very high bar. So we need some way to lay out the metadata, length, and data into static memory in a constant function.
Unifying equality
We must not unify from static memory into the PMA or the NockStack. Thus we need to be able to detect if something is in static memory.
Copying
We should not copy from static memory into the PMA or the NockStack. Thus, we need to be able to detect if something is in static memory.
Right now we have a very ugly and verbose encoding of the paths in static hot state, which has to be translated to path nouns when we initialize an interpreter context. We should be able to just write the paths as nouns.
Jets for +mure
and +mute
(and through them +mule
and +mole
) must push a "pass-through" scry gate onto the scry stack. This currently must be constructed for each invocation by allocating on the NockStack:
ares/rust/ares/src/jets/nock.rs
Lines 158 to 176 in 23cef0b
See this advice from @mcevoypeter
Several integration issues between Vere and Ares would be solved by migrating both to the urth/mars protocol instead of the current king/serf protocol. In particular, Vere forgoes the serf protocol for many subcommands and simply directly loads the snapshot (as well as the event log.) Making all such subcommands the responsibility of mars would allow a cleaner separation between ares-mars and vere-mars than is possible between ares-serf and vere-serf.
(cut [3 [0 0] 0])
causes Ares to crash:
@philipcmonk wrote this I believe?
thread '<unnamed>' panicked at 'assertion failed: size > 0', external/crate_index__ares-0.1.0/src/noun.rs:314:9
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: ares::noun::IndirectAtom::new_raw_mut
4: ares::noun::IndirectAtom::new_raw_mut_zeroed
5: ares::noun::IndirectAtom::new_raw_mut_bitslice
6: ares::jets::math::jet_cut
7: ares::interpreter::match_pre_hint
8: ares::interpreter::interpret::{{closure}}
...
Currently, attempting to boot with #130 will fail almost immediately. We need unit tests to determine where there are bugs / jet mismatches in the parser jets.
Right now we have the missing_safety_doc
lint disabled for clippy in CI. We should re-enable it and document the safety assumptions of our unsafe functions.
The %nara
and %hela
hints might be useful during debugging and testing for checking the state of the mean stack. However, %nara
is currently unimplemented and %hela
is implemented incorrectly: %hela
prints the mean stack to the bottom of the nearest frame, but this is not guaranteed to be the entire mean stack for the event, nor even for the current level of virtualization.
In addition, the %lose
hint would also be useful, as we currently do no pruning of the mean stack, meaning that massive traces are possible.
At some point in writing #60, my implementation of jam for cells became wrong (I could have sworn it was correct at some point...)
For context, +jinx-gate
takes in a gate and sample and sends it to Ares, which runs the interpreter and returns a jammed noun to Vere, which Vere then attempts to +cue
. I've put some printfs from Ares in curly braces.
(some 1)
[0 1]
> (jam (some 1))
201
> (jinx-gate:f some 1)
{ares}: result [0 1]
{ares}: jammed result 9
dojo: hoon expression failed
So basically Ares thinks the jam of [0 1]
is 9
instead of 201
.
In aid of #155, we should see if we can replace Urcrypt with the work of the Rust Crypto team.
Dual-platform Ares is nice but not necessary for the initial sprint. IMO it makes sense to get it working on Linux, doing our best not to make MacOS harder for ourselves, and then do a separate block of work after initial release to add a MacOS target. If this is the case then MacOS CI is just getting in the way for now.
Requesting comment from active devs, plus (not mutually exclusively)
@philipcmonk @joemfb @belisarius222 @tacryt-socryp
The PMA needs a garbage collector. We need the following functionality:
Title says it all.
Vere serf has several assertions that verify that the events coming in are monotonically increasing and ordered correctly. Ares should implement similar safety checks.
It would make sense to be able to verify jet behavior against trusted implementation (i.e. vere). This could improve test cases and assurance that said test cases are correct. Ideally there would be a tool to supply test inputs and generate unit tests with expected outputs returned by vere.
Currently in #143 we hard-reset the stack after each event.. This wipes out our hot and warm states, which are not saved to the snapshot as they contain function pointers. Thus, we have to re-initialize both after every event. Saving the state of the stack and restoring to the saved state suffices to preserve the hot state, which should not change over the course of our execution. However, it is not sufficient to save the warm state, which is reinitialized whenever the cold state changes and thus requires collection of stale entries and HAMT stems.
Instead we should implement a top-frame copying GC as follows:
The effect of this is that each event will run with the opposite-orientation frame as top each time. This should be fine, though we should audit for assumptions that "the top is always west." If the top frame has no locals and wasn't using the lightweight stack when we pushed, then this wastes no space. However if for some odd reason we did use locals or the lightweight stack in the top frame, the wasted space is now just part of the allocation area for the new "top" frame, and will be reclaimed when we switch back next time.
@joemfb @frodwith @ashelkovnykov please review this idea.
One open question is the best GC / lifecycle management strategy for the PMA. This breaks down into a couple of sub-questions:
Even when an atom has a value that can fit in less than 8 bytes, our as_bytes()
functions still return a [u8; 8]
, which isn't usable for some crypto operations (like ++veri:ed:crypto
, for example).
It looks like we also zero-pad when we construct atoms (see IndirectAtom::new_raw_mut_bytes()
, etc.). Is this necessary for an allocation reason, like if we assume all allocations are at least one 64-bit word, or something?
We enforce everything else, might as well enforce this as well.
Due to running Ares with assert_no_alloc
enabled, we are unable to use the default Rust string interpolation because it allocates on the Rust heap. This is currently forcing us to comment out various blocks of code which would be most easily implemented with string interpolation and occasionally causing failures during regular, valid operation due to eprintln
calls.
We need to overwrite the default string interpolation with a version that uses NockStack
.
very frustrating to correct clippy lints when clippy doesn't report them locally
The current cell encoding is rather heavyweight in its memory use. Each cell consists of a tagged 8-byte pointer to a 24-byte struct, consisting of:
All structure in nouns is encoded with cells, so such a heavyweight encoding is likely to create memory pressure. There are 2 major inefficiencies:
It may be possible to produce a "packed" representation of a cell which need not be a tagged pointer, but rather packs the mug and offsets from itself to its children into a 64 bit word. Such a representation could not be directly loaded into a register or local variable, as the offset information would then be meaningless. It could however be dynamically converted, since we use accessor methods for cells already. Probably this form would never be created by user code calling some variant of Cell::new()
, but would instead be created by the copier after checking the constraints. Since the copier is also responsible for laying out a copied noun in memory, combining this representation with a breadth-first traversal for copying nouns could result in nearly 3x savings in cell memory allocations in the best case.
Instead of /* ... */
style comments for functions and structs, we should be using ///
s, as instructed here.
Right now we link c dependencies (most notably Urcrypt and its transitive dependencies) by building them with Nix. This defaults to dynamic linking. An executable built this way cannot be distributed independently as it depends on shared objects from the Nix store. We should build a statically linked binary, but this is complicated by conflicting opinions in our dependencies:
The ed25519 jets currently reference but do not utilize a more complete suite of test from Section 7.1 of RFC 8032.
Presently the treewalking interpreter pushes a new 2stackz frame for every subformula. We should instead do something like what jam
does and use a lightweight stack to traverse the formula tree, only pushing new frames for (non-tail) 2 and 9 evaluations.
CC
environment variableAres needs a way to kills events to due to timeout.
The current holdup between us and a dojo prompt is parser jets. These exist for vere in
https://github.com/urbit/vere/blob/45d28f9cf65c6088f3745e79172fd07d793d2169/pkg/noun/jets/e/parse.c
I've attached a trace which shows that most of the time spent in the %zest
move sent by %dill
to %clay
is spent in the parser jets, outermost being +pfix
. (Look at the end of the trace. Unfortunately at this stage we still have to trace the boot process entirely, and of course the boot process runs far more slowly when tracing because every call must be checked for tracing.)
https://drive.google.com/file/d/1eceb0VFXpORh20EsHG-ST2idqvgFjzsE/view?usp=drive_link
This trace can be loaded in Perfetto.
The NockStack
should have a guard page for safety and so that we can detect OOM errors during computation. Currently, there's no defense against the two stacks in 2Stackz clobbering each other.
Previously:
PMA requirements discussion here.
@philipcmonk what's the intent of vendoring and integrating the PMA in its current state? Is it just to support more experimentation with pills? Or are we trying to move as fast as possible to a prototype with the PMA in place? If the latter, there's a lot of work to be done on the PMA including
It would be great to be able to have some sort of test suite for memory/thread safety in the long term. There are many tools to approach this with, such as miri, valgrind, tsan/asan/ubsan.
I made a PR for miri #45, but it is not mergeable in its current state, since it conflicts with the way memory is currently managed (libc). However, miri performs the most exhaustive checks leading to highest assurance of correctness, so it would be wise to consider using it to test memory safety.
Miri interprets rust code, thus libc is unsupported. This means you can't use the following:
You can walk around mmap with the following:
unsafe fn allocate(layout: Layout) -> *mut u8 {
#[cfg(miri)]
return std::alloc::alloc(layout);
#[cfg(not(miri))]
return libc::malloc(/* ... */);
}
You can't walk around thread killing, thus you can't verify actual runtime timeout behavior.
If other libc features are used, likewise, not much you can do.
Miri would allow you to verify behavior at unit test level, for integration testing you would want to use some other tool, like valgrind.
In addition, for miri support you would want to correct the codebase to not have any dangling pointers being created. In the aforementioned PR I'm changing the way NockStack behaves to cast integers to pointers right before dereferencing them, instead of casting to double pointers and dereferencing them. This is a tiny change that helps miri keep track of allocations better and not reject the code.
I just wanted to bring this up as a possibility to look for in the future, because it's much better to catch safety issues before they spiral out of control.
@eamsden suggested a way to improve the speed of the +rev
jet by splitting boz
into cases: #193 (comment)
Lines 34 to 66 in 7176f39
The runtime is going to have small @tas
atoms everywhere, especially for hint and jet matching. A Rust procedural macro which could rewrite a string literal "add"
to an unsigned integer literal 0x646461
would make such code much clearer. Constants or a macro_rules macro could be used in value position but not in patterns, which will be a common place for atom values to appear in the runtime.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.