urbit / ares Goto Github PK

The new runtime for Urbit

License: MIT License

hoon 16.48% Rust 75.56% Nix 0.19% Shell 0.03% C 7.75%

code-generation persistence runtime urbit

ares's Introduction

Urbit

Urbit is a personal server stack built from scratch. It has an identity layer (Azimuth), virtual machine (Vere), and operating system (Arvo).

A running Urbit "ship" is designed to operate with other ships peer-to-peer. Urbit is a general-purpose, peer-to-peer computer and network.

This repository contains the Arvo Kernel

For the Runtime, see Vere. For more on the identity layer, see Azimuth. To manage your Urbit identity, use Bridge.

Install

To install and run Urbit, please follow the instructions at urbit.org/getting-started. You'll be on the live network in a few minutes.

Contributing

Contributions of any form are more than welcome! Please take a look at our contributing guidelines for details on our git practices, coding styles, and how we manage issues.

You might also be interested in joining the urbit-dev mailing list.

Release

For details about our release process, see the maintainers guidelines

ares's People

Contributors

Stargazers

Watchers

Forkers

vcavallo wi-tscher ashelkovnykov matthew-levan tacryt-socryp h33p gmh5225 zorp-corp ynx0 nisfeb

ares's Issues

mem::copy() segfault

cargo run ./test_data/shax.jam
Segmentation fault (core dumped)

This is a regression caused by #60 . I am still working out the parameters under which it happens, but I at least have an example: a jamfile of [[shax 0x1] 9 2 10 [6 0 3] 0 2], produced on a ship that has all of the math %sham jets implemented in hoon.hoon. So far, the layer 1 and 2 Hoon standard library calls that I've tried have worked fine, but I've only tried a handful. +shax is in layer 3.

Ares writes the jammed input to NockStack fine, but falls over when it gets to cue. I tried pumping up the amount of memory Ares up til the point that the machine could not allocate it to see if it was just using an absurd amount of memory, but this didn't result in a success.

This jamfile runs fine on Ares pre split-stack, so it is not caused by the sham jets.

https://github.com/urbit/ares/blob/jon/cue-fix/rust/ares/test_data/shax.jam

[ares] Macro to take small @tas to direct atom

The runtime is going to have small @tas atoms everywhere, especially for hint and jet matching. A Rust procedural macro which could rewrite a string literal "add" to an unsigned integer literal 0x646461 would make such code much clearer. Constants or a macro_rules macro could be used in value position but not in patterns, which will be a common place for atom values to appear in the runtime.

inconsistent lints between CI and local dev

very frustrating to correct clippy lints when clippy doesn't report them locally

"Packed Cell" encoding

The current cell encoding is rather heavyweight in its memory use. Each cell consists of a tagged 8-byte pointer to a 24-byte struct, consisting of:

8 bytes of metadata (for alignment, only 4 bytes are needed for mugs)
8 bytes to store the head of the cell
8 bytes to store the tail of the cell

All structure in nouns is encoded with cells, so such a heavyweight encoding is likely to create memory pressure. There are 2 major inefficiencies:

4 bytes are wasted for alignment purposes to store the metadata.
8 bytes each are used to store a pointer to something which, given the use of a bump allocator and copying collector, is likely very nearly adjacent.

It may be possible to produce a "packed" representation of a cell which need not be a tagged pointer, but rather packs the mug and offsets from itself to its children into a 64 bit word. Such a representation could not be directly loaded into a register or local variable, as the offset information would then be meaningless. It could however be dynamically converted, since we use accessor methods for cells already. Probably this form would never be created by user code calling some variant of Cell::new(), but would instead be created by the copier after checking the constraints. Since the copier is also responsible for laying out a copied noun in memory, combining this representation with a breadth-first traversal for copying nouns could result in nearly 3x savings in cell memory allocations in the best case.

`jam` issue in `SIGINT` branch

I'm seeing non-determinism in jam on the ctrlc branch which contains my SIGINT work. The code is hooked up and working correctly, but roughly 1/4 interrupts cause the King to fail with an error during cue (_cs_cue_xeno via u3s_cue_xeno_with via _lord_on_plea).

This can be tested on the above-linked branch by compiling toddler.hoon into a pill, starting it with the Ares code from that same branch, waiting until initial boot completes, and then pressing any key to start a long-running event (~2 seconds) and interrupting it with ctrl-C.

Static build

Right now we link c dependencies (most notably Urcrypt and its transitive dependencies) by building them with Nix. This defaults to dynamic linking. An executable built this way cannot be distributed independently as it depends on shared objects from the Nix store. We should build a statically linked binary, but this is complicated by conflicting opinions in our dependencies:

Rust really does not want to link statically against glibc (though it can allegedly be convinced), but will happily link statically against musl. However musl is a Tier-2 support target for Rust!
urcrypt's dependencies are very difficult to properly statically link at all (looking at you libaes_siv), and pitch a bigger fit when the libc in question is musl
Nix really only offers a static package set linked against musl.

ed25519 jet unit tests

The ed25519 jets currently reference but do not utilize a more complete suite of test from Section 7.1 of RFC 8032.

Assert event numbers are in order

Vere serf has several assertions that verify that the events coming in are monotonically increasing and ordered correctly. Ares should implement similar safety checks.

PMA discussion issue

Previously:

PMA requirements discussion here.

@philipcmonk what's the intent of vendoring and integrating the PMA in its current state? Is it just to support more experimentation with pills? Or are we trying to move as fast as possible to a prototype with the PMA in place? If the latter, there's a lot of work to be done on the PMA including

B+ tree page index
explicit page-dirtying interface (this was already decided on)
garbage collection (discussion at #36 )

PMA: Garbage collection

The PMA needs a garbage collector. We need the following functionality:

Deallocating unused blocks
Melding small [what is our predicate for small?] allocations into larger allocations
Tracing multiple roots
Trait for tracing and melding
[later extendable to] tracing multiple roots after checking that owning threads are still active (for concurrent scries)

improve tool chain consistency across local/CI builds

Need to specify versions for dependencies
The CI isn't currently using nix -- should it?
With the cc crate added in #53, we're building with gcc on linux and clang on macos. We should force clang across platforms. The cc crate uses whatever compiler is specified by the CC environment variable

Constant nouns in static memory

We need a way to put nouns in the executable's static memory, preferably with

a way to identify static memory as the most senior memory for purposes of copying and unifying equality.
bonus: parse nouns from a string

Cells

Create the struct in the constant function
point it at the head and tail
take a static reference to the struct
cast the static reference to a const pointer
tag the pointer as a cell

Direct Atoms
D(x) works fine, no further work needed

Indirect atoms
The difficulty here is that byte manipulation in constant context in rust is quite difficult. You can easily e.g. get the length of a byteslice in a const function, but creating a new byte array based on that size is a very high bar. So we need some way to lay out the metadata, length, and data into static memory in a constant function.

Unifying equality
We must not unify from static memory into the PMA or the NockStack. Thus we need to be able to detect if something is in static memory.

Copying
We should not copy from static memory into the PMA or the NockStack. Thus, we need to be able to detect if something is in static memory.

Use cases

Hot state

Right now we have a very ugly and verbose encoding of the paths in static hot state, which has to be translated to path nouns when we initialize an interpreter context. We should be able to just write the paths as nouns.

Pass-through scry gate

Jets for +mure and +mute (and through them +mule and +mole) must push a "pass-through" scry gate onto the scry stack. This currently must be constructed for each invocation by allocating on the NockStack:

ares/rust/ares/src/jets/nock.rs

Lines 158 to 176 in 23cef0b

    
           pub fn pass_thru_scry(stack: &mut NockStack) -> Noun { 
        
               //  > .*  0  !=  =>  ~  |=(a=^ ``.*(a [%12 [%0 2] %0 3])) 
        
               // [[[1 0] [1 0] 2 [0 6] 1 12 [0 2] 0 3] [0 0] 0] 
        
               let sig = T(stack, &[D(1), D(0)]); 
        
               let sam = T(stack, &[D(0), D(6)]); 
        
               let hed = T(stack, &[D(0), D(2)]); 
        
               let tel = T(stack, &[D(0), D(3)]); 
        
               let zap = T(stack, &[D(0), D(0)]); 
        
               let cry = T(stack, &[D(12), hed, tel]); 
        
               let fol = T(stack, &[D(1), cry]); 
        
               let res = T(stack, &[D(2), sam, fol]); 
        
               let uno = T(stack, &[sig, res]); 
        
               let dos = T(stack, &[sig, uno]); 
        
               let gat = T(stack, &[zap, D(0)]); 
        
               T(stack, &[dos, gat]) 
        
           }

PMA garbage collection

One open question is the best GC / lifecycle management strategy for the PMA. This breaks down into a couple of sub-questions:

What are the roots we want to trace for the PMA and where do they live?
- (my preferred answer: they are the bytecode cache and Arvo-as-a-noun and they live in slots in the top stack frame of the NockStack in the PMA)
What GC strategy are we using for the PMA?
- (my preferred answer: mark-and-sweep. We are assuming that most objects here are long-lived and so mark-and-sweep with a free-list allocator is by far the most proven solution)

CI: cache rust

We should have Github Actions cache our rust artifacts in CI. This would speed up CI time tremendously

MacOS CI?

#53 (comment)

Dual-platform Ares is nice but not necessary for the initial sprint. IMO it makes sense to get it working on Linux, doing our best not to make MacOS harder for ourselves, and then do a separate block of work after initial release to add a MacOS target. If this is the case then MacOS CI is just getting in the way for now.

Requesting comment from active devs, plus (not mutually exclusively)
@philipcmonk @joemfb @belisarius222 @tacryt-socryp

Vere urth/mars restage (discussion issue)

Several integration issues between Vere and Ares would be solved by migrating both to the urth/mars protocol instead of the current king/serf protocol. In particular, Vere forgoes the serf protocol for many subcommands and simply directly loads the snapshot (as well as the event log.) Making all such subcommands the responsibility of mars would allow a cleaner separation between ares-mars and vere-mars than is possible between ares-serf and vere-serf.

CI: add test to boot fakezod with -x and check return code

CI should attempt to boot a fakezod as an integration test.

Open questions:

Where do we get a patched king for CI to run?
Should this run on
- status branch merges?
- All open PRs?
- ready-for-review PRs?
- reviewed PRs?

Serf: actually exit

Right now when the king sends us a %live %exit command we eprintln("exit")...and do not exit. We should simply exit at this point.

2stackz: use split stack frames

ares/rust/ares/src/mem.rs

Lines 34 to 66 in 7176f39

    
           /* XX 
        
            * ~master-morzod suggests splitting stack frames: 
        
            * when FP-relative slots are on the west stack, allocate on the east stack, and vice versa. 
        
            * This would mean that traversal stacks must be allocated adjacent to the current frame's locals, 
        
            * rather than being allocated adjacent to the previous frame, which may in fact make more sense. 
        
            * 
        
            * This would enable completely reliable tail calls. Currently we cannot do tail-call optimization 
        
            * reliably for indirect calls, since the called arm might need arbitrarily many stack slots. With 
        
            * the proposed "split" layoud we can simply extend the number of slots if a tail-called arm 
        
            * requires more. 
        
            * 
        
            * Unless I'm mistaken this requires a three-pointer stack: the frame pointer is the basis for 
        
            * relative-offset locals as per usual, the stack pointer denotes the extent of the current frame, 
        
            * and the allocation denotes the current frontier of allocations on the opposite stack. 
        
            * 
        
            * Allocation means bumping the allocation pointer as per usual. Traversal stacks are implemented 
        
            * by saving the stack pointer to a local and then manipulating the stack pointer. 
        
            * 
        
            * Pushing is implemented by setting the frame pointer to the allocation pointer, the stack pointer 
        
            * to the necessary offset from the frame pointer to accomodate the locals, and the allocation 
        
            * pointer to the stack pointer. The previous values of all three are saved in the new frame's 
        
            * first three locals. 
        
            * 
        
            * Popping must first save the parent-frame stored values to temporaries. Then a copy is run which 
        
            * will, in general, run over the stored locals, and update the temporary for the allocation 
        
            * pointer. Finally, the temporaries are restored as the current frame/stack/allocation pointer. 
        
            * 
        
            * An alternative to this model is to copy anyway when making a tail call, using the tail call's 
        
            * parameters as roots. (Notably this requires the copier to take a list of mutable references to 
        
            * roots.) This ensures that no allocations are in the way if we need to extend the list of locals, 
        
            * at the expense of removing an obvious point of programmer control over the timing of memory 
        
            * management. 
        
            */

Event timeout

Ares needs a way to kills events to due to timeout.

Serf -> Mars

If #184 proceeds we will need to convert Ares from a serf to a mars.

@joemfb does the subcommand change (i.e. urbit mars ... instead of urbit serf ...)? If so this would enable a progressive implementation of Mars without interfering with Serf

Jet testing workflow

It would make sense to be able to verify jet behavior against trusted implementation (i.e. vere). This could improve test cases and assurance that said test cases are correct. Ideally there would be a tool to supply test inputs and generate unit tests with expected outputs returned by vere.

Fix `%hela`, `%nara`, `%lose` hints

The %nara and %hela hints might be useful during debugging and testing for checking the state of the mean stack. However, %nara is currently unimplemented and %hela is implemented incorrectly: %hela prints the mean stack to the bottom of the nearest frame, but this is not guaranteed to be the entire mean stack for the event, nor even for the current level of virtualization.

In addition, the %lose hint would also be useful, as we currently do no pruning of the mean stack, meaning that massive traces are possible.

jam is wrong for cells

At some point in writing #60, my implementation of jam for cells became wrong (I could have sworn it was correct at some point...)

For context, +jinx-gate takes in a gate and sample and sends it to Ares, which runs the interpreter and returns a jammed noun to Vere, which Vere then attempts to +cue. I've put some printfs from Ares in curly braces.

(some 1)
[0 1]

> (jam (some 1))
201

>  (jinx-gate:f some 1)
{ares}: result  [0 1]
{ares}: jammed result  9
dojo: hoon expression failed

So basically Ares thinks the jam of [0 1] is 9 instead of 201.

`+mook` crashes when attempting to format `%mean` hints during boot

Title says it all.

+cut jet tries to allocate size 0 indirect atom

(cut [3 [0 0] 0]) causes Ares to crash:

@philipcmonk wrote this I believe?

thread '<unnamed>' panicked at 'assertion failed: size > 0', external/crate_index__ares-0.1.0/src/noun.rs:314:9
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic
3: ares::noun::IndirectAtom::new_raw_mut
4: ares::noun::IndirectAtom::new_raw_mut_zeroed
5: ares::noun::IndirectAtom::new_raw_mut_bitslice
6: ares::jets::math::jet_cut
7: ares::interpreter::match_pre_hint
8: ares::interpreter::interpret::{{closure}}
...

treewalking interpreter: use a lightweight stack for intra-formula traversal

Presently the treewalking interpreter pushes a new 2stackz frame for every subformula. We should instead do something like what jam does and use a lightweight stack to traverse the formula tree, only pushing new frames for (non-tail) 2 and 9 evaluations.

Ares needs string interpolation on stack

Due to running Ares with assert_no_alloc enabled, we are unable to use the default Rust string interpolation because it allocates on the Rust heap. This is currently forcing us to comment out various blocks of code which would be most easily implemented with string interpolation and occasionally causing failures during regular, valid operation due to eprintln calls.

We need to overwrite the default string interpolation with a version that uses NockStack.

Testing for memory safety

It would be great to be able to have some sort of test suite for memory/thread safety in the long term. There are many tools to approach this with, such as miri, valgrind, tsan/asan/ubsan.

I made a PR for miri #45, but it is not mergeable in its current state, since it conflicts with the way memory is currently managed (libc). However, miri performs the most exhaustive checks leading to highest assurance of correctness, so it would be wise to consider using it to test memory safety.

Miri interprets rust code, thus libc is unsupported. This means you can't use the following:

mmap
pthread_kill
Other libc features

You can walk around mmap with the following:

unsafe fn allocate(layout: Layout) -> *mut u8 {
    #[cfg(miri)]
    return std::alloc::alloc(layout);
    #[cfg(not(miri))]
    return libc::malloc(/* ... */);
}

You can't walk around thread killing, thus you can't verify actual runtime timeout behavior.

If other libc features are used, likewise, not much you can do.

Miri would allow you to verify behavior at unit test level, for integration testing you would want to use some other tool, like valgrind.

In addition, for miri support you would want to correct the codebase to not have any dangling pointers being created. In the aforementioned PR I'm changing the way NockStack behaves to cast integers to pointers right before dereferencing them, instead of casting to double pointers and dereferencing them. This is a tiny change that helps miri keep track of allocations better and not reject the code.

I just wanted to bring this up as a possibility to look for in the future, because it's much better to catch safety issues before they spiral out of control.

Use rust crypto for crypto jets

In aid of #155, we should see if we can replace Urcrypt with the work of the Rust Crypto team.

Ares needs persistent caching

Vere will shortly be able to cache data across events using a persistent cache. This is not currently implemented in Ares.

`NockStack` guard page

The NockStack should have a guard page for safety and so that we can detect OOM errors during computation. Currently, there's no defense against the two stacks in 2Stackz clobbering each other.

CI: Add MacOS and Windows build flows

See this advice from @mcevoypeter

Parser jets

The current holdup between us and a dojo prompt is parser jets. These exist for vere in

https://github.com/urbit/vere/blob/45d28f9cf65c6088f3745e79172fd07d793d2169/pkg/noun/jets/e/parse.c

I've attached a trace which shows that most of the time spent in the %zest move sent by %dill to %clay is spent in the parser jets, outermost being +pfix. (Look at the end of the trace. Unfortunately at this stage we still have to trace the boot process entirely, and of course the boot process runs far more slowly when tracing because every call must be checked for tracing.)

https://drive.google.com/file/d/1eceb0VFXpORh20EsHG-ST2idqvgFjzsE/view?usp=drive_link

This trace can be loaded in Perfetto.

no_alloc debug flag

When assert_no_alloc() fires, it doesn't tell you what caused it. It would be good to have a debug flag that turns it off to find out why, instead of commenting it out like I currently do. (maybe this is already an option somehow with the crate)

codegen: emit bytecode

Once #135 is tested and merged we will have working codegen in Ares! However, the current rust-side representation is mostly just the noun representation of the IR, but with $maps replaced by HAMTs. We can do much better. With lifecycles already in place, we can create a bytecode representation which will be extremely cache-friendly, and replace linked-list iteration by array iteration. Further, it can include pointers to jets and to bytecode for called arms directly, removing HAMT lookups except at indirect callsites.

TODO: rough spec in this issue

atom -> byteslice functions pad with zeros

Even when an atom has a value that can fit in less than 8 bytes, our as_bytes() functions still return a [u8; 8], which isn't usable for some crypto operations (like ++veri:ed:crypto, for example).

It looks like we also zero-pad when we construct atoms (see IndirectAtom::new_raw_mut_bytes(), etc.). Is this necessary for an allocation reason, like if we assume all allocations are at least one 64-bit word, or something?

2stackz: top-frame copying GC

Currently in #143 we hard-reset the stack after each event.. This wipes out our hot and warm states, which are not saved to the snapshot as they contain function pointers. Thus, we have to re-initialize both after every event. Saving the state of the stack and restoring to the saved state suffices to preserve the hot state, which should not change over the course of our execution. However, it is not sufficient to save the warm state, which is reinitialized whenever the cold state changes and thus requires collection of stale entries and HAMT stems.

Instead we should implement a top-frame copying GC as follows:

ensure (assert?) that the stack is popped to the top frame, preserving hot and warm states.
Set a new allocation pointer at the opposite end of the stack arena from the current allocation pointer, causing copies to overwrite the locals area of the current top frame.
Copy hot and warm state to the allocation arena tracked by the new pointer, using a lightweight stack in memory adjacent to the old pointer.
Set up new stack and frame pointers at the now-reclaimed end of the stack arena.
Ensure pointers to hot and warm are updated in interpreter context.

The effect of this is that each event will run with the opposite-orientation frame as top each time. This should be fine, though we should audit for assumptions that "the top is always west." If the top frame has no locals and wasn't using the lightweight stack when we pushed, then this wastes no space. However if for some odd reason we did use locals or the lightweight stack in the top frame, the wasted space is now just part of the allocation area for the new "top" frame, and will be reclaimed when we switch back next time.

@joemfb @frodwith @ashelkovnykov please review this idea.

Use Rust-style doc comments

Instead of /* ... */ style comments for functions and structs, we should be using ///s, as instructed here.

HAMT `preserve()` issue for `ackermann(4, 1)`

If you clone the as/trace branch from #59 and attempt to change the call to the Ackermann function in the toddler pill from [2 1] to [4 1], Ares will crash with an error in the HAMT while trying to run preserve().

I believe it's an overflow error: the HAMT will attempt to call preserve on an IndirectAtom whose usize is returned as 16140918534541148574 words.

I tried debugging the issue, but I'm too unfamiliar with the HAMT to make any headway, and it's not integral to my work on stack traces and interrupts.

inverted peg in +plot

https://github.com/urbit/new-mars/blob/f1cfdeb8a720c39595901a22297b1761b9c5874a/docs/spec/ska/lib/degen.hoon#L204

(peg saf 2) should be (peg 2 saf) because we are pulling the it from the head of the subject, not the head of the subject from it

Tracing profiler

A tracing profiler would be very useful to identify cores which still require jets, as well as other issues preventing us from getting to a full boot.

The format should match that of vere: (example https://gist.github.com/eamsden/e4134b2b140207f42dc475e533c2f712) which is readable by Google Chrome's profiler `chrome://tracing/ and records both Arvo event durations and the durations of fast-hinted computations.

The json crate could be used to write the individual trace events to the tracing output. The current head of the eamsden/demo-debug branch provides a matches() function for the cold state, which could be invoked on the core computed for a Nock 9 to see if there is a recorded fast label for that core. To properly handle tail calls, save a stack of profiler spans containing fast label and start time in each stack frame, pushing whenever a fast-labeled core is called in tail position, and initializing with the fast path of a fast-labeled core called in non-tail position. When a frame is popped, get the time, compute the duration of each span, and dump full spans to the file.

For events: the serf can write a begin event as each event starts and an end event when each ends.

Re-enable clippy unsafe doc lint and add Safety docs for unsafe functions

Right now we have the missing_safety_doc lint disabled for clippy in CI. We should re-enable it and document the safety assumptions of our unsafe functions.

OOM / bail:meme check in NockStack allocation routines

jet_rev fastpaths

@eamsden suggested a way to improve the speed of the +rev jet by splitting boz into cases: #193 (comment)

Parser jets require unit tests

Currently, attempting to boot with #130 will fail almost immediately. We need unit tests to determine where there are bugs / jet mismatches in the parser jets.

PMA: cleanup tons of warnings emitted from `cargo build`

https://mastyr-bottec.coeli.network/scratch/view/f5822?rmsg=saved

rustc 1.74.1 (a28077b28 2023-12-04)
cargo 1.74.1 (ecb9851af 2023-10-18)

MacOS (darwin) Aarch64
(edit by @eamsden)

Scry logic unit tests

We should verify the correct behaviour of the scry logic in mink with unit tests.

CI should enforce no compiler warnings

We enforce everything else, might as well enforce this as well.

`mprotect()` `SIGINT`

The current implementation of handling SIGINT signals in Ares from king sets a sentinel value which is polled on every call to Nock 2, Nock 9, and push to mean stack. If the sentinel value is set, the event bails with a non-deterministic error. If the sentinel value is already set when the user attempts to set it again, the serf process exits.

An alternative solution is to mprotect() the entire NockStack on SIGINT. The next access will hit SIGSEGV, from where we can un-protect the NockStack memory and bail with a non-deterministic error. A second call to SIGINT while the region is already mprotect()ed will kill the serf process.

	pub fn pass_thru_scry(stack: &mut NockStack) -> Noun {
	// > .* 0 != => ~ \|=(a=^ ``.*(a [%12 [%0 2] %0 3]))
	// [[[1 0] [1 0] 2 [0 6] 1 12 [0 2] 0 3] [0 0] 0]
	let sig = T(stack, &[D(1), D(0)]);
	let sam = T(stack, &[D(0), D(6)]);
	let hed = T(stack, &[D(0), D(2)]);
	let tel = T(stack, &[D(0), D(3)]);
	let zap = T(stack, &[D(0), D(0)]);

	let cry = T(stack, &[D(12), hed, tel]);
	let fol = T(stack, &[D(1), cry]);
	let res = T(stack, &[D(2), sam, fol]);
	let uno = T(stack, &[sig, res]);
	let dos = T(stack, &[sig, uno]);

	let gat = T(stack, &[zap, D(0)]);

	T(stack, &[dos, gat])
	}

	/* XX
	* ~master-morzod suggests splitting stack frames:
	* when FP-relative slots are on the west stack, allocate on the east stack, and vice versa.
	* This would mean that traversal stacks must be allocated adjacent to the current frame's locals,
	* rather than being allocated adjacent to the previous frame, which may in fact make more sense.
	*
	* This would enable completely reliable tail calls. Currently we cannot do tail-call optimization
	* reliably for indirect calls, since the called arm might need arbitrarily many stack slots. With
	* the proposed "split" layoud we can simply extend the number of slots if a tail-called arm
	* requires more.
	*
	* Unless I'm mistaken this requires a three-pointer stack: the frame pointer is the basis for
	* relative-offset locals as per usual, the stack pointer denotes the extent of the current frame,
	* and the allocation denotes the current frontier of allocations on the opposite stack.
	*
	* Allocation means bumping the allocation pointer as per usual. Traversal stacks are implemented
	* by saving the stack pointer to a local and then manipulating the stack pointer.
	*
	* Pushing is implemented by setting the frame pointer to the allocation pointer, the stack pointer
	* to the necessary offset from the frame pointer to accomodate the locals, and the allocation
	* pointer to the stack pointer. The previous values of all three are saved in the new frame's
	* first three locals.
	*
	* Popping must first save the parent-frame stored values to temporaries. Then a copy is run which
	* will, in general, run over the stored locals, and update the temporary for the allocation
	* pointer. Finally, the temporaries are restored as the current frame/stack/allocation pointer.
	*
	* An alternative to this model is to copy anyway when making a tail call, using the tail call's
	* parameters as roots. (Notably this requires the copier to take a list of mutable references to
	* roots.) This ensures that no allocations are in the way if we need to extend the list of locals,
	* at the expense of removing an obvious point of programmer control over the timing of memory
	* management.
	*/