olofson / audiality2 Goto Github PK

View Code? Open in Web Editor NEW

79.0 11.0 5.0 1.26 MB

A realtime scripted modular audio engine for video games and musical applications.

Home Page: http://audiality.org/

License: zlib License

C 95.06% Shell 0.57% CMake 4.36%

realtime-audio sound-engine c jack sdl sdl2 modular music video-games synthesizer

audiality2's People

Contributors

Stargazers

Watchers

Forkers

v3n bentley chaifix pandygui joe-nano

audiality2's Issues

Make wtosc support longer waveforms

The wtosc unit has a 32 bit phase accumulator, so it's a trade-off between pitch accuracy and maximum waveform length.

Currently, the accumulator has 8 fractional bits, which means we can handle waveforms up to 16 Msamples. That's almost 6 minutes at 48 kHz. If we need better pitch accuracy (see #10), this is cut in half for each extra fractional bit...

We don't really want any restrictions like this! We should probably add some cute fragmentation logic, or otherwise increase the effective number of bits of the phase accumulator.

Maybe it would be acceptable to simply use a 64 bit phase accumulator? This would be emulated by the compiler on 32 bit CPUs. (At least non-ancient versions of GNU gcc will do this.)

For the lowest end platforms, we could simply use a 32 bit accumulator, and accept the resulting waveform length restriction there. One is not likely going to use tens of MBs of 48kHz 16 bit samples on those anyway.

Get render-to-wave working

Most of the mechanisms needed for this should be there, in the form of a2_SubState(), the "stream" driver and a2_Read(). The last one is not yet implemented, but once that's in place, it should be possible to render programs into waves using only the official API.

Next, we need to figure out how to do this from scripts, without any explicit host application support whatsoever. Modules should be able to use programs to pre-render waveforms that can then be used just like the built-in waveforms in other programs. There are basically two ways of doing this:

Imperatively, by having the compiler set up a substate and actually render the requested waveform as the very "render wave" statement is compiled, or...
Declaratively, by having these "render wave" statements just declare waves (store program ID, arguments, any other parameters) as the statements are compiled, and then perform all rendering later - after the whole script module has been compiled, or perhaps even on demand, so that only waves that can actually be used ("dead program elimination?") are rendered.

This should probably be implemented in three stages:

Implement it in a test application, finishing the API and adding any missing engine features as needed.
Clean up and move the resulting code into a2_RenderWave(). (a2_RenderString() can probably be implemented as a trivial convenience call on top of that.)
Implement support for doing this from within scripts.

wtosc pitch accuracy issues

Octaves are not always pure when they should be. 16:16 fixed point just isn't accurate enough for linear pitch...? This hits API messages as well - not just VM code with f20 immediate values!

Voice allocation/instantiation performance

The engine currently uses a pool of partially pre-initialized voices. This has the advantage of avoiding some initialization code when grabbing voices from the pool, but it also has the drawback of increasing memory and cache footprint.

Maybe it's actually faster to keep all realtime memory blocks in a single pool (as far as possible; the core engine objects have different sizes) despite the extra initialization work?

Benchmarking is needed! Results will most likely vary between platforms; in particular "PC" vs mobile devices.

a2_Kill() doesn't seem to work properly

While working on 2D positional sound effects for Kobo II (a model with per-body sound sources implemented as voices; a design now abandoned), I observed problems with leaking voices (expected) - but as I tried to kill them for testing purposes, that didn't seem to work as intended either;

Voices are leaking! They seem to burn CPU time as well.
Is KILL forgetting to detach voices?
Are voices actually killed at all...?

This needs to be verified and if something's truly wrong, fixed!

Pitch accuracy with resonant filters?

To check: Do we have pitch accuracy problems (similar to what we have with wtosc in 1.9.0) with oscillators based on resonant filters as well?

Remove shared state checks

Why are we checking the 'ss' field all over the place? Is it really possible to run into a state that has a NULL there without using a stale pointer?

Probably traces of ChipSound with a global static engine state... We should just remove all those checks.

Loop construct with access to counter

The scripting language needs a nice loop construct that gives the loop body access to the loop counter. Maybe start, end and step while we're at it?

Implementation: The LOOP instruction is already using a VM register for iterator (hidden; allocated by the compiler), so we could basically just slap a variable name on that. Except it counts down to zero, of course.

MIDI file loader/player

One could implement this as a MIDI file parser that generates A2S code that's compiled into a program, but it's probably both easier and more efficient to implement it as a voice module or similar contraption that behaves like an ordinary voice controlling a number of subvoices.

However, routing is probably going to have to be a bit more sophisticated than directly controlling subvoices. Looking at the current hand scripted Audiality 2 songs, the instruments are the leaf nodes of complex trees - not just voices playing on different channels as in a traditional MIDI synth.

Unit tail handling

We need mechanisms for handling reverb tails and the like! Detached voices terminate when all their subvoices have finished, ignoring tails of the local units, which is obviously not very nice at all.

It would be easy enough to add a VM instruction that looks at the output - but that doesn't work when the last unit mixes directly into an external output buffer.
Use an envelope tracker unit? Set it up to measure the peak value over some suitable period of time, and poll the result...
The most efficient solution is probably to have units report their tail status via some efficient out-of-band interface.

Chorus effect unit

Implement a chorus effect! (A bank of modulated delays.)

'import' directive/instrument library

The A2S compiler needs an 'import' directive! Preferably, we should be able to set up an instrument/sound/song library where individual items can be loaded and compiled on demand - so maybe we should take it further, and not just compile entire .a2s files as they're imported, but rather import specific programs...?

Verify wtosc phase register functionality

Does granular synthesis (subsample accurate phase control) work properly now? Not entirely sure it ever did work correctly. Either way, I suspect that a2_OscPhase() was broken when issue #10 was fixed.

Hermite interpolating waveshaper unit(s)

Waveshaper! For anything from subtle compression via ring-modulation-like effects through clipping/distortion effects. Control points should be controlled with ramped control registers for smooth, click-free modulation and morphing of the effects.

Do we do it all live or crossfade between control point sets? Can we crossfade coefficient sets...? These "optimizations" don't seem to make much sense while ramping control points, as we're only ever using two control points (or 4 coefficients) per sample (oversampled) anyway.

How many control points? We should probably provide a few versions with different numbers of control points. (Or extend the compiler and unit API so we can provide unit instantiation parameters.)

Interface with MIDI

The normal API is not suitable for realtime control in musical applications. You can certainly use timestamping for accurate timing, but that calls for substantial extra latency (buffering, essentially) to be reliable. This API is meant for soft realtime control of sound effects and (possibly) playback of pre-recorded events.

The proper approach is to handle it all in the engine realtime context, integrating it more directly with the internal event subsystem. (See A2_event.) This is probably best implemented in two parts:

A new driver API that makes MIDI and similar protocols/APIs available to the engine. We probably want these drivers to more or less directy generate/receive A2 events.
An interface between these drivers and the scripting engine. What this boils down to is probably just telling which voices to send incoming input (ex MIDI/...) events to, and some way of sending output events to an external port instead of a subvoice.

This design allows A2S modules to support MIDI and similar without explicit host application support. Basically, the typical glue code is just scripted in A2S instead of native code in the host.

Meanwhile, the driver API still allows host applications to inject their own drivers if desired, for example, to drive Audiality 2 directly from a realtime MIDI sequencer engine, or to add support for protocols and APIs not directly supported by Audiality 2.

Construct to spawn subvoice with ID from variable

The VM instruction (SPAWNR) is there, but unless I'm missing something, there's no way to use it. So, we can only spawn subvoices with hardcoded voice IDs, meaning we can't use a loop to start a number of subvoices to be controlled later.

We also need something like REGISTER '<' ... to go with this, to be able to send messages to these subvoices in a similar fashion.

MML parser

MML is a widely known and relatively effective way of entering music as text. (Unlike coding it in A2S, which wasn't really designed for that at all...)

The most basic implementation would simply translate monophonic MML into the same type of VM code we're currently using for controlling subvoices. However, that means an MML statement can control only a single subvoice, and completely occupies the host voice while playing.

We could work around that by actually spawning a subvoice for each MML statement, playing the notes on a subvoice of its own, leaving the host voice free to do other things - like playing other MML statements. This has the problem of detaching MML timing from the host voice, forcing the author to rely on explicit delay statements if the program is to do anything more than just starting a bunch of MML statements and then end.

Another option would be to make the MML statement itself polyphonic and/or support actual polyphonic MML. (Syntax?) That would result in VM code being generated for all channels in one go, interleaving the code as needed, so we can inline it directly in the host voice and get synchronized timing as well as polyphony.

Fast pitch coefficient approximation for filter12

There's powf(), sin() and division going on in filter12 when changing the cutoff frequency. That's not very fast - and if we're going to run this on non-FPU hardware, it's really bad news!

We could probably come up with a polynomial approximation for the whole expression, that's faster (at least without a proper FPU) and still accurate enough for musical pitch tuned filters.

TK_FWDECL is unused and untested

TK_FWDECL appears to be implemented, but is never actually used. Do we need it, or should we just remove it? (Unused, untested code is evil...!)

Validity of integer signal processing

The situation

Let's face it: High quality integer/fixed point signal processing is one major pain in certain regions to implement! Any non-trivial operation becomes a nightmare of juggling bits, weighing headroom against resolution through every single step.

Not only that, but it's already been quite a few years since floating point started turning out to be faster than fixed point more often than not on desktop/workstation CPUs, and with the latest advancements, floating point SIMD instructions and all, that's probably even more true now.

However, what about handheld devices and these new low end consoles? It seems that only the latest generation of ARM CPUs have floating point hardware, and how fast are they, actually? What about Atom and other low-power x86 compatibles? When will handheld devices arrive at the point where floating point is as fast as, or faster than fixed point?

Another aspect is that Audiality 2 may not be the best option for low powered devices anyway, regardless of how the processing is done. After all, it's a modular synthesizer with sub-sample accurate scripting! A simpler, more traditional sound engine would most likely be faster, and probably easier to use as well.

Then again, if you already have an application written towards Audiality 2, porting it could be a (relatively) simple matter of converting the scripts to use more pre-rendering and/or mp3/ogg streaming, and just leaving the application code as is.

What we need to decide

Is there any point in bothering with integer processing at all? Do any viable target platforms have poor enough floating point performance that it makes sense to keep some integer support?

A possible solution

If there is any point in doing integer processing at all, what we could do is restrict the integer support to only a small core set of voice units. This way, we have what we need to run appropriately downscaled Audiality 2 projects, without implementing integer versions of the more complex (and most likely too CPU intensive) units.

Of course, if we decide to drop integer processing altogether, then that's all there is to it. Simply don't use more and heavier units than your target platform can handle - as always!

Voice VM timing issue

There is a problem somewhere, causing the VM to frequently wake up (approximately?) one sample frame too early after a DELAY instruction. When forcing the phase of an oscillator, this looks like the output stream occasionally drops one sample frame.

It might be some kind of edge case issue with the fractional sample timing, but since it doesn't happen at any exact fractional value (like 255, for example), it seems more likely that there's a problem with the buffer splitting logic; a '>=' instead of a '>' or something like that. Or are we perhaps rounding time to sample frames in the wrong direction somewhere?

DC blocker filter unit

Add a DC-blocker highpass filter with a very low cutoff frequency. Should be a reasonably steep IIR/SVR filter, to avoid latency in the pass band.

Cutoff should probably be configurable (default 10-15 Hz or so), but no control ramping/smoothing or anything; you're supposed to use "real" filters for that.

Streaming oscillator unit

We need a streaming oscillator unit, allowing us to a2_Write() (see #8 and #34) to stream audio into a realtime graph, modulating and processing it similarly to how an uploaded wave is played back by wtosc.

The oscillator should have a control register that reads back the current buffer status, so scripts can deal with stream drop-outs, sample rate mismatches and the like.

One specific use case we might want to add explicit support for is streaming from other A2 engine contexts running in lower priority threads. This can reduce latency and CPU load issues by allowing additional CPU cores to offload work from the main engine thread. Like an off-line rendering context for rendering waves, a low priority engine state like this can share all "static" data (programs, waves etc) with the realtime state, so it's essentially just a partition of the graph.

Command line player

A proper command line player would be handy. 'a2play' or something. Needs to have the usual options for driver selection and configuration. We also need options for starting specific programs with arguments.

Realtime error messages

We don't want to do string processing and stuff in the realtime context, and certainly not print messages to stderr or similar! However, just setting some state global 32 bit error code when something goes wrong isn't very helpful.

For example, wtosc will just silently fail and switch off if you give it a wave it cannot play. (Control register write callbacks have no error handling, as most of them can't fail anyway, and we want minimal main path overhead.)

There should be some kind of log where the engine core as well as voice units can send more specific error messages. A message should contain at least a source object handle and an error code. We could also add secondary object handles, voice unit descriptors etc, arbitrary values, pointers to static strings etc.

Messages and message classes should preferably be simple and well defined, so application code can actually use them, rather than just throwing incomprehensible error messages at the user.

Clean up the object property interface

Currently, object properties are addressed using a single enumeration of integer names, covering the properties of all object types. We're sticking with the single enumeration/integer name idea, but we need to extend the interface to support applications and language bindings better. We need calls to check what properties an object actually has, and a call to translate an integer property name into a string.

Should we desire to add support for run-time defined properties (for example, A2S programs defining custom properties), we can easily add that on top of this interface by adding a state global property registry.

There is a more serious problem, though: Synchronization! As it is, we'll have to hold the engine lock to touch any interesting properties, and that's something you should NEVER do during normal operation. Everything needs to be lock free to ensure glitch free operation on all platforms.

There are a few approaches:

Lock free techniques.
This gets very complicated with objects that may be destroyed by the realtime context at any time.
Caching.
Quickly becomes extremely expensive! We can't know ahead of time which properties of what objects an application is going to read, or how often, so we'd essentially have to send them all to the API side after each realtime engine cycle...
Realtime-only API.
Keep the API, but allow calls only from the context of the realtime engine. Thus, any application code dealing directly with properties will have to be moved into one or more callbacks, moving the synchronization problem to the application side.
Request/response events.
Use the existing API/engine message queues to pass batch requests for property reads and writes. The actual operations would be performed in the realtime context, and values and errors would be sent back to the API as messages.

Alternative 1 would be really hard to get right, and 2 is simply not viable for performance reasons. (We can have thousands of objects, most of which will have several readable properties!)

Alternative 3 might be of interest if there's a case for realtime application code that needs to access properties. Can't really think of any valid reasons to do that at this point, though. It should be simple enough to add such an interface if needed, as we'll probably need to implement something much like that internally anyway.

So, it looks like 4 - request/response events. It's probably the only proper way of doing it, at least on mainstream operating systems (an RTOS with tools for dealing with priority inversion is another matter), and the engine is already operated like that during normal operation, so it makes perfect sense to build on that.

Realtime safe fbdelay unit

The 'fbdelay' unit currently uses calloc() to allocate and clear the delay buffers, which obviously is not realtime safe on normal operating systems! This needs to be fixed one way or another - probably by using the features that'll be provided by #16.

While we're at it, it would be nice to have ramped controls, so we can "morph" the delay effect without artifacts. This is probably best implemented using a chain of smaller buffers, so we can add and remove buffers as needed, rather than reallocating and copying.

I'm not a big fan of the idea of setting a maximum delay on initialization for buffer pre-allocation... In this engine, we need to perform the full unit instantiation in the realtime context anyway, making this hack seem even sillier.

Subtle pitch/period inaccuracies

After fixing #43, we still have subtle artifacts when running a nasty, unforgiving test such as this:

// Oscillator phase/timing test. This is supposed to sound exactly like
// a plain sine wave oscillator, with no audible artifacts whatsoever.
PhaseTest(P V=1)
{
        struct { wtosc, panmix }
        !per p2d P
        w sine, a V
        for {
                p P, phase 0, d per
        }
.rel    a 0, d per
        1() { force rel }
}

It turns out we simply don't have enough fractional bits in the VM registers to express audio frequency periods in milliseconds, as we do here.

For example, pitch 26n is 1174.652832 Hz, or 0.851315361235174 ms per period, which translates to 55792 (with correct rounding) in the register - and that's just not accurate enough to avoid audible artifacts in extreme cases like this.

We can add correct rounding in some places to improve matters slightly, but it's still not really good enough for this kind of (ab)use. We could probably "solve" this by switching to some other unit than milliseconds, or some odd fixed point format like 12:20, but considering #28, we'll just fix the rounding errors and leave it at that for now.

Buffered alternative to a2_Tap()

In native (C, C++ etc) applications, you can implement buffered streaming around a custom callback, but obviously, you can't do that in a scripting language that cannot be called from the engine realtime context! Rather than forcing language bindings to implement this on their own, we should provide a suitable API.

Suggestion:

Add a call a2_BufferedTap(), which returns a handle that can be used for reading
buffered output from an xinsert plugin in "a2_Tap() mode."
Use an SFIFO for a2_BufferedTap()? That's the "tightest" interface possible timing wise, and we need specific, user configurable buffer size restrictions anyway...
We could/should probably use a2_Read() (see #8) for reading the stream.

Failure in a2_InitWaves() results in double free crash

Granted, if a2_InitWaves() fails in any version so far, you're probably in quite serious trouble anyway. (OOM would be the only valid reason.) However, there aren't supposed be crashes after handled errors!

More importantly, this accidentally discovered crash could be an indication of some bug in the init/cleanup logic that might impact application code.

'notebase' directive

Base-10 isn't all that practical for music based on twelve-tone scales. Some say 12 tone scales are weird as well. So, how about a directive that lets you set the number of notes per octave and/or the numeric base of the note numbers?

Problems:

'a', 'b' etc can't start a numeric literal! However, we can put a '0' before those - which kind of goes nicely with the other figures needed outside octave 0 anyway!

One bit/PWM noise/pulse generator unit

Essentially a DC generator that flips the output sign bit at random intervals, generating full amplitude noise in a specified frequency range.

It would probably be nice and handy to have the usual 'p' (linear pitch) and 'a' (amplitude) registers (like wtosc), along with 'low' and 'high' registers which specify the longest and shortest periods allowed in terms of linear pitch offsets from 'p'. Setting 'low' and 'high' to 0 would result in a square wave at pitch 'p'.

An amplitude randomness control would probably be useful too, allowing the oscillator to use random output levels. Setting this to 1 should result in only a and -a levels, setting it to 0 would result in completely random levels, and 0.5 would result in levels from the two ranges [-a, -.5a] and [a, .5a].

Any other interesting controls that make sense to build into the unit?

It would be nice to implement subsample accurate timing for accurate handling of high frequencies, but doing that well is actually rather complicated and expensive. Is it worth the effort? Maybe simple interpolation around the switch points is sufficient? We're not trying to generate crystal clear square waves here, after all...

Realtime safe DEBUG VM instruction

The DEBUG VM instructions should send messages through the API response FIFO, instead of making nasty system calls! It's "only" for debugging, but if you do that on a system configured for realtime audio, you get kicked sooner or later...

Realtime memory manager

Some units need a bit of memory for internal state, and although we could brutalize most things to deal in blocks of 256 bytes (A2_MAXFRAG samples), it would be real nice to have a slightly more malloc()-like system that's still realtime safe.

Power-of-two size bins?
Max size 65536 or so...?
Certain API calls check memory manager status and send new blocks as needed.
Largest bin found empty in RT context --> malloc()! (If this happens, you haven't configured the engine properly - but this gives you a fair chance of getting away with nothing but a warning in the log.)
The VM is already wired to the system driver memory manager calls... So we just implement it there, and leave the current (fast) pools alone?

Recursive calls can crash the VM

Apparently, recursive calls can crash the VM! Not quite sure why, or what happens, as all I get is JACK weirdness. The expected behavior would be that the engine runs out of memory or freezes after a while, but something else seems to happen.

Remove the 'run' statement

Though it may seem like a neat idea at first when thinking about A2S as a music language (which it was not originally meant to be), it's fundamentally broken in the context of Audiality 2 for several reasons.

Tails! If a program hangs around to handle envelopes, reverb tails and the like, 'run' will wait for that too, rendering it useless for musical timing. So, we'd have to add some way for voices to tell their parents when they were actually supposed to terminate, even if they don't...
Control flow reversion! To implement 'run' with sample accurate timing, we'd have to change the engine internals to allow voices to abort processing before finishing their fragments, somehow returning that information back up to the voice executing the 'run', so it can wake up its VM.
Control flow reversion - again! The problem with 2 is that as it is, it could only work for inlined subvoices (literally running inside an 'inline' unit), since normal subvoices are processed for the full fragment after the parent voice is all done processing - and at that point, it's obviously too late to tell the parent voice about any early terminations.

So, although it would (unless I'm missing something) be theoretically possible to implement 'run' properly, it's complicated and potentially impacts performance. And all this for a feature that's typically used when one just can't be arsed to keep track of the duration of clips/patterns when handcoding a song...? Nope.

6 dB/octave filter unit

6 dB/octave state variable filters are very low cost, and still quite useful for tweaking the timbre a bit - like a very cheap EQ.

It should have per-band gain controls instead of the typical filter type selector, similar to how filter12 works, to make it usable as an actual EQ. (A traditional filter needs additional units and manual wiring to do that.)

Raw audio read/write/seek interface

We need a generic interface for reading from and writing to waves, streams and the like, supporting both off-line and realtime objects.

An a2_Read() call is planned but not implemented. (See #8.)

There already is an a2_Write(). It's called a2_WaveWrite(), works only for waves, and also incorporates an annoying 'offset' argument... Turn this into a generic a2_Write() call! It'll have to be made aware of additional object types eventually.

Remove the 'offset' argument from a2_(Wave)Write() and add a function a2_Seek() instead. Waves and other objects will have to maintain an internal stream pointer or similar where applicable.

We should probably also add calls for reading back the stream pointer, and for getting the object size or stream buffer status.

Unit API cleanup and tweaks

There are a few quick hacks in the voice unit API that might need cleaning up or tweaking. These things should be sorted out ASAP, before too much unit code relies on them.

Register write callbacks are getting integer 'frames' (ramp duration) arguments, whereas timing is generally subsample accurate. The wtosc unit already implements subsample accurate timing when writing the phase register.
Register write callbacks get a A2_vmstate pointers. Most callbacks don't need it, and those that do would probably be better served by private pointers directly to the relevant fields. We can set that up in the Initialize() callback when needed instead.
The A2_vmstate timer field isn't very nice to deal with... It counts "backwards," as it actually is a delay timer - not a current time counter. Also, now that it actually contains the right value, there should be no need for units to '& 0xff' to get the fractional part only.

wtosc should check the length of waves

As it is, wtosc has an unsigned 24:8 phase accumulator, so it's limited to 16 Msamples. (Unless I'm missing something. The mipmapping logic should be checked for further restrictions!) However, there's no check when selecting waves, so there will be trouble if someone hands it a wave that's too long.

If we don't remove the length restriction, we should at least have wtosc refuse to deal with waves it can't handle.

Implement A2_NORMALIZE, A2_XFADE and A2_REVMIX

Implement these wave upload post processing features as outlined in wave.h:

A2_NORMALIZE - Normalize ("maximize") amplitude in the conversion.
A2_XFADE - Crossfade mix a copy offset by half the loop length.
A2_REVMIX - Mix wave with a reversed version of itself.

Forget about A2_WRAP, since we can't do that with the current API. I was probably thinking of some variant where you defined the length explicitly - but now it's defined automatically by the amount of data uploaded.

Effect routing

The strict tree structure of the DSP graph is nice, simple and handy in many ways, but it becomes a PITA when dealing with "send" effects in the typical MIDI/studio mixer sense.

As it is, you can only send audio up the tree (towards the root/master output, that is), which means the only way we can actually send audio to effect units (reverbs, choruses etc) is to pass it along with our output using additional channels. That actually seems rather neat in theory, but it requires support in the A2S language, and a bit of logic to avoid wasting cycles on send buffers that are just to be passed along up the tree.

Another option would be to introduce a way of sending audio to arbitrary voices in the graph. However, that violates the tree graph design, and from the processing order/dependency point of view, this has the same problems as arbitrary processing graphs. This adds complexity and overhead, and we also need to deal with the loop issue in some well defined way - probably by prohibiting loops altogether. Also, how would we even address "remote" nodes from within scripts? (Remember: The voices are not static objects! We can't declare "insert points" in programs, as there can be any number of instances (including 0) of any program running in various places of the graph.

Oversampling filter12

Currently, the filter12 unit clamps the cutoff frequency to Nyqvist/4 to keep the filter from going unstable. A more proper, better sounding solution would to oversample the filter, so we can sweep it all the way to Nyqvist/2. (Beyond that point, we can basically just bypass the filter, as it should have no effect on the signal, theoretically.)

However, although it's a simple filter, it's still 7 shifts, 6 multiplications and 6 adds/subs per sample. And even a simple mobile device project would likely use a fair number of these filters. Would it be worth the effort to have the filter oversample only as needed? Can we switch oversampling depth on the fly without artifacts?

Simple but somewhat ugly hack: Implement the oversampling version as a new unit, so programs can select it explicitly.

Subvoice spawn timing is way off

When a voice spawns a subvoice, the subvoice is supposed to start at the exact time at which the spawn instruction was executed. That is, the new voice should be started in a sleeping state, set to wake up at the corresponding sample frame of the spawn instruction, with the same fractional time at which the spawn instruction was executed.

This, however, is not exactly what's happening, to say the least. It sounds like timings are off my many sample frames. They're possibly quantized to the first sample frame of the fragment, though I don't think it sounds quite like that... One should have a look at the output waveforms to get a better idea of what's happening here.

Possible subvoice/event timing related heisenbug

When testing after fixing the subvoice and event timing issues, there were some glitches at one occasion. Occasional missed events and/or spawns, and various other weirdness. Kobo II Trance seemed to provoke a lot of these at first, but it seems rather random and/or dependent on prior activities...

However, now I can't seem to repeat any of this. It's all rock solid! >:-/

(Related to the fix for #44.)

Separate VM instruction set for unit control registers

As of now, the VM treats all registers the same way. To avoid unit callbacks for every single control register modification, and also to make duration info (provided by the next timing instruction executed) available to the callbacks, there is a piece of logic known as A2_regtracker, which keeps track of registers that have been written to, and does the actual callbacks all in one go when a timing instruction is issued.

NOTE: The 'set' instruction bypasses this and directly performs the control register callback, whereas any normal control register writes are actually performed by the next timing instruction!

In order to speed up register operations as well as enabling various higher level math optimizations (compound operation instructions etc), the A2_regtracker logic needs to be moved into the compiler, and the control register logic should be removed from the affected VM instructions. For this to work, we need to add timing instructions that can apply control register changes as specified by the compiler.

There is a hairy problem with this, though: Conditional code! It's perfectly possible to write code that only conditionally writes control registers before the next timing instruction. To deal with this, we'll have to either keep A2_regtracker for control registers, or have the compiler duplicate the code after the branches so it can issue different timing instructions. The latter sounds like big can of worms to open...

Implement manual voice unit wiring

Autowiring is just a simple hack to avoid coding explicit wiring into every single program, even when it's just a matter of a simple chain or a straight bank of oscillators. Manual wiring is where a synth truly becomes modular in the (virtual) analog sense. So, this is a priority. This is where it gets interesting!

Autowiring is currently dealing only in entire "buses" rather than individual channels. (They're not actually buses in the audio mixer sense. We're just abusing the A2_bus type, since it's there, with calls and a realtime pool to go with it.)

To wire individual voices, we're going to have to manage per-voice arrays. The actual buffers can still be recycled, though! It probably gets more complicated to get it right, though...

Exceptions/performance hacks:

A single input or output can always be pointed directly at the intended bus channel.
Multiple channels need their own buses, except in the special case that they're wired to a contiguous range of channels in the right order. Worth the effort to test for?

Control register readback support

Most voice units have one or more control registers that may change their value automatically. Usually, this would be the core control ramping feature (which is obviously predictable, as it's dictated by engine code rather than unit code), but there are special cases, such as the wtosc phase register, which keeps rolling whenever a wave or noise is playing.

As it is, the actual values of these registers cannot be read back by the VM. What you get is the last value written. This isn't much of an issue with control ramping, as ramping is started from whatever value the register actually is at anyway, but for example, reading the aforementioned wtosc phase register is absolutely pointless, and reading a ramping register from inside a message handler isn't much use either.

(There was also a comment on the FORCE instruction; "Have units read back their states into their control registers!" Not quite sure what that was about... Who needs that, unless they're actually reading those registers?)

The implementation would probably come down to another (optional) callback for each control register. Like the write callbacks, it should have access to the current fractional time, so control "signals" can be read with subsample accurate timing.

On the VM side, along the lines of #46, we should perhaps add special "first read in fragment" instructions that perform these callbacks, caching the value in the internal VM register for any subsequent reads before the next timing instruction. Since control register allocation is fixed at compile time, the compiler should probably just use the normal instructions for registers that don't provide readback callbacks.

Narrowband noise generator unit

Similar to #36, but instead of just dropping fixed or somewhat random values, this one should "morph" from point to point using Hermite implementation or similar. We want the interpolation to be at least cubic, so that each output segment starts and ends with a second derivate of 0. That is, setting the bandwidth to 0 should result in something rather similar to a sine wave.

(Note: If you actually want it sine wave based, you're probably better off just modulating a wtosc, the way I've been doing explosions and rumble in Kobo II before resonant filters.)

Unstable filter12

The filter12 unit filter explodes if the cutoff is set too high! The cutoff is clamped to keep the filter stable, but this is no longer working for some reason.

(Oversampling comment moved to feature #27.)