jamesmunns / bbqueue Goto Github PK

A SPSC, lockless, no_std, thread safe, queue, based on BipBuffers

License: Apache License 2.0

Rust 100.00%

bbqueue's Introduction

BBQueue

BBQueue, short for "BipBuffer Queue", is a Single Producer Single Consumer, lockless, no_std, thread safe, queue, based on BipBuffers. For more info on the design of the lock-free algorithm used by bbqueue, see this blog post.

For a 90 minute guided tour of BBQueue, you can also view this guide on YouTube.

BBQueue is designed (primarily) to be a First-In, First-Out queue for use with DMA on embedded systems.

While Circular/Ring Buffers allow you to send data between two threads (or from an interrupt to main code), you must push the data one piece at a time. With BBQueue, you instead are granted a block of contiguous memory, which can be filled (or emptied) by a DMA engine.

Local usage

// Create a buffer with six elements
let bb: BBBuffer<6> = BBBuffer::new();
let (mut prod, mut cons) = bb.try_split().unwrap();

// Request space for one byte
let mut wgr = prod.grant_exact(1).unwrap();

// Set the data
wgr[0] = 123;

assert_eq!(wgr.len(), 1);

// Make the data ready for consuming
wgr.commit(1);

// Read all available bytes
let rgr = cons.read().unwrap();

assert_eq!(rgr[0], 123);

// Release the space for later writes
rgr.release(1);

Static usage

use bbqueue::BBBuffer;

// Create a buffer with six elements
static BB: BBBuffer<6> = BBBuffer::new();

fn main() {
    // Split the bbqueue into producer and consumer halves.
    // These halves can be sent to different threads or to
    // an interrupt handler for thread safe SPSC usage
    let (mut prod, mut cons) = BB.try_split().unwrap();

    // Request space for one byte
    let mut wgr = prod.grant_exact(1).unwrap();

    // Set the data
    wgr[0] = 123;

    assert_eq!(wgr.len(), 1);

    // Make the data ready for consuming
    wgr.commit(1);

    // Read all available bytes
    let rgr = cons.read().unwrap();

    assert_eq!(rgr[0], 123);

    // Release the space for later writes
    rgr.release(1);

    // The buffer cannot be split twice
    assert!(BB.try_split().is_err());
}

The bbqueue crate is located in core/, and tests are located in bbqtest/.

Features

By default BBQueue uses atomic operations which are available on most platforms. However on some (mostly embedded) platforms atomic support is limited and with the default features you will get a compiler error about missing atomic methods.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

bbqueue's People

Contributors

Stargazers

Watchers

Forkers

fmckeogh b-xiang disasm sjroe korken89 atnightly atul9 dotkt jlmsymbio mvirkkunen yazdan justacec astraw zktos vonwenm dirk-dms yatekii tedsta mattico dfreese sympatron timvisee wuerwa dirbaio hasheddan lulf icodein fnafnio mriise munin-space mciantyre ithinuel xgroleau sphw sizurka fluffysquirrels antoinevg ertis-research simmsb mathiaskoch dgoodlad linuxaged stienman adfernandes astroforge-incorporated liamkinne taunusflieger clownsw

bbqueue's Issues

Aligned grants

Hello here,

I was trying to use my SPARK implementation of bbqueue in a USB stack but I realized that it would not work with the internal DMA of the USB Device controller (samd51) because the addresses have to be 4 bytes aligned.

I think it should be possible to add alignment constraints to the framed implementation buy adding padding header length, I'd like to have your opinion on this.

Improve docs to make it clear grants persist until released

Did I miss a way to test if the BBQueue is empty without actually pulling bytes?

If not, was there an architectural reason not to implement is_empty() on a Consumer?

Rationale for is_empty():

Sometimes I may want to check to see if there is something in the queue and then schedule the actual read for later/somewhere else. For example, I may want to set a flag in interrupt context or schedule a DMA sometime later.

Unfortunately, if I have to call read() I have to be ready in case the read comes back with a grant instead of InsufficientSize. Getting that grant to where it needs to be may not be very simple especially if the grant is only the first half of a read cut apart by inversion of the read/write pointers.

Thanks.

bbq! macro should be unsafe

The documentation says:

this macro is not intrinsically thread safe!

So the macro should be unsafe to call. Also, since this macro is unchecked you may as well directly return &'static mut BBQueue (without the Option).

I would also recommend adding a safe, checked variant that enforces singleton-ness using atomics -- see second snippet in rust-embedded/cortex-m#128 (comment) .

Store the current active grant for the Producer and the Consumer

Currently, the action of either a Producer grant or a Consumer read is a Grant instance that must be maintained and tracked. I have found this difficult when using the RTFM embedded framework. Since this queue is designed to be SPSC, there can only ever be one write Grant and one read Grant at a time.

If this is the case, I recommend that the Producer and the Consumer internally track the grant.

This would simplify things for the end user in both complexity and the housekeeping of tracking extra symbols.

Thoughts?

Convenient Heap use of BBBuffer

It would be good to have a heap allocated version of BBBuffer, that would automatically be dropped when the Producer and Consumer (and all grants) have been dropped.

This should be possible by replacing the NonNull references in the producer or consumer with an Arc reference, and likely providing a constructor for BBBuffer with something like ::new_arc() with the buffer located in something like an Arc<Box<BBBuffer>>.

Add examples and better top level documentation

Add commit_and_grant convenience methods

Often you want to commit a grant (read or write), then immediately get the next grant, if possible. We should add an interface that lets you do that in one operation.

This would be particularly useful for "self reloading" DMA operations, where the "end of DMA interrupt" could immediately retrigger the next DMA transaction to start.

MPSC Support Possible?

I might be missing something very obvious here, but it seems to me that BBQueue is very well setup to act as an MPSC queue. When taking a grant there is already a "lock" of sorts with write_in_progress. It seems like you could create a function that allows for the creation of multiple producers from the same underlying buffer. Something similar to #78 would work. Of course this would make writes effectively have a lock, but there are cases where that might be an acceptable trade-off.

Are there any safety issues with the approach I outlined above, or other trade-offs I might have missed?

Ability to read_exact so that producer can fill up the rest of the buffer

Hi @jamesmunns,

Awesome project! I am using BBQueue to shuffle audio data from a producer loop that is temporally unstable (some audio frames take longer to process than others) into a buffer and then reading that data at very critically timed intervals (inside an I2S interrupt) to pass it to a digital to analog converter. I've tried to distill the essence of this setup in the code below.

   // runs forever
    fn producer_task(producer: &mut Producer<'_, 1000>) {
        loop {
            match producer.grant_exact(100) {
                Ok(mut grant) => {
                    grant.to_commit(100);
                    // get audio bytes and copy them into write grant
                }
                Err(_) => {
                    // Input queue full, this is normal
                    // the i2s interrupt firing should free up space
                    cortex_m::asm::wfi();
                }
            }
        }
    }

    // fired by interrupt every 10 ms
    fn consumer_task(previous_grant: GrantR<'_, 1000>, consumer: &mut Consumer<'_, 1000>) {
        previous_grant.release(100);

        match consumer.read() {
            Ok(grant) => {
                long_running_dma_copy(grant);
            }
            Err(_) => {
                // no audio to play - play silence
            }
        }
    }

The long running consumer read operation holds the read grant for all the data available to read when it only really needs to read a small chunk here (100 bytes). This seems to put undue back-pressure on the producer resulting in the whole thing operating more like a turnstyle than a buffer.

My current solution to this problem is to use a double buffer and copy the data out of the BBQueue read grant and release the grant immediately. The double buffer is needed because there is only really enough time to switch pointers in this I2S interrupt handler or the audio will glitch.

My question is this, would it be possible to hold a read grant for only a small window of the data (say 100 bytes out of 1000) so that the producer can continue to write data to the queue elsewhere. Like a read_exact() function on the consumer. This may not be possible due to how read and write pointers work in bip buffers. However, if you think it is possible and worthwhile adding then I would like to attempt to implement it.

PS, I took a look at the framed stuff but that pesky frame length header messes up my alignment!

Hammer out `Ordering`

Right now, everything is SeqCst, which is likely to be a performance killer. These should be relaxed to the proper level where possible.

Allow taking just Producer or Consumer

Allow for the time-disjoint splitting of producer and consumer, allowing one context to take only the item it needs. Something like:

static BUF: BBBuffer<U8> = BBBuffer( ConstBBBuffer::new() );

fn one() {
    BUF.try_take_producer().unwrap().grant_exact(1).unwrap().commit(1);
}

fn two() {
   BUF.try_take_consumer().unwrap().read().unwrap().release(1);
}

fn main() {
    one();
    two();
}

Consumer overwriting oldest data

Hi,
It would be nice if there were one more method for Producer that tries to "read" the oldest data (to make room for the newest) if the Consumer is not currently consuming them, otherwise return Err too.
For example, MCU via DMA receives telemetry data from the motor controller over UART (interesting is only the latest), but MCU is doing something else, so the bbqueue fulls and (if I am not mistaken) the latest data would be lost.

Improve Documentation RE supported types

Is BBQueue limited to buffering bytes? If so (and I think it is), it's an important limitation to mention in the top-level README. I somehow made it through the video, the blog post, the readme, and halfway through intergrating bbqueue into my code before I discovered the bound:

impl<'a, N> BBBuffer<N>
where
    N: ArrayLength<u8>,

, figured out where ArrayLength came from, studied the generic_array crate a bit, and realized bbqueue probably isn't what I was looking for. I knew bbqueue was implemented for byte buffer DMA use cases, but I'm so used to containers being generic over the contained type that I mistakenly assumed it.

Thanks! Andrew

Consider making certain checks zero cost

Some sacrifices have been made for convenience and working with current stable.

We currently do runtime checks for verifying we have the right grant. See this comment
We currently use an option for bbq!() macros, because MaybeUninit isn't a thing

Example for`SplitGrantR` `bufs` is incorrect

split_read() was recently added, so example for bufs() is probably copy-pasted from other method and therefore incorrect.

bbqueue/core/src/bbbuffer.rs

Lines 1078 to 1109 in 479eb3d

    
           /// Obtain access to both inner buffers for reading 
        
           /// 
        
           /// ``` 
        
           /// # // bbqueue test shim! 
        
           /// # fn bbqtest() { 
        
           /// use bbqueue::{BBBuffer, consts::*}; 
        
           /// 
        
           /// // Create and split a new buffer of 6 elements 
        
           /// let buffer: BBBuffer<U6> = BBBuffer::new(); 
        
           /// let (mut prod, mut cons) = buffer.try_split().unwrap(); 
        
           /// 
        
           /// // Successfully obtain and commit a grant of four bytes 
        
           /// let mut grant = prod.grant_max_remaining(4).unwrap(); 
        
           /// grant.buf().copy_from_slice(&[1, 2, 3, 4]); 
        
           /// grant.commit(4); 
        
           /// 
        
           /// // Obtain a read grant, and copy to a buffer 
        
           /// let mut grant = cons.read().unwrap(); 
        
           /// let mut buf = [0u8; 4]; 
        
           /// buf.copy_from_slice(grant.buf()); 
        
           /// assert_eq!(&buf, &[1, 2, 3, 4]); 
        
           /// # // bbqueue test shim! 
        
           /// # } 
        
           /// # 
        
           /// # fn main() { 
        
           /// # #[cfg(not(feature = "thumbv6"))] 
        
           /// # bbqtest(); 
        
           /// # } 
        
           /// ``` 
        
           pub fn bufs(&self) -> (&[u8], &[u8]) { 
        
               (self.buf1, self.buf2) 
        
           }

Dropping grant objects makes the queue unusable

It's very easy to accidentally drop a grant on the floor instead of consuming it with commit or release. If this happens, the grant's reservation will stay in place and effectively lock up the queue, since no further grant of that type can be made. It would be nice to prevent this. I can see several options to do that:

Add an impl Drop for Grant* which unconditionally panics (commit and release can call mem::forget to "defuse" the grant) - this effectively makes the usage error show up much earlier, but doesn't prevent it
Make grants GrantX<'a> and give them a mutable reference to the consumer/producer or queue they came from; Have them call release(self.len(), self) or commit(self.len(), self) on drop by default
Write an RFC to add real linear types to Rust, implement it, stabilize it and use it here (this is clearly the simplest option)

QEMU for testing no_std targets

Perhaps QEMU could be used to run tests in a no_std environment, especially since #27 involves manually implementing atomic operations.

Fix Slugs

https://crates.io/category_slugs

Why does bbqueue target only a single platform?

Hi, I was reading the code and noticed this comment:

bbqueue/core/src/vusize.rs

Lines 36 to 38 in 640312d

    
           //! For bbqueue, the sender doing the encoding and the receiver doing the decoding 
        
           //! will always reside on the same platform, meaning we CAN make these non-portable 
        
           //! assumptions for the sake of performance/simplicity.

Is still a desire to favor performance (or simplicity) over cross platform operation? My - very naive - guess would be that the benefits of being single-platform are minimal. Not being cross-platform was unexpected for me and, had I not seen this, I would have been unpleasantly surprised by these issues. The pointer size issue does seem to be one concern with the current code. Another might be endianness - which is not much discussed much. Both could be adjusted in future releases (e.g. by using the vint64 crate, also mentioned in the code).

I am asking because I was thinking about using bbqueue to communicate between an embedded device and a PC, but this would violate the design assumptions described above.

If this design decision is still relevant, it may be worth pointing this out in the readme, because otherwise bbqueue seems like a nice way to communicate between platforms and I could imagine I would not be the only person to have that idea.

Another idea would be to abstract out the particular framing code and let the user choose between platform-specific performance and cross-platform interop.

Fix data structure privacy

Some stuff was added for debugging, and should not be public. This should be resolved

Make BBBuffer generic over the index variables

It would be nice to use atomic types smaller than usize for tracking variables. This could be a significant size savings for buffers like BBBuffer<U128> (18 bytes vs 6 bytes overhead on a 32-bit target).

This should be possible by following the tricks used by heapless, and could be made backwards compatible to alias current constructors to use usize as the generic parameter.

SPMC

Any thoughts on adding single producer multiple consumer support? I would like the ability to read from different sections of the received chunk from multiple threads, so that each thread gets access to a unique section.

What's up with TravisCI?

I have travisCI set up, but at some point it stopped working for PRs.

I need to investigate why it isn't working, and why it didn't catch the nightly feature used in #46.

Safe API lets you alias mutable memory

use core::mem;

// version = "=0.2.1"
use bbqueue::BBQueue;

fn main() {
    // `&'static mut` references are safe to generate using `#[entry]` or `singleton!`
    static mut A: [u8; 32] = [0; 32];
    static mut B: [u8; 32] = [0; 32];

    unsafe { safe_but_unsound(&mut A, &mut B) }
}

fn safe_but_unsound(a: &'static mut [u8], mut b: &'static mut [u8]) {
    let mut a = BBQueue::new(a);

    let mut x = a.grant(16).unwrap();

    // this lets us effectively construct a fake `GrantW`
    mem::swap(&mut x.buf, &mut b);

    // we commit the fake grant; `b` still lets us access the granted buffer
    a.commit(16, x);

    // this aliases memory
    let mut alias = a.grant(16).unwrap();

    // this prints the same pointer and length
    println!("{:?} - {}", b.as_mut_ptr(), b.len());
    println!("{:?} - {}", alias.buf.as_mut_ptr(), alias.buf.len());
}

I think this can be patched by making the GrantW.buf field private.

split_read() does not return new grant if release is not explicitly called

// queue has U50 elements
fn idle() {

    if let Ok(grant) = ctx.resources.uart_in_c.split_read() {
        // underlying circular buffer might have wrapped but we like to get continous buffer
        // so lets get both slices (second one is empty if no wrapping has occured)
        let (first_half, second_half) = grant.bufs();
        defmt::info!("f:{:usize} s:{:usize}", first_half.len(), second_half.len());

        // lets say we only consume data if there is '\r' somewhere
        if let Some(cr_pos) = first_half.iter().position(|n| *n == b'\r') {
            let len = grant.combined_len();
            grant.release(len);
        }
    }
}

#[task(binds = USART3_USART4_LPUART1,resources = [uart, uart_in_p, uart_timeout])]
fn usart_3_and_4_isr(mut ctx: usart_3_and_4_isr::Context) {

    if isr.rxne().bit_is_set() {
        let byte = uart4_ptr.rdr.read().bits() as u8;
        if let Ok(mut grant) = ctx.resources.uart_in_p.grant_exact(1) {
            defmt::warn!("r: {:u8}", byte);
            grant[0] = byte;
            grant.commit(1);
        }
        *ctx.resources.uart_timeout = Some(Instant::now() + 2.millis());
    }
}

Notice that consumer releases data only if it finds \r and does nothing if not. It is assumed that grant goes out of scope and everything is OK. However next calls to split_read() always fail and never give out new grant.

It can be fixed if grant.release(0) is explicitly called:

if let Some(cr_pos) = first_half.iter().position(|n| *n == b'\r') {
    let len = grant.combined_len();
    grant.release(len);
} else {
    grant.release(0);
}

I remember that normal read did not have such requirement, so it must be a bug in fairly new split_read API.

panic upon committing a zero-sized grant

Sometimes zero-sized grant points beyond the buffer size (grant_ptr=buffer_ptr+buffer_len), with such grant commit panics on assert!(self.is_our_grant(&grant.buf));.

Tracking issue for new grant methods

In the 0.4.x release train, I would like to add some new ways to allow for more granular control of grants. The following are currently planned:

Regular grants

grant_remaining()
- User will receive a grant 0 < sz <= total_buffer_sz (or receive an error)
- This will only cause a wrap to the beginning of the ring if exactly
  zero bytes are available at the end of the ring.
- Maximum possible waste due to skipping: 0 bytes
grant_largest()
- User will receive a grant 0 < sz <= total_buffer_sz (or receive an error)
- This function will find the largest contiguous region available
  (at the end or beginning of the ring).
- If the region at the beginning was chosen, some bytes at the end of the ring
  will be skipped
- Maximum possible waste due to skipping: (total_buffer_sz / 2) - 1 bytes
grant_largest_max(N)
- User will receive a grant 0 < sz <= N (or receive an error)
- This function will attempt to find a contiguous region up to sz bytes large.
  If no such region exists, the largest region available (at the end or
  beginning of the ring) will be granted to the user.
- If the region at the beginning was chosen, some bytes at the end of the ring
  will be skipped
- Maximum possible waste due to skipping: (total_buffer_sz / 2) - 1 bytes

Split Grants

The following might introduce the concept of "split grants", which provide two
separate contiguous buffers in order to eliminate waste due to splitting, but require
the user to make writes to each buffer.

split_grant_remaining(N)
- User will receive a grant containing two segments with a total size of
  0 < (sz_A + sz_B) <= total_buffer_sz (or receive an error)
split_grant_max_remaining(N)
- User will receive a grant containing two segments with a total size of
  0 < (sz_A + sz_B) <= N (or receive an error)
- If the grant requested fits without wraparound, then the sizes of the grants
  will be: sz_A == N, sz_B == 0.
split_grant_exact(N)
- User will receive a grant containing two segments with a total size of
  (sz_A + sz_B) == N (or receive an error)
- If the grant requested fits without wraparound, then the sizes of the grants
  will be: sz_A == N, sz_B == 0.

What happens when commit `used` is smaller than granted buffer?

Hi @jamesmunns,

I have a question about the behavior of commit when used is smaller than the granted buffer.
My guess is that the chunk that is not "committed" is available for a future write grand and not available for read.
Is this correct?

bbqueue/core/src/bbbuffer.rs

Line 888 in 479eb3d

let used = min(len, used);

Test on Embedded Platforms

I mean, that's really the point of this whole thing, isn't it?

Better examples/docs

We should have more complete examples for common use cases, including (at least):

How to use with RTFM
How to use in bare metal

CC #54 / @justacec

Performance, benchmarking, speed...

This project seems to get more or less mature. The whole point of existence of BipBuffer is performance. I wonder why there are no benchmarking results here.

Did anybody do any measurements against some highly performant ring buffers like SPSC http://daugaard.org/blog/writing-a-fast-and-versatile-spsc-ring-buffer--performance-results/ (which perform close to the speed of memcpy)?

If so, could you please point me to the measurement results?

If not, would you consider adding some basic, very approximate benchmarks?

Support thumbv6 targets

Cortex-M0 based chips (eg. the nRF51 series) are thumbv6 only, which is missing some operations on atomics. Trying to build bbqueue for thumbv6m-none-eabi currently results in these errors:

error[E0599]: no method named `fetch_sub` found for type `core::sync::atomic::AtomicUsize` in the current scope
   --> /home/jonas/.cargo/registry/src/github.com-1ecc6299db9ec823/bbqueue-0.3.2/src/lib.rs:337:22
    |
337 |         self.reserve.fetch_sub(len - used, Relaxed);
    |                      ^^^^^^^^^

error[E0599]: no method named `fetch_add` found for type `core::sync::atomic::AtomicUsize` in the current scope
   --> /home/jonas/.cargo/registry/src/github.com-1ecc6299db9ec823/bbqueue-0.3.2/src/lib.rs:421:27
    |
421 |         let _ = self.read.fetch_add(used, Release);
    |                           ^^^^^^^^^

error[E0599]: no method named `swap` found for type `core::sync::atomic::AtomicBool` in the current scope
   --> /home/jonas/.cargo/registry/src/github.com-1ecc6299db9ec823/bbqueue-0.3.2/src/lib.rs:442:37
    |
442 |         assert!(!self.already_split.swap(true, Relaxed));
    |                                     ^^^^

The log crate also had this issue: rust-lang/log#285
It was worked around with my PR in rust-lang/log#325

The same will probably not really work for bbqueue, unfortunately, since that needs to be at least interrupt-safe.

Add a read() method that also returns the bytes that wrapped around

At the moment it is not really possible to use this for zero copy processing of incoming data, because you don't get to see the bytes that wrapped around until you released everything at the end of the buffer. But releasing it would obviously enable the Producer to overwrite the contents, so that is not an option.

To work around this, there needs to be an alternative to read() that returns both parts if present. Maybe returning a tuple with the second item as an Option would be possible.

I know it's not trivial, because you can't just return two GrantR, because then you could release the second before the first. So another option might be a special kind of GrantR, that handles both parts releases them correctly. This could possibly use two GrantR internally...

I am just thinking out loud here, but I think it would be possible.
Together with GrantR.buf_mut() you could do almost any kind of data processing in place!

Clean up BBQTest features

Now that we are running on GitHub Actions, I can remove most of the features in bbqtest.

Test with ThreadSanitizer

Since this has an elaborate use of atomics I would recommend testing the main API using TSAN -- that will catch problems like using the wrong Ordering. You can use the heapless CI setup as a reference:

`grant_max` does not reserve the largest contiguous buffer available

This might just be a documentation issue, but I was assuming that grant_max should behave identically to grant wrt. the successful case. An example:

use bbqueue::*;

fn main() {
    let bbq = bbq!(1000).unwrap();

    let size = 999;
    let grant = bbq.grant(size).unwrap();
    bbq.commit(size, grant);
    let grant = bbq.read().unwrap();
    bbq.release(size, grant);

    let grant = bbq.grant_max(500).unwrap();
    println!("{}", grant.len());
    bbq.commit(0, grant);
}

This consumes and then frees all but 1 Byte of the queue. Then it tries to obtain a 500-byte grant using grant_max. I would expect this to succeed and return the requested 500-byte grant, since there's a large 999-byte area that's still free. Instead it returns a grant of length 1.

Using grant instead will successfully grant 500 bytes.

BBQueue object is not safe to move after splitting

If the BBQueue moves after you have split, the Producer and Consumer will point to bad data. Producer and Consumer should probably take references to BBQueue to prevent this, or BBQueue should be Pin or something.

This is not an issue if you use the bbq!() macro, or Box methods.

Usage with circular DMA mode

After setting up circular DMA on STM32F4 for receiving lots of data from USART, I thought, why not use a bbqueue and make the code a little bit safer.

I've come up with the following idea - create a buffer of let's say 32 bytes, obtain a write grant to the first half (16 bytes) and start the DMA from wgr.as_ptr() but with a transfer size of 32 bytes. When half transfer IRQ occurs, commit the first half, obtain a write grant for the second half and notify consumer after that. On transfer complete IRQ commit the second half, obtain the grant for the first half again, notify the consumer.

In theory this should work, if consumer will be fast enough to eat half of the buffer and release it. If it is not fast enough, write grant will fail, DMA could be stopped right away (I think this can even be done fast enough, so that DMA will not overwrite non processed data). Then in USART Idle interrupt DMA will be restarted, if a write grant to the beginning of the buffer will succeed.

So I went ahead and put together this structure (for storing it in RTIC resources for the DMA task):

pub struct DmaRxContext<N: generic_array::ArrayLength<u8>>
{
    producer: Producer<'static, N>,
    first_half_wgr: Option<GrantW<'static, N>>,
    second_half_wgr: Option<GrantW<'static, N>>,
    // ... pointers to dma, usart, etc
}

I've omitted non-relevant parts of the code for clarity.

On DMA interrupt, this code get's executed.

impl<N: generic_array::ArrayLength<u8>> DmaRxContext<N> {
    pub fn handle_dma_rx_irq(&mut self) {
        let (half_complete, complete) = self.dma_status();
        if half_complete {
            if self.first_half_wgr.is_some() {
                let wgr = self.first_half_wgr.take().unwrap();
                wgr.commit(N::USIZE / 2);
                self.first_half_wgr = None;
            }
            match self.producer.grant_exact(N::USIZE / 2) {
                Ok(wgr) => {
                    self.second_half_wgr = Some(wgr);
                    rprintln!(=>5, "DMA:F->S\n");
                },
                Err(_) => {
                    rprintln!(=>5, "DMA:ErrorA\n");
                }
            }
        } else if complete {
            if self.second_half_wgr.is_some() {
                let wgr = self.second_half_wgr.take().unwrap();
                wgr.commit(N::USIZE / 2);
                self.second_half_wgr = None;
            }
            match self.producer.grant_exact(N::USIZE / 2) {
                Ok(wgr) => {
                    self.first_half_wgr = Some(wgr);
                    rprintln!(=>5, "DMA:S->F\n");
                },
                Err(_) => {
                    rprintln!(=>5, "DMA:ErrorB\n");
                }
            }
        }
    }
}

Now the tricky part. I'm sending bytes one by one to better see what's going on. The first 16 bytes arrives, first half write grant get's committed, write grant for the second halt is successfully obtained and stored. Consumer eats 16 bytes and releases the read grant. Another 16 bytes arrive, second half write grant is committed, BUT grant_exact() gives an error this time. Even though the first 16 bytes is clearly released from the consumer.

After wasting a lot of time trying to understand wha't going on, I've noticed that grant_max_remaining(16) actually can succeed, and gives out 15 bytes instead. But why 15? I'm clearly missing something here.. Separately from all this, bbqueue works just fine, so most likely this is my mistake somewhere, or some pretty intricate bug.

Another possible solution is to use DMA in regular mode, but then one will have to obtain new write grants and restart a DMA pretty fast, or some data might get lost. Approx 8us for 1Mbps, assuming DMA priority is the highest one and not a lot of streams are in use this is quite a bit of time. Although this seem's like a strange solution, when hardware is perfectly fine on it's own.

bbqueue use on shared memory

I'm looking into using this on shared memory for IPC. Do you have any input on this, whether this might work or not?

I couldn't really figure out whether the actual buffer is part of the struct (ConstBBBuffer), or whether it uses a pointer. If it is part of the struct, I think copying/transmuting the buffer onto shared memory would be alright, wouldn't it. Sorry for the possibly vague message, I'm somewhat uneducated on this topic.

Also, if this can easily be done, providing (unsafe) functions to initialize and attach to a bip buffer might be useful. I believe using a bip buffer on shared memory is quite a common use case.

Reduce size of Grants for Framed operation

Right now, the hdr_len field of FrameGrantW and FrameGrantR is of type usize, though the only possible values are 0..=9, as these are the only possible lengths of the header.

It would be good to change the type from usize to u8, and handle this appropriately in all handling code. This should be as easy as changing the types, and fixing any compilation errors. We should add debug asserts as well to ensure there have not been any logic errors.

BBQueue backing storage is put in `.data` instead of `.bss`

See jonas-schievink/rubble#100

This is a regression from 0.3.0 to 0.4.0.

Should Reader move `last`?

This comment from initial implementation states that Reader should be in charge of moving last back to max when necessary. See this comment

However in my current implementation, I move the last pointer when the Writer commits a grant past the last position. I'm not sure when this changed. This still seems sound, as the following scenarios are all good:

read < write < last
- OK: Reader will be blocked by writer
write < read < last
- OK: Reader will be blocked by last, then will jump back to 0, where it will be back in case 1

I should probably make sure that the ordering of these transactions are still good though, and either update the code, or update the comments

CC @utaal, I need to see what his spsc buffer does in this case

	/// Obtain access to both inner buffers for reading
	///
	/// ```
	/// # // bbqueue test shim!
	/// # fn bbqtest() {
	/// use bbqueue::{BBBuffer, consts::*};
	///
	/// // Create and split a new buffer of 6 elements
	/// let buffer: BBBuffer<U6> = BBBuffer::new();
	/// let (mut prod, mut cons) = buffer.try_split().unwrap();
	///
	/// // Successfully obtain and commit a grant of four bytes
	/// let mut grant = prod.grant_max_remaining(4).unwrap();
	/// grant.buf().copy_from_slice(&[1, 2, 3, 4]);
	/// grant.commit(4);
	///
	/// // Obtain a read grant, and copy to a buffer
	/// let mut grant = cons.read().unwrap();
	/// let mut buf = [0u8; 4];
	/// buf.copy_from_slice(grant.buf());
	/// assert_eq!(&buf, &[1, 2, 3, 4]);
	/// # // bbqueue test shim!
	/// # }
	/// #
	/// # fn main() {
	/// # #[cfg(not(feature = "thumbv6"))]
	/// # bbqtest();
	/// # }
	/// ```
	pub fn bufs(&self) -> (&[u8], &[u8]) {
	(self.buf1, self.buf2)
	}

	//! For bbqueue, the sender doing the encoding and the receiver doing the decoding
	//! will always reside on the same platform, meaning we CAN make these non-portable
	//! assumptions for the sake of performance/simplicity.