Giter Site home page Giter Site logo

Comments (13)

newAM avatar newAM commented on July 27, 2024 2

You're assuming people would run SPI at high speeds. Something that doesn't necessarily seem to be true as I've learned by this issue. 😅

I do it for ease of debug with my low end equipment. 😀

from stm32f0xx-hal.

newAM avatar newAM commented on July 27, 2024

stm32f1xx-hal implementation: https://github.com/stm32-rs/stm32f1xx-hal/blob/master/src/spi.rs#L492
which calls into spi_write here: https://github.com/stm32-rs/stm32f1xx-hal/blob/master/src/spi.rs#L268

The difference does appear to be that the stm32f1xx implementation polls for bsy.


Back to this crate, what is the purpose of this line here?
https://github.com/stm32-rs/stm32f0xx-hal/blob/master/src/spi.rs#L485

As far as I know reads to sr do not have any side effects, and the ok() will just drop any errors. I would like to submit a fix, but I feel I am missing something here.

from stm32f0xx-hal.

therealprof avatar therealprof commented on July 27, 2024

The difference does appear to be that the stm32f1xx implementation polls for bsy.

Comparing two different generations of a peripheral (or software implementations) is typically not too useful. I guess the problem here is that you're assuming that the transfer is completed by the time the send call returns but it actually is buffered. Maybe we should add a busy check before changing the mode to bidi.

Reading sr does have side effects:

This flag is set by hardware and reset when SPIx_SR is read by software.

from stm32f0xx-hal.

newAM avatar newAM commented on July 27, 2024

I guess the problem here is that you're assuming that the transfer is completed by the time the send call returns but it actually is buffered.

Is this a bad assumption to make? I asked in the Rust Embedded element chat and the consensus seemed to agree that the implementation of the traits should block for completion.

from stm32f0xx-hal.

therealprof avatar therealprof commented on July 27, 2024

Well, depends on what the goal is. If you want best possible write performance without complicated implementations (i.e. DMA) it makes total sense to use the built-in buffer. I'm not quite sure what happened in your first example since the trace doesn't quite seem to line up with the code, there could be other issues with a software NSS implementation, too.

from stm32f0xx-hal.

newAM avatar newAM commented on July 27, 2024

Well, depends on what the goal is. If you want best possible write performance without complicated implementations (i.e. DMA) it makes total sense to use the built-in buffer.

I took a look through the awesome embedded list, I could not find a device driver where they made the assumption that the embedded-hal traits would not poll for completion.

For example, the ENC28J60 crate will not work with the stm32f0xx-hal because it does not poll for completion, but it will work with the stm32f1xx-hal.

    fn read_buffer_memory(&mut self, addr: Option<u16>, buf: &mut [u8]) -> Result<(), E> {
        if let Some(addr) = addr {
            self.write_control_register(bank0::Register::ERDPTL, addr.low())?;
            self.write_control_register(bank0::Register::ERDPTH, addr.high())?;
        }

        self.ncs.set_low();
        self.spi.write(&[Instruction::RBM.opcode()])?;
        self.spi.transfer(buf)?;
        self.ncs.set_high();

        Ok(())
    }

I'm not quite sure what happened in your first example since the trace doesn't quite seem to line up with the code

I am not sure I understand what you mean here. The trace not lining up is the issue, specifically the fact that CS goes high before the transfer is complete is unexpected. The rest of the trace looks correct to me.

there could be other issues with a software NSS implementation, too.

Any ideas? I am happy to investigate further!
I do not have a lot of boards on hand, but I can confirm that my first example does work with a STM32 F103C8T6 blue pill board using the stm32f1xx-hal.

from stm32f0xx-hal.

therealprof avatar therealprof commented on July 27, 2024

I took a look through the awesome embedded list, I could not find a device driver where they made the assumption that the embedded-hal traits would not poll for completion.

?

How would you see such a thing? If you have a write-only driver there's no problem at all (and those are quite common, e.g. for displays).

For example, the ENC28J60 crate will not work with the stm32f0xx-hal because it does not poll for completion, but it will work with the stm32f1xx-hal.

Is that an assumption or did you verify that? I appreciate that the pattern you're using might be a problem in general but it would be nice to have some hard evidence.

And again, I think the problem might actually be the turn-around from write to transfer because we're actually changing settings.

I am not sure I understand what you mean here. The trace not lining up is the issue, specifically the fact that CS goes high before the transfer is complete is unexpected. The rest of the trace looks correct to me.

I understand that but I am puzzled about how little data seems to be transferred before the loss of sync... what's the SPI clock?

I do not have a lot of boards on hand, but I can confirm that my first example does work with a STM32 F103C8T6 blue pill board using the stm32f1xx-hal.

Yeah, but again that that's a completely different generation of SPI peripheral which means you can't just assume that using the F1 mechanisms would just work for F0, too.

It would be great if you could insert a busy check loop in set_send_only and set_bidi_mode and see whether that fixes your problem.

from stm32f0xx-hal.

newAM avatar newAM commented on July 27, 2024

How would you see such a thing?

I made the assumption (perhaps a bad one, sorry!) that any crate using the blocking embedded-hal SPI traits with a transfer or write wrapped by software driven chip select set_high and set_low calls will not be compatible.

If you have a write-only driver there's no problem at all (and those are quite common, e.g. for displays).

I tried it out, and it appears that even for write only this is still a problem.

    cs.set_low().unwrap();
    spi.write(&[0x12, 0x34, 0x56, 0x78]);
    cs.set_high().unwrap();

That code (or similar) also produces a clipped chip-select. Full code here: https://github.com/newAM/stm32f0xx-hal-issue-130/blob/example1/src/main.rs#L70

image

Is that an assumption or did you verify that? I appreciate that the pattern you're using might be a problem in general but it would be nice to have some hard evidence.

It was a problem I had a while ago, but I did not dive into it at the time.

Here is a reproduction: https://github.com/newAM/stm32f0xx-hal-issue-130
It is a bit harder to see what is going on because there is a lot more activity on the bus; but it is still pretty clear that the chip select is not acting as expected.

I understand that but I am puzzled about how little data seems to be transferred before the loss of sync... what's the SPI clock?

10kHz, though this can be reproduced at higher frequencies.

Yeah, but again that that's a completely different generation of SPI peripheral which means you can't just assume that using the F1 mechanisms would just work for F0, too.

The point I am trying to make is that there is a gap in behavior between the F0 and F1 that should not exist. Polling the bsy bit which exists on both versions (and seems to have the same behavior across generations) such that both HALs behave similarly should be possible.

It would be great if you could insert a busy check loop in set_send_only and set_bidi_mode and see whether that fixes your problem.

Happy to help debug this! 👍

I tried it out, and the problem still exists :(

This is the same as above, just a bare write, since that is a better minimal reproduction.

stm32f0xx-hal: https://github.com/newAM/stm32f0xx-hal/tree/issue-130
source code: https://github.com/newAM/stm32f0xx-hal-issue-130/tree/example2

image

Edit: I should also mention that all of the above was done with a Nucleo F070RB (original post was with a STM32F070CBTx).

from stm32f0xx-hal.

therealprof avatar therealprof commented on July 27, 2024

I tried it out, and it appears that even for write only this is still a problem.

Well, yes. The problem is the data is still in the hardware buffer when the call returns which isn't a problem if one uses hardware CS or transfer or runs the bus as very high clocks.

10kHz, though this can be reproduced at higher frequencies.

10kHz explains a lot.

Polling the bsy bit which exists on both versions (and seems to have the same behavior across generations) such that both HALs behave similarly should be possible.

Okay, let's poll the bsy bit at the end of write then. It's going to be a performance hit for lots of consecutive writes at high speeds but possibly we can introduce a separate implementation for such common use cases to deal with that.

from stm32f0xx-hal.

newAM avatar newAM commented on July 27, 2024

It's going to be a performance hit for lots of consecutive writes at high speeds but possibly we can introduce a separate implementation for such common use cases to deal with that.

Consecutive writes with software chip select (at any frequency) will also produce a clipped CS.

For example, this code, with SPI at 500kHz.

    w5500_cs.set_low().unwrap();
    spi.write(&[0x00, 0x39, 0x00]).unwrap();
    spi.write(&[0x00]).unwrap();
    w5500_cs.set_high().unwrap();

Will produce a waveform like this:
image

Measurements

I did some measurements at a 500kHz bus frequency to see what the performance hit is at a more reasonable frequency.

This HAL does not expose the hardware NSS management (as far as I can tell), so this should be a decent summary of the existing possible use cases with software NSS management.

The performance hit to enable back to back writes/transfers seems acceptable to me, should I open a pull-request with the stm32f0xx-hal modification described below?

Transfer + Transfer

Using only transfers in two calls, no modifications to stm32f0xx-hal:

    let mut header: [u8; 3] = [0x00, 0x39, 0x00];
    let mut buf: [u8; 1] = [0];
    w5500_cs.set_low().unwrap();
    spi.transfer(&mut header).unwrap();
    spi.transfer(&mut buf).unwrap();
    w5500_cs.set_high().unwrap();

image

Write + Poll BSY + Transfer

Modifying stm32f0xx-hal to introduce polling, and retaining the original write followed by a transfer.

Modified write function:

    fn write(&mut self, words: &[u8]) -> Result<(), Self::Error> {
        let mut bufcap: u8 = 0;

        // We only want to send, so we don't need to worry about the receive buffer overflowing
        self.set_send_only();

        // Make sure we don't continue with an error condition
        nb::block!(self.check_send())?;

        // We have a 32 bit buffer to work with, so let's fill it before checking the status
        for word in words {
            // Loop as long as our send buffer is full
            while bufcap == 0 {
                bufcap = self.send_buffer_size();
            }

            self.send_u8(*word);
            bufcap -= 1;
        }

        loop {
            let sr = self.spi.sr.read();
            if !sr.bsy().bit_is_set() {
                break;
            }
        }

        // Do one last status register check before continuing
        self.check_send().ok();
        Ok(())
    }

Code:

   let mut buf: [u8; 1] = [0];
   w5500_cs.set_low().unwrap();
   spi.write(&[0x00, 0x39, 0x00]).unwrap();
   spi.transfer(&mut buf).unwrap();
   w5500_cs.set_high().unwrap();

image

Write + Poll BSY + Write

Same modifications to stm32f0xx-hal as above, but with a write after the write (just to measure timing).

   w5500_cs.set_low().unwrap();
   spi.write(&[0x00, 0x39, 0x00]).unwrap();
   spi.write(&[0x00]).unwrap();
   w5500_cs.set_high().unwrap();

image

from stm32f0xx-hal.

therealprof avatar therealprof commented on July 27, 2024

500kHz is not reasonably fast, SPI was meant for multi-MHz operation. 😅

Please go ahead to PR a change. Probably the easiest and most sensible and correct change would be to replace

stm32f0xx-hal/src/spi.rs

Lines 485 to 486 in fa3f2c2

self.check_send().ok();
Ok(())

by

nb::block!(self.check_send())

from stm32f0xx-hal.

WasabiFan avatar WasabiFan commented on July 27, 2024

Just throwing this out there...

As far as I can tell, the current implementation has a race condition if you invoke write directly followed by a transfer: the former sets BIDIMODE/BIDIOE for bi-directional operation in transmit-only mode, while the latter switches back to full-duplex unidirectional mode. If you call transfer immediately after write, the FIFO is likely not fully cleared by the time it enters transfer, and it re-enables receiving before everything has shifted out following the write. In other words, it enables receipt before the write has finished, causing more data to be loaded into the RX buffer than expected.

This should be resolved by the discussed fixes in #131. I've validated that it avoids the overruns I encountered in my case.

In theory, fixing this requires the new "busy" check that was proposed; that being said, the FIFO only takes a few extra cycles to clear, so even just doing the reads to check if there is space in the TX buffer is enough. Not that I'm fond of that 🙂

from stm32f0xx-hal.

therealprof avatar therealprof commented on July 27, 2024

As far as I can tell, the current implementation has a race condition if you invoke write directly followed by a transfer: the former sets BIDIMODE/BIDIOE for bi-directional operation in transmit-only mode, while the latter switches back to full-duplex unidirectional mode.

Yes, that it is correct and mentioned before. This needs to be fixed.

In theory, fixing this requires the new "busy" check that was proposed; that being said, the FIFO only takes a few extra cycles to clear, so even just doing the reads to check if there is space in the TX buffer is enough. Not that I'm fond of that 🙂

You're assuming people would run SPI at high speeds. Something that doesn't necessarily seem to be true as I've learned by this issue. 😅

from stm32f0xx-hal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.