Comments (13)
You're assuming people would run SPI at high speeds. Something that doesn't necessarily seem to be true as I've learned by this issue. 😅
I do it for ease of debug with my low end equipment. 😀
from stm32f0xx-hal.
stm32f1xx-hal implementation: https://github.com/stm32-rs/stm32f1xx-hal/blob/master/src/spi.rs#L492
which calls into spi_write
here: https://github.com/stm32-rs/stm32f1xx-hal/blob/master/src/spi.rs#L268
The difference does appear to be that the stm32f1xx
implementation polls for bsy
.
Back to this crate, what is the purpose of this line here?
https://github.com/stm32-rs/stm32f0xx-hal/blob/master/src/spi.rs#L485
As far as I know reads to sr
do not have any side effects, and the ok()
will just drop any errors. I would like to submit a fix, but I feel I am missing something here.
from stm32f0xx-hal.
The difference does appear to be that the stm32f1xx implementation polls for bsy.
Comparing two different generations of a peripheral (or software implementations) is typically not too useful. I guess the problem here is that you're assuming that the transfer is completed by the time the send call returns but it actually is buffered. Maybe we should add a busy check before changing the mode to bidi.
Reading sr
does have side effects:
This flag is set by hardware and reset when SPIx_SR is read by software.
from stm32f0xx-hal.
I guess the problem here is that you're assuming that the transfer is completed by the time the send call returns but it actually is buffered.
Is this a bad assumption to make? I asked in the Rust Embedded element chat and the consensus seemed to agree that the implementation of the traits should block for completion.
from stm32f0xx-hal.
Well, depends on what the goal is. If you want best possible write performance without complicated implementations (i.e. DMA) it makes total sense to use the built-in buffer. I'm not quite sure what happened in your first example since the trace doesn't quite seem to line up with the code, there could be other issues with a software NSS implementation, too.
from stm32f0xx-hal.
Well, depends on what the goal is. If you want best possible write performance without complicated implementations (i.e. DMA) it makes total sense to use the built-in buffer.
I took a look through the awesome embedded list, I could not find a device driver where they made the assumption that the embedded-hal
traits would not poll for completion.
For example, the ENC28J60 crate will not work with the stm32f0xx-hal
because it does not poll for completion, but it will work with the stm32f1xx-hal
.
fn read_buffer_memory(&mut self, addr: Option<u16>, buf: &mut [u8]) -> Result<(), E> {
if let Some(addr) = addr {
self.write_control_register(bank0::Register::ERDPTL, addr.low())?;
self.write_control_register(bank0::Register::ERDPTH, addr.high())?;
}
self.ncs.set_low();
self.spi.write(&[Instruction::RBM.opcode()])?;
self.spi.transfer(buf)?;
self.ncs.set_high();
Ok(())
}
I'm not quite sure what happened in your first example since the trace doesn't quite seem to line up with the code
I am not sure I understand what you mean here. The trace not lining up is the issue, specifically the fact that CS goes high before the transfer is complete is unexpected. The rest of the trace looks correct to me.
there could be other issues with a software NSS implementation, too.
Any ideas? I am happy to investigate further!
I do not have a lot of boards on hand, but I can confirm that my first example does work with a STM32 F103C8T6 blue pill board using the stm32f1xx-hal
.
from stm32f0xx-hal.
I took a look through the awesome embedded list, I could not find a device driver where they made the assumption that the
embedded-hal
traits would not poll for completion.
?
How would you see such a thing? If you have a write-only driver there's no problem at all (and those are quite common, e.g. for displays).
For example, the ENC28J60 crate will not work with the
stm32f0xx-hal
because it does not poll for completion, but it will work with thestm32f1xx-hal
.
Is that an assumption or did you verify that? I appreciate that the pattern you're using might be a problem in general but it would be nice to have some hard evidence.
And again, I think the problem might actually be the turn-around from write
to transfer
because we're actually changing settings.
I am not sure I understand what you mean here. The trace not lining up is the issue, specifically the fact that CS goes high before the transfer is complete is unexpected. The rest of the trace looks correct to me.
I understand that but I am puzzled about how little data seems to be transferred before the loss of sync... what's the SPI clock?
I do not have a lot of boards on hand, but I can confirm that my first example does work with a STM32 F103C8T6 blue pill board using the
stm32f1xx-hal
.
Yeah, but again that that's a completely different generation of SPI peripheral which means you can't just assume that using the F1 mechanisms would just work for F0, too.
It would be great if you could insert a busy check loop in set_send_only
and set_bidi_mode
and see whether that fixes your problem.
from stm32f0xx-hal.
How would you see such a thing?
I made the assumption (perhaps a bad one, sorry!) that any crate using the blocking embedded-hal
SPI traits with a transfer
or write
wrapped by software driven chip select set_high
and set_low
calls will not be compatible.
If you have a write-only driver there's no problem at all (and those are quite common, e.g. for displays).
I tried it out, and it appears that even for write only this is still a problem.
cs.set_low().unwrap();
spi.write(&[0x12, 0x34, 0x56, 0x78]);
cs.set_high().unwrap();
That code (or similar) also produces a clipped chip-select. Full code here: https://github.com/newAM/stm32f0xx-hal-issue-130/blob/example1/src/main.rs#L70
Is that an assumption or did you verify that? I appreciate that the pattern you're using might be a problem in general but it would be nice to have some hard evidence.
It was a problem I had a while ago, but I did not dive into it at the time.
Here is a reproduction: https://github.com/newAM/stm32f0xx-hal-issue-130
It is a bit harder to see what is going on because there is a lot more activity on the bus; but it is still pretty clear that the chip select is not acting as expected.
I understand that but I am puzzled about how little data seems to be transferred before the loss of sync... what's the SPI clock?
10kHz, though this can be reproduced at higher frequencies.
Yeah, but again that that's a completely different generation of SPI peripheral which means you can't just assume that using the F1 mechanisms would just work for F0, too.
The point I am trying to make is that there is a gap in behavior between the F0 and F1 that should not exist. Polling the bsy
bit which exists on both versions (and seems to have the same behavior across generations) such that both HALs behave similarly should be possible.
It would be great if you could insert a busy check loop in
set_send_only
andset_bidi_mode
and see whether that fixes your problem.
Happy to help debug this! 👍
I tried it out, and the problem still exists :(
This is the same as above, just a bare write, since that is a better minimal reproduction.
stm32f0xx-hal: https://github.com/newAM/stm32f0xx-hal/tree/issue-130
source code: https://github.com/newAM/stm32f0xx-hal-issue-130/tree/example2
Edit: I should also mention that all of the above was done with a Nucleo F070RB (original post was with a STM32F070CBTx
).
from stm32f0xx-hal.
I tried it out, and it appears that even for write only this is still a problem.
Well, yes. The problem is the data is still in the hardware buffer when the call returns which isn't a problem if one uses hardware CS or transfer
or runs the bus as very high clocks.
10kHz, though this can be reproduced at higher frequencies.
10kHz explains a lot.
Polling the bsy bit which exists on both versions (and seems to have the same behavior across generations) such that both HALs behave similarly should be possible.
Okay, let's poll the bsy
bit at the end of write
then. It's going to be a performance hit for lots of consecutive writes at high speeds but possibly we can introduce a separate implementation for such common use cases to deal with that.
from stm32f0xx-hal.
It's going to be a performance hit for lots of consecutive writes at high speeds but possibly we can introduce a separate implementation for such common use cases to deal with that.
Consecutive writes with software chip select (at any frequency) will also produce a clipped CS.
For example, this code, with SPI at 500kHz.
w5500_cs.set_low().unwrap();
spi.write(&[0x00, 0x39, 0x00]).unwrap();
spi.write(&[0x00]).unwrap();
w5500_cs.set_high().unwrap();
Will produce a waveform like this:
Measurements
I did some measurements at a 500kHz bus frequency to see what the performance hit is at a more reasonable frequency.
This HAL does not expose the hardware NSS management (as far as I can tell), so this should be a decent summary of the existing possible use cases with software NSS management.
The performance hit to enable back to back writes/transfers seems acceptable to me, should I open a pull-request with the stm32f0xx-hal
modification described below?
Transfer + Transfer
Using only transfers in two calls, no modifications to stm32f0xx-hal
:
let mut header: [u8; 3] = [0x00, 0x39, 0x00];
let mut buf: [u8; 1] = [0];
w5500_cs.set_low().unwrap();
spi.transfer(&mut header).unwrap();
spi.transfer(&mut buf).unwrap();
w5500_cs.set_high().unwrap();
Write + Poll BSY + Transfer
Modifying stm32f0xx-hal
to introduce polling, and retaining the original write
followed by a transfer
.
Modified write function:
fn write(&mut self, words: &[u8]) -> Result<(), Self::Error> {
let mut bufcap: u8 = 0;
// We only want to send, so we don't need to worry about the receive buffer overflowing
self.set_send_only();
// Make sure we don't continue with an error condition
nb::block!(self.check_send())?;
// We have a 32 bit buffer to work with, so let's fill it before checking the status
for word in words {
// Loop as long as our send buffer is full
while bufcap == 0 {
bufcap = self.send_buffer_size();
}
self.send_u8(*word);
bufcap -= 1;
}
loop {
let sr = self.spi.sr.read();
if !sr.bsy().bit_is_set() {
break;
}
}
// Do one last status register check before continuing
self.check_send().ok();
Ok(())
}
Code:
let mut buf: [u8; 1] = [0];
w5500_cs.set_low().unwrap();
spi.write(&[0x00, 0x39, 0x00]).unwrap();
spi.transfer(&mut buf).unwrap();
w5500_cs.set_high().unwrap();
Write + Poll BSY + Write
Same modifications to stm32f0xx-hal
as above, but with a write after the write (just to measure timing).
w5500_cs.set_low().unwrap();
spi.write(&[0x00, 0x39, 0x00]).unwrap();
spi.write(&[0x00]).unwrap();
w5500_cs.set_high().unwrap();
from stm32f0xx-hal.
500kHz is not reasonably fast, SPI was meant for multi-MHz operation. 😅
Please go ahead to PR a change. Probably the easiest and most sensible and correct change would be to replace
Lines 485 to 486 in fa3f2c2
by
nb::block!(self.check_send())
from stm32f0xx-hal.
Just throwing this out there...
As far as I can tell, the current implementation has a race condition if you invoke write
directly followed by a transfer
: the former sets BIDIMODE
/BIDIOE
for bi-directional operation in transmit-only mode, while the latter switches back to full-duplex unidirectional mode. If you call transfer
immediately after write
, the FIFO is likely not fully cleared by the time it enters transfer
, and it re-enables receiving before everything has shifted out following the write. In other words, it enables receipt before the write has finished, causing more data to be loaded into the RX buffer than expected.
This should be resolved by the discussed fixes in #131. I've validated that it avoids the overruns I encountered in my case.
In theory, fixing this requires the new "busy" check that was proposed; that being said, the FIFO only takes a few extra cycles to clear, so even just doing the reads to check if there is space in the TX buffer is enough. Not that I'm fond of that 🙂
from stm32f0xx-hal.
As far as I can tell, the current implementation has a race condition if you invoke
write
directly followed by atransfer
: the former setsBIDIMODE
/BIDIOE
for bi-directional operation in transmit-only mode, while the latter switches back to full-duplex unidirectional mode.
Yes, that it is correct and mentioned before. This needs to be fixed.
In theory, fixing this requires the new "busy" check that was proposed; that being said, the FIFO only takes a few extra cycles to clear, so even just doing the reads to check if there is space in the TX buffer is enough. Not that I'm fond of that 🙂
You're assuming people would run SPI at high speeds. Something that doesn't necessarily seem to be true as I've learned by this issue. 😅
from stm32f0xx-hal.
Related Issues (20)
- Why does configuring a GPIO require a critical section in this hal? HOT 2
- Have PA9/10/11/12 USB pins automatically remap? HOT 5
- cannot find value `RCC_PLLSRC_PREDIV1_SUPPORT` HOT 5
- First write on I2C bus never NACK's HOT 15
- bxcan/defmt version conflict HOT 3
- SPI on 0.17.1 is broken HOT 3
- More precise units for timers. HOT 1
- USART: Support half-duplex mode HOT 1
- rcc error HOT 1
- Flash Driver HOT 2
- How to use SysTick timer to count ticks from a rotary encoder? HOT 3
- Improve API to allow having an array of Analog pins HOT 1
- Cannot acknowledge timer interrupt flag
- Confused by features on f03x HOT 1
- Disabling GPIO clock HOT 1
- PWM on Special Channels? HOT 3
- Quadrature Encoder Interface
- Using timer1 channel 4 for ADC triggering
- STM32F072C8T6 -- wrong binary size? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stm32f0xx-hal.