Giter Site home page Giter Site logo

majerle / stm32-usart-uart-dma-rx-tx Goto Github PK

View Code? Open in Web Editor NEW
1.2K 63.0 301.0 20.14 MB

STM32 examples for USART using DMA for efficient RX and TX transmission

License: MIT License

C 97.17% Assembly 2.68% Python 0.01% HTML 0.10% CSS 0.03% Shell 0.01% Batchfile 0.01% CMake 0.02%
stm32 usart dma receive data bluepill ringbuff ring buff buffer

stm32-usart-uart-dma-rx-tx's Introduction

STM32 UART DMA RX and TX

This application note contains explanation with examples for 2 distinct topics:

  • Data reception with UART and DMA when application does not know size of bytes to receive in advance
  • Data transmission with UART and DMA to avoid CPU stalling and use CPU for other purposes

Table of Contents

Github supports ToC by default. It is available in the top-left corner of this document.

Abbreviations

  • DMA: Direct Memory Access controller in STM32
  • UART: Universal Asynchronous Receiver Transmitter
  • USART: Universal Synchronous Asynchronous Receiver Transmitter
  • TX: Transmit
  • RX: Receive
  • HT: Half-Transfer Complete DMA event/flag
  • TC: Transfer Complete DMA event/flag
  • RTO: Receiver Timeout UART event/flag
  • IRQ: Interrupt

General about UART

STM32 has peripherals such as USART, UART or LPUART. Difference between them is not relevant for this purpose since concept can be applied to all of them. In few words, USART supports synchronous operation on top of asynchronous (UART) and LPUART supports Low-Power operation in STOP mode. When synchronous mode or low-power mode is not used, USART, UART and LPUART can be consideted identical. For complete set of details, check product's reference manual and datasheet.

For the sake of this application note, we will only use term UART.

UART in STM32 allows configurion using different transmit (TX) and receive (RX) modes:

  • Polling mode (no DMA, no IRQ)
    • P: Application is polling for status bits to check if any character has been transmitted/received and read it fast enough in order to not-miss any byte
    • P: Easy to implement, simply few code lines
    • C: Can easily miss received data in complex application if CPU cannot read registers quickly enough
    • C: Works only for low baudrates, 9600 or lower
  • Interrupt mode (no DMA)
    • P: UART triggers interrupt and CPU jumps to service routine to handle each received byte separately
    • P: Commonly used approach in embedded applications
    • P: Works well with common baudrates, 115200, up to ~921600 bauds
    • C: Interrupt service routine is executed for every received character
    • C: May decrease system performance if interrupts are triggered for every character for high-speed baudrates
  • DMA mode
    • DMA is used to transfer data from USART RX data register to user memory on hardware level. No application interaction is needed at this point except processing received data by application once necessary
    • P: Transfer from USART peripheral to memory is done on hardware level without CPU interaction
    • P: Can work very easily with operating systems
    • P: Optimized for highest baudrates > 1Mbps and low-power applications
    • P: In case of big bursts of data, increasing data buffer size can improve functionality
    • C: Number of bytes to transfer must be known in advance by DMA hardware
    • C: If communication fails, DMA may not notify application about all bytes transferred

This article focuses only on DMA mode for RX operation and explain how to handle unknown data length

Every STM32 has at least one (1) UART IP and at least one (1) DMA controller available in its DNA. This is all we need for successful data transmission. Application uses default features to implement very efficient transmit system using DMA.

While implementation happens to be pretty straight-forward for TX (set pointer to data, define its length and go) operation, this may not be the case for receive. Implementing DMA receive, application should understand number of received bytes to process by DMA before its considered done. However, UART protocol does not offer such information (it could work with higher-level protocol, but that's way another story that we don't touch here. We assume we have to implement very reliable low-level communication protocol).

Idle Line or Receiver Timeout events

STM32s have capability in UART to detect when RX line has not been active for period of time. This is achieved using 2 methods:

  • IDLE LINE event: Triggered when RX line has been in idle state (normally high state) for 1 frame time, after last received byte. Frame time is based on baudrate. Higher baudrate means lower frame time for single byte.
  • RTO (Receiver Timeout) event: Triggered when line has been in idle state for programmable time. It is fully configured by firmware.

Both events can trigger an interrupt which is an essential feature to allow effective receive operation

Not all STM32 have IDLE LINE or RTO features available. When not available, examples concerning these features may not be used.

An example: To transmit 1 byte at 115200 bauds, it takes approximately (for easier estimation) ~100us; for 3 bytes it would be ~300us in total. IDLE line event triggers an interrupt when line has been in idle state for 1 frame time (in this case 100us), after third byte has been received.

IDLE LINE DEMO

This is a real experiment demo using STM32F4 and IDLE LINE event. After IDLE event is triggered, data are echoed back (loopback mode):

  • Application receives 3 bytes, takes approx ~300us at 115200 bauds
  • RX goes to high state (yellow rectangle) and UART RX detects it has been idle for at least 1 frame time (approx 100us)
    • Width of yellow rectangle represents 1 frame time
  • IDLE line interrupt is triggered at green arrow
  • Application echoes data back from interrupt context

General about DMA

DMA in STM32 can be configured in normal or circular mode. For each mode, DMA requires number of elements to transfer before its events (half-transfer complete, transfer complete) are triggered.

  • Normal mode: DMA starts with data transfer, once it transfers all elements, it stops and sets enable bit to 0.
    • Application is using this mode when transmitting data
  • Circular mode: DMA starts with transfer, once it transfers all elements (as written in corresponding length register), it starts from beginning of memory and transfers more
    • Applicaton is using this mode when receiving data

While transfer is active, 2 (among others) interrupts may be triggered:

  • Half-Transfer complete HT: Triggers when DMA transfers half count of elements
  • Transfer-Complete TC: Triggers when DMA transfers all elements

When DMA operates in circular mode, these interrupts are triggered periodically

Number of elements to transfer by DMA hardware must be written to relevant DMA register before start of transfer

Combine UART + DMA for data reception

Now it is time to understand which features to use to receive data with UART and DMA to offload CPU. As for the sake of this example, we use memory buffer array of 20 bytes. DMA will transfer data received from UART to this buffer.

Listed are steps to begin. Initial assumption is that UART has been initialized prior reaching this step, same for basic DMA setup, the rest:

  • Application writes 20 to relevant DMA register for data length
  • Application writes memory & peripheral addresses to relevant DMA registers
  • Application sets DMA direction to peripheral-to-memory mode
  • Application puts DMA to circular mode. This is to assure DMA does not stop transferring data after it reaches end of memory. Instead, it will roll over and continue with transferring possible more data from UART to memory
  • Application enables DMA & UART in reception mode. Receive can not start & DMA will wait UART to receive first character and transmit it to array. This is done for every received byte
  • Application is notified by DMA HT event (or interrupt) after first 10 have been transferred from UART to memory
  • Application is notified by DMA TC event (or interrupt) after 20 bytes are transferred from UART to memory
  • Application is notified by UART IDLE line (or RTO) in case of IDLE line or timeout detected on RX line
  • Application needs to reach on all of these events for most efficient receive

This configuration is important as we do not know length in advance. Application needs to assume it may be endless number of bytes received, therefore DMA must be operational endlessly.

We have used 20 bytes long array for demonstration purposes. In real app this size may need to be increased. It all depends on UART baudrate (higher speed, more data may be received in fixed window) and how fast application can process the received data (either using interrupt notification, RTOS, or polling mode)

Combine UART + DMA for data transmission

Everything gets simplier when application transmits data, length of data is known in advance and memory to transmit is ready. For the sake of this example, we use memory for Helloworld message. In C language it would be:

const char
hello_world_arr[] = "HelloWorld";
  • Application writes number of bytes to transmit to relevant DMA register, that would be strlen(hello_world_arr) or 10
  • Application writes memory & peripheral addresses to relevant DMA registers
  • Application sets DMA direction to memory-to-peripheral mode
  • Application sets DMA to normal mode. This effectively disables DMA once all the bytes are successfully transferred
  • Application enables DMA & UART in transmitter mode. Transmit starts immediately when UART requests first byte via DMA to be shifted to UART TX register
  • Application is notified by TC event (or interrupt) after all bytes have been transmitted from memory to UART via DMA
  • DMA is stopped and application may prepare next transfer immediately

Please note that TC event is triggered before last UART byte has been fully transmitted over UART. That's because TC event is part of DMA and not part of UART. It is triggered when DMA transfers all the bytes from point A to point B. That is, point A for DMA is memory, point B is UART data register. Now it is up to UART to clock out byte to GPIO pin

DMA HT/TC and UART IDLE combination details

This section describes 4 possible cases and one additional which explains why HT/TC events are necessary by application

DMA events

Abbrevations used for the image:

  • R: Read pointer, used by application to read data from memory. Later also used as old_ptr
  • W: Write pointer, used by DMA to write next byte to. Increased every time DMA writes new byte. Later also used as new_ptr
  • HT: Half-Transfer Complete event triggered by DMA
  • TC: Transfer-Complete event - triggered by DMA
  • I: IDLE line event - triggered by USART

DMA configuration:

  • Circular mode
  • 20 bytes data length
    • Consequently HT event gets triggered at 10 bytes being transmitted
    • Consequently TC event gets triggered at 20 bytes being transmitted

Possible cases during real-life execution:

  • Case A: DMA transfers 10 bytes. Application gets notification with HT event and may process received data
  • Case B: DMA transfers next 10 bytes. Application gets notification thanks to TC event. Processing now starts from last known position until the end of memory
    • DMA is in circular mode, thus it will continue right from beginning of the buffer, on top of the picture
  • Case C: DMA transfers 10 bytes, but not aligned with HT nor TC events
    • Application gets notified with HT event when first 6 bytes are transfered. Processing may start from last known read location
    • Application receives IDLE line event after next 4 bytes are successfully transfered to memory
  • Case D: DMA transfers 10 bytes in overflow mode and but not aligned with HT nor TC events
    • Application gets notification by TC event when first 4 bytes are transfered. Processing may start from last known read location
    • Application gets notification by IDLE event after next 6 bytes are transfered. Processing may start from beginning of buffer
  • Case E: Example what may happen when application relies only on IDLE event
    • If application receives 30 bytes in burst, 10 bytes get overwritten by DMA as application did not process it quickly enough
    • Application gets IDLE line event once there is steady RX line for 1 byte timeframe
    • Red part of data represents first 10 received bytes from burst which were overwritten by last 10 bytes in burst
    • Option to avoid such scenario is to poll for DMA changes quicker than burst of 20 bytes take; or by using TC and HT events

Example code to read data from memory and process it, for cases A-D

/**
 * \brief           Check for new data received with DMA
 *
 * User must select context to call this function from:
 * - Only interrupts (DMA HT, DMA TC, UART IDLE) with same preemption priority level
 * - Only thread context (outside interrupts)
 *
 * If called from both context-es, exclusive access protection must be implemented
 * This mode is not advised as it usually means architecture design problems
 *
 * When IDLE interrupt is not present, application must rely only on thread context,
 * by manually calling function as quickly as possible, to make sure
 * data are read from raw buffer and processed.
 *
 * Not doing reads fast enough may cause DMA to overflow unread received bytes,
 * hence application will lost useful data.
 *
 * Solutions to this are:
 * - Improve architecture design to achieve faster reads
 * - Increase raw buffer size and allow DMA to write more data before this function is called
 */
void
usart_rx_check(void) {
    /*
     * Set old position variable as static.
     *
     * Linker should (with default C configuration) set this variable to `0`.
     * It is used to keep latest read start position,
     * transforming this function to not being reentrant or thread-safe
     */
    static size_t old_pos;
    size_t pos;

    /* Calculate current position in buffer and check for new data available */
    pos = ARRAY_LEN(usart_rx_dma_buffer) - LL_DMA_GetDataLength(DMA1, LL_DMA_CHANNEL_5);
    if (pos != old_pos) {                       /* Check change in received data */
        if (pos > old_pos) {                    /* Current position is over previous one */
            /*
             * Processing is done in "linear" mode.
             *
             * Application processing is fast with single data block,
             * length is simply calculated by subtracting pointers
             *
             * [   0   ]
             * [   1   ] <- old_pos |------------------------------------|
             * [   2   ]            |                                    |
             * [   3   ]            | Single block (len = pos - old_pos) |
             * [   4   ]            |                                    |
             * [   5   ]            |------------------------------------|
             * [   6   ] <- pos
             * [   7   ]
             * [ N - 1 ]
             */
            usart_process_data(&usart_rx_dma_buffer[old_pos], pos - old_pos);
        } else {
            /*
             * Processing is done in "overflow" mode..
             *
             * Application must process data twice,
             * since there are 2 linear memory blocks to handle
             *
             * [   0   ]            |---------------------------------|
             * [   1   ]            | Second block (len = pos)        |
             * [   2   ]            |---------------------------------|
             * [   3   ] <- pos
             * [   4   ] <- old_pos |---------------------------------|
             * [   5   ]            |                                 |
             * [   6   ]            | First block (len = N - old_pos) |
             * [   7   ]            |                                 |
             * [ N - 1 ]            |---------------------------------|
             */
            usart_process_data(&usart_rx_dma_buffer[old_pos], ARRAY_LEN(usart_rx_dma_buffer) - old_pos);
            if (pos > 0) {
                usart_process_data(&usart_rx_dma_buffer[0], pos);
            }
        }
        old_pos = pos;                          /* Save current position as old for next transfers */
    }
}

Interrupt priorities are important

Thanks to Cortex-M NVIC's (Nested Vectored Interrupt Controller) flexibility, user can configure priority level for each of the NVIC interrupt lines; it has full control over execution profile for each of the interrupt lines separately.

There are 2 priority types in Cortex-M:

  • Preemption priority: Interrupt with higher logical priority level can preempt already running lower priority interrupt
  • Subpriority: Interrupt with higher subpriority (but same preemption priority) will execute first when 2 (or more) interrupt lines become active at the same time; such interrupt will also never stop currently executed interrupt (if any) by the CPU.

STM32s have different interrupt lines (interrupt service routines later too) for DMA and UART, one for each peripheral and its priority could be software configurable.

Function that gets called to process received data must keep position of last read value, hence processing function is not thread-safe or reentrant and requires special attention.

Application must assure, DMA and UART interrupts utilize same preemption priority level. This is the only configuration to guarantee processing function never gets preempted by itself (DMA interrupt to preempty UART, or opposite), otherwise last-known read position may get corrupted and application will operate with wrong data.

Examples

Examples can be used as reference code to implement your own DMA TX and RX functionality.

There are 2 sets of examples:

  • Examples for RX only
    • Available in projects folder with usart_rx_ prefix
    • DMA is used to receive data, polling is used to echo data back
  • Examples for RX & TX
    • DMA is used to receive data and to transmit data back
    • It uses ring buffer to copy data from DMA buffer to application before it is sent back

Common for all examples:

  • Developed in STM32CubeIDE for easier evaluation on STM32 boards
  • Fully developed using LL drivers for various STM32 families
  • UART common configuration: 115200 bauds, 1 stop bit, no-parity
  • DMA RX common configuration: Circular mode, TC and HT events enabled
  • DMA TX common configuration: Normal mode, TC event enabled
  • All RX examples implement loop-back functionality. Every character received by UART and transfered by DMA is sent back to same UART
STM32 family Board name USART STM32 TX STM32 RX RX DMA settings TX DMA settings
STM32F1xx BluePill-F103C8 USART1 PA9 PA10 DMA1, Channel 5
STM32F4xx NUCLEO-F413ZH USART3 PD8 PD9 DMA1, Stream 1, Channel 4 DMA1, Stream 3, Channel 4
STM32G0xx NUCLEO-G071RB USART2 PA2 PA3 DMA1, Channel 1
STM32G4xx NUCLEO-G474RE LPUART1 PA2 PA3 DMA1, Channel 1
STM32L4xx NUCLEO-L432KC USART2 PA2 PA15 DMA1, Channel 6, Request 2
STM32H7xx NUCLEO-H743ZI2* USART3 PD8 PD9 DMA1, Stream 0 DMA1, Stream 1
STM32U5xx NUCLEO-U575ZI-Q* USART1 PA9 PA10 GPDMA1, Channel 0 GPDMA1, Channel 1
  • It is possible to run H743 (single-core) examples on dual-core STM32H7 Nucleo boards, NUCLEO-H745 or NUCLEO-H755. Special care needs to be taken as dual-core H7 Nucleo boards use DCDC for MCU power hence application must check clock configuration in main file and uncomment code to enable SMPS.

Examples demonstrate different use cases for RX only or RX&TX combined.

Demos part of this repository are all based on Low-Level (LL) drivers to maximize user understanding - how to convert theory into practice. Some STM32Cube firmware packages include same example using HAL drivers too. Some of them are (with link to example; list is not exhausted) listed below. All examples are identified as UART_ReceptionToIdle_CircularDMA - you can search for it in your local Cube firmware repository.

Examples for UART + DMA RX

Polling for changes

  • DMA hardware takes care to transfer received data to memory
  • Application must constantly poll for new changes in DMA registers and read received data quick enough to make sure DMA will not overwrite data in buffer
  • Processing of received data is in thread mode (not in interrupt)
  • P: Easy to implement
  • P: No interrupts, no consideration of priority and race conditions
  • P: Fits for devices without USART IDLE line detection
  • C: Application takes care of data periodically
  • C: Not possible to put application to low-power mode (sleep mode)

Polling for changes with operating system

  • Same as polling for changes but with dedicated thread in operating system to process data
  • P: Easy to implement to RTOS systems, uses single thread without additional RTOS features (no mutexes, semaphores, memory queues)
  • P: No interrupts, no consideration of priority and race conditions
  • P: Data processing always on-time with maximum delay given by thread delay, thus with known maximum latency between received character and processed time
    • Unless system has higher priority threads
  • P: Fits for devices without UART IDLE line detection
  • C: Application takes care of data periodically
  • C: Uses memory resources dedicated for separate thread for data processing
  • C: Not possible to put application to low-power mode (sleep mode)

UART IDLE line detection + DMA HT&TC interrupts

  • Application gets notification by IDLE line detection or DMA TC/HT events
  • Application has to process data only when it receives any of the 3 interrupts
  • P: Application does not need to poll for new changes
  • P: Application receives interrupts on events
  • P: Application may enter low-power modes to increase battery life (if operated on battery)
  • C: Data are read (processed) in the interrupt. We strive to execute interrupt routine as fast as possible
  • C: Long interrupt execution may break other compatibility in the application

Processing of incoming data is from 2 interrupt vectors, hence it is important that they do not preempt each-other. Set both to the same preemption priority!

USART Idle line detection + DMA HT&TC interrupts with RTOS

  • Application gets notification by IDLE line detection or DMA TC/HT events
  • Application uses separate thread to process the data only when notified in one of interrupts
  • P: Processing is not in the interrupt but in separate thread
  • P: Interrupt only informs processing thread to process (or to wakeup)
  • P: Operating system may put processing thread to blocked state while waiting for event
  • C: Memory usage for separate thread + message queue (or semaphore)

This is the most preferred way to use and process UART received character

Examples for UART DMA for TX (and optionally included RX)

  • Application is using DMA in normal mode to transfer data
  • Application is always using ringbuffer between high-level write and low-level transmit operation
  • DMA TC interrupt is triggered when transfer has finished. Application can then send more data

Demo application for debug messages

This is a demo application available in projects folder. Its purpose is to show how can application implement output of debug messages without drastically affect CPU performance. It is using DMA to transfer data (no CPU to wait for UART flags) and can achieve very high or very low data rates

  • All debug messages from application are written to intermediate ringbuffer
  • Application will try to start & configure DMA after every successfive write to ringbuffer
  • If transfer is on-going, next start is configured from DMA TC interrupt

As a result of this demo application for STM32F413-Nucleo board, observations are as following:

  • Demo code sends 1581 bytes every second at 115200 bauds, which is approx 142ms.
  • With DMA disabled, CPU load was 14%, in-line with time to transmit the data
  • With DMA enabled, CPU load was 0%
  • DMA can be enabled/disabled with USE_DMA_TX macro configuration in main.c

How to use this repository

  1. run git clone --recurse-submodules https://github.com/MaJerle/stm32-usart-dma-rx-tx to clone repository including submodules
  2. run examples from projects directory using STM32CubeIDE IDE

stm32-usart-uart-dma-rx-tx's People

Contributors

dependabot[bot] avatar jakeru avatar majerle avatar matstm avatar rajah-ketos avatar tomt0329 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stm32-usart-uart-dma-rx-tx's Issues

Keeping the old_po(sition) as static to function creates problems for resetting

During the rx ring buffer write from dma source exchange, you are tracking old_pos as a static inside the rx_check().

If you want to reset the connection because of error, reinit, sleep/wake, whatever. This isn't great and there is no good way to do it. The obvious solution for me was to make this static outside of the function. Now at least I can reset it if I do something like change baud where I have to reset the DMA anyway.

I am aware this is not meant to be production ready code, but I figured it was worth discussing because the second you do want to use code like this, the issue pops up,

void
usart_rx_check(void) {
    static size_t old_pos;
    size_t pos;

    /* Calculate current position in buffer and check for new data available */
    pos = ARRAY_LEN(usart_rx_dma_buffer) - LL_DMA_GetDataLength(DMA1, LL_DMA_STREAM_0);
    if (pos != old_pos) {                       /* Check change in received data */
        if (pos > old_pos) {                    /* Current position is over previous one */
            /*
             * Processing is done in "linear" mode.

Also, I am aware that static variables are initialized by C standard, but I find when it's something like this that could cause difficult bugs if not set to zero at start, I typically set the equal to zero anyhow, a set your mind at ease thing. At least a comment in code or documentation would be helpful for those that didn't know https://stackoverflow.com/questions/3373108/why-are-static-variables-auto-initialized-to-zero

re-entrance protection?

Thank you very much for your clear and very well documented code examples. I'm using it for F0.

In your code, the interrupt handlers call usart_rx_check which in turn calls usart_process_data. There is a chance that the IDLE interrupt follows very quickly after TC or HT, regardless of the buffer length.

This can, in theory, cause re-entrancy issues in usart_rx_check() like if the second interrupt falls between reading pos and updating last_pos.

You clear the interrupt flags before the usart_rx_check() call. Wouldn't it be more protective against re-entrancy to clear them only after usart_rx_check() returns?

Do you see a need for further measures like pausing the DMA etc.?

Some improvements under high baud rate of serial port

Hello, @MaJerle .

My current serial port application is 5Mbps, using RTOS thread processing. I turned on the DMA HT and FT interrupts by referring to your tutorial, and the serial port interrupt line, but my maximum data per packet is 64 words. section, in this case, should I turn off the DMA HT interrupt to reduce system overhead?

ESP8266 - Examples?

Hi,

I was wondering if you have worked with the STM32F769 Discovery board and the dedicated connector it has for an ESP8266?

I want to leverage the async IO pattern that you talk about here but not really sure how much of your source code is portable to the STM32F769, I'm using Visual GDB and Visual Studio for this.

I've actually designed and developed a pretty powerful API in C# that runs on a PC and controls an ESP8266, this is fully async in nature and includes a very flexible and quite powerful RingBuffer class.

I'd now like to rewrite this in C++ and get the board working with the ESP8266.

I can't even find out which USART is tied to the ESP8266 connector, pretty much no documentation.

Thanks

usart_process_data() gets called with zero length

Hello, thanks for sharing this framework.
I've ported the F4 project to IAR here: https://github.com/bluehash/STM32_USART_DMA_RX

One thing I've noticed is that usart_process_data() gets called with zero length. It is as if the IDLE line interrupt takes over when the DMA HT/FT finishes. I'm not sure if there is a way to ignore this. If I have alot of data coming in, that is a good number of interrupts that use up CPU cycle time with zero length.

I put in a simple check here(bluehash@e2a180a) so that I can breakpoint the code.

I've also added a test under idle_line_irq_rtos_F4/test using TeraTerm.
My environment:
STM32F413ZH Nucleo Board
IAR 8.30.2

/bin/sh: arm-none-eabi-objcopy.exe: command not found

arm-none-eabi-objcopy.exe -O ihex "usart_dma_rx_idle_line_irq_rtos_L4.elf" "usart_dma_rx_idle_line_irq_rtos_L4.hex"
/bin/sh: arm-none-eabi-objcopy.exe: command not found
make[1]: *** [makefile:80: post-build] Error 127
make: *** [makefile:48: all] Error 2
"make -j7 all" terminated with exit code 2. Build might be incomplete.

Return variable "started" has almost no useful purpose

At https://github.com/MaJerle/stm32-usart-uart-dma-rx-tx/blob/main/projects/usart_rx_idle_line_irq_ringbuff_tx_H7/Src/main.c

/**
 * \brief           Check if DMA is active and if not try to send data
 *
 * This function can be called either by application to start data transfer
 * or from DMA TX interrupt after previous transfer just finished
 *
 * \return          `1` if transfer just started, `0` if on-going or no data to transmit
 */
uint8_t
usart_start_tx_dma_transfer(void) {
    uint32_t primask;
    uint8_t started = 0;

The function usart_rx_check(void) returns 1 for started, or 0 for "transmission already in progress".

I'm really not sure I can see reason the result of started / not started would matter to the user.

Either you're already transmitting, in which case the DMA will get to your new ring buffer data soon. Or, congrats, you're sending your data now. Same effect either way your data will be sent.

I suppose the user could implement an abort and then priority send, but not really because you probably aren't certain if there is data in the ring buffer ahead of yours.

Am I missing something on this? Did this just exist for testing?

Using CubeMX and HAL for UART DMA on H7

I'm trying to do the same functionality on H750 but with using CubeMX (From within CubeIDE) and using HAL function (Mainly HAL_UARTEx_ReceiveToIdle_DMA) but it is not working for me.
Is there any reason for you to use the LL over the HAL?

a suggestion with README

Good job. But I still have a little suggestions:
In the part of important fact about U(S)ART, " IDLE line interrupt will notify application when it will detect for 1 character inactivity on RX line, meaning after 10 us", it should be 100 us I think.
In the part of final configuration, "At 115200 baud, 100 bytes means 1 ms time", it should be 10 ms. Right?

help with RX

Hello,

First off thank you for these examples. I am trying to use the usart_rx_idle_line_irq_ringbuff_G0 example. But would like some guidance on checking what is coming in from the terminal. I want to send commands from the terminal and act on them but only act on them once. I have looked in the usart_rx_dma_buffer for my commands and see them, but is this the correct way of doing this? I want to check some buffer to see if the command is there, act on it, and then clear the buffer. Any help would be appreciated.

Thanks,
Mike

problem in using multiple DMAs for RX

I love this repo. It has nice documentation as well. Thank you.

I found a problem when I try to work with 2 UARTs by following the instruction here.
The code uses RX DMA. It works fine when I test each UART port one by one.
But the problem happens when I play with 2 ports at the same time:

  • I am sending data to each port every 3 seconds and STM32 echoes it
  • It doesn't need to be synchronised. I can see the problem even it is not synchronised.

The problem I found is that "LL_DMA_GetDataLength()" returns a wrong value sometimes very often in the case above.
I use different streams:

void DMA1_Stream1_IRQHandler(void)
{
  /* USER CODE BEGIN DMA1_Stream1_IRQn 0 */
    // USART3 RX
    if (LL_DMA_IsActiveFlag_HT1(DMA1))
    {
      LL_DMA_ClearFlag_HT1(DMA1);
      /* Call function Reception complete Callback */
      USART3_Rx_Callback();
    }
    else if (LL_DMA_IsActiveFlag_TC1(DMA1))
    {
      LL_DMA_ClearFlag_TC1(DMA1);
      /* Call function Reception complete Callback */
      USART3_Rx_Callback();
    }
    else if(LL_DMA_IsActiveFlag_TE1(DMA1))
    {
      /* Call Error function */
      USART3_Error_Callback();
    }
...
void DMA1_Stream6_IRQHandler(void)
{
  /* USER CODE BEGIN DMA1_Stream6_IRQn 0 */
	// UART8 RX
    if (LL_DMA_IsActiveFlag_HT6(DMA1))
    {
      LL_DMA_ClearFlag_HT6(DMA1);
      /* Call function Reception complete Callback */
      UART8_Rx_Callback();
    }
    else if (LL_DMA_IsActiveFlag_TC6(DMA1))
    {
      LL_DMA_ClearFlag_TC6(DMA1);
      /* Call function Reception complete Callback */
      UART8_Rx_Callback();
    }
    else if(LL_DMA_IsActiveFlag_TE6(DMA1))
    {
      /* Call Error function */
      UART8_Error_Callback();
    }
...

Error cloning the repository

Not sure It is important but got the following error cloning the repo:
Also you mention

Examples for RX & TX
Available in projects folder with usart_tx_ prefix
DMA is used to receive data and to transmit data back
It uses ring buffer to copy data from DMA buffer to application buffer first

However there are no usart_tx_ prefix projects in the rep.

C:\Temp>git clone --recurse-submodules https://github.com/MaJerle/stm32-usart-dma-rx-tx
Cloning into 'stm32-usart-dma-rx-tx'...
remote: Enumerating objects: 539, done.
remote: Counting objects: 100% (539/539), done.
remote: Compressing objects: 100% (363/363), done.
remote: Total 1713 (delta 256), reused 405 (delta 160), pack-reused 1174
Receiving objects: 100% (1713/1713), 2.93 MiB | 5.48 MiB/s, done.
Resolving deltas: 100% (919/919), done.
Submodule 'middlewares/ringbuff' (https://github.com/MaJerle/ringbuff) registered for path 'middlewares/ringbuff'
Cloning into 'C:/Temp/stm32-usart-dma-rx-tx/middlewares/ringbuff'...
remote: Enumerating objects: 362, done.
remote: Counting objects: 100% (362/362), done.
remote: Compressing objects: 100% (192/192), done.
remote: Total 883 (delta 164), reused 276 (delta 91), pack-reused 521
Receiving objects: 100% (883/883), 233.31 KiB | 1.08 MiB/s, done.
Resolving deltas: 100% (393/393), done.
Submodule path 'middlewares/ringbuff': checked out '247155c50a36cd09cec473368acd76527039c2ca'
Submodule 'third_party/Embedded_Libs' (https://github.com/MaJerle/Embedded_Libs) registered for path 'middlewares/ringbuff/third_party/Embedded_Libs'
fatal: No url found for submodule path 'middlewares/ringbuff/third_party/embedded-libs' in .gitmodules
Failed to recurse into submodule path 'middlewares/ringbuff'

C:\Temp>

USART3 Not Found on F411 Nucleo Board

Hi MaJerle,

I want to study your code, but I only have F411 Nucleo board.
I found your code for F4 uses UART3 with PD8,9 and Tx/Rx pins. They are not available on my board. Yet, I want to try your code. Any suggestion?

F411 USART Low Level Driver Not Work Well

Hi MaJerle,

Sorry to trouble you.
I have made my own PCB board with F411 MCU; and I made my USART code by following your example.
My issue is: the USART2 does not work well if I use LL driver.
This is the init code generated by CubeIDE.
`
static void MX_USART2_UART_Init(void)
{

/* USER CODE BEGIN USART2_Init 0 */

/* USER CODE END USART2_Init 0 */

LL_USART_InitTypeDef USART_InitStruct = {0};

LL_GPIO_InitTypeDef GPIO_InitStruct = {0};

/* Peripheral clock enable */
LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_USART2);

LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_GPIOA);
/**USART2 GPIO Configuration
PA2 ------> USART2_TX
PA3 ------> USART2_RX
*/
GPIO_InitStruct.Pin = LL_GPIO_PIN_2|LL_GPIO_PIN_3;
GPIO_InitStruct.Mode = LL_GPIO_MODE_ALTERNATE;
GPIO_InitStruct.Speed = LL_GPIO_SPEED_FREQ_VERY_HIGH;
GPIO_InitStruct.OutputType = LL_GPIO_OUTPUT_PUSHPULL;
GPIO_InitStruct.Pull = LL_GPIO_PULL_NO;
GPIO_InitStruct.Alternate = LL_GPIO_AF_7;
LL_GPIO_Init(GPIOA, &GPIO_InitStruct);

/* USER CODE BEGIN USART2_Init 1 */

/* USER CODE END USART2_Init 1 /
USART_InitStruct.BaudRate = 115200;
//USART_InitStruct.BaudRate = 1000000;
USART_InitStruct.DataWidth = LL_USART_DATAWIDTH_8B;
USART_InitStruct.StopBits = LL_USART_STOPBITS_1;
USART_InitStruct.Parity = LL_USART_PARITY_NONE;
USART_InitStruct.TransferDirection = LL_USART_DIRECTION_TX_RX;
USART_InitStruct.HardwareFlowControl = LL_USART_HWCONTROL_NONE;
USART_InitStruct.OverSampling = LL_USART_OVERSAMPLING_16;
LL_USART_Init(USART2, &USART_InitStruct);
LL_USART_ConfigAsyncMode(USART2);
LL_USART_Enable(USART2);
/
USER CODE BEGIN USART2_Init 2 */

/* USER CODE END USART2_Init 2 */
}
`

And this is the function for USART2 to transmit:
`
void Start_USART2_Tx(void* data, uint16_t len) {
const uint8_t* d = data;

for (; len > 0; --len, ++d) {
		LL_USART_TransmitData8(USART2, *d);
		while (!LL_USART_IsActiveFlag_TXE(USART2)) {}
}
while (!LL_USART_IsActiveFlag_TC(USART2)) {}

}
`

The above code only works when baud rate is <= 115200; when baud rate is higher it hangs at the following line
while (!LL_USART_IsActiveFlag_TXE(USART2)) {}

However, if I use HAL library, my PCB works even baud rate reaches 921600.
`
static void MX_USART2_UART_Init(void)
{

/* USER CODE BEGIN USART2_Init 0 */

/* USER CODE END USART2_Init 0 */

/* USER CODE BEGIN USART2_Init 1 */

/* USER CODE END USART2_Init 1 /
huart2.Instance = USART2;
huart2.Init.BaudRate = 921600;
huart2.Init.WordLength = UART_WORDLENGTH_8B;
huart2.Init.StopBits = UART_STOPBITS_1;
huart2.Init.Parity = UART_PARITY_NONE;
huart2.Init.Mode = UART_MODE_TX_RX;
huart2.Init.HwFlowCtl = UART_HWCONTROL_NONE;
huart2.Init.OverSampling = UART_OVERSAMPLING_16;
if (HAL_UART_Init(&huart2) != HAL_OK)
{
Error_Handler();
}
/
USER CODE BEGIN USART2_Init 2 */

/* USER CODE END USART2_Init 2 */
}
`

The following code works fine without any hanging.
`
while (1)
{
HAL_UART_Transmit(&huart2, (uint8_t *)"LED ON\r\n", 8, 0xFFFF);
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_10, GPIO_PIN_RESET);
HAL_Delay(1000);
HAL_UART_Transmit(&huart2, (uint8_t )"LED OFF\r\n", 9, 0xFFFF);
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_10, GPIO_PIN_SET);
HAL_Delay(1000);
/
USER CODE END WHILE */

/* USER CODE BEGIN 3 */

}
`

I have checked HAL_UART_Transmit; it basically also checks USART TXE flags besides some lines of code handling timeout situation.
Do you have any idea what could the reason?

Opening usart_rx_polling_F4

Sorry for asking this newbie question.
I've started using STM32CubeIDE/ STM32CubeMX a while ago but I would like to run your code since it is the only one with an explanation of running UART with interrupt on an RTOS.
But I have no clue as to where to start: I open the .project file and the IDE requires me to provide an ..ioc file
I noticed that you're using Visual Studio instead of STM32CubeIDE and I tried to follow https://github.com/MaJerle/stm32-cube-cmake-vscode but got stuck there as well. I've installed VSudio Code and another instance of STM32CubeIDE in a seperate folder (C:\ST\STM32CubeIDE_1.10.1_VS) but whatever I click, ,nothing opens as expected.
Can you please me some pointers?

Good Idea to add a int16_t uart_get_byte(void)

Hi,

Many thanks for the great samples. Do you have any good idea to add a uart_get_byte? Do you think it is a good idea to add a ringbuffer and then fill this buffer in the usart_rx_check function then I can add a general usart_get_byte() as normal.

thx
mathias

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.