Giter Site home page Giter Site logo

Comments (12)

P33M avatar P33M commented on May 27, 2024

Ah, commands are concatenated together in packets, but only a single packet is being submitted at a time. So we're toggling between cmsis_dap_swd_write_from_queue() and cmsis_dap_swd_read_process() at every packet boundary for some reason.

from debugprobe.

tom-van avatar tom-van commented on May 27, 2024

It's not a right place to report an OpenOCD bug.
But yes, you're right - multiple pending requests handling works on HID interface only.
We should use async libusb_submit_transfer() instead of synchronous libusb_bulk_transfer()

from debugprobe.

tom-van avatar tom-van commented on May 27, 2024

See the new series of 8 patches up to
7367: jtag/drivers/cmsis_dap: debug tweaks | https://review.openocd.org/c/openocd/+/7367

The main one is
7365: jtag/drivers/cmsis_dap_bulk: use asynchronous libusb transfer | https://review.openocd.org/c/openocd/+/7365

and fully implements the asynchronous handling of USB bulk transfer in CMSIS-DAP driver.

The good news is that my experimental CMSIS-DAPv2 adapter (USB FS, single Cortex-M4 core) runs twice faster
than before (never seen such figures on USB FS, better than CMSIS-DAPv1 on USB HS!):

adapter speed: 20000 kHz
halt
poll off

> load_image /run/user/1000/ram256k.bin 0x20000000
262144 bytes written at address 0x20000000
downloaded 262144 bytes in 0.490739s (521.662 KiB/s)

> dump_image /dev/null 0x20000000 0x40000
dumped 262144 bytes in 0.813695s (314.614 KiB/s)

The bad news is that picoprobe using a comparable hw is more than 2 times slower

> load_image /run/user/1000/ram256k.bin 0x20000000
262144 bytes written at address 0x20000000
downloaded 262144 bytes in 1.308464s (195.649 KiB/s)

and even worse that picoprobe reading from the target memory loses a packet time to time (a not guarded critical section?)

> dump_image /dev/null 0x20000000 0x40000
CMSIS-DAP transfer count mismatch: expected 15, got 16
SWD DPIDR 0x0bc12477, DLPIDR 0x00000001
Failed to read memory at 0x20003434

BTW the target was RP2040

from debugprobe.

P33M avatar P33M commented on May 27, 2024

Probably a rich seam of bugs here

  • How is tinyusb delivering endpoint data - one packet at a time, or only when a short packet is received (implying end of transfer)?
  • Does the DAP command parsing account for the fact that commands may span packets, and if it doesn't then does it respond with the number of bytes processed from the command stream?
  • Does my choice of timer interrupt (40us) not work so well when you add host latencies and endpoint parking if the host receives a NAK?

from debugprobe.

P33M avatar P33M commented on May 27, 2024

Obvious bug number 1: only one command is processed per timer interrupt, hence the large divergence in throughput as the adapter khz goes up.
Obvious bug number 2: the uint32 returned by DAP_ExecuteCommand has bit 16 set, so I'm passing in 64k+response_length to tud_vendor_write() - which for some unexplainable reason seems to still work.
Probably a bug: neither the picoprobe nor cmsis firmware tracks how far along in a received buffer the command processing has gotten. So in the case where you have a command spanning a packet (I can't find any documentation to the contrary that says this can't happen) you'd point the command decoder at the first byte in the second packet which could be anything.

from debugprobe.

tom-van avatar tom-van commented on May 27, 2024

Probably a bug: neither the picoprobe nor cmsis firmware tracks how far along in a received buffer the command processing has gotten. So in the case where you have a command spanning a packet (I can't find any documentation to the contrary that says this can't happen) you'd point the command decoder at the first byte in the second packet which could be anything.

AFAIK CMSIS-DAP is solely packet oriented. The version 1 was transmitted over HID reports so no fragmentation/reassembly could happen. I'm not aware of any change of this principle in CMSIS-DAP v2 or v2.1
OpenOCD sends command packets shorter by at least one byte (on USB bulk) than 'Packet Size' queried from the adapter by DAP_Info Command. OpenOCD ensures the response is no longer than 'Packet Size'-1 by computing the expected response size.
So far so good.

I had a short look to the picoprobe source. The tusb vendor class support employs character based FIFO so it's not suitable for a packet oriented protocol. Seems me that picoprobe should define an application tusb driver for vendor class instead and avoid concatenating packet data in FIFOs.

from debugprobe.

tom-van avatar tom-van commented on May 27, 2024

So in the case where you have a command spanning a packet (I can't find any documentation to the contrary that says this can't happen) ...

https://arm-software.github.io/CMSIS_5/DAP/html/group__DAP__Commands__gr.html

reads

Command and Response data have a package size limitation that is defined with DAP_PACKET_SIZE. This configuration setting can be obtained with the command DAP_Info and is used to optimize the performance for Full-Speed or High-Speed USB. The debugger must ensure that each data package fits within the limitations of the configured DAP_PACKET_SIZE.

from debugprobe.

P33M avatar P33M commented on May 27, 2024

I implemented command batching but I think one problem is that there's a bug in CMSIS, at least the version I'm using. If I dump the reported request lengths that DAP_ProcessCommand used, I get:

133330 Got packet 64
DAP_ProcessCommand DAP_Info
133335 Packet decode remaining 62 req 2 resp 2
DAP_ProcessCommand DAP_Info
133341 Packet decode remaining 60 req 2 resp 2
DAP_ProcessCommand
133347 Packet decode remaining 59 req 1 resp 1
DAP_ProcessCommand
133353 Packet decode remaining 58 req 1 resp 1
DAP_ProcessCommand DAP_Transfer
133360 Packet decode remaining 55 req 3 resp 3
DAP_ProcessCommand DAP_Info
133367 Packet decode remaining 53 req 2 resp 2
DAP_ProcessCommand DAP_Disconnect
133374 Packet decode remaining 52 req 1 resp 2
DAP_ProcessCommand
133380 Packet decode remaining 51 req 1 resp 1
DAP_ProcessCommand
133386 Packet decode remaining 50 req 1 resp 1
DAP_ProcessCommand DAP_Transfer
133393 Packet decode remaining 47 req 3 resp 3
DAP_ProcessCommand DAP_Info
133400 Packet decode remaining 45 req 2 resp 2
DAP_ProcessCommand DAP_Disconnect
133407 Packet decode remaining 44 req 1 resp 2
DAP_ProcessCommand
133413 Packet decode remaining 43 req 1 resp 1
DAP_ProcessCommand
133419 Packet decode remaining 42 req 1 resp 1
DAP_ProcessCommand DAP_Transfer
133426 Packet decode remaining 39 req 3 resp 3
DAP_ProcessCommand DAP_Info
133433 Packet decode remaining 37 req 2 resp 2
DAP_ProcessCommand DAP_Transfer
133440 Packet decode remaining 4294967290 req 43 resp 3

Timestamps are in milliseconds, as I get openocd to do sleep(1) whenever it gets an unexpected result, so I can correlate probe vs openocd logs. I had 37 bytes left in the packet but DAP_Transfer claimed it consumed 43 bytes (?!)

from debugprobe.

tom-van avatar tom-van commented on May 27, 2024

There is no point in batching of DAP commands from the performance point of view.
After the adapter initialization (and before quit) no other commands than DAP_Transfer are issued.
And DAP_Transfer is always sized to fill the packet if possible.
A batching might allow you to use character based FIFOs, but the adapter will never run as fast a packet based solution.

from debugprobe.

P33M avatar P33M commented on May 27, 2024

The bug was due to not completely draining tinyUSB's fifo - reading only 64B when there were two packets in RX meant you sometimes parsed half of a DAP_Transfer block and it all went wrong. Sizing the read buffer to be DAP_PACKET_SIZE * DAP_PACKET_COUNT fixes that, but this improved performance somehow breaks the Pi 4's integrated USB2.0 hub.

My suspicion is that the hub tries to start a transfer very close to EOF2, causing babble. Device recovery and subsequent transfers are ignored by picoprobe, so there's an error condition that isn't caught somewhere.

from debugprobe.

P33M avatar P33M commented on May 27, 2024

See #47 - suitable for testing on something that isn't a Pi 4 (3B+ works fine here).

from debugprobe.

P33M avatar P33M commented on May 27, 2024

Fixed in 58fa7a1 which now respects DAP semantics for supported number of outstanding packets.

from debugprobe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.