Giter Site home page Giter Site logo

Zynq support about corundum HOT 29 CLOSED

corundum avatar corundum commented on July 24, 2024
Zynq support

from corundum.

Comments (29)

alexforencich avatar alexforencich commented on July 24, 2024

To be honest, the main issue I have at the moment is the software side. I don't know much about platform drivers and dealing with the device tree. The main things that I have done so far are to put together an AXI DMA interface module (which so far has not been committed to the repo or tested in HW, so the timing performance is unknown) and segmenting out all of the PCIe-specific stuff in the driver so that platform device support can be added. The other component that I need to sort out is tinkering in the simulation framework so that the driver model can be set up to use the AXI interface models. If that's something you would be interested in working on, then I would be happy to collaborate on that.

from corundum.

ollie-etl avatar ollie-etl commented on July 24, 2024

I'm still getting to grips with the codebase, but I'll dig through the driver code to see how involved i think this'd be. I'll report back

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Near as I can tell, the main thing to do with the driver code is to add the platform driver probe and remove methods to mqnic_main.c. Everything else should be independent of platform device vs. PCIe device. I think. Last time I grepped through everything, I make sure there were no references to pci_ methods or to the PCI device structure outside of mqnic_main.c.

Actually, I suppose the other aspect is figuring out all of the zynq tool automation and what not. I do my best to avoid the IPI flow, but zynq requires that at least on some level. IIRC my plan was to use TCL to generate the Zynq block in the IPI flow, export that to verilog, and then do the rest in verilog like the other designs. But perhaps there is a way to package up a zynq version of corundum that you can just pop in a block diagram for use on zynq devices, for people that avoid writing HDL. Like all of the other designs in the repo, the whole shebang has to be automated from makefiles and minimal TCL scripts - I want to be able to type make and get a bit file I can put on the device, regardless of what version of Vivado is installed (assuming it's something relatively recent). And the petalinux build would need to be automated in a similar way.

from corundum.

ollie-etl avatar ollie-etl commented on July 24, 2024

I tend to wrap up verilog and pass it to vivado as a block design. I use something like:

    create_project <blah>
    read_verilog <srcs>
    ...
    update_compile_order -fileset sources_1
    ipx::package_project \
         -import_files \
         -library user \
         -taxonomy /UserIP \
         -root_dir ./
     ipx::current_core component.xml
          update_compile_order -fileset sources_1
          ipx::create_xgui_files [ipx::current_core]
          ipx::update_checksums [ipx::current_core]
          ipx::check_integrity [ipx::current_core]
          ipx::save_core [ipx::current_core]
          close_project -delete

All of our builds are automated with nix, including tcl fragment generation, but you should be able to achieve the same thing in make

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

It's entirely possible that a different automation technique would make sense for zynq devices. I'm certainly open to ideas on that front.

What zynq device(s) are you targeting? Are you running on custom boards or Xilinx dev boards? Do you have a ZCU102 or ZCU106?

from corundum.

ollie-etl avatar ollie-etl commented on July 24, 2024

youd need to automate the device tree fragment generation in make, but that should be ok. I think the actual build using block design / petalinux is probaly outside the scope of what you're trying to achieve.

Providing the package IP and device tree fragment would be a good outcome. Its extremely likely other IP would be desired by the end user in addition.

I'm targetting both ZCU111 and custom hardware, based on zcu111

from corundum.

ollie-etl avatar ollie-etl commented on July 24, 2024

I'm guessing it possible to associate device tree fragments with the ipx packaging tcl command, but I've never worked out how to do that. Xilinx IP seems to do so though - the device tree geenration is parameterized based on your IP configuration

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Well, I like to provide a complete design that works out of the box that a user can grab, load on their board, and hit the ground running. So for zynq, presumably that would include building both the bit file and petalinux. It's one thing to get it working for your, but it's also necessary to put together at least one reference design.

I have access to a ZCU106 and ZCU102, I'm not sure if Xilinx will send me a ZCU111, but I suppose I can ask. They have provided quite a bit of hardware to support the development of corundum; I may be able to ask about getting you a ZCU106 or ZCU102 so we have a platform in common.

from corundum.

ollie-etl avatar ollie-etl commented on July 24, 2024

A design on a zcu106 or zcu102 is sufficiently close. the mpsoc, mac and trancievers are the same, I believe

from corundum.

ollie-etl avatar ollie-etl commented on July 24, 2024

https://github.com/Xilinx/device-tree-xlnx is the repo where all the device tree generation scripts are for xilinx IP, by the way, if generating them in vivado is of interest

from corundum.

joft-mle avatar joft-mle commented on July 24, 2024

As already very briefly mentioned in a past dev meeting, this message is to inform the group about progress regarding support for interfacing MQNIC with the ZynqMP Processing System (PS) via AXI4. Most importantly this requires a new DMA interface module, representing an AXI4 Master instead of interfacing to a PCIe IP Core to perform the host memory accesses. @alexforencich already has committed a first version of such a new DMA interface module to his github.com/alexforencich/corundum repository.

Based on that repository and specifically this new DMA interface module we (@andreasbraun90 and @joft-mle) at @missinglinkelectronics have been working on

  1. a new FPGA design flavor for the ZCU106 board which interfaces the MQNIC to the ZynqMP PS,
  2. corresponding Linux platform driver support and
  3. a PetaLinux project for running on the Linux MQNIC driver and support utilities on the ZynqMP PS.

The current work-in-progress state can be found in our missinglinkelectronics/corundum, branch zcu106-zynqmp-axi.

The currently open issues are:

  1. show-stopper: The DMA interface module or the MQNIC itself (to be determined) "freezes" as soon as a certain amount of load in form of TX plus RX traffic is exceeded. Freezing means that the MQNIC does not issue any AXI4 Master accesses anymore. TX packets are not taken out of TX queues anymore and RX packets do not make it into the RX queues anymore (according to mqnic-dump tool). This situation can mostly be easily triggered by "pinging fast enough" (ping -i 0.00001) or by, for example, "flood-pinging" from both both peers. OTOH pure, sustained, >1Gb/s unidirectional traffic (iperf -u) has not been seen to trigger the situation.
  2. nice-to-have: A device tree generator plugin would be nice to have - to reduced or completely remove the currently rather hard-coded device tree entry for the MQNIC.
  3. nice-to-have: Resolve the GT reference clock being turned "off->on again" via existing device tree entry for the SI570 chip in a better way, than simply switching the corresponding device tree entry off completely.
  4. nice-to-have: Include a properly working irqbalanced in PetaLinux which actually really "balances" the MQNIC IRQs amound the CPU cores (The default one (which is currently not included) is not doing that!)

from corundum.

joft-mle avatar joft-mle commented on July 24, 2024

Hi @alexforencich,

thanks for the notification mail regarding the new AXI simulation code having been committed :-).

Also your comment about ZynqMP only supporting a AXI burst length of 16 directed our attention to what Vivado reports for the maximum burst length when querying the associated BD top-level AXI interface port: 256. Interestingly querying the max. burst length of the AXI interface pin of the ZynqMP "IP Core" block itself also results in 256:
set s_axi_mm [get_bd_intf_pins -of_object [get_bd_cells zynq_ultra_ps] -filter {NAME =~ "*S_AXI_HPC0_FPD*"}] /zynq_ultra_ps/S_AXI_HPC0_FPD get_property CONFIG.MAX_BURST_LENGTH $s_axi_mm
So Vivado does not seem to properly propagate ZynqMP's internal limitation, neither to its interface pin nor up to a top-level port.

That said, we tried with manually setting the maximum burst length to the correct value of 16 (commit in our missinglinkelectronics/cordundum repo. Result: Unfortunately no change in behavior. Too much bi-directional traffic causes the MQNIC to freeze.

Today, we also did a quick merge (not yet pushed to Github) of your updated alexforencich/corundum repo into our zcu106-zynqmp-axi branch and checked whether any of your other changes might have fixed the issue. However it did not come as a surprise then, that the behavior still remains the same, since you already mentioned via mail that only one bug regarding 4096 byte transfers came to your attention during testbench development (I think this one?).

I guess, as a next step, we have to see whether we can cause the same "freeze" issue in the now existing testbench environment.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Hmm, that's interesting that the block diagram indicates that the max supported burst length is 256; this seems to conflict with chapter 35 of the TRM (UG1085).

So, I suppose now we have to sort out what the cause of that hang-up is - is something getting hung up on the PS side due to something the PL side is doing, is something getting hung up on the PL side due to something the PS side is doing, or is the bug fully on the PL side? Possibly what we should do is drop ILAs on both the AXI-lite interface and the AXI interface, and think about ways of generating a trigger when the system is hung.

from corundum.

joft-mle avatar joft-mle commented on July 24, 2024

Hi all, @alexforencich,

just a minor update: In the meantime we have rebased our zcu106-zynqmp-axi branch onto the current corundum master branch. We are currently still busy with other topics, however we will definitively resume our porting/debugging soon.

from corundum.

sessl3r avatar sessl3r commented on July 24, 2024

Hi all, @alexforencich,

we restarted looking into this issue and found it. To summarize: The dma_if_axi_rd is not supporting out-of-order-interleaved read responses which the PS is using as seen in hardware.

Some more details:
The test we are doing to trigger this issue very fast is floodpinging in both directions. This in most cases taskes only some seconds until the issue occures.

In Chipscope we then see the following (I reduced it to what is needed to understand the issue, the middle part of the second image is another timebase!):

20220129_mpsoc_freeze_outoforder_read_response_overview
20220129_mpsoc_freeze_outoforder_read_response

In the second picture (! different zoom levels, copied together from first one !) two reads are issued. The first with ID=0 with a LEN of 3 (4 beats) ; the second with ID=1 with a LEN of 0 (1 beat).
The read responses from the MPSoC PS are interleaved (middle part of the image) but all data is correct and it is valid AXI-MM handling.
But only one of the two op_table entries gets cleared and from this point on the op_table is stuck as the expected and recevied IDs do not match anymore.

The fillup you can see in the overview afterwards is exactly 16 requests - seems some counters are trying to realign here.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Hmm, it appears that I have misread the AXI spec; I was under the impression that bursts could not be split up and interleaved like that. So, no wonder it doesn't work. Back to the 'ole drawing board... And it looks like I should also either implement or remove USE_AXI_ID. And I will need to take a look at my AXI library components as well as a few of those (particularly the crossbar interconnect) will also have issues with this.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

I think what I will do is implement USE_AXI_ID first and set the default to 0 on the read side, then when I fix this interleaving problem I'll flip the default back to 1. I think that will be a reasonable interim fix for this.

from corundum.

sessl3r avatar sessl3r commented on July 24, 2024

For me it was also new that MPSoC is really using this interleaving feature. But yes AXI AMBA spec is very clear about that point in [1] in A5.3.1 it states

Data from read transactions with different ARID values can arrive in any order. Read
data of transactions with different ARID values can be interleaved.

But yes for now I also think just setting the number of outstanding reads to 0 is the best option.

[1] http://www.gstitt.ece.ufl.edu/courses/fall15/eel4720_5721/labs/refs/AXI4_specification.pdf

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Sure, I read that before, but I thought it applied to bursts and not individual transfer cycles. As they say, a figure is worth 1000 words, and I don't think they bothered to include a figure for that, at least not in that section. Anyway, the solution for now is not to set the number of outstanding reads to 0, but to set all of the ARID values to 0 so no reordering or interleaving takes place. Setting ARID to 0 is the easy part, the slightly more complex part is updating the book-keeping logic to update the correct table entries.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Alright, I have the DMA interface modules updated in the repo here: https://github.com/alexforencich/verilog-pcie . Please check and see if that fixes the hang, if so I will go ahead and pull those changes into the main Corundum repo.

from corundum.

sessl3r avatar sessl3r commented on July 24, 2024

Tested your changes but unfortunately does not change things for me for now. But I expect I did something wrong because m_axi_arid is couting upwards. Yes I changed the instantiation of dma_if_axi_rd to hardcoded USE_AXI_ID(0). Need to recheck this tomorrow.
Besides that I did some more experiments and it seems like a possible solution to get rid of those interleaved transfers is to just connect the DMA channel to HP0 instead of HPC0 of the MPSoC. At least some minutes of TCP/UDP iperf were showing good results without crashing. Will run some longer test this night.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

You'll probably need to adjust USE_AXI_ID in mqnic_core_axi as well.

from corundum.

sessl3r avatar sessl3r commented on July 24, 2024

OK, no idea what went wrong yesterday but now it's working as expected with your version also. It's only issuing arid==0 and iperf TCP/UDP tests are working.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Excellent! I'll go ahead and get that merged in, then.

from corundum.

Basseuph avatar Basseuph commented on July 24, 2024

We've just updated the changes to a new branch, see https://github.com/missinglinkelectronics/corundum/tree/mle/zcu106-zynqmp-axi, on our side and rebased to corundum/master. We also did some successful tests on ZCU106.

In additiona to the previous version, now

  1. the the local MAC used for the PS interface is read from the EEPROM and then modified to derive the MQNIC MACs
  2. removing the driver from the kernel vi rmmod doe work now

from corundum.

joft-mle avatar joft-mle commented on July 24, 2024

We've just updated our mle/zcu106-zynqmp-axi branch - rebased onto the latest corundum master branch - as discussed in yesterday's Cordundum Developer Meeting. As reported, we think this branch is ready for being discussed in a PullRequest.

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

The ZCU106 design along with the driver modifications have now been pulled in to this repo. See https://github.com/corundum/corundum/tree/master/fpga/mqnic/ZCU106/fpga_zynqmp for the ZCU106 example design.

Thank you very much to the folks at Missing Link Electronics for making this happen!

from corundum.

joft-mle avatar joft-mle commented on July 24, 2024

I think, this issue can be closed - now that #98 has been merged?

from corundum.

alexforencich avatar alexforencich commented on July 24, 2024

Yep, at this point I think we can close this.

from corundum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.