Hi: I am trying to rebuild the project of vcu1525 on windows, and i had some quest

what is the version of vivado for the project of vcu1525? about corundum HOT 13 CLOSED

corundum commented on July 4, 2024

what is the version of vivado for the project of vcu1525?

from corundum.

Comments (13)

alexforencich commented on July 4, 2024

I have been using 2019.1. Will probably move to 2019.2 when the department gets a new vivado license. There should be no significant difference between different versions of vivado aside from having to upgrade/downgrade/regenerate XCI files for IP. If you're on windows, presumably the make file won't work, so may have to create a project and add all the files manually - shouldn't really be a major problem. I think the simulation should work under WSL.

I have not used Xilinx QDMA so I don't have any performance data to compare. DPDK drivers are planned for corundum likely sometime this year; the current driver is for the standard linux networking stack.

from corundum.

Jzone315 commented on July 4, 2024

Thanks a lot for your reply, there are still some questions about the corundu.
First, what is the full name of TDMA, is the 'T' menas time-aware or tightly-integrated?
Then,what is differences between interfaces and ports which are refered in READEME, in the project of vcu1525, i had noticed that it implemented two interfaces with one port in each to connect to the two CMACs.
Last, did the project of vcu1525 do some packet mapping between CMACs and DMA, when i tried to do the same works on U50 with Xilinx's CMAC and QDMA, i had to do some adjustments of the packet size between CMACs and QDMA, since the Ethernet's maximum packet size is 1514Bytes, while the best packet size of QDMA is 4k, which makes the performance of QDMA maximization，so there always need some intermediate treatment between the CMACs and QDMA.
Thanks again.

from corundum.

alexforencich commented on July 4, 2024

TDMA = time division multiple access. The TDMA design variants include a scheduler control module that can enable/disable queues based on PTP time. These modules need a bit of work, so the supported queue count is currently a bit limited.

The interface contains the queues, and each interface is registered separately with the OS ("eth0", etc.). The port contains the datapath components for a single ethernet MAC. Each port has its own transmit scheduler. This split means multiple ports can be attached to one interface, sharing the same queues. Since each port has its own transmit scheduler, flows can be migrated from port to port or even striped across ports under hardware control, transparent to the operating system. The VCU1525 really only supports 3 possible configurations in 100G mode as it only has two QSFP28 interfaces: 1 interface, 1 port per interface (only one QSFP port active); 2 interfaces, 1 port per interface; 1 interface, 2 ports per interface.

Not sure what you mean by packet mapping or packet size adjusting.

from corundum.

Jzone315 commented on July 4, 2024

By packet mapping, for example, a DMA process transmitted 4k bytes of data, but for the Ethernet frames, the maximum frame length is 1514 bytes, so the data burst from DMA need to be split into about 3 or 4 Ethernet frames.
On the others side, for a continuous Ethernet frames came from the CMAC, if the frame length of each is too short, for example only 64 bytes, so will the DMA transmit these short frames one by one with each as a full DMA process, or makes it as a large size of frame, for example 4k bytes, to implemented once DMA process, because of the extra overhead of DMA, what as i understand is that the bandwidth of DMA will be benefited from larger packet size.

from corundum.

alexforencich commented on July 4, 2024

Currently, corundum doesn't do anything like that. Mainly because that means it won't be compatible with the Linux networking stack, where each packet has to go in its own SKB. So combining multiple packets back-to-back means having the host CPU move things around, which is no longer zero copy. Now, if you're talking about segmentation offloading...I might implement that at some point, but it's not a particularly high priority. I do plan on implementing scatter/gather at some point so that corundum can take advantage of software GSO, but it will be some time before I get around to doing that.

Also, the "ideal" size of DMA operations is almost never 4KB. On all the systems I have worked with, the largest PCIe max payload setting was 256 bytes. So any write larger than 256 bytes is going to get split up into multiple 256 byte TLPs anyway. Also, the Ultrascale PCIe hard IP core only supports payloads of up to 1024 bytes. In terms of reads, most systems seem to set the max read request size to 512 bytes. Reads larger than 512 bytes will be segmented across multiple read requests. The flip side is control traffic - descriptors and completions - which can cause problems with small packet sizes. There are certainly design changes that can be made to increase throughput, but these types of optimizations usually come with downsides as well. So far, I have tried to start with the simplest possible implementation, see how that works, and then iterate from there. Considering it runs at up to 79 Gbps TX and 93 Gbps RX right now, that method seems to have been relatively successful.

At any rate, there is definitely room for optimizing the DMA performance on corundum. It's still very much in development, and will probably see a number of breaking changes over the next year or so.

from corundum.

Jzone315 commented on July 4, 2024

But for the Pcie Gen3x16, the theoretical bandwidth is 128Gbps, for actual, the bandwidth may not even be 100Gbps, in vcu1525, there are two ETH ports, both is 100G, so the bandwidth of PCIE may not even meet one of the ETH port, if the two ETH ports both works at its full speed, how can the pcie meet the flows of the two ETH ports.

from corundum.

alexforencich commented on July 4, 2024

It doesn't. You can't saturate both 100G ports at the same time via PCIe. In fact, it's impossible to saturate even one port with packets smaller than about 1500 bytes. However, this doesn't make it useless, having a second port can be useful for network topology reasons, failover, etc. Or you can do packet processing/routing/etc. on the FPGA before sending data to the host, in which case having less bandwidth is acceptable. Running at 10G or 25G provides even more flexibility by providing more ports.

Corundum was originally intended to support optical networking research, and for the projects we're working on, we need multiple uplinks per host and the ability to control packet transmission on a per-destination basis with microsecond precision. Hence we need a lot of queues that are shared across multiple ports, and the ability to enable/disable queues on microsecond timescales. We also need to use links that can be reconfigured at microsecond timescales - something that 40G and 100G links cannot accomplish due to FEC block lock and interlane deskew. So for our projects we're mainly interested in running the NICs at 10G or 25G. However, 100G is a very nice selling point for the overall project, and it provides a straightforward means of characterizing and debugging the core NIC datapath. And there are likely lots of applications out there that could take advantage of a fully open source, FPGA-based NIC that can source and sink a significant portion of 100 Gbps.

from corundum.

Jzone315 commented on July 4, 2024

Thanks a lot, one more question.
when i rebuilt the project on Alveo U50 board, with only one interface and per port implemented, since there is only one qsfp28 port on board, everything is ok except the constraint file "axis_async_fifo.tcl", vivado inserts a warning and error messages listed below:

Warning : [Vivado 12-1008] No clocks found for command 'get_clocks -of_objects [get_pins {core_inst/iface[1].mac[0].mac_rx_fifo_inst/wr_ptr_reg_reg[0]/C}]'. ["axis_async_fifo.tcl":23]

Errors : [Common 17-55] 'get_property' expects at least one object. ["axis_async_fifo.tcl":23]

What i had changed is the commment of module "cmac_usplus_1", and also the paramater IF_COUNT had been modified to 1, so there should be no path "iface[1]" now, but what caused this error about path "iface[1]".
Thanks.

from corundum.

Jzone315 commented on July 4, 2024

I had fixed this issue now, and had successfully migrated to U50 board.
I'll run some tests later.
Thanks very much for your reply.

from corundum.

alexforencich commented on July 4, 2024

What did the issue end up being? As you said, you should never have an iface[1] if IF_COUNT is set to 1.

from corundum.

Jzone315 commented on July 4, 2024

It has something to do with the "generate" statements in interface module, by comment the part of qsfp1, i had overcame this issue.

from corundum.

Jzone315 commented on July 4, 2024

Also a question with cmac_pad module, since the cmac core is responsibile for inserting FCS, so i think the zeros pad length of cmac_pad should be 60bytes, but in the module ,it seemed to pad to 64bytes, is this correct?

from corundum.

alexforencich commented on July 4, 2024

That's a very good point. I will test that change and see if the CMAC module is OK with 60 byte frames (it should be, as that is the spec).

from corundum.

what is the version of vivado for the project of vcu1525? about corundum HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent