Comments (13)
I have been using 2019.1. Will probably move to 2019.2 when the department gets a new vivado license. There should be no significant difference between different versions of vivado aside from having to upgrade/downgrade/regenerate XCI files for IP. If you're on windows, presumably the make file won't work, so may have to create a project and add all the files manually - shouldn't really be a major problem. I think the simulation should work under WSL.
I have not used Xilinx QDMA so I don't have any performance data to compare. DPDK drivers are planned for corundum likely sometime this year; the current driver is for the standard linux networking stack.
from corundum.
Thanks a lot for your reply, there are still some questions about the corundu.
First, what is the full name of TDMA, is the 'T' menas time-aware or tightly-integrated?
Then,what is differences between interfaces and ports which are refered in READEME, in the project of vcu1525, i had noticed that it implemented two interfaces with one port in each to connect to the two CMACs.
Last, did the project of vcu1525 do some packet mapping between CMACs and DMA, when i tried to do the same works on U50 with Xilinx's CMAC and QDMA, i had to do some adjustments of the packet size between CMACs and QDMA, since the Ethernet's maximum packet size is 1514Bytes, while the best packet size of QDMA is 4k, which makes the performance of QDMA maximization,so there always need some intermediate treatment between the CMACs and QDMA.
Thanks again.
from corundum.
TDMA = time division multiple access. The TDMA design variants include a scheduler control module that can enable/disable queues based on PTP time. These modules need a bit of work, so the supported queue count is currently a bit limited.
The interface contains the queues, and each interface is registered separately with the OS ("eth0", etc.). The port contains the datapath components for a single ethernet MAC. Each port has its own transmit scheduler. This split means multiple ports can be attached to one interface, sharing the same queues. Since each port has its own transmit scheduler, flows can be migrated from port to port or even striped across ports under hardware control, transparent to the operating system. The VCU1525 really only supports 3 possible configurations in 100G mode as it only has two QSFP28 interfaces: 1 interface, 1 port per interface (only one QSFP port active); 2 interfaces, 1 port per interface; 1 interface, 2 ports per interface.
Not sure what you mean by packet mapping or packet size adjusting.
from corundum.
By packet mapping, for example, a DMA process transmitted 4k bytes of data, but for the Ethernet frames, the maximum frame length is 1514 bytes, so the data burst from DMA need to be split into about 3 or 4 Ethernet frames.
On the others side, for a continuous Ethernet frames came from the CMAC, if the frame length of each is too short, for example only 64 bytes, so will the DMA transmit these short frames one by one with each as a full DMA process, or makes it as a large size of frame, for example 4k bytes, to implemented once DMA process, because of the extra overhead of DMA, what as i understand is that the bandwidth of DMA will be benefited from larger packet size.
from corundum.
Currently, corundum doesn't do anything like that. Mainly because that means it won't be compatible with the Linux networking stack, where each packet has to go in its own SKB. So combining multiple packets back-to-back means having the host CPU move things around, which is no longer zero copy. Now, if you're talking about segmentation offloading...I might implement that at some point, but it's not a particularly high priority. I do plan on implementing scatter/gather at some point so that corundum can take advantage of software GSO, but it will be some time before I get around to doing that.
Also, the "ideal" size of DMA operations is almost never 4KB. On all the systems I have worked with, the largest PCIe max payload setting was 256 bytes. So any write larger than 256 bytes is going to get split up into multiple 256 byte TLPs anyway. Also, the Ultrascale PCIe hard IP core only supports payloads of up to 1024 bytes. In terms of reads, most systems seem to set the max read request size to 512 bytes. Reads larger than 512 bytes will be segmented across multiple read requests. The flip side is control traffic - descriptors and completions - which can cause problems with small packet sizes. There are certainly design changes that can be made to increase throughput, but these types of optimizations usually come with downsides as well. So far, I have tried to start with the simplest possible implementation, see how that works, and then iterate from there. Considering it runs at up to 79 Gbps TX and 93 Gbps RX right now, that method seems to have been relatively successful.
At any rate, there is definitely room for optimizing the DMA performance on corundum. It's still very much in development, and will probably see a number of breaking changes over the next year or so.
from corundum.
But for the Pcie Gen3x16, the theoretical bandwidth is 128Gbps, for actual, the bandwidth may not even be 100Gbps, in vcu1525, there are two ETH ports, both is 100G, so the bandwidth of PCIE may not even meet one of the ETH port, if the two ETH ports both works at its full speed, how can the pcie meet the flows of the two ETH ports.
from corundum.
It doesn't. You can't saturate both 100G ports at the same time via PCIe. In fact, it's impossible to saturate even one port with packets smaller than about 1500 bytes. However, this doesn't make it useless, having a second port can be useful for network topology reasons, failover, etc. Or you can do packet processing/routing/etc. on the FPGA before sending data to the host, in which case having less bandwidth is acceptable. Running at 10G or 25G provides even more flexibility by providing more ports.
Corundum was originally intended to support optical networking research, and for the projects we're working on, we need multiple uplinks per host and the ability to control packet transmission on a per-destination basis with microsecond precision. Hence we need a lot of queues that are shared across multiple ports, and the ability to enable/disable queues on microsecond timescales. We also need to use links that can be reconfigured at microsecond timescales - something that 40G and 100G links cannot accomplish due to FEC block lock and interlane deskew. So for our projects we're mainly interested in running the NICs at 10G or 25G. However, 100G is a very nice selling point for the overall project, and it provides a straightforward means of characterizing and debugging the core NIC datapath. And there are likely lots of applications out there that could take advantage of a fully open source, FPGA-based NIC that can source and sink a significant portion of 100 Gbps.
from corundum.
Thanks a lot, one more question.
when i rebuilt the project on Alveo U50 board, with only one interface and per port implemented, since there is only one qsfp28 port on board, everything is ok except the constraint file "axis_async_fifo.tcl", vivado inserts a warning and error messages listed below:
Warning : [Vivado 12-1008] No clocks found for command 'get_clocks -of_objects [get_pins {core_inst/iface[1].mac[0].mac_rx_fifo_inst/wr_ptr_reg_reg[0]/C}]'. ["axis_async_fifo.tcl":23]
Errors : [Common 17-55] 'get_property' expects at least one object. ["axis_async_fifo.tcl":23]
What i had changed is the commment of module "cmac_usplus_1", and also the paramater IF_COUNT had been modified to 1, so there should be no path "iface[1]" now, but what caused this error about path "iface[1]".
Thanks.
from corundum.
I had fixed this issue now, and had successfully migrated to U50 board.
I'll run some tests later.
Thanks very much for your reply.
from corundum.
What did the issue end up being? As you said, you should never have an iface[1] if IF_COUNT is set to 1.
from corundum.
It has something to do with the "generate" statements in interface module, by comment the part of qsfp1, i had overcame this issue.
from corundum.
Also a question with cmac_pad module, since the cmac core is responsibile for inserting FCS, so i think the zeros pad length of cmac_pad should be 60bytes, but in the module ,it seemed to pad to 64bytes, is this correct?
from corundum.
That's a very good point. I will test that change and see if the CMAC module is OK with 60 byte frames (it should be, as that is the spec).
from corundum.
Related Issues (20)
- Why is distributed RAM used to store packages on the chip, and why is it divided into segment?
- Missing prerequisite in makefile HOT 1
- Document needs to be updated
- Custom App Ports HOT 2
- Cocotb with icarus verilog gets frozen stuck at some point during simulation: How to debug? HOT 2
- port to zc706 HOT 1
- insmod mqnic.ko error HOT 2
- Does corundum support two devices in a server? HOT 3
- Accelerating PPPOE
- petalinux compile error for mqnic module HOT 8
- nic
- Error "Device needs to be reset" when insmod mqnic.ko HOT 1
- AU50 working with DAC but not Optical HOT 3
- issues in doc HOT 1
- Porting PCIe example to zu7cg HOT 7
- mqnic is not compilent from modern kernel HOT 1
- Weird bugs meet with fb2CG and Vivado 2023.2 HOT 3
- Inquiry on Round-Trip Latency
- problem while building the driver HOT 7
- Troubleshooting Interface Configuration Issue with mqnic Driver HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corundum.