Giter Site home page Giter Site logo

corundum / corundum Goto Github PK

View Code? Open in Web Editor NEW
1.5K 86.0 381.0 19.92 MB

Open source FPGA-based NIC and platform for in-network compute

Home Page: https://corundum.io/

License: Other

Verilog 45.91% Tcl 19.71% Python 22.20% Makefile 8.31% Shell 0.15% C 3.71% sed 0.01% BitBake 0.01%
fpga nic in-network-compute linux networking

corundum's Introduction

Corundum Readme

Build Status

GitHub repository: https://github.com/corundum/corundum

Documentation: https://docs.corundum.io/

GitHub wiki: https://github.com/corundum/corundum/wiki

Google group: https://groups.google.com/d/forum/corundum-nic

Zulip: https://corundum.zulipchat.com/

Introduction

Corundum is an open-source, high-performance FPGA-based NIC and platform for in-network compute. Features include a high performance datapath, 10G/25G/100G Ethernet, PCI express gen 3, a custom, high performance, tightly-integrated PCIe DMA engine, many (1000+) transmit, receive, completion, and event queues, scatter/gather DMA, MSI interrupts, multiple interfaces, multiple ports per interface, per-port transmit scheduling including high precision TDMA, flow hashing, RSS, checksum offloading, and native IEEE 1588 PTP timestamping. A Linux driver is included that integrates with the Linux networking stack. Development and debugging is facilitated by an extensive simulation framework that covers the entire system from a simulation model of the driver and PCI express interface on one side to the Ethernet interfaces on the other side.

Corundum has several unique architectural features. First, transmit, receive, completion, and event queue states are stored efficiently in block RAM or ultra RAM, enabling support for thousands of individually-controllable queues. These queues are associated with interfaces, and each interface can have multiple ports, each with its own independent scheduler. This enables extremely fine-grained control over packet transmission. Coupled with PTP time synchronization, this enables high precision TDMA.

Corundum also provides an application section for implementing custom logic. The application section has a dedicated PCIe BAR for control and a number of interfaces that provide access to the core datapath and DMA infrastructure.

Corundum currently supports devices from both Xilinx and Intel, on boards from several different manufacturers. Designs are included for the following FPGA boards:

  • Alpha Data ADM-PCIE-9V3 (Xilinx Virtex UltraScale+ XCVU3P)
  • Dini Group DNPCIe_40G_KU_LL_2QSFP (Xilinx Kintex UltraScale XCKU040)
  • Cisco Nexus K35-S (Xilinx Kintex UltraScale XCKU035)
  • Cisco Nexus K3P-S (Xilinx Kintex UltraScale+ XCKU3P)
  • Cisco Nexus K3P-Q (Xilinx Kintex UltraScale+ XCKU3P)
  • Silicom fb2CG@KU15P (Xilinx Kintex UltraScale+ XCKU15P)
  • NetFPGA SUME (Xilinx Virtex 7 XC7V690T)
  • BittWare 250-SoC (Xilinx Zynq UltraScale+ XCZU19EG)
  • BittWare XUSP3S (Xilinx Virtex UltraScale XCVU095)
  • BittWare XUP-P3R (Xilinx Virtex UltraScale+ XCVU9P)
  • BittWare IA-420F (Intel Agilex F 014)
  • Intel Stratix 10 MX dev kit (Intel Stratix 10 MX 2100)
  • Intel Stratix 10 DX dev kit (Intel Stratix 10 DX 2800)
  • Intel Agilex F dev kit (Intel Agilex F 014)
  • Terasic DE10-Agilex (Intel Agilex F 014)
  • Xilinx Alveo U50 (Xilinx Virtex UltraScale+ XCU50)
  • Xilinx Alveo U55N/Varium C1100 (Xilinx Virtex UltraScale+ XCU55N)
  • Xilinx Alveo U200 (Xilinx Virtex UltraScale+ XCU200)
  • Xilinx Alveo U250 (Xilinx Virtex UltraScale+ XCU250)
  • Xilinx Alveo U280 (Xilinx Virtex UltraScale+ XCU280)
  • Xilinx Kria KR260 (Xilinx Zynq UltraScale+ XCK26)
  • Xilinx VCU108 (Xilinx Virtex UltraScale XCVU095)
  • Xilinx VCU118 (Xilinx Virtex UltraScale+ XCVU9P)
  • Xilinx VCU1525 (Xilinx Virtex UltraScale+ XCVU9P)
  • Xilinx ZCU102 (Xilinx Zynq UltraScale+ XCZU9EG)
  • Xilinx ZCU106 (Xilinx Zynq UltraScale+ XCZU7EV)

For operation at 10G and 25G, Corundum uses the open source 10G/25G MAC and PHY modules from the verilog-ethernet repository, no extra licenses are required. However, it is possible to use other MAC and/or PHY modules.

Operation at 100G on Xilinx UltraScale+ devices currently requires using the Xilinx CMAC core with RS-FEC enabled, which is covered by the free CMAC license.

Documentation

For detailed documentation, see https://docs.corundum.io/

Block Diagram

Corundum block diagram

Block diagram of the Corundum NIC. PCIe HIP: PCIe hard IP core; AXIL M: AXI lite master; DMA IF: DMA interface; AXI M: AXI master; PHC: PTP hardware clock; TXQ: transmit queue manager; TXCQ: transmit completion queue manager; RXQ: receive queue manager; RXCQ: receive completion queue manager; EQ: event queue manager; MAC + PHY: Ethernet media access controller (MAC) and physical interface layer (PHY).

Modules

cmac_pad module

Frame pad module for 512 bit 100G CMAC TX interface. Zero pads transmit frames to minimum 64 bytes.

cpl_op_mux module

Completion operation multiplexer module. Merges completion write operations from different sources to enable sharing a single cpl_write module instance.

cpl_queue_manager module

Completion queue manager module. Stores device to host queue state in block RAM or ultra RAM.

cpl_write module

Completion write module. Responsible for enqueuing completion and event records into the completion queue managers and writing records into host memory via DMA.

desc_fetch module

Descriptor fetch module. Responsible for dequeuing descriptors from the queue managers and reading descriptors from host memory via DMA.

desc_op_mux module

Descriptor operation multiplexer module. Merges descriptor fetch operations from different sources to enable sharing a single desc_fetch module instance.

event_mux module

Event mux module. Enables multiple event sources to feed the same event queue.

mqnic_core module

Core module. Contains the interfaces, asynchronous FIFOs, PTP subsystem, statistics collection subsystem, and application block.

mqnic_core_pcie module

Core module for a PCIe host interface. Wraps mqnic_core along with generic PCIe interface components, including DMA engine and AXI lite masters.

mqnic_core_pcie_us module

Core module for a PCIe host interface on Xilinx 7-series, UltraScale, and UltraScale+. Wraps mqnic_core_pcie along with FPGA-specific interface logic.

mqnic_interface module

Interface module. Contains the event queues, interface queues, and ports.

mqnic_port module

Port module. Contains the transmit and receive datapath components, including transmit and receive engines and checksum and hash offloading.

mqnic_ptp module

PTP subsystem. Contains one mqnic_ptp_clock instance and a parametrizable number of mqnic_ptp_perout instances.

mqnic_ptp_clock module

PTP clock module. Contains an instance of ptp_clock with a register interface.

mqnic_ptp_perout module

PTP period output module. Contains an instance of ptp_perout with a register interface.

mqnic_tx_scheduler_block_rr module

Transmit scheduler block with round-robin transmit scheduler and register interface.

mqnic_tx_scheduler_block_rr_tdma module

Transmit scheduler block with round-robin transmit scheduler, TDMA scheduler, TDMA scheduler controller, and register interface.

queue_manager module

Queue manager module. Stores host to device queue state in block RAM or ultra RAM.

rx_checksum module

Receive checksum computation module. Computes 16 bit checksum of Ethernet frame payload to aid in IP checksum offloading.

rx_engine module

Receive engine. Manages receive datapath operations including descriptor dequeue and fetch via DMA, packet reception, data writeback via DMA, and completion enqueue and writeback via DMA. Handles PTP timestamps for inclusion in completion records.

rx_hash module

Receive hash computation module. Extracts IP addresses and ports from packet headers and computes 32 bit Toeplitz flow hash.

stats_collect module

Statistics collector module. Parametrizable number of increment inputs, single AXI stream output for accumulated counts.

stats_counter module

Statistics counter module. Receives increments over AXI stream and accumulates them in block RAM, which is accessible via AXI lite.

stats_dma_if_pcie module

Collects DMA-related statistics for dma_if_pcie module, including operation latency.

stats_dma_if_latency module

DMA latency measurement module.

stats_pcie_if module

Collects TLP-level statistics for the generic PCIe interface.

stats_pcie_tlp module

Extracts TLP-level statistics for the generic PCIe interface (single channel).

tdma_ber_ch module

TDMA bit error ratio (BER) test channel module. Controls PRBS logic in Ethernet PHY and accumulates bit errors. Can be configured to bin error counts by TDMA timeslot.

tdma_ber module

TDMA bit error ratio (BER) test module. Wrapper for a tdma_scheduler and multiple instances of tdma_ber_ch.

tdma_scheduler module

TDMA scheduler module. Generates TDMA timeslot index and timing signals from PTP time.

tx_checksum module

Transmit checksum computation and insertion module. Computes 16 bit checksum of frame data with specified start offset, then inserts computed checksum at the specified position.

tx_engine module

Transmit engine. Manages transmit datapath operations including descriptor dequeue and fetch via DMA, packet data fetch via DMA, packet transmission, and completion enqueue and writeback via DMA. Handles PTP timestamps for inclusion in completion records.

tx_scheduler_ctrl_tdma module

TDMA transmit scheduler control module. Controls queues in a transmit scheduler based on PTP time, via a tdma_scheduler instance.

tx_scheduler_rr module

Round-robin transmit scheduler. Determines which queues from which to send packets.

Source Files

cmac_pad.v                         : Pad frames to 64 bytes for CMAC TX
cpl_op_mux.v                       : Completion operation mux
cpl_queue_manager.v                : Completion queue manager
cpl_write.v                        : Completion write module
desc_fetch.v                       : Descriptor fetch module
desc_op_mux.v                      : Descriptor operation mux
event_mux.v                        : Event mux
event_queue.v                      : Event queue
mqnic_core.v                       : Core logic
mqnic_core_pcie.v                  : Core logic for PCIe
mqnic_core_pcie_us.v               : Core logic for PCIe (UltraScale)
mqnic_interface.v                  : Interface
mqnic_port.v                       : Port
mqnic_ptp.v                        : PTP subsystem
mqnic_ptp_clock.v                  : PTP clock wrapper
mqnic_ptp_perout.v                 : PTP period output wrapper
mqnic_tx_scheduler_block_rr.v      : Scheduler block (round-robin)
mqnic_tx_scheduler_block_rr_tdma.v : Scheduler block (round-robin TDMA)
queue_manager.v                    : Queue manager
rx_checksum.v                      : Receive checksum offload
rx_engine.v                        : Receive engine
rx_hash.v                          : Receive hashing module
stats_collect.v                    : Statistics collector
stats_counter.v                    : Statistics counter
stats_dma_if_pcie.v                : DMA interface statistics
stats_dma_latency.v                : DMA latency measurement
stats_pcie_if.v                    : PCIe interface statistics
stats_pcie_tlp.v                   : PCIe TLP statistics
tdma_ber_ch.v                      : TDMA BER channel
tdma_ber.v                         : TDMA BER
tdma_scheduler.v                   : TDMA scheduler
tx_checksum.v                      : Transmit checksum offload
tx_engine.v                        : Transmit engine
tx_scheduler_ctrl_tdma.v           : TDMA transmit scheduler controller
tx_scheduler_rr.v                  : Round robin transmit scheduler

Testing

Running the included testbenches requires cocotb, cocotbext-axi, cocotbext-eth, cocotbext-pcie, scapy, and Icarus Verilog. The testbenches can be run with pytest directly (requires cocotb-test), pytest via tox, or via cocotb makefiles.

Publications

  • A. Forencich, A. C. Snoeren, G. Porter, G. Papen, Corundum: An Open-Source 100-Gbps NIC, in FCCM'20. (FCCM Paper, FCCM Presentation)

  • J. A. Forencich, System-Level Considerations for Optical Switching in Data Center Networks. (Thesis)

Citation

If you use Corundum in your project, please cite one of the following papers and/or link to the project on GitHub:

@inproceedings{forencich2020fccm,
    author = {Alex Forencich and Alex C. Snoeren and George Porter and George Papen},
    title = {Corundum: An Open-Source {100-Gbps} {NIC}},
    booktitle = {28th IEEE International Symposium on Field-Programmable Custom Computing Machines},
    year = {2020},
}

@phdthesis{forencich2020thesis,
    author = {John Alexander Forencich},
    title = {System-Level Considerations for Optical Switching in Data Center Networks},
    school = {UC San Diego},
    year = {2020},
    url = {https://escholarship.org/uc/item/3mc9070t},
}

Dependencies

Corundum internally uses the following libraries:

corundum's People

Contributors

alexforencich avatar andreasbraun90 avatar basseuph avatar joft-mle avatar lastweek avatar lomotos10 avatar minseongg avatar penberg avatar sessl3r avatar wnew avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

corundum's Issues

Support transceiver resets

Support resetting the MGT sites without resetting the complete design. This is necessary because MGT DFE auto-adaptation can result in an unusable receiver if no input signal is present for long enough, which is only recoverable by resetting the transceiver channel. Will require work on the TX path to handle missing PTP timestamps due to a transceiver reset. May require some work on the async FIFOs.

[General question] RDMA support

Hi,

I am just wondering whether the RDMA support in on the TODO list for Corundum. Which part can be modified to support RDMA?

PCIe CORE Parameter AXIS_DATAWIDTH = 64 breaks the design.

The folowing PCIe CORE configuration broke the design:

AXIS_DATAWIDTH = 64bit
CORE_FREQ = 250MHz

compiler complained:
../lib/pcie/rtl/dma_client_axis_sink.v:239: error: Concatenation repeat may not be zero in this context.
../lib/pcie/rtl/dma_client_axis_sink.v:240: error: Concatenation repeat may not be zero in this context.
../lib/pcie/rtl/dma_client_axis_sink.v:319: error: Concatenation repeat may not be zero in this context.

It's iface[0].interface_inst.cpl_write_inst.dma_client_axis_sink_inst had errors.

plz see the flowing code:

dma_client_axis_sink #(
.SEG_COUNT(SEG_COUNT),
.SEG_DATA_WIDTH(SEG_DATA_WIDTH),
.SEG_ADDR_WIDTH(SEG_ADDR_WIDTH),
.SEG_BE_WIDTH(SEG_BE_WIDTH),
.RAM_ADDR_WIDTH(RAM_ADDR_WIDTH),
.AXIS_DATA_WIDTH(CPL_SIZE*8), ----------------> NO good for PCIe AXIS_DATAWIDTH=64
.AXIS_KEEP_ENABLE(CPL_SIZE > 1),
.AXIS_KEEP_WIDTH(CPL_SIZE),
.AXIS_LAST_ENABLE(1),
.AXIS_ID_ENABLE(0),
.AXIS_DEST_ENABLE(0),
.AXIS_USER_ENABLE(1),
.AXIS_USER_WIDTH(1),
.LEN_WIDTH(8),
.TAG_WIDTH(CL_DESC_TABLE_SIZE)
)
dma_client_axis_sink_inst (
.clk(clk),
.rst(rst),
.....

Test Issue about Constant User Function

Hi Alex,

When I ran the test using python3 test_fpga_core.py, I met the issue as follows,

Running test...
../rtl/fpga_core.v:1075: sorry: constant user functions are not currently supported: w_32().
../lib/axi/rtl/axil_interconnect.v:134: sorry: constant user functions are not currently supported: calcBaseAddrs().
../rtl/fpga_core.v:1128: sorry: constant user functions are not currently supported: w_32().
../lib/axi/rtl/axil_interconnect.v:134: sorry: constant user functions are not currently supported: calcBaseAddrs().
../rtl/fpga_core.v:1128: error: Unable to evaluate parameter M_ADDR_WIDTH value: {}
../rtl/fpga_core.v:1075: error: Unable to evaluate parameter M_ADDR_WIDTH value: 2{}
../rtl/common/interface.v:879: sorry: constant user functions are not currently supported: w_32().
../lib/axi/rtl/axil_interconnect.v:134: sorry: constant user functions are not currently supported: calcBaseAddrs().
../rtl/common/interface.v:879: sorry: constant user functions are not currently supported: w_32().
../lib/axi/rtl/axil_interconnect.v:134: sorry: constant user functions are not currently supported: calcBaseAddrs().
../rtl/common/interface.v:879: error: Unable to evaluate parameter M_ADDR_WIDTH value: {{}, , , , , , , }
../rtl/common/interface.v:879: error: Unable to evaluate parameter M_ADDR_WIDTH value: {{}, , , , , , , }
../rtl/common/port.v:746: sorry: constant user functions are not currently supported: w_32().
../lib/axi/rtl/axil_interconnect.v:134: sorry: constant user functions are not currently supported: calcBaseAddrs().
../rtl/common/port.v:746: sorry: constant user functions are not currently supported: w_32().
../lib/axi/rtl/axil_interconnect.v:134: sorry: constant user functions are not currently supported: calcBaseAddrs().
../rtl/common/port.v:746: error: Unable to evaluate parameter M_ADDR_WIDTH value: 2{}
...

Do you have any hint that might solve this problem?
Please give me some advises.
Thank you.

How to check whether HW is working correctly?

Hi,

I am trying to run Corundum using Alveo U250. It seems that I can see two interfaces in my host after installing mqnic kernel module. But I do not know whether the HW works correctly

Here is my setting. I directly connect those two ports via a cable. Say these two interfaces are eth1 and eth2.

I tcprelay some packets on eth1/eth2, but can not see any traffic on the other side (i.e., eth2/eth1, respectively).

Since ifconfig can only see the counters at SW level (or OS level), so I am wondering whether I can get some counters inside the HW pipeline to check whether it works correctly?

Thanks in advance.

Cannot connect to switch

I successfully set up the image, driver. When I connect the two ports on my U250 card directly, the os can send and receive packets. But when we connect the U250 to our switch, the switch cannot receive the packet. I tried to turn off the auto-negotiation and it did not work either. Do you have any hint that might solve this problem?

mqnic module compile error

make -C /lib/modules/4.18.0-193.28.1.el8_2.x86_64/build M=/home/kha/fpga/corundum/modules/mqnic modules
make[1]: Entering directory '/usr/src/kernels/4.18.0-193.28.1.el8_2.x86_64'
  CC [M]  /home/kha/fpga/corundum/modules/mqnic/mqnic_tx.o
In file included from ./include/linux/kernel.h:10,
                 from /home/kha/fpga/corundum/modules/mqnic/mqnic.h:37,
                 from /home/kha/fpga/corundum/modules/mqnic/mqnic_tx.c:35:
/home/kha/fpga/corundum/modules/mqnic/mqnic_tx.c: In function ‘mqnic_start_xmit’:
/home/kha/fpga/corundum/modules/mqnic/mqnic_tx.c:526:22: error: ‘struct sk_buff’ has no member named ‘xmit_more’
     if (unlikely(!skb->xmit_more || stop_queue))
                      ^~
./include/linux/compiler.h:77:42: note: in definition of macro ‘unlikely’
 # define unlikely(x) __builtin_expect(!!(x), 0)
                                          ^
make[2]: *** [scripts/Makefile.build:313: /home/kha/fpga/corundum/modules/mqnic/mqnic_tx.o] Error 1
make[1]: *** [Makefile:1545: _module_/home/kha/fpga/corundum/modules/mqnic] Error 2
make[1]: Leaving directory '/usr/src/kernels/4.18.0-193.28.1.el8_2.x86_64'
make: *** [Makefile:18: all] Error 2

U200 cannot receive packets

I followed the step in the Readme, generated bitstream file and mqnic.ko then programed 100G design on U200, also inserted module after reboot host, configured MAC address by programming eeprom . Message from the dmesg looks fine. U200 host is running on Ubuntu18.04.

I connect U200 and CX-5 nic with 100g DAC and cable, then I just ping CX-5 nic from U200 , I can see arp packet send out in the U200 host wireshark, but received nothing. In the CX-5 nic host wireshark, it received arp packet form U200 and did a response to that arp packet, the ping test failed.

Do you have any hint that might solve this problem?
Please give me some advises.
Thank you

Make a loopback for performance test

Hi Alex,

I'm trying to do a performance test on a module that I added in corundum's data path (the AXIS channel). What I did was:

  1. I initialized my module (AXIS tx and rx) in port.v;
  2. I connected the AXIS TX/RX signals of port.v directly to my module and commented out the codes that are related to those signals;
  3. I used a network tester to generate traffic from one QSFP28 port and monitor the RX traffic from the same port.

My problem is that the tester always shows "link error" when I connect my board with the tester.

Am I doing something wrong? What do you think is the easiest way of doing such a performance test?

Thanks.

Add support for adm-pcie-8v3

https://www.alpha-data.com/dcp/products.php?product=adm-pcie-8v3

I have a couple of these FPGA cards and I would like to have support for these to test out 100gbps networks. Also will MAC splitting work with this firmware? If I were to take the QSFP28 port and split to 4 SFP28 connectors would I be able to have 4 separate 25gbps links?

Please let me know if there is anything I could assist with in adding support for these devices.

Thanks,

simulation based on python

Your code structure of simulation verification is very strict. Is there any design document or instruction document for reference

what is the version of vivado for the project of vcu1525?

Hi:
I am trying to rebuild the project of vcu1525 on windows, and i had some questiuons about it
First, what is the version of vivado, for now, i am trying to make it out on vivado 2019.1.3 on windows system.
Then, is there any performance files vs Xilinx's QDMA, and is the DMA's drivers compatible with DPDK, which is the driver modules for Xilinx's QDMA.
Thanks a lot.

question about iperf performance

Hi,

I test the performance of iperf, with 8 processes and 1.5KB MTU. However, the throughput is only 24G, much lower than 60G in the paper. Is there any optimization needed to achieve higher performance? BTW, does corundum support any hardware offloading, such as LSO and checksum, etc?

Thanks.

Support MSI-X

MSI-X supports more interrupt channels than MSI. Additionally, some motherboards limit MSI to 1 channel, but do not limit MSI-X.

PTP mapping of timestamp and eth message

Hi Alex, thanks for your nice repo! :)

I have a question regarding PTP and interaction with software driver and linux-ptp.
I understand that the PTP modules will create a timestamp for each and every eth packet and will be saved either in the RX or TX fifo. But how is the split done afterwards in the Software because i never saw an ID for the timestamps or etc? Because for Sync follow up message (master side) it's needed in the sent data. And for slave it's needed in general for all packets I saw.

And another question: Is mqnic kernel Software the only Software which is running (beside of linux-ptp) to fulfill the PTP tasks on a linux basis? And do you see a realistic chance to run the ptp stack without the help of linux as operating system? Or will this take lot's of hours (100 and more). I just want to know what would come up to me because i am sure you have more knowledge and you can judge it well.

Thanks

Is there anything I need to modify when using U250/fpga_100g?

Hello
I am trying to execute the codum with U250 now.
I'm following the AU250/fpga_100g execution method after watching the YouTube video.
(FPGA Dev Live Stream: More Corundum Porting : https://www.youtube.com/watch?v=oD6qlJXQMkg&t=2316s)

[Timing 38-282] The design failed to meet the timing requirements.

These critical error messages keep appearing.
I wonder if there's anything else I need to touch that's not in the video. (e.g. modifying the clocks in the Transceivers Wizard, ...)

I would appreciate if you reply. Thanks :)

eth_mac_10g_fifo.v minor bug

Minor typo when synchronising Ethernet MAC rx errors I believe:
corundum/fpga/lib/eth/rtl/eth_mac_10g_fifo.v
line 230
rx_sync_reg_1 <= rx_sync_reg_1 ^ {rx_error_bad_frame_int, rx_error_bad_frame_int};
should be rx_error_bad_fcs_int I think.

support SR-IOV?

Hi,

Does corundum support SR-IOV? I tried to turn on SR-IOV for corundom, but failed. How can I use SR-IOV for corundom? I found this link #7 (comment), but I did not find how to enable SR-IOV.

Thanks

Support 10G/25G switchable interfaces

Convert all 25G designs to 10G/25G switchable designs. Requires support for transceiver resets and reworking of the transceiver clocking and reset infrastructure, as well as driver support.

Zynq support

I note you list zynq support on the roadmap. Do you have any work in progress on this?

This is something which would be interesting to us, and which we may be able to contribute towards bringing to fruition. It would help to get an understanding of exactly which component(s) it would entail rewriting. Certinaly dma_if_pcie_us, but I'm wondering if there is any wider impact which is less obvious.

Interfacing CMAC to PCIe x8

Hi Alex,

I'm porting Corundum to a customized VU9P FPGA board with x8 PCIe and QSFP28 interfaces.
The board works fine with 10G/25G designs. But with 100G design there is issue sending/receiving packages.

The purpose is not to fully utilize the 100G bandwidth (limited by PCIe x8) but to save logic resources with CMAC hard IP.

Issue Description

Almost all packages are dropped (checked in ifconfig) and a few captured by tcpdump show incomplete data received.
I guess there's something wrong after reducing PCIe AXIS data width to 256. Half of the data are missing in the captured package.

Ping package sent from other side with pktsize = 256:

        0x0000:  4500 011c 86f2 0000 4001 719b c0a8 0001  [email protected].....
        0x0010:  c0a8 0002 0000 4529 11cd 0003 ddbe 1660  ......E).......`
        0x0020:  0000 0000 23e8 0900 0000 0000 1011 1213  ....#...........
        0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
        0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
        0x0050:  3435 3637 3839 3a3b 3c3d 3e3f 4041 4243  456789:;<=>?@ABC
        0x0060:  4445 4647 4849 4a4b 4c4d 4e4f 5051 5253  DEFGHIJKLMNOPQRS
        0x0070:  5455 5657 5859 5a5b 5c5d 5e5f 6061 6263  TUVWXYZ[\]^_`abc
        0x0080:  6465 6667 6869 6a6b 6c6d 6e6f 7071 7273  defghijklmnopqrs
        0x0090:  7475 7677 7879 7a7b 7c7d 7e7f 8081 8283  tuvwxyz{|}~.....
        0x00a0:  8485 8687 8889 8a8b 8c8d 8e8f 9091 9293  ................
        0x00b0:  9495 9697 9899 9a9b 9c9d 9e9f a0a1 a2a3  ................
        0x00c0:  a4a5 a6a7 a8a9 aaab acad aeaf b0b1 b2b3  ................
        0x00d0:  b4b5 b6b7 b8b9 babb bcbd bebf c0c1 c2c3  ................
        0x00e0:  c4c5 c6c7 c8c9 cacb cccd cecf d0d1 d2d3  ................
        0x00f0:  d4d5 d6d7 d8d9 dadb dcdd dedf e0e1 e2e3  ................
        0x0100:  e4e5 e6e7 e8e9 eaeb eced eeef f0f1 f2f3  ................
        0x0110:  f4f5 f6f7 f8f9 fafb fcfd feff            ............

Received ping package:

        0x0000:  4500 011c 86f2 0000 4001 719b c0a8 0001  [email protected].....
        0x0010:  c0a8 0002 0000 4529 11cd 0003 ddbe 1660  ......E).......`
        0x0020:  0000 0000 23e8 0900 0000 0000 1011 1213  ....#...........
        0x0030:  1415 0000 0000 0000 0000 0000 0000 0000  ................
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0070:  0000 5657 5859 5a5b 5c5d 5e5f 6061 6263  ..VWXYZ[\]^_`abc
        0x0080:  6465 6667 6869 6a6b 6c6d 6e6f 7071 7273  defghijklmnopqrs
        0x0090:  7475 7677 7879 7a7b 7c7d 7e7f 8081 8283  tuvwxyz{|}~.....
        0x00a0:  8485 8687 8889 8a8b 8c8d 8e8f 9091 9293  ................
        0x00b0:  9495 0000 0000 0000 0000 0000 0000 0000  ................
        0x00c0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x00d0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x00e0:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x00f0:  0000 d6d7 d8d9 dadb dcdd dedf e0e1 e2e3  ................
        0x0100:  e4e5 e6e7 e8e9 eaeb eced eeef f0f1 f2f3  ................
        0x0110:  f4f5 f6f7 f8f9 fafb fcfd feff            ............

Design Modifications

The design baseline is VCU118, with PCIe config referred to ExaNIC 25G.
Beside the changes below, anything else shall be take care of?
Or maybe should I ask if it's possible to interface PCIe x8 with CMAC without hacking into interface modules or DMA engine?

PCIe IP changes:

@@ -3,10 +3,11 @@ 
 set_property -dict [list \
     CONFIG.PL_LINK_CAP_MAX_LINK_SPEED {8.0_GT/s} \
-    CONFIG.PL_LINK_CAP_MAX_LINK_WIDTH {X16} \
-    CONFIG.AXISTEN_IF_EXT_512_RQ_STRADDLE {false} \
+    CONFIG.PL_LINK_CAP_MAX_LINK_WIDTH {X8} \
+    CONFIG.AXISTEN_IF_RC_STRADDLE {false} \
     CONFIG.axisten_if_enable_client_tag {true} \
-    CONFIG.axisten_if_width {512_bit} \
+    CONFIG.axisten_if_width {256_bit} \
+    CONFIG.extended_tag_field {true} \
     CONFIG.axisten_freq {250} \
     CONFIG.PF0_CLASS_CODE {020000} \
     CONFIG.PF0_DEVICE_ID {1001} \
@@ -20,7 +21,13 @@
     CONFIG.pf0_bar0_64bit {true} \
     CONFIG.pf0_bar0_prefetchable {true} \
     CONFIG.pf0_bar0_scale {Megabytes} \
-    CONFIG.pf0_bar0_size {16} \
+    CONFIG.pf0_bar0_size {64} \
     CONFIG.vendor_id {1234} \

fpga.v changes:

     /*
      * PCI express
      */
-    input  wire [15:0]  pcie_rx_p,
-    input  wire [15:0]  pcie_rx_n,
-    output wire [15:0]  pcie_tx_p,
-    output wire [15:0]  pcie_tx_n,
+    input  wire [7:0]   pcie_rx_p,
+    input  wire [7:0]   pcie_rx_n,
+    output wire [7:0]   pcie_tx_p,
+    output wire [7:0]   pcie_tx_n,
+    input  wire         pcie_refclk_0_p,
+    input  wire         pcie_refclk_0_n,
     // input  wire         pcie_refclk_1_p,
     // input  wire         pcie_refclk_1_n,
-    input  wire         pcie_refclk_2_p,
-    input  wire         pcie_refclk_2_n,
     input  wire         pcie_reset_n,

...

-parameter AXIS_PCIE_DATA_WIDTH = 512;
+// PCIe bus config refered to ExaNIC_X25
+parameter AXIS_PCIE_DATA_WIDTH = 256;
 parameter AXIS_PCIE_KEEP_WIDTH = (AXIS_PCIE_DATA_WIDTH/32);
-parameter AXIS_PCIE_RC_USER_WIDTH = 161;
-parameter AXIS_PCIE_RQ_USER_WIDTH = 137;
-parameter AXIS_PCIE_CQ_USER_WIDTH = 183;
-parameter AXIS_PCIE_CC_USER_WIDTH = 81;
+parameter AXIS_PCIE_RC_USER_WIDTH = 75;
+parameter AXIS_PCIE_RQ_USER_WIDTH = 62;
+parameter AXIS_PCIE_CQ_USER_WIDTH = 88;
+parameter AXIS_PCIE_CC_USER_WIDTH = 33;
 parameter RQ_SEQ_NUM_WIDTH = 6;
-parameter BAR0_APERTURE = 24;
+parameter BAR0_APERTURE = 26;
 
 parameter AXIS_ETH_DATA_WIDTH = 512;
 parameter AXIS_ETH_KEEP_WIDTH = AXIS_ETH_DATA_WIDTH/8;

Thanks & BR.

Improve metadata support

Use tuser signals to transfer metadata on the TX and RX paths. Permit arbitrary metadata from descriptor on TX path and stored in completion record on RX path. May need to update FIFO modules to make this efficient in terms of block RAM usage. PTP timestamps can also be transferred as metadata fields.

build VCU118 with vivado 2018.3

Hi Alex,

I followed the instructions in VCP118's 100g makefile to build Corundum on vivado 2018.3 (win platform). But while I was looking into the options of 100g mac core, I found its not possbile to change the user interface from localbus to axi stream. Does that mean Corundum on VCU118 cannot be built with vavado 2018.3? Is there any heck that Corundum provided to replace localbuswith AXIS as the interface?

Thanks,

Migration to customized FPGA board

Hi Alex, we have a customized fpga board with the same fpga chip as ZCU106. Our board contains one SFP+ and a X8 PCIE interface. The optical module and EEPROM have dedicated I2C inteface. That are the major differences compared to ZCU106. To run corundum on our board, we have modified two related verilog files: fpga.v and fpga_core.v. Results turn out that we can see the network interface eth0 when we install mqnic kernel module. However, when we connect two boards directly via cable, we failed to ping from one board to another. We also noticed that there is no light sending out from the optical interface.
Could you please give me some advices on how to debug and project migration? Thanks.

Implement port status and statistics counters

Implement port link status and packet counters and report this information via driver APIs. Currently, the driver only keeps track of statistics internally and corundum does not provide any hardware statistics counters.

Can you share your Vivado version?

Hi all,
Thanks for sharing the interesting work and it helps me a lot in building an FPGA based NIC.

When I compiled the project for vcu1525 on Vivado 2019.10, I got an error on connecting qsfp modules like this.

ERROR: [Synth 8-448] named port connection 'gt_rxp_in' does not exist for instance 'qsfp0_cmac_inst' of module 'cmac_usplus_0'

I fixed it on my own and replace gt_rxp_in with gt0_rxp_in.

It seems that different version of Vivado generates different ports name of qsfp modules.

Can you share your Vivado version and update it to readme? That would help others.

If possible, can I pull a request to update cmac_usplus modules configurations?

Thanks!
Yang

Is there any way to deliver large size packet to FPGA without DPDK?

Thank you for developing and continuing to update this great projects.
I want to deliver large size packet to FPGA without the host's segmenting to 1460 bytes.
As far as I know, this work is possible with DPDK, but I know you are still trying to make corundum support DPDK.
So I wonder if there is another way to deliver large size packet to FPGA NIC.
Thank you again for your project.

A few questions about tx_scheduler_rr

Hi, Alex.
I have a few questions when I read the tx_scheduler_rr module.

The first question is about finish_status in code line 546.
write_data_pipeline_next[0][0] = finish_status_reg || op_table_doorbell[finish_ptr_reg];
It seams that finish_status_reg will be high if the transmit is completed since tx_engine module sending back the request status and the data length which tx_engine module has sent doesn't equal to zero(s_axis_tx_req_status_len != 0). That makes sense.
However, the code line 689 to 703 makes me confused again. The logic of the module is that when tx_scheduler_rr module receives the doorbell signal from tx_queue_manager module, it will deal with the doorbell signal and send transmit request to tx_engine module. However, in this logic, it seems that write_data_pipeline_reg[PIPELINE][0] won't be low in tx complete section? If this signal won't be low, how can the transmit request section stop sending request?

The second question is about the rr_fifo module in the tx_scheduler_rr module.
It's confusing that when the module is dealing with AXIL write section, or tx complete section, the queue will still possibly be the input as the signal axis_scheduler_fifo_in_queue of the rr_fifo module.

Support DPDK

Supporting DPDK would permit the use of userspace network stacks, enabling development of features across the entire stack. Implementation of a DPDK PMD should be investigated.

Synth Error

While synthesizing the code, I get the following error:
Synth Error: Error: AXI lite address width too narrow (instance) [tx_scheduler_rr.v ": 150]
While line 149 equals:
if (AXIL_ADDR_WIDTH <QUEUE_INDEX_WIDTH+5) where AXIL_ADDR_WIDTH is 16 and QUEUE_INDEX_WIDTH is 6!
also:
part-select [15: 2] out of range of prefix 'axil_ctrl_awaddr' [mqnic_port.v ": 622]
And:
conditional expression could not be resolved to a constant [mqnic_interface.v ": 2025]

Support variable-length descriptors

Rewrite descriptor handling subsystem to support variable-length descriptors. Variable-length descriptor support opens the door for much more expressive descriptors, including both metadata support as well as inline headers and packets. Logic required to implement variable-length descriptor handling can also provide descriptor read batching, which further improves PCIe link utilization.

Optimize iperf Performance of External Loopback in Dual-port Design

Hi Alex,

In your YouTube video I saw you isolated one CX5 card with network namespace and tested the traffic to/from corundum on the same server. While I don't have 100G NIC at hand, I'm thinking if it's feasible to test the performance of a dual-port corundum design with external loopback by connecting the two port with a DAC.

image

The iperf is functional with network namespace isolation on one port but the performance is not good. With one server/client process the throughput is only ~1.3Gbps, and with 10 server/client processes in parallel the total throughput only can reach ~10Gbps.

I also tried testing traffic of two FPGA cards with corundum 100G design on the same server and got similar results.

image

The server has two 14C/28T CPUs and runs Ubuntu 18.04 LTS (5.4.0-56 Kernel), which should not be the bottleneck, or should it?

So the question is what's your test environment to get throughput higher than 90Gbps? Two servers, one with FPGA the other with CX5?
Any specific software optimization needed? e.g. manually set CPU affinity for queues and interrupts?

Thanks & BR,
Xiaohai Li

U200 packet receiving performance test

Hello, doctor
about U200 board performance test, I want to ask you some questions.

test environment:
We use Ixia device to do the packet test for U200 board. Ixia sets pseudo-random IP address and port number to ensure that packets can be evenly distributed to each queue, and the packet size is 9600 bytes
The main parameters of U200 board are as follows: Rx_ MTU=16384, rx_ fifo_ depth=131072, number_ Rx_ queue=4096, rx_ pkt_ table_ size=16

Test results:
Through PF_ Ring count instance packet receiving test, only supports 23Gbps no packet loss reception
Set the count in engine.v . The counter is set after the queue response is completed and the data is written to ram, it can reach 37Gbps and 40Gbps respectively by reading the register
In Xilinx's CMAC IP core, RX data is returned to TX for direct forwarding, and the loopback rate can reach 96Gbps

So what other configurations do I need to do to achieve the high-speed rate of 95.5Gbps mentioned in your paper

the result of 35Gbps flow
image
image

the result of 40Gbps flow
image
image

the result of 50Gbps flow
image
image

fix typo in test_fpga_core.py

In test_fpga_core.py:

orginal:
dev.user_clock_frequency = 250e6

should be:
dev.user_clk_frequency = 250e6

note: it will be ok if core_freq is fixed at 250MHz, when switching to other freq. (e.g. Gen2@125MHz), simulation fails.

VCU118 REF_CLK_FREQ is changed?

Hi

I've compiled fpga_100g on VCU118 and connected to Mellanox ConnectX4.
It works on old XCI IP generation flow, but latest TCL IP generation flow, they can't link-up.

I think it is because CMAC REF_CLK_FREQ is changed from 156.25 to 161.1328125. As far as I test, below change to cmac_usplus_0/1.tcl fix this issue.

-   CONFIG.GT_REF_CLK_FREQ {161.1328125} \
+   CONFIG.GT_REF_CLK_FREQ {156.25} \

Thanks

ExaBlaze10G requires .fw to be downloaded onto FPGA

Hi,

I used corundum to generate the bitstream (fpga.bit) for exablaze10G. But when I was going to download it onto the board, I found the instruction of Exa10G said that:

The exanic-fwupdate utility can be used to update the firmware image stored on the ExaNIC. This >utility can use the compressed .fw.gz file format that firmware images are released as, or >uncompressed .fw files.

So it looks like the exaNIC needs something like xxx.fw instead of typical xxx.bit to program the FPGA. How should I proceed with this?

Thanks,

specify xci generation output directory?

Currently, running the xci generation tcl scripts from the ip/ directory of a card puts the generated xci file in

<prjname>.srcs/sources_1/ip/<ipname>/<ipname>.xci

I would like more control of where the xci file is generated (to better support vivado nonproject flow).

I can do that by checking to see if a tcl variable is defined, and if so, specifying a directory with -dir argument on the create_ip. If the variable isn't defined, it uses the default location.

This is what I've got in mind TripRichert@210016e?branch=210016e27baa7ff324bd1c2776c6e2d0f744f6bd&diff=unified

I recognize that this doesn't help your workflow in the slightest, and it pollutes your namespace, so I fully understand if you don't want this kind of change to your repo. If you're open to the change, I'll submit a pull request. If not, I'll work around it.

Thanks for building this great project!

PTP slave?

I was interested in the PTP slave I thought corundum is providing.

The ptp_clock module's period/offset/drift adjustments seem to come from the PCIe rather than an Ethernet interface.

Do you confirm Corundum doesn't parse and process the PTP messages via UDP to update the timestamp?

Do we need a different NIC that is connected to a PTP master and then forwarding the PTP adjustment messages via PCIe?

Thanks

cheaper fpgas

I was wondering if much lower spec (and much cheaper) FGPAs would work.

The lattice ECP5 has 5gb/s serdes, so i guess that's not a good option.
How about Artix? the datasheet is a little confusing to me as someone who has never used xilinx.

" 211 Gb/s Serial Bandwidth" but only "6.6 Gb/s Transceiver Speed"

i guess 6gbs is still ok, assuming pcie is separate.

Not working on Alveo U50DD(Alveo U50-ES3)

Hi, I'm trying to use corundum on Alveo U50DD, but it seems not working.

U50DD has 2 SFP-DD instead of QSFP. This is the only difference with AU50, as I know. So, I programmed my U50DD with original corundum design for AU50 10g. After warm-reset or hot-reset and loading the mqnic module - MAC address is hardcoded -, It showed up.

I connected the first port of U50DD to other 10g NIC with DAC, assigned static IPs, run ping. Sending and receiving are failed.

Additionally, I tried examples in verilog-ethernet, verilog-pcie. And both of them are working on the same setup.

Is there anything I missed or wrong?

Support SR-IOV

SR-IOV is useful both for virtualization applications and for mixing use of the normal linux kernel driver other drivers, including a DPDK driver. Supporting SR-IOV will require sever changes to corundum, including tracking the PCIe function associated with each operation, the ability to assign queues to PCIe functions, and the ability to restrict access to various control registers based on the PCIe function.

timing issue on AU250

Hi Alex,

I'm implementing some of my own logics in port.v on an AU250 board. I managed to reduce the logic level <10. However, timing constraints still cannot be met (~0.4ns for WNS, -300ns for TNS). Furthermore, I found most critical paths are within dma_if_pcie_us_wr_inst.

Since I'm not very familiar with the DMA part in corundum. Any suggestions on solving the timing issue? Or, should I keep optimizing my code to make it less competitive to other parts of corundum?

Looking forward to hearing your thoughts on it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.