Giter Site home page Giter Site logo

fpgasystems / fpga-network-stack Goto Github PK

View Code? Open in Web Editor NEW
705.0 83.0 257.0 18.91 MB

Scalable Network Stack for FPGAs (TCP/IP, RoCEv2)

License: BSD 3-Clause "New" or "Revised" License

C++ 43.63% Tcl 4.96% Python 2.52% VHDL 11.20% Verilog 16.06% PowerShell 0.09% CMake 2.46% SystemVerilog 19.09%
tcp fpga network roce 100gbit

fpga-network-stack's Introduction

Scalable Network Stack supporting TCP/IP, RoCEv2, UDP/IP at 10-100Gbit/s

Prerequisites

  • Xilinx Vivado 2022.2
  • cmake 3.0 or higher

Supported boards (out of the box)

  • Xilinx VC709
  • Xilinx VCU118
  • Alpha Data ADM-PCIE-7V3

Git submodules

This repository uses git submodules, so do one of the following:

# When cloning:
git clone --recurse-submodules [email protected]/this/repo.git
# Later, if you forgot or when submodules have been updated:
git submodule update --init --recursive

Compiling HLS modules

  1. Create a build directory
mkdir build
cd build
  1. Configure build
cmake .. -DFNS_PLATFORM=xilinx_u55c_gen3x16_xdma_3_202210_1 -DFNS_DATA_WIDTH=64

All cmake options:

Name Values Desription
FNS_PLATFORM xilinx_u55c_gen3x16_xdma_3_202210_1 Target platform to build
FNS_DATA_WIDTH <8,16,32,64> Data width of the network stack in bytes
FNS_ROCE_STACK_MAX_QPS 500 Maximum number of queue pairs the RoCE stack can support
FNS_TCP_STACK_MSS #value Maximum segment size of the TCP/IP stack
FNS_TCP_STACK_FAST_RETRANSMIT_EN <0,1> Enabling TCP fast retransmit
FNS_TCP_STACK_NODELAY_EN <0,1> Toggles Nagle's Algorithm on/off
FNS_TCP_STACK_MAX_SESSIONS #value Maximum number of sessions the TCP/IP stack can support
FNS_TCP_STACK_RX_DDR_BYPASS_EN <0,1> Enabling DDR bypass on the RX path
FNS_TCP_STACK_WINDOW_SCALING_EN <0,1> Enalbing TCP Window scaling option
  1. Build HLS IP cores and install them into IP repository
make ip

For an example project including the TCP/IP stack or the RoCEv2 stack with DMA to host memory checkout our Distributed Accelerator OS DavOS.

Working with individual HLS modules

  1. Setup build directory, e.g. for the TCP module
cd hls/toe
mkdir build
cd build
cmake .. -DFNS_PLATFORM=xilinx_u55c_gen3x16_xdma_3_202210_1 -DFNS_DATA_WIDTH=64
  1. Run
make csim # C-Simulation (csim_design)
make synth # Synthesis (csynth_design)
make cosim # Co-Simulation (cosim_design)
make ip # Export IP (export_design)

Interfaces

All interfaces are using the AXI4-Stream protocol. For AXI4-Streams carrying network/data packets, we use the following definition in HLS:

template <int D>
struct net_axis {
	ap_uint<D>    data;
	ap_uint<D/8>  keep;
	ap_uint<1>    last;
};

TCP/IP

Open Connection

To open a connection the destination IP address and TCP port have to provided through the s_axis_open_conn_req interface. The TCP stack provides an answer to this request through the m_axis_open_conn_rsp interface which provides the sessionID and a boolean indicating if the connection was openend successfully.

Interface definition in HLS:

struct ipTuple {
	ap_uint<32>	ip_address;
	ap_uint<16>	ip_port;
};
struct openStatus {
	ap_uint<16>	sessionID;
	bool		success;
};

void toe(...
	hls::stream<ipTuple>& openConnReq,
	hls::stream<openStatus>& openConnRsp,
	...);

Close Connection

To close a connection the sessionID has to be provided to the s_axis_close_conn_req interface. The TCP/IP stack does not provide a notification upon completion of this request, however it is guranteeed that the connection is closed eventually.

Interface definition in HLS:

hls::stream<ap_uint<16> >& closeConnReq,

Open a TCP port to listen on

To open a port to listen on (e.g. as a server), the port number has to be provided to s_axis_listen_port_req. The port number has to be in range of active ports: 0 - 32767. The TCP stack will respond through the m_axis_listen_port_rsp interface indicating if the port was set to the listen state succesfully.

Interface definition in HLS:

hls::stream<ap_uint<16> >& listenPortReq,
hls::stream<bool>& listenPortRsp,

Receiving notifications from the TCP stack

The application using the TCP stack can receive notifications through the m_axis_notification interface. The notifications either indicate that new data is available or that a connection was closed.

Interface definition in HLS:

struct appNotification {
	ap_uint<16>			sessionID;
	ap_uint<16>			length;
	ap_uint<32>			ipAddress;
	ap_uint<16>			dstPort;
	bool				closed;
};

hls::stream<appNotification>& notification,

Receiving data

If data is available on a TCP/IP session, i.e. a notification was received. Then this data can be requested through the s_axis_rx_data_req interface. The data as well as the sessionID are then received through the m_axis_rx_data_rsp_metadata and m_axis_rx_data_rsp interface.

Interface definition in HLS:

struct appReadRequest {
	ap_uint<16> sessionID;
	ap_uint<16> length;
};

hls::stream<appReadRequest>& rxDataReq,
hls::stream<ap_uint<16> >& rxDataRspMeta,
hls::stream<net_axis<WIDTH> >& rxDataRsp,

Waveform of receiving a (data) notification, requesting data, and receiving the data:

signal tcp-rx-handshake

Transmitting data

When an application wants to transmit data on a TCP connection, it first has to check if enough buffer space is available. This check/request is done through the s_axis_tx_data_req_metadata interface. If the response through the m_axis_tx_data_rsp interface from the TCP stack is positive. The application can send the data through the s_axis_tx_data_req interface. If the response from the TCP stack is negative the application can retry by sending another request on the s_axis_tx_data_req_metadata interface.

Interface definition in HLS:

struct appTxMeta {
	ap_uint<16> sessionID;
	ap_uint<16> length;
};
struct appTxRsp {
	ap_uint<16> sessionID;
	ap_uint<16> length;
	ap_uint<30> remaining_space;
	ap_uint<2>  error;
};

hls::stream<appTxMeta>& txDataReqMeta,
hls::stream<appTxRsp>& txDataRsp,
hls::stream<net_axis<WIDTH> >& txDataReq,

Waveform of requesting a data transmit and transmitting the data. signal tcp-tx-handshake

RoCE (RDMA over Converged Ethernet)

The new RDMA-version (02/2024) is adapted from the one used in Coyote (https://github.com/fpgasystems/Coyote) and fully compatible to the RoCE-v2 standard, thus able to communicate to standard NICs (such as i.e. Mellanox-cards). It is proven to run at 100 Gbit / s, allowing for low latency and high throughput comparable to the results achievable with mentioned ASIC-based NICs.

The whole included design is defined in a Block Diagram as follows:

The packet processing pipeline is coded in Vitis-HLS and included in "roce_v2_ip", consisting of separate modules for the IPv4-, UDP- and InfiniBand-Headers. In the top-level-module "roce_stack.sv", this pipeline is then combined with HDL-coded ICRC-calculation and RDMA-flow control.

For actual usage of the RDMA-stack, it needs to be integrated into a full FPGA-networking stack and combined with some kind of shell that enables DMA-exchange with the host for both commands and memory access. An example for that is Coyote with a networking stack as depicted in the following block diagram:

The RDMA-stack presented in this repository is the blue roce_stack. Surrounding modules would need to be provided by users to integrate the RDMA-capability in their projects. To be able to integrate the RDMA-stack into a shell-design, one must be aware of the essential interfaces. These are the following:

Network Data Path

The two ports s_axis_rx and m_axis_tx are 512-bit AXI4-Stream interfaces and used to transfer network traffic from the shell to the RDMA-stack. With the Ethernet-Header already processed in earlier parts of the networking environment, the RDMA-core expects a leading IP-Header, followed by a UDP- and InfiniBand-Header, payload and a final ICRC-checksum.

Meta Interfaces for Connection Setup

RDMA operates on so-called Queue Pairs at remote communication nodes. The initial connection between Queues has to be established out-of-band (i.e. via TCP/IP) by the hosts. To exchanged meta-information then needs to be communicated to the RDMA-stack via the two meta-interfaces s_axis_qp_interface and s_axis_qp_conn_interface. The interface definition in HLS looks like this:

typedef enum {RESET, INIT, READY_RECV, READY_SEND, SQ_ERROR, ERROR} qpState;

struct qpContext {
	qpState		newState;
	ap_uint<24> qp_num;
	ap_uint<24> remote_psn;
	ap_uint<24> local_psn;
	ap_uint<16> r_key;
	ap_uint<48> virtual_address;
};
struct ifConnReq {
	ap_uint<16> qpn;
	ap_uint<24> remote_qpn;
	ap_uint<128> remote_ip_address;
	ap_uint<16> remote_udp_port;
};

hls::stream<qpContext>&	s_axis_qp_interface,
hls::stream<ifConnReq>&	s_axis_qp_conn_interface,

Issue RDMA commands

The actual RDMA-operations are handled between the shell and the RDMA-core through the interfaces s_rdma_sq for initiated RDMA-operations and m_rdma_ack to signal automatically generated ACKs from the stack to the shell.

Definition of s_rdma_sq:

  • 20 Bit rsrvd
  • 64 Bit message_size
  • 64 Bit local vaddr
  • 64 Bit remote vaddr
  • 4 Bit offs
  • 24 Bit ssn
  • 4 Bit cmplt
  • 4 Bit last
  • 4 Bit mode
  • 4 Bit host
  • 12 Bit qpn
  • 8 Bit opcode (i.e. RDMA_WRITE, RDMA_READ, RDMA_SEND etc.)

Definition of m_rdma_ack:

  • 24 Bit ssn
  • 4 Bit vfid - Coyote-specific
  • 8 Bit pid - Coyote-specific
  • 4 Bit cmplt
  • 4 Bit rd

Memory Interface

The RDMA stack as published here and originally developed for use with the Coyote-shell is designed to use the QDMA IP-core. Therefore, the memory-control interfaces m_rdma_rd_req and m_rdma_wr_req are designed to hold all information required for communication with those cores. The two data interfaces for transportation of memory content m_axis_rdma_wr and s_axis_rdma_rd are 512-bit AXI4-Stream interfaces.

Definition of m_rdma_rd_req / m_rdma_wr_req:

  • 4 Bit vfid
  • 48 Bit vaddr
  • 4 Bit sync
  • 4 Bit stream
  • 8 Bit pid
  • 28 Bit len
  • 4 Bit host
  • 12 Bit dest
  • 4 Bit ctl

Example of RDMA WRITE-Flow

The following flow chart shows an exemplaric RDMA WRITE-exchange between a remote node with an ASIC-based NIC and a local node with a FPGA-NIC implementing the RDMA-stack. It depicts the FPGA-internal communication between RDMA-stack and Shell as well as the network data-exchange between the two nodes:

Publications

  • D. Sidler, G. Alonso, M. Blott, K. Karras et al., Scalable 10Gbps TCP/IP Stack Architecture for Reconfigurable Hardware, in FCCM’15, Paper, Slides

  • D. Sidler, Z. Istvan, G. Alonso, Low-Latency TCP/IP Stack for Data Center Applications, in FPL'16, Paper

  • D. Sidler, Z. Wang, M. Chiosa, A. Kulkarni, G. Alonso, StRoM: smart remote memory, in EuroSys'20, Paper

Citations

If you use the TCP/IP or RDMA stacks in your project please cite one of the following papers and/or link to the github project:

@inproceedings{DBLP:conf/fccm/SidlerABKVC15,
  author       = {David Sidler and
                  Gustavo Alonso and
                  Michaela Blott and
                  Kimon Karras and
                  Kees A. Vissers and
                  Raymond Carley},
  title        = {Scalable 10Gbps {TCP/IP} Stack Architecture for Reconfigurable Hardware},
  booktitle    = {23rd {IEEE} Annual International Symposium on Field-Programmable Custom
                  Computing Machines, {FCCM} 2015, Vancouver, BC, Canada, May 2-6, 2015},
  pages        = {36--43},
  publisher    = {{IEEE} Computer Society},
  year         = {2015},
  doi          = {10.1109/FCCM.2015.12}

@inproceedings{DBLP:conf/fpl/SidlerIA16,
  author       = {David Sidler and
                  Zsolt Istv{\'{a}}n and
                  Gustavo Alonso},
  title        = {Low-latency {TCP/IP} stack for data center applications},
  booktitle    = {26th International Conference on Field Programmable Logic and Applications,
                  {FPL} 2016, Lausanne, Switzerland, August 29 - September 2, 2016},
  pages        = {1--4},
  publisher    = {{IEEE}},
  year         = {2016},
  doi          = {10.1109/FPL.2016.7577319}
}

@inproceedings{DBLP:conf/eurosys/SidlerWCKA20,
  author       = {David Sidler and
                  Zeke Wang and
                  Monica Chiosa and
                  Amit Kulkarni and
                  Gustavo Alonso},
  title        = {StRoM: smart remote memory},
  booktitle    = {EuroSys '20: Fifteenth EuroSys Conference 2020, Heraklion, Greece,
                  April 27-30, 2020},
  pages        = {29:1--29:16},
  publisher    = {{ACM}},
  year         = {2020},
  doi          = {10.1145/3342195.3387519}
}

@PHDTHESIS{sidler2019innetworkdataprocessing,
	author = {Sidler, David},
	publisher = {ETH Zurich},
	year = {2019-09},
	copyright = {In Copyright - Non-Commercial Use Permitted},
	title = {In-Network Data Processing using FPGAs},
}

@INPROCEEDINGS{sidler2020strom,
	author = {Sidler, David and Wang, Zeke and Chiosa, Monica and Kulkarni, Amit and Alonso, Gustavo},
	booktitle = {Proceedings of the Fifteenth European Conference on Computer Systems},
	title = {StRoM: Smart Remote Memory},
	doi = {10.1145/3342195.3387519},
}

Contributors

fpga-network-stack's People

Contributors

chipet avatar d-kor avatar dsidler avatar fabiomaschi avatar gustavsvj avatar lisal2023 avatar wangzeke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fpga-network-stack's Issues

CMake Error: create_project.tcl.in does not exist

I was trying to do the make and it complained about the create_project.tcl:
CMake Error: File /home/fpga-network-stack/projects/create_project.tcl.in does not exist.
CMake Error at CMakeLists.txt:97
Line 97 of the CMakeLists.txt is looking for projects, I think it should be scritps .

Thanks.

Can you provide some test cases?

Toe's tests(toe_tb) and implementations are mismatched and do not seem to be implemented completely, especially in the case of mode1 and mode 2.
When testing the RX side, the test in the testVector folder, part of the checksum is incorrect.
In addition, the test case of the TX side and the test case of the mode2 are not given. Can you upload this part of the use case?
The test data and golden used for Toe_script_probably_obsolete.py are also not been uploaded.
Can you provide these test cases ?
Thank you.

HLS synthesis error on module (toe)

Hi, Sidler~

I see the module "hls/toe" cannot be synthesized by HLS, and the build progress errors out with messages like this:~

ERROR: [HLS 200-70] Compilation errors found: In file included from ../../../hls/toe/ack_delay/ack_delay.cpp:1:
In file included from ../../../hls/toe/ack_delay/ack_delay.cpp:30:
In file included from ../../../hls/toe/ack_delay/ack_delay.hpp:29:
In file included from ../../../hls/toe/ack_delay/../toe_internals.hpp:4:
In file included from ../../../hls/toe/ack_delay/../../axi_utils.hpp:32:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\iostream:39:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\ostream:39:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\ios:39:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\exception:151:
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\exception_ptr.h:132:13: error: unknown type name 'type_info'
      const type_info*
            ^
In file included from ../../../hls/toe/ack_delay/ack_delay.cpp:1:
In file included from ../../../hls/toe/ack_delay/ack_delay.cpp:30:
In file included from ../../../hls/toe/ack_delay/ack_delay.hpp:29:
In file included from ../../../hls/toe/ack_delay/../toe_internals.hpp:4:
In file included from ../../../hls/toe/ack_delay/../../axi_utils.hpp:32:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\iostream:39:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\ostream:39:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\ios:39:
In file included from D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\exception:151:
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\nested_exception.h:62:5: error: the parameter for this explicitly-defaulted copy constructor is const, but a member or base requires it to be non-const
    nested_exception(const nested_exception&) = default;
    ^
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\nested_exception.h:64:23: error: the parameter for this explicitly-defaulted copy assignment operator is const, but a member or base requires it to be non-const
    nested_exception& operator=(const nested_exception&) = default;
                      ^
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\nested_exception.h:77:28: error: exception specification in declaration does not match previous declaration
  inline nested_exception::~nested_exception() = default;
                           ^
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\nested_exception.h:66:20: note: previous declaration is here
    inline virtual ~nested_exception();
                   ^
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\nested_exception.h:122:61: error: redefinition of default argument
    __throw_with_nested(_Ex&& __ex, const nested_exception* = 0)
                                                            ^ ~
D:/Xilinx/Vivado/2019.2/win64/tools/clang/bin\..\lib\clang\3.1/../../../include/c++/4.5.2\nested_exception.h:110:56: note: previous definition is here
    __throw_with_nested(_Ex&&, const nested_exception* = 0)
                                                       ^ ~
5 errors generated.

I tried to open the cmake-generated project with HLS GUI, and this error is confirmed.
2

Windows 7.1 + Vivado 2019.2 + MinGW 16.1 + CMake 3.15

Please help to double check with this.

Best Regards :- )
[email protected]

Can it work with Vcu 707

Hi guys, your project is great.
I know this question is rhetorical but I’m trying to get started with a VCU 707 board and was wondering if your code is extendable to the Xilinx’ VCU 707 board?

Thanks

How to access RoCE QP states

Hi @dsidler ,

I'm using the rocev2 IP and I'd like to test its bandwidth or latency. However, I don't see how to access the QP states so as to determine when RDMA requests are finished.

Is it possible to access those QP states from outside the rocev2 IP?
How did you determine when RDMA requests are finushed when you did the benchmarks in the StRoM paper?

Best,
Terry

Report Timing summary showing -ve slacks for tcp toe.

I have added timing constraint as following:
dclock=8ns
refclock=6.4ns.

Then i run synthesis and Report Timing Summary. I have observed following :
Setup:

Worst Negative Slack: -5.6ns

Hold:

Worst Hold Slack=-0.349ns

Pulse Width:

Worst Pulse Width Slack= -0.02ns

My understanding is that all above values should be positive for meeting timing constraint before running on hardware.
I do not have hardware VCU118 now but i will get in future. I want to make sure that design work on real hardware without any surprises.

Please comments/suggest.

Thanks and regards,
Ishtiyaque
[email protected]

Request for a up-to-date example design

Hi,

I would like to use the TCP/IP stack for VCU1525 and need an example design for a test. However, the top-level module (tcp_ip_top) has not been updated with the implementation of the network stack module.

Is it possible to have an updated and simple example design with basic applications like an echo server?

Best

TCP windows size exeded then stall

Hi,

I adapted your code to run on the KC705 at 10Gbps.
I tested ping, echo server, PC to FPGA custom data transfer and it work fine so far.

I have however an issue when the FPGA send the data to the PC.

I wrote a server (FPGA) that wait for a connection from a PC application and then send test data . (thing iperf3 with option -R i.e. server send) . I make sure I use the txMeta interface to reserve space before seeding a packet, but otherwise I send as fast as I can.

This work for a couple packet then the FPGA stall and stop to send packet and on wireshark I can see a lot on TCP WINDOWS UPDATE and it seen the PC cannot keep with the FPGA.

On chipscope, I stall on the SEND_PACKET state and more specifically on a txData.write() blocking statement.

My question is the stack support changing windows size ?

fpga network stack code synthesize but not compile.

Hi,
We were able to synthesize the fpga network stack code. But when we tried to do the CSimulation, we found compile issues with toe_tb.cpp, toe.cpp and toe.hpp. We have concern that if this code is recently tested and have complete implementation. Has anyone tested generated bitstream on vc709 successfully ?

thanks and regards,
Ishtiyaque

Build error - ERROR: [HLS 207-3776] use of undeclared identifier 'FNS_ROCE_STACK_MAX_QPS'

Trying to build from source
Ubuntu 22.04
Vivado 2022.2

Steps to reproduce

  1. mkdir build && cd build
  2. cmake .. -DFNS_PLATFORM=xilinx_u50_gen3x16_xdma_5_202210_1 -DFNS_DATA_WIDTH=64 -DFNS_ROCE_STACK_MAX_QPS=500
  3. make ip

Getting error:

ERROR: [HLS 207-3776] use of undeclared identifier 'FNS_ROCE_STACK_MAX_QPS' (/home/test/source/learn/xilinx/tcp-ip/fpga-network-stack/hls/ethernet_frame_padding/../fns_config.hpp:5:26)
INFO: [HLS 200-111] Finished Command csynth_design CPU user time: 1.73 seconds. CPU system time: 0.16 seconds. Elapsed time: 0.95 seconds; current allocated memory: 0.000 MB.
 
    while executing
"source /home/test/source/learn/xilinx/tcp-ip/fpga-network-stack/build/hls/ethernet_frame_padding/ethernet_frame_padding_synthesis.tcl"
    ("uplevel" body line 1)
    invoked from within
"uplevel \#0 [list source $arg] "

Same error if I do not specify -DFNS_ROCE_STACK_MAX_QPS=500
How to fix this error?

RoCEv2 ICRC issue

Your fields mask for the computation of the ICRC is wrong.
I believe that the ICRC masked fields these are related to the RoCEv1 specification [1]. In fact here we can find the masked field for the GRH (traffic class, flow label and hop limit).
RoCEv2 ICRC specification can be found here [2].

The computation itself cannot run at 100 Gbps, in fact the unrolled loop cannot be pipelined efficiently, implementing the RoCEv2 IP with the ICRC enabled will cause a WNS of ~60 ns (10 times the clock period).

[1] InfiniBand Architecture Specification Volume 1 Release 1.4 Pg. 1913 (RoCEv1), Pg. 218 (Infiniband ICRC)
[2] InfiniBand Architecture Specification Volume 1 Release 1.4 Pg. 1935-1936

The board replied with an incorrect packet when establishing the TCP connection

Hello,
I tried to use Limago's subsystem(https://github.com/hpcn-uam/Limago) and replace the Toe in it with the current version in the library. But when I try to connect to the board using Linux, the connection is not successfully established.
I am using the board 192.168.1.5 as a server and Linux 192.168.1.1 as a client . When I try to establish a connection, the board replies with SYN instead of SYN-ACK, does anyone have any idea what the problem might be?
屏幕截图 2024-05-23 200344
屏幕截图 2024-05-23 200454

There may be a bug in the txEngMemAccessBreakdown() module?

Bug situation: In my test, the APP established multiple connections through the TCP interface. When retransmission and address loopback occur, the retransmission data is abnormal.
Expected behavior: When retransmission and address wrapping occur at the same time, TCP should issue two read commands to DDR. The start address of the second read command should be {SESSION_ID,{WINDOW_BITS{1’b0}}.
Actual behavior: According to line 1489 of the txEngMemAccessBreakdown() module in hls/toe/tx_engine/tx_engine.cpp, the start address for the second read command is incorrectly set to 0.
outputMemAccess.write(mmCmd(0, cmd.bbt - lengthFirstAccess));
Possible modifications:

ap_uint<32> pkgAddr;
pkgAddr(31, 30) = cmd.saddr(31, 30);
pkgAddr(29, WINDOW_BITS) = cmd.saddr(29, WINDOW_BITS);
pkgAddr(WINDOW_BITS-1, 0) = 0;
outputMemAccess.write(mmCmd(pkgAddr, cmd.bbt - lengthFirstAccess));

Reason for AXI4-Stream register slices

Hi,

I am currently reading the code of your network stack and want to port it to a Zynq 7100 board. Even though I have huge troubles building a complete project for one of the example boards, I think I got how most of this stuff works and I'm trying to build a Vivado block diagram to get some simple ARP, and ICMP going for now to iterate on with UDP and later TCP.

However, I don't know why there are AXI4-Stream register slices between a bunch of the cores. I don't see a clock domain crossing and as far as I can see, the slices don't contain any FIFOs, right? Could you please elaborate why the slices are necessary?

Thanks,

  • Andy

TCP Out Of Order Segment Processing

Hi David,

How is out of order segments processed in current version toe? It seems like the out of order segments are just dropped. Could you help to confirm it?

Thanks.

Zhe

Trying out TCP/IP stack

Hello, I want to try out this project for my research project.

I want to run some TCP/IP stack benchmark on my vcu118 board, but it seems like there are multiple repos that contain TCP/IP stacks in https://github.com/fpgasystems/(Coyote, Vitis_with_100Gbps_TCP-IP, this repo, and etc), and I am quite confused which one is usable and which one is not.

Can you recommend the right repository to start with, if I want to run perf_tcp or repoduce numbers from Vitis_100Gbps_TCP_benchmark?

Not able to ping VCU118 board.

Hi,
I have followed process for generating bit stream for VCU118. I got evaulation license for ethernet_10g_ip from xilinx. As per my network configuration i haved changed mac adress, ip address and default gateway in tcp_ip_top.v/network_module.v
After all above changes i am not able to ping the board. I have put ILA and try to see if some activity is showing on parallel RX/TX interface of mac. If i start udp iperf client i can see packet coming on parallel TX interface of MAC but it is not coming out of board . I have run tcpdump on iperf server but no activity. The most basic ping is also not coming on parallel RX interface. Can anyone suggest way to debug further ?

regards,
Ishtiyaque Shaikh

vcu118: ethernet_10g_ip is giving fault.

We have loaded bit stream for TOE project on VCU118. ethernet_10g_ip is transmitting packet which can capture through the tcpdump on network. However when we tried to ping we do not see any activity on rx side of the ethernet_10g_ip. I have captured stat_rx_local_fault and stat_rx_internal_local_fault of ethernet_10g_ip using ILA. These stats are coming continuously high.

I will appreciate if we can get pointer to debug this issue further.

thanks and regards,
Ishtiyaque Shaikh

Linux TCP stack not reacting to SYN packet sent from the FPGA board

Hi all,

Has anyone tried to build a connection between the board and the Linux kernel TCP stack?

I'm using the board as TCP client, and the Linux as TCP server. I found the Linux is not reacting to the first SYN packet from the board. The board and the server are connected directly with no switch in between.

I suppose I could debug the TCP headers whatnot, but just want to ask whether anyone has the same experience. Thanks.

Here is the SYN packet from tcpdump. The board is 192.168.1.8. The server is 192.168.1.71.

11:37:33.578934 IP (tos 0x0, ttl 64, id 0, offset 0, flags [none], proto TCP (6), length 48)
    192.168.1.8.32768 > 192.168.1.71.60000: Flags [S], cksum 0x32ff (correct), seq 1445134768, win 65535, options [mss 4096,wscale 2,eol], length 0
        0x0000:  0b0a 0908 0706 0605 0403 0201 0800 4500  ..............E.
        0x0010:  0030 0000 0000 4006 f728 c0a8 0108 c0a8  .0....@..(......
        0x0020:  0147 8000 ea60 5623 01b0 0000 0000 7002  .G...`V#......p.
        0x0030:  ffff 32ff 0000 0204 1000 0303 0200       ..2...........

Compilation fails at stage(make installip)

The "make installip" failed saying there's no valid part for the project.
CentOS 7 + Vivado 2019.2

Here's my procedure:
1. mkdir build (successful)
2. cd build (successful)
3. cmake .. -DDEVICE_NAME=vcu118 (successful)
4. make installip (failed)

And it errors out with message:
source /program/Xilinx/Vivado/2019.2/scripts/vivado_hls/hls.tcl -notrace
INFO: [HLS 200-10] Running '/program/Xilinx/Vivado/2019.2/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'xlab' on host 'localhost' (Linux_x86_64 version 3.10.0-1062.4.3.el7.x86_64) on Tue Nov 26 03:56:57 CST 2019
INFO: [HLS 200-10] On os "CentOS Linux release 7.7.1908 (Core)"
INFO: [HLS 200-10] In directory '/home/xlab/Documents/toe/build/hls/ethernet_frame_padding'
Sourcing Tcl script 'make.tcl'
INFO: [HLS 200-10] Opening project '/home/xlab/Documents/toe/build/hls/ethernet_frame_padding/ethernet_frame_padding_prj'.
INFO: [HLS 200-10] Opening solution '/home/xlab/Documents/toe/build/hls/ethernet_frame_padding/ethernet_frame_padding_prj/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 3.2ns.
INFO: [HLS 200-10] Setting target device to 'xcvu9p-flga2104-2L-e'
INFO: [HLS 200-10] Adding design file '/home/xlab/Documents/toe/hls/ethernet_frame_padding/ethernet_frame_padding.cpp' to the project
INFO: [IMPL 213-8] Exporting RTL as a Vivado IP.

****** Vivado v2019.2 (64-bit)
**** SW Build 2700185 on Thu Oct 24 18:45:48 MDT 2019
**** IP Build 2699827 on Thu Oct 24 21:16:38 MDT 2019
** Copyright 1986-2019 Xilinx, Inc. All Rights Reserved.

source run_ippack.tcl -notrace
ERROR: [Project 1-848] Could not get a valid part for the project. Make sure you have the required part installed, use the get_parts command to see the list of valid parts.
INFO: [Common 17-206] Exiting Vivado at Tue Nov 26 03:57:09 2019...
ERROR: [IMPL 213-28] Failed to generate IP.

How to generate bitstream (wiki is outdated)

I've been trying to generate the bitstream targeting VCU118 for benchmark purposes.

To generate the bitstream, the wiki says to run Vivado using create_vcu118_proj.tcl, which is now located in scripts instead of projects.

Even after manually modifying the script to satisfy the current repository (redirecting rtl to hdl, upgrading IP versions, etc.) the build outputs error at the following step:

# update_compile_order -fileset sources_1
# create_ip -name ip_handler -vendor ethz.systems -library hls -version 1.2 -module_name ip_handler_ip -dir $ip_dir/vu9p
ERROR: [Coretcl 2-1134] No IP matching VLNV 'ethz.systems:hls:ip_handler:1.2' was found. Please check your repository configuration.

It would be most helpful if there was an up-to-date method detailing how to generate the bitstream.

Upgrade to 100G TCP/IP

If possible, how about trying to upgrade the project to support 100G Ethernet MAC?
I can ask my friend to make two PCB with XCZU19EG (built-in 100G Ethernet MAC hard IP). and let's make it fun.
[email protected]
:- )

Starting guide out of date

Hi,
The scripts folder have been deleted and the starting guide suggests to run the make_tcp_ip script. But this file does not exists anymore since the June commit. I would like to create the VC709 project for example and there is some problems in those scripts due to the change in the repository and scripts name.

Is it possible to up to date the starting guide plz ?

How to Install the HLS IP core to the IP repository

wrong # args: should be "file mkdir name ?name ...?"
while executing
"file mkdir "
invoked from within
"if {$command == "synthesis"} {
csynth_design
} elseif {$command == "csim"} {
csim_design -clean -argv {0 /home/sky/work/fpga_net_stack/fpga-netw..."
(file "make.tcl" line 39)
invoked from within
"source make.tcl"
("uplevel" body line 1)
invoked from within
"uplevel #0 [list source $arg] "

INFO: [Common 17-206] Exiting vivado_hls at Fri Mar 29 11:28:31 2024...
make[3]: *** [CMakeFiles/installip.toe.dir/build.make:76: CMakeFiles/installip.toe] Error 1
make[2]: *** [CMakeFiles/Makefile2:110: CMakeFiles/installip.toe.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:90: CMakeFiles/installip.dir/rule] Error 2
make: *** [Makefile:118: installip] Error 2

test ping failed

Hi Dsidler,
I use the code you have provided to create a project for a test on board vcu118 . After download the bit , the Ehernet was linked up ,but when I use the ping test :ping 10.1.212.209,it was failed. So I try to use the chipscope to debug,and I found that the network_module have received the ARP request message ,and transferred to the network_stack module, it seems that the arp_server_subnet_ip module dropped the ARP message and it haven't reply a ARP response message.
Do you have any suggestions? thank you !

synthesis error : "tcp_ip_top.v" instantiates a mismathed version of "network_stack.v"?

Not sure if some directories or files are updated to different versions causing this error.
But we found in the *project.tcl , it sets the src_dir to "rtl", but there is no "rtl", only "hdl" is available.
After fix the src_dir and some minor error (like no NUM_TCP_CHANNELS defined in network_stack.v)
we started to do synthesis, but if failed with errors like below, we checked the network_stack.v and found the
tcp_ip_top.v instantiates a "network_stack" with different in/out ports.

[Synth 8-2916] unconnected interface port 's_axil' ["/mnt/projects/fpga-network-stack-master/hdl/ultraplus/vcu118/tcp_ip_top.v":369]
[Synth 8-6156] failed synthesizing module 'network_stack' ["/mnt/projects/fpga-network-stack-master/hdl/common/network_stack.sv":36]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.