mtcp-stack / mtcp Goto Github PK
View Code? Open in Web Editor NEWmTCP: A Highly Scalable User-level TCP Stack for Multicore Systems
License: Other
mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems
License: Other
The MAX_CPUS macros limits the maximum CPUs to 16, however my server has 32 cores.
Can I remove it?
Hi,
I noticed when using mTCP as tcp client, from wireshark, the 3WHS final ack always advertise receive window size as 114 bytes and in each ACK of tcp server data, the receive window size is advertised as 64 bytes, this leads to tcp server sending small packet like 64 bytes data to mTCP client. I looked through the mTCP code, I see nowhere hard coded 64 bytes receive window size, could you give me a direction where I can change mTCP for larger receive window size?
i increased mtcp configuration file rcvbuf value that seems have some effect on the receive window size, but still not big engough, say set rcvbuf to 32768, the ACK receive window size increased to 256 bytes, I would like to increase the receive window size up to at least 1 full tcp segment size like 1460 bytes for testing
Thanks
I checked in timer.c and there are some points:
ret = HandleRTO(mtcp, cur_ts, walk);
TAILQ_REMOVE(rto_list, walk, sndvar->timer_link);
mtcp->rto_list_cnt--;
walk->on_rto_idx = -1;
if (cur_stream->on_rto_idx < 0 ) {
It will be false because it already in some rto_list, so that the line
mtcp->rto_list_cnt++;
will not be met.
Seeing all the above lines of code, I think those will make the stream I need to continue checking for timeout and retransmission out of the timer list.
Besides, because rto_list_cnt decreases here and when ACKed, it might lead to situation when rto_list_cnt == 0 and then mtcp->rto_store->rto_now_idx will be reset to 0,
which will ignore all the stream at high offset when traverse the list
Hi,
I was browsing the source code, and I don't see any SYN Flood protection code. Maybe I missed it.
What are some security measures implemented in mtcp?
Thanks,
Lawrence
Hi guys, I'm facing problems in receiving packets out of order when rcv_wnd is small
Here the case:
Assuming we have next Seq is rcv_nxt = X and window now is rcv_wnd = 3500
There are 4 pkts coming with Seq: X, X + 500, X + 1500, X + 2500 with respective length: 500, 1000, 1000, 1000.
Let's begin: First come is pkt (X + 500), len = 1000 so rcv_wnd = 2500
Next is pkt (X + 1500), len = 1000 => rcv_wnd = 1500
Now to pkt (X + 2500), len = 1000 => problem happened!
In tcp_in.c, function ValidateSequence, there is a condition:
if (!TCP_SEQ_BETWEEN(seq + payloadlen, cur_stream->rcv_nxt,
cur_stream->rcv_nxt + cur_stream->rcvvar->rcv_wnd - 1))
According to it, (X + 2500 + 1000) is out of range (X, X + 1500 -1) => FALSE and resend ACK
The problem happened because the variable rcv_wnd is not represent the buf range that payload will fit into so it make the SEQ validation wrong
I'm thinking of adding a variable to struct tcp_stream to deal with this scenario but I still hope you can fix it soon
Best Regards,
Quy
[root@FreeBSD-9 /github/mtcp/mtcp/src]# make
"Makefile", line 33: Missing dependency operator
"Makefile", line 35: Need an operator
"Makefile", line 37: Need an operator
"Makefile", line 39: Missing dependency operator
"Makefile", line 41: Need an operator
"Makefile", line 43: Need an operator
"Makefile", line 53: Missing dependency operator
"Makefile", line 56: Need an operator
"Makefile", line 58: Need an operator
"Makefile", line 60: Missing dependency operator
"Makefile", line 61: Need an operator
"Makefile", line 63: Need an operator
"Makefile", line 87: Missing dependency operator
In the process of debugging, I discovered a bug in MTCP where the receiving side of a stream may "stall" if the rcvbuffer is configured to be larger than 65KB.
In tcp_out.c, there's a line that updates the rvc-window:
tcph->window = htons(MIN((uint16_t)window32, TCP_MAX_WINDOW));
It casts the window32 to a uint16 before comparing to the maximum value, thus if window32 is e.g. 128K, it'll be cast down to a uint16 of value 0. The cast needs to be moved outside the MIN comparison:
tcph->window = htons((uint16_t)MIN(window32, TCP_MAX_WINDOW));
Cheers,
Eric
I am using Intel 82571EB Gigabit Ethernet NIC on dell poweredge R220, here is the full output when running the example app epwget, it appears mTCP version of dpdk fail to configure the device. i have no issue run upstream dpdk version alone with the NIC. am I doing something wrong here with mTCP dpdk configuration?
here is epwget.conf:
io = dpdk
num_mem_ch = 4
port = dpdk0
rcvbuf = 8192
sndbuf = 8192
max_concurrency = 10000
max_num_buffers = 10000
tcp_timeout = 30
tcp_timewait = 0
stat_print = dpdk0
Application configuration:
URL: /
Loading mtcp configuration from : epwget.conf
Loading interface setting
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 1 on socket 0
EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 2 lcore(s)
EAL: Auto-detected process type: PRIMARY
EAL: VFIO modules not all loaded, skip VFIO support...
EAL: Setting up memory...
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90d0600000 (size = 0x200000)
EAL: Ask a virtual area of 0x2e00000 bytes
EAL: Virtual area found at 0x7f90cd600000 (size = 0x2e00000)
EAL: Ask a virtual area of 0x5200000 bytes
EAL: Virtual area found at 0x7f90c8200000 (size = 0x5200000)
EAL: Ask a virtual area of 0x5800000 bytes
EAL: Virtual area found at 0x7f90c2800000 (size = 0x5800000)
EAL: Ask a virtual area of 0x3e00000 bytes
EAL: Virtual area found at 0x7f90be800000 (size = 0x3e00000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90be400000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90be000000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90bdc00000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90bd800000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90bd400000 (size = 0x200000)
EAL: Ask a virtual area of 0x4000000 bytes
EAL: Virtual area found at 0x7f90b9200000 (size = 0x4000000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90b8e00000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90b8a00000 (size = 0x200000)
EAL: Ask a virtual area of 0x600000 bytes
EAL: Virtual area found at 0x7f90b8200000 (size = 0x600000)
EAL: Ask a virtual area of 0x2a00000 bytes
EAL: Virtual area found at 0x7f90b5600000 (size = 0x2a00000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7f90b5000000 (size = 0x400000)
EAL: Ask a virtual area of 0x5800000 bytes
EAL: Virtual area found at 0x7f90af600000 (size = 0x5800000)
EAL: Ask a virtual area of 0x1800000 bytes
EAL: Virtual area found at 0x7f90adc00000 (size = 0x1800000)
EAL: Ask a virtual area of 0x600000 bytes
EAL: Virtual area found at 0x7f90ad400000 (size = 0x600000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90ad000000 (size = 0x200000)
EAL: Ask a virtual area of 0x400000 bytes
EAL: Virtual area found at 0x7f90aca00000 (size = 0x400000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90ac600000 (size = 0x200000)
EAL: Ask a virtual area of 0x200000 bytes
EAL: Virtual area found at 0x7f90ac200000 (size = 0x200000)
EAL: Ask a virtual area of 0x2c00000 bytes
EAL: Virtual area found at 0x7f90a9400000 (size = 0x2c00000)
EAL: Requesting 291 pages of size 2MB from socket 0
EAL: TSC frequency is ~2693781 KHz
EAL: Master lcore 0 is ready (tid=d19a6900;cpuset=[0])
EAL: lcore 1 is ready (tid=a8bfe700;cpuset=[1])
EAL: PCI device 0000:01:00.0 on NUMA socket -1
EAL: probe driver: 8086:105e rte_em_pmd
EAL: PCI memory mapped at 0x7f90d0800000
EAL: PCI memory mapped at 0x7f90d0820000
PMD: eth_em_dev_init(): port_id 0 vendorID=0x8086 deviceID=0x105e
EAL: PCI device 0000:01:00.1 on NUMA socket -1
EAL: probe driver: 8086:105e rte_em_pmd
EAL: Not managed by a supported kernel driver, skipped
Total number of attached devices: 1
Interface name: dpdk0
Configurations:
Number of CPU cores available: 2
Number of CPU cores to use: 2
Maximum number of concurrency per core: 10000
Maximum number of preallocated buffers per core: 10000
Receive buffer size: 8192
Send buffer size: 8192
TCP timeout seconds: 30
TCP timewait seconds: 0
Interfaces:
name: dpdk0, ifindex: 0, hwaddr: 00:26:55:E2:9D:C0, ipaddr: 10.9.3.9, netmask: 255.255.255.0
Loading routing configurations from : config/route.conf
Routes:
Destination: 10.9.3.0/24, Mask: 255.255.255.0, Masked: 10.9.3.0, Route: ifdx-0
Loading ARP table from : config/arp.conf
ARP Table:
Initializing port 0... EAL: Error - exiting with code: 1
Cause: Cannot configure device: err=-22, port=0
The MAX_CPUS macros limits the maximum CPUs to 16, however my server has 32 cores.
Can I remove it?
Hi,
I am getting No route to epserver in epwget client. I am using the below route.conf and arp.conf. Please do let know of I have done anything wrong
ROUTES 1
10.10.10.222/32 port0
ARP_ENTRY 1
10.10.10.222/32 00:0c:29:74:12:9a
[mtcp_create_context:1352] CPU 0 is in charge of printing stats.
[GetOutputInterface: 28] [WARNING] No route to 10.10.10.222
CPU 1: initialization finished.
[GetOutputInterface: 28] [WARNING] No route to 10.10.10.222
Thread 1 handles 5000 flows. connecting to 10.10.10.222:80
[GetOutputInterface: 28] [WARNING] No route to 10.10.10.222
Thanks,
Mohan
Hi,
It seems autogen.sh is required for building mtcp.
Is it your intention?
Best regards,
Wiriyang
Hi Team,
I can understand that the mtcp is mainly to support AF_INET sockets and to accelerate the networking speed across systems.
But why can't I create AF_LOCAL sockets with mtcp stack? Is there any specific reason the AF_LOCAL socket creation blocked by mtcp?
Also the mtcp_epoll_wait() and mtcp_epoll_ctl() API's have any restrictions for other socket domains like AF_LOCAL?
The reason for my question is,
Any ideas/suggestion on this would be great !
I am attempting to create a port the OpenMPI TCP btl module to use mtcp instead. I have had and solved various other issues with this port. But this problem seems to be a deficiency in mtcp's implementation of the sockets api:
Having attempted to connect to another machine, ompi attempts to use getsockopt to query the socket for its connection status asynchronously:
if(mtcp_getsockopt(btl_endpoint->mctx, btl_endpoint->endpoint_sd, SOL_SOCKET, SO_ERROR, (char *)&so_error, &so_length) < 0) {
BTL_ERROR(("mtcp_getsockopt() to %s failed: %s (%d)",
opal_net_get_hostname((struct sockaddr*) &endpoint_addr),
strerror(opal_socket_errno), opal_socket_errno));
mca_btl_tcp_endpoint_close(btl_endpoint);
return;
}
if(so_error == EINPROGRESS || so_error == EWOULDBLOCK) {
return;
}
if(so_error != 0) {
BTL_ERROR(("mtcp_connect() to %s failed: %s (%d)",
opal_net_get_hostname((struct sockaddr*) &endpoint_addr),
strerror(so_error), so_error));
mca_btl_tcp_endpoint_close(btl_endpoint);
return;
}
It fails however with so_error=38
, strerror()
giving the explanation, 'Function not implemented'.
Believing this may have been a simple fix, I attempted to fix it myself (as I had verified with wireshark under certain conditions that the connection was being established):
inline int
GetSocketError(socket_map_t socket, void *optval, socklen_t *optlen)
{
tcp_stream *cur_stream;
if (!socket->stream) {
errno = EBADF;
return -1;
}
cur_stream = socket->stream;
if (cur_stream->state == TCP_ST_CLOSED) {
if (cur_stream->close_reason == TCP_TIMEDOUT ||
cur_stream->close_reason == TCP_CONN_FAIL ||
cur_stream->close_reason == TCP_CONN_LOST) {
*(int *)optval = ETIMEDOUT;
*optlen = sizeof(int);
return 0;
}
}
if (cur_stream->state == TCP_ST_CLOSE_WAIT ||
cur_stream->state == TCP_ST_CLOSED) {
if (cur_stream->close_reason == TCP_RESET) {
*(int *)optval = ECONNRESET;
*optlen = sizeof(int);
return 0;
}
}
// if(cur_stream->state == TCP_ST_SYN_SENT) {
// *(int *)optval = EINPROGRESS;
// *optlen = sizeof(int);
//
// return 0;
// }
// if(cur_stream->state == TCP_ST_ESTABLISHED)
// return 0;
//
errno = ENOSYS;
return -1;
}
As the code did not work, I uncommented it.
I am not sure whether this is a trivial fix or not but any help would be appreciated.
EDIT: commented it out.
Would you like to add more error handling for return values from functions like the following?
Dear all,
Is there a way to inspect the detailed log? Since the output of the example just shows the number of errors, I do not know what is wrong.
Thanks,
Tao
Hi
the ported apachebench does not support SSL load test, I am wondering what client SSL load tool mtcp project used to evaluate SSLShader performance mentioned in the paper. I am interested to know if there is existing one or get some idea on how to port one on mtcp.
Thanks
Often the TCP connection closes (for a variety of reasons) but only one side of the connection realizes that it's closed. This might lead to very many unclosed TCP connections for mTCP over time. How does mTCP deal with these 'zombie' TCP connections? Do they all have some kind of inactivity timeout? Or are they somehow reaped when the maximum connections are reached?
Hi,
I am running ubuntu 14.04 as virtual guest on VMware ESXi, the guest is using adapter vmxnet3. I had following diff to mtcp/src/dpdk_module.c to make mTCP compile and run, but at the web server side when I run tcpdump, I see no packet coming in to server. I don't have access to VMware ESXi hypervisor, so not sure if the packet has egressed out hypervisor.
`diff --git a/mtcp/src/dpdk_module.c b/mtcp/src/dpdk_module.c
index 33d349e..666dfd3 100644
--- a/mtcp/src/dpdk_module.c
+++ b/mtcp/src/dpdk_module.c
@@ -57,8 +57,8 @@
/*
Configurable number of RX/TX ring descriptors
*/
-#define RTE_TEST_RX_DESC_DEFAULT 128
-#define RTE_TEST_TX_DESC_DEFAULT 128
+#define RTE_TEST_RX_DESC_DEFAULT 128
+#define RTE_TEST_TX_DESC_DEFAULT 512
static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
@@ -124,7 +124,7 @@ static const struct rte_eth_txconf tx_conf = {
* As the example won't handle mult-segments and offload cases,
* set the flag by default.
*/
.txq_flags = 0x0,
.txq_flags = ETH_TXQ_FLAGS_NOOFFLOADS|ETH_TXQ_FLAGS_NOMULTSEGS,
struct mbuf_table {
`
I noticed the dpdk0 interface has empty MAC address as below:
4: dpdk0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff <==========EMPTY MAC ADDRESS
inet 10.1.72.28/24 brd 10.1.72.255 scope global dpdk0
valid_lft forever preferred_lft forever
inet6 fe80::200:ff:fe00:0/64 scope link
valid_lft forever preferred_lft forever
I reviewed dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h, noticed only IXGBE and IGB adapter were supported for mTCP to retrieve and attach MAC addresses for dpdk0 in Linux world.
do you think the empty MAC address for dpdk0 is the reason I see no packet from the server side?
if so, adapter vmxnet3 can be added to igb_uio.h like IGB adapter to resolve the issue?
below is the output I think relevant:
EAL: probe driver: 15ad:7b0 rte_vmxnet3_pmd
EAL: PCI memory mapped at 0x7ff74ea00000
EAL: PCI memory mapped at 0x7ff74ea01000
EAL: PCI memory mapped at 0x7ff74ea02000
PMD: eth_vmxnet3_dev_init(): >>
PMD: eth_vmxnet3_dev_init(): Hardware version : 1
PMD: eth_vmxnet3_dev_init(): UPT hardware version : 1
PMD: eth_vmxnet3_dev_init(): MAC Address : 00:50:56:86:10:76
Total number of attached devices: 1
Interface name: dpdk0
Configurations:
Number of CPU cores available: 4
Number of CPU cores to use: 4
Maximum number of concurrency per core: 10000
Maximum number of preallocated buffers per core: 10000
Receive buffer size: 8192
Send buffer size: 8192
TCP timeout seconds: 30
TCP timewait seconds: 0
Interfaces:
name: dpdk0, ifindex: 0, hwaddr: 00:00:00:00:00:00, ipaddr: 10.1.72.28, netmask: 255.255.255.0
Loading routing configurations from : /etc/mtcp/config/route.conf
Routes:
Destination: 10.1.72.0/24, Mask: 255.255.255.0, Masked: 10.1.72.0, Route: ifdx-0
Loading ARP table from : /etc/mtcp/config/arp.conf
ARP Table:
Initializing port 0... PMD: vmxnet3_dev_configure(): >>
PMD: vmxnet3_dev_rx_queue_setup(): >>
PMD: vmxnet3_dev_rx_queue_setup(): >>
PMD: vmxnet3_dev_rx_queue_setup(): >>
PMD: vmxnet3_dev_rx_queue_setup(): >>
PMD: vmxnet3_dev_tx_queue_setup(): >>
PMD: vmxnet3_dev_tx_queue_setup(): >>
PMD: vmxnet3_dev_tx_queue_setup(): >>
PMD: vmxnet3_dev_tx_queue_setup(): >>
PMD: vmxnet3_dev_start(): >>
PMD: vmxnet3_rss_configure(): >>
PMD: vmxnet3_setup_driver_shared(): Writing MAC Address : 00:50:56:86:10:76
PMD: vmxnet3_disable_intr(): >>
PMD: vmxnet3_dev_rxtx_init(): >>
rte_eth_dev_config_restore: port 0: MAC address array not supported <=====here
done:
Checking link statusdone
Port 0 Link Up - speed 10000 Mbps - full-duplex
Configuration updated by mtcp_setconf().
CPU 0: initialization finished.
[mtcp_create_context:1173] CPU 0 is now the master thread.
[CPU 0] dpdk0 flows: 0, RX: 10(pps) (err: 0), 0.00(Gbps), TX: 0(pps), 0.00(Gbps)
[ ALL ] dpdk0 flows: 0, RX: 10(pps) (err: 0), 0.00(Gbps), TX: 0(pps), 0.00(Gbps)
Thread 0 handles 1 flows. connecting to 10.1.72.17:80
[CPU 0] dpdk0 flows: 1, RX: 25(pps) (err: 0), 0.00(Gbps), TX: 2(pps), 0.00(Gbps)
[ ALL ] dpdk0 flows: 1, RX: 25(pps) (err: 0), 0.00(Gbps), TX: 2(pps), 0.00(Gbps)
Is this tested for production usage, for high volume traffic?
If not, is there anything I can contribute to make it production ready?
Can mTCP be used to achieve C10M [1] ? Can it be used to open 10 million (or more?) keepalive HTTP connections which send very little traffic, for example, for a chat server or notification server? If so, how to calculate the RAM overhead of each connection and/or keep it to a minimum?
Hi,
I looked at mtcp/src/dpdk_module.c, port_conf, rx_conf, tx_conf are more tuned for physical Intel drivers, but the function dpdk_send_pkts and dpdk_recv_pkts calls common DPDK API rte_eth_tx_burst, rte_eth_rx_burst respectively.
I assume the common DPDK API rte_eth_tx_burst could eventually calls virtio_xmit_pkts, and rte_eth_tx_burst calls virtio_recv_pkts when librte_pmd_virtio is loaded in KVM guest.I could very likely miss something, I am guessing mTCP should work in KVM guest given some port_conf, rx_conf, tx_conf tuning for virtio in dpdk_module.c.
could you give me some guidance what code needed to be added/changed to support mTCP runs in KVM guest with virtio PMD?
here is the DPDK virtio PMD usage link http://dpdk.org/doc/guides/nics/virtio.html
Vincent
Hi mtcp developer,
I'm going to implement a server with a listener thread responsible for accepting new incoming clients. All the child sockets would be assigned to other threads so that the listener thread can keep accepting.
(I know another way to implement this is epoll, which I've tried, but I want to perform experiments on this one for some reason)
However, I found that the BLOCKING_SUPPORT flag is FALSE in mtcp.h.
Does this flag correspond to the blocking of accept, read, write? That is, The thread calling them will be blocked if there is no corresponding event in the queues. (without calling mtcp_setsock_nonblock)
Is it OK if I change it from FALSE to TRUE?
Sincerely,
Alex
Hi,
I am doing a research project. It's Nginx + mTCP + DPDK. I use the latest version of mtcp/dpdk code.
In order to run Nginx in multi-process mode, I fixed some bugs in DPDK, modified some code in mTCP and Nginx.
At present, fork() is supported. So Nginx+mTCP+DPDK can work normally. But I‘ve encountered a difficult problem, which that the performance of Nginx will not be promoted in the multi-core case.
So I wrote a test program. it used mTCP/DPDK. and run in server side, It used fork() to create some child processes. Each child process run in a separate core, received and sent date in a separate RSS tx/rx queue. It just count the number of connection requests completed per second. I wrote another test program which run in client. It just requested tcp connections, then close those connections.
I fount an interesting phenomenon.For example, If 2 child processes were created, each child process could accept about 132,000 connections per second. If 4 child processes were created,each child process could accept about 66,000 connections per second. If 8 child processes were create, each child process could accept about 33,000 connections per second.
So, It seems that no matter how many child processes created, the total number of connections per second is constant. It feels like that there is an invisible bottleneck which limits the expansion of multicore.
I did all the test in a server which
CPU is intel Xeon E5, it has 2 physical CPUs and 24 cores.
200+ GB RAM (4 memory channels)
The network Interface card is Intel dual port 82599 10 GbE NIC
OS is rhel 7.1
By the way, I didn't use the FDIR function in RSS. Does it have any effect on multi-core performance?
Can someone give me some advice ?
Hello mtcp developers,
I would like to run mtcp on multiple NICs using different core each NIC card, is this possible or not?
For example:
NIC 82:00.0 using core 0,1,2,3,4
NIC 82:00.1 using core 5,6,7,8
NIC 85:00.1 using 9,10,11,12
Thanks
I saw dpdk had a load-balance example, but it seems not all i need.
is there a way to do tcp load balance with multi processes?
Thanks.
The source does not compile in 4.3.0-5-generic. After fixing the compilation problem, the example does not link with gcc version 5.3.1 20160114.
Hi,
Thanks for putting this great module. The userspace TCP is the hot cake in the market. I have been looking for one open source for sometime. I have few questions -
Thanks..Santos
Is there any way to get mTCP working with Amazon EC2 virtual NICs?
In timer.c, function HandleRTO, there is a condition
/* update rto timestamp */
if (cur_stream->state >= TCP_ST_ESTABLISHED) {
So only stream which is ESTABLISHED or has state above will be updated its rto value.
Otherwise, stream like SYN_SENT will not and continues using old rto (1000ms) => not obey the RFC
Hi, mtcp team,
now I can initialize over 1.6 Million tcp connection with epwget, connecting epserver, each tcp connection sends a request of 200 bytes from epwget every 60 s, and epserver sends back an response packet, e.g., 200 bytes, to epwget. from the wireshark, I observe the in-host processing delay, i.e., the delay between the time the request arriving at the epserver host and the response sent out of the epserver host, can over 500ms, the same observation can be found for the epwget host, thus the response time of the request can be seconds.
so does any tuning I can apply to reduce this in-host processing delay so that the in-host processing delay can be further reduced for this particular scenario
Thanks
Ke
The function "gettimeofday" does not belong to the list of async-signal-safe functions.
I guess that a different program design will be needed for your function "HandleSignal".
I wanted to ask the following questions:
Background: We have a fastcgi-webserver which runs on apache/nginx. I have seen an apache example in the apps folder. Is it a modified version to support mtcp?
If i use this apacha, do i just need to deploy my fastcgi module (and thats it?)
We met an unsolvable problem.
At first we had to run mtcp on a Redhat 6.2 machine whose kernel version was 2.6.32-220.el6.x86_64. We compiled a new kernel version 3.2.78. Then we met the problem.
When we tried to use the former mtcp version with dpdk-2.1.0, we met these strange problems when compiling dpdk-2.1.0 in it:
CC [M] /home/gj/mtcp/dpdk-2.1.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o
In file included from /home/gj/mtcp/dpdk-2.1.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:34:
/home/gj/mtcp/dpdk-2.1.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:378: error: unknown field ‘ndo_fdb_add’ specified in initializer
In file included from /home/gj/mtcp/dpdk-2.1.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:39:
/home/gj/mtcp/dpdk-2.1.0/lib/librte_eal/linuxapp/igb_uio/compat.h:53: error: redefinition of ‘pci_intx_mask_supported’
/home/gj/mtcp/dpdk-2.1.0/lib/librte_eal/linuxapp/igb_uio/compat.h:53: note: previous definition of ‘pci_intx_mask_supported’ was here
/home/gj/mtcp/dpdk-2.1.0/lib/librte_eal/linuxapp/igb_uio/compat.h:76: error: redefinition of ‘pci_check_and_mask_intx’
/home/gj/mtcp/dpdk-2.1.0/lib/librte_eal/linuxapp/igb_uio/compat.h:76: note: previous definition of ‘pci_check_and_mask_intx’ was here
make[10]: *** [/home/gj/mtcp/dpdk-2.1.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o] Error 1
make[9]: *** [module/home/gj/mtcp/dpdk-2.1.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio] Error 2
make[8]: *** [sub-make] Error 2
make[7]: *** [igb_uio.ko] Error 2
make[6]: *** [igb_uio] Error 2
make[5]: *** [linuxapp] Error 2
make[4]: *** [librte_eal] Error 2
make[3]: *** [lib] Error 2
make[2]: *** [all] Error 2
make[1]: *** [x86_64-native-linuxapp-gcc_install] Error 2
make: *** [install] Error 2
When tried new version with dpdk-2.2.0, error gone with:
CC [M] /home/mtcp/dpdk-2.2.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o
In file included from /home/mtcp/dpdk-2.2.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:35:
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:218: error: expected declaration specifiers or ‘...’ before ‘netdev_features_t’
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h: In function ‘netdev_set_features’:
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:221: error: ‘features’ undeclared (first use in this function)
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:221: error: (Each undeclared identifier is reported only once
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:221: error: for each function it appears in.)
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h: At top level:
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:229: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘attribute’ before ‘netdev_fix_features’
cc1: warnings being treated as errors
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:283: error: initialization from incompatible pointer type
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:284: error: ‘netdev_fix_features’ undeclared here (not in a function)
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/igb_uio.h:285: error: unknown field ‘ndo_fdb_add’ specified in initializer
In file included from /home/mtcp/dpdk-2.2.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.c:42:
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/compat.h:66: error: redefinition of ‘pci_intx_mask_supported’
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/compat.h:66: note: previous definition of ‘pci_intx_mask_supported’ was here
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/compat.h:89: error: redefinition of ‘pci_check_and_mask_intx’
/home/mtcp/dpdk-2.2.0/lib/librte_eal/linuxapp/igb_uio/compat.h:89: note: previous definition of ‘pci_check_and_mask_intx’ was here
make[10]: *** [/home/mtcp/dpdk-2.2.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio/igb_uio.o] Error 1
make[9]: *** [module/home/mtcp/dpdk-2.2.0/x86_64-native-linuxapp-gcc/build/lib/librte_eal/linuxapp/igb_uio] Error 2
make[8]: *** [sub-make] Error 2
make[7]: *** [igb_uio.ko] Error 2
make[6]: *** [igb_uio] Error 2
make[5]: *** [linuxapp] Error 2
make[4]: *** [librte_eal] Error 2
make[3]: *** [lib] Error 2
make[2]: *** [all] Error 2
make[1]: *** [pre_install] Error 2
make: *** [install] Error 2
In /home/gj/mtcp/dpdk-2.1.0/lib/librte_eal/linuxapp/igb_uio/compat.h, I added
and then solved the redefinition problem. But others are still.
However, when we turned to the original dpdk version, dpdk-2.1.0-rc4, it succeeded without any error. What's wrong? I thought error goes from the dpdk in mtcp. Who can help me? Thank you!
Hi,
I noticed that the socket fd's created inside a mtcp context are bound to that specific mtcp context and cannot be used by other mtcp context. Is my understanding is right?
I am facing the following problem
Ideally there are 2 threads, one thread is accepting connection using mtcp_epoll_wait and second thread is handling the vents to the accepted connections based on events triggered in mtcp_epoll_wait.
(I hope this is a common design for handling connections asynchronously).
As 2 threads are using mtcp_epoll_wait() simultaneously, I can't use the same mtcp context for both the threads (as the mtcp manager has only one epoll event pointer for it). So what could be the suggestion now?
Will creation of new mtcp context manager helps in this scenario? If so, the accepted fd's are created in mtcp context manager [0]. Can I use the socket fd's created bu context 0 to new context 1 ?
How it is handled in mtcp ported applications ?
Hi,
I am testing mtcp with dpdk on ubuntu 14.04 with 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I believe the apache bench configure/Makefile file only support packet i/o ps, not updated to support dpdk packet io library build as seen from error below:
/bin/bash /usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/libtool --silent --mode=link gcc -g -O2 -pthread -DHAVE_CONFIG_H -DLINUX -D_REENTRANT -D_GNU_SOURCE -I./include -I/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/include/arch/unix -I./include/arch/unix -I/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/include/arch/unix -I/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/include -I/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/../../../../mtcp/include -I/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/../../../../io_engine -version-info 4:6:4 -L/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/../../../../mtcp/lib -L/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr/../../../../io_engine/lib -o libapr-1.la -rpath /usr/local/apache2/lib passwd/apr_getpass.lo strings/apr_cpystrn.lo strings/apr_fnmatch.lo strings/apr_snprintf.lo strings/apr_strings.lo strings/apr_strnatcmp.lo strings/apr_strtok.lo tables/apr_hash.lo tables/apr_tables.lo atomic/unix/builtins.lo atomic/unix/ia32.lo atomic/unix/mutex.lo atomic/unix/ppc.lo atomic/unix/s390.lo atomic/unix/solaris.lo dso/unix/dso.lo file_io/unix/buffer.lo file_io/unix/copy.lo file_io/unix/dir.lo file_io/unix/fileacc.lo file_io/unix/filedup.lo file_io/unix/filepath.lo file_io/unix/filepath_util.lo file_io/unix/filestat.lo file_io/unix/flock.lo file_io/unix/fullrw.lo file_io/unix/mktemp.lo file_io/unix/open.lo file_io/unix/pipe.lo file_io/unix/readwrite.lo file_io/unix/seek.lo file_io/unix/tempdir.lo locks/unix/global_mutex.lo locks/unix/proc_mutex.lo locks/unix/thread_cond.lo locks/unix/thread_mutex.lo locks/unix/thread_rwlock.lo memory/unix/apr_pools.lo misc/unix/charset.lo misc/unix/env.lo misc/unix/errorcodes.lo misc/unix/getopt.lo misc/unix/otherchild.lo misc/unix/rand.lo misc/unix/start.lo misc/unix/version.lo mmap/unix/common.lo mmap/unix/mmap.lo network_io/unix/inet_ntop.lo network_io/unix/inet_pton.lo network_io/unix/multicast.lo network_io/unix/sendrecv.lo network_io/unix/sockaddr.lo network_io/unix/socket_util.lo network_io/unix/sockets.lo network_io/unix/sockopt.lo poll/unix/epoll.lo poll/unix/kqueue.lo poll/unix/poll.lo poll/unix/pollcb.lo poll/unix/pollset.lo poll/unix/port.lo poll/unix/select.lo random/unix/apr_random.lo random/unix/sha2.lo random/unix/sha2_glue.lo shmem/unix/shm.lo support/unix/waitio.lo threadproc/unix/proc.lo threadproc/unix/procsup.lo threadproc/unix/signals.lo threadproc/unix/thread.lo threadproc/unix/threadpriv.lo time/unix/time.lo time/unix/timestr.lo user/unix/groupinfo.lo user/unix/userinfo.lo -lrt -lcrypt -lpthread -ldl -lps -lmtcp -lnuma
/usr/bin/ld: cannot find -lps
collect2: error: ld returned 1 exit status
make[3]: *** [libapr-1.la] Error 1
make[3]: Leaving directory /usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory
/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib/apr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/mtcp/apps/apache_benchmark_deprecated/srclib'
make: *** [all-recursive] Error 1
I would like to add dpdk build for apache bench, which file in apache bench I should modify to add dpdk build support?
there are files under apps/apache_benchmark_deprecated:
configure.in
Makefile.in
configure
I thought I should change configure.in, if so should I run buildconf to re-create the configure script?
I looked mtcp/configure and mtcp/configure.ac for reference, still not sure what detail configuration I should add dpdk for apache bench, any pointer would be helpful
Hi,
I have a need to make epwget to create tcp connections to multiple destination ip addresses to simulate request to a wildcard network listener like 0.0.0.0:80.
reading the code, the "host" is parsed from command input argv[1] and passed on to "daddr" in main().
then in RunWgetMain, in the while (!done[core]) loop, "daddr" is passed on to CreateConnection.
can I increment destination ip address 'daddr' in while loop below before CreateConnection(ctx) to achieve that?
572 while (ctx->pending < concurrency && ctx->started < ctx->target) { 573 if (CreateConnection(ctx) < 0) { 574 done[core] = TRUE; 575 break; 576 } 577 }
is there anything I might be missing? I noticed the mtcp_init_rss has "daddr" as input already to CreateAddressPoolPerCore before the while (!done[core]) loop,
I also read there is "primary" and "secondary" processes in DPDK that user can start two identical processes, say start two epwget processes, but that seems not supported in mTCP.
I would appreciate any input, thanks!
../../mtcp/lib/libmtcp.a(cpu.o): In function mtcp_core_affinitize': cpu.c:(.text+0xa1): undefined reference to
numa_max_node'
cpu.c:(.text+0xe6): undefined reference to numa_bitmask_alloc' cpu.c:(.text+0x13c): undefined reference to
numa_bitmask_setbit'
cpu.c:(.text+0x146): undefined reference to numa_set_membind' cpu.c:(.text+0x150): undefined reference to
numa_bitmask_free'
I would like to point out that identifiers like "__MTCP_API_H_
" and "__TCP_STREAM_H_
" do eventually not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?
Hi, mTCP team,
Can we set other tcp options like tcp_nodealy in mTCP?
Best,
Tao
Hi, mtcp team,
sorry to bother you guys
I tried to connect to epserver with over 1M TCP concurrent connections using epwget, first, I change the rcv_buf and snd_buf to 1024 (because using default 8192, the application is killed), and found warnings like "[WARINING] Available # addresses (8063) is smaller than the max concurrency (375000).", thus "mtcp_connect: Resource temporarily unavailable",
Does it mean that if we enlarge the addresses pool, this problem is solved, in README, it said "epwget can use a range of IP addresses for larger concurrent connections that cannot be in an IP. you can set it in epwget.c:33".
Thanks
Hi mTCP team,
I am trying to integrate KNI (Kernel NIC interface) support into mTCP code.
Wanted to know whether KNI interface is already supported in newer version ?
Why KNI is not considered before, even for UDP packets? Is it a good idea to add KNI support ?
As I am trying to add KNI support, kindly let me know the places of doing code changes inside epserver application.
Thanks,
Arun
Would you like to add the configuration script "AX_PTHREAD" to your build specification?
Hi,
I am trying to compile mTCP with the most recent upstream DPDK git release dpdk v2.2.0-rc4, I patched up lib/librte_eal/linuxapp/igb_uio/igb_uio.c diff from mTCP DPDK and also copied igb_uio/igb_uio.h, v2.20-rc4 dpdk compiles file with the mTCP DPDK changes, I can also see dpdk0 interface created after binding the interface, but I failed to compile mTCP code with the v2.2.0-rc4, typical error below:
In file included from /home/admin/mtcp/dpdk/include/rte_ether.h:50:0,
from /home/admin/mtcp/dpdk/include/rte_ethdev.h:185,
from io_module.c:17:
/home/admin/mtcp/dpdk/include/rte_memcpy.h: In function ‘rte_memcpy’:
/home/admin/mtcp/dpdk/include/rte_memcpy.h:870:2: warning: implicit declaration of function ‘_mm_alignr_epi8’ [-Wimplicit-function-declaration]
MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
^
/home/admin/mtcp/dpdk/include/rte_memcpy.h:870:2: error: incompatible type for argument 2 of ‘_mm_storeu_si128’
In file included from /usr/lib/gcc/x86_64-linux-gnu/4.8/include/xmmintrin.h:1246:0,
from /usr/lib/gcc/x86_64-linux-gnu/4.8/include/x86intrin.h:34,
from /home/admin/mtcp/dpdk/include/rte_vect.h:67,
from /home/admin/mtcp/dpdk/include/rte_memcpy.h:46,
from /home/admin/mtcp/dpdk/include/rte_ether.h:50,
from /home/admin/mtcp/dpdk/include/rte_ethdev.h:185,
from io_module.c:17:
I imagine there could be some other diffs I am missing from mTCP DPDK.
could you release the complete diff patches between mTCP DPDK and upstream DPDK ? I would like to experiment some new DPDK features with mTCP.
Thanks!
Vincent
Hi Team,
During mtcp modified DPDK installation, after attaching the device to igb_uio driver there is a new kernel logical interface dpdk0 created.
So why this dpdk0 interface is created? And where it is created?
Can't we receive all the incoming packets from NIC port via dpdk_recv_pkts() call in mtcp ?
I am trying to add KNI support into mtcp-DPDK supported application. But mtcp creates dpdk0 interface and one more KNI interface vEth0_0 created.
I want to avoid creation of 2 interfaces. After receiving the packets in rte_eth_rx_burst() call, I want to check the type of packet (say TCP or UDP) and then pass them accordingly to user process (to mtcp library) or to send it to KNI interface (say UDP packets which are not supported by mtcp).
Any inputs/suggestions are welcome...
Thanks,
Arun
Hi, mtcp team,
sorry for the interruption
after installing your mtcp modified DPDK and using igb_uio driver in our device, there is no new logical interface dpdk0 created, so I cannot set a IP mask for it in the next step using your set_iface_single_process.sh, Am I missing some step? my machine is running centos 6.5 with Linux kernel 3.10.25, and NIC using dpdk driver is Intel 82580 Gigabit NIC
Thanks
Hi mTCP team,
When you have time, could you please help patching up mTCP igb_uio/igb_uio.c to support vmxnet3 so mTCP dpdk0 interface could have the correct MAC address instead of 00:00:00:00:00:00, this would allow mTCP running in VMware EXSi VM environment, i am able to get it working by hard coding MAC address in mTCP, let me know if you are not clear with my problem.
I have attempted to patch up igb_uio/igb_uio.c to support vmxnet3, but I am not sure which vmxnet3 header file to include like the igb_uio.h you guys did, and the configure/Makefile..., if you can offer some idea, I can help writting the patch.
Thanks!
Is there any plan to support UDP? If no, can you tell me the start point to add UDP.
Why not support netmap [1] ? This would allow mTCP to work with a larger number of commodity so-called 'dedicated' rental servers which use a single NIC. Why? Because netmap allows certain NIC packets (e.g. ssh) to be filtered towards to the regular kernel network stack, while allowing user land to snaffle the rest.
sudo ./setup_iface_single_process.sh 3
dpdk0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:10.0.0.3 Bcast:10.0.0.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
dpdk1 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet addr:10.0.1.3 Bcast:10.0.1.255 Mask:255.255.255.0
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
So why the hwaddr is setting to 0; and after few seconds, ifconfig
dpdk0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
dpdk1 Link encap:Ethernet HWaddr 00:00:00:00:00:00
inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
The ip was disappear.
Hi,
I have to say mTCP is so far the best user space TCP stack I used on top of DPDK that can generate millions of stateful TCP connections on cheap hardware, I am wondering what is hold mTCP back to spread the word in DPDK community. Thank you for the great work!
Now here is an issue I noticed, and it is only my guess, so I am seeking idea here:
Here is what i am doing two test:
test 1:
a: set server syncookie threshhold to 16 to trigger server start syncookie mode earlier than normal
b: start tcpdump on server to capture packet from epwget
c: use epwget to generate tcp connection
the epwget will trigger server in syncookie mode and then I see tons of packet retransmission from both mTCP and server and the whole tcpdump packet size is around ~91MB
test 2:
a: set server syncookie threshhold to 1638400000, basically disables server syncookie
b: start tcpdump on server to capture packet from epwget
c: use epwget to generate tcp connection
the server is not in syncookie mode, there is no any tcp packet retransmission between mTCP and server. the whole tcpdump packet size is only around ~1.28MB.
so my question is in test1, why there is so many packet retransmission between mTCP and server when server in syncookie mode? does mTCP has interoperability issue with server in syncookie mode, is this related to mTCP batch mode TCP packet processing?
Let me know if you need to see the sample tcpdump capture and I appreciate any input from you.
Regards,
Vincent
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.