Giter Site home page Giter Site logo

microsoft / freeflow Goto Github PK

View Code? Open in Web Editor NEW
588.0 34.0 87.0 1.01 MB

High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application code/binary.

License: MIT License

Shell 0.14% Makefile 0.54% C 50.69% C++ 36.56% M4 0.33% Roff 11.75%

freeflow's Introduction

Freeflow

Freeflow is a high performance container overlay network that enables RDMA communication and accelerates TCP socket to the same as the host network.

Freeflow works on top of popular overlay network solutions including Flannel, Weave, etc. The containers have their individual virtual network interfaces and IP addresses, and do not need direct access to the hardware NIC interface. A lightweight Freeflow library inside containers intercepts RDMA and TCP socket APIs, and a Freeflow router outside containers helps accelerate those APIs.

Freeflow is developed based on Linux RDMA project (https://github.com/linux-rdma/rdma-core), and released with MIT license.

Three working modes

Freeflow works in three modes: fully-isolated RDMA (master branch), semi-isolated RDMA, and TCP (tcp branch).

Fully-isolated RDMA provides the best isolation between different containers and works the best in multi-tenant environment, e.g., cloud. While it offers typical RDMA performance (40Gbps throughput and 1 or 2 microsecond latency), this comes with some CPU overhead penalty.

The TCP mode accelerates the TCP socket performance to the same as host network. On a typical Linux server with a 40Gbps NIC, it can achieve 25Gbps throughput for a single TCP connection and less than 20 microsecond latency.

We will release semi-isolated RDMA in the future. It has the same CPU efficiency as host RDMA, while does not have full isolation on the data path. It works the best for single-tenant clusters, e.g., an internal cluster.

Performance

Below show the performance of Spark and Tensorflow running in fully-isolated RDMA mode on servers connected with 40Gbps RDMA network.

Quick Start: run a demo of Freeflow

Below are the steps of running Freeflow in fully-isolated RDMA mode. For TCP mode, refer to the README in tcp branch.

Step 1: Start Freeflow router (one instance per server)

sudo docker run --name router1 --net host -e "FFR_NAME=router1" -e "LD_LIBRARY_PATH=/usr/lib/:/usr/local/lib/:/usr/lib64/" -v /sys/class/:/sys/class/ -v /freeflow:/freeflow -v /dev/:/dev/ --privileged -it -d ubuntu:14.04 /bin/bash

Then log into the router container with

sudo docker exec -it router1 bash

Download and install RDMA libraries and drivers in router container. Currently, Freeflow is developed and tested with "MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64.tgz" You can download it from http://www.mellanox.com/page/products_dyn?product_family=26.

Then, build the code using the script build-router.sh. In ffrouter/, start the router by running "./router router1".

Step 2: Repeat Step 1 to start the router in other hosts. You can capture a Docker image of router1 for avoiding repeating the installations and building.

Step 3: Start an application container on the same host as router1

sudo docker run --name node1 --net weave -e "FFR_NAME=router1" -e "FFR_ID=10" -e "LD_LIBRARY_PATH=/usr/lib" -e --ipc=container:router1 -v /sys/class/:/sys/class/ -v /freeflow:/freeflow -v /dev/:/dev/ --privileged --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm -it -d ubuntu /bin/bash

You may use any container overlay solution. In this example, we use Weave (https://github.com/weaveworks/weave).

Environment variable "FFR_NAME=router1" points to the container to the router (router1) on the same host; "FFR_ID=10" is the ID of the contaienr in FreeFlow. Each container on the same host should have a unique FFR_ID. We are removing FFR_ID in next version.

Download and install the same version of RDMA libraries and drivers as Step 1. Then build the the code of libraries/ and libmempool/ and install to /usr/lib/ (which is default).

Step 4: Repeat Step 3 to start customer containers in more hosts. You can capture a Docker image of node1 for avoiding repeating the installations and building.

Attention: the released implementation hard-codes the host IPs and virtual IP to host IP mapping in https://github.com/Microsoft/Freeflow/blob/master/ffrouter/ffrouter.cpp#L215 and https://github.com/Microsoft/Freeflow/blob/master/ffrouter/ffrouter.h#L76. For quick tests, you can edit it according to your environment. Ideally, the router should read it from container overlay controller/zookeeper/etcd.

Validation: in customer containers, install RDMA perftest tools with "sudo apt-get install perftest". Try "ib_send_bw" or "ib_send_lat".

Applications

For RDMA, Freeflow has been tested with RDMA-based Spark (http://hibd.cse.ohio-state.edu/), HERD (https://github.com/efficient/HERD), Tensorflow with RDMA enabled (https://github.com/tensorflow/tensorflow) and rsocket (https://linux.die.net/man/7/rsocket). Most RDMA applications should run with no (or very little) modification, and outperform traditional TCP socket-based implementation.

For TCP, Freeflow has also been tested with many applications/framework, including DLWorkspace (https://github.com/Microsoft/DLWorkspace), Horovod (https://github.com/uber/horovod), Memcached, Nginx, PostSQL, and Kafka.

Contacts

This implementation is a research prototype that shows the feasibility. It is NOT production quality code. The technical details will be published in academic papers. If you have any questions, please raise issues on Github or contact the authors below.

Hongqiang Harry Liu ([email protected])

Yibo Zhu ([email protected])

Daehyeok Kim ([email protected])

Tianlong Yu ([email protected])

freeflow's People

Contributors

bobzhuyb avatar lampson0505 avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

freeflow's Issues

Why we tried ib_read_bw and ib_write_bw testings without FFO installed but succeeded? And why we installed libibverbs but can't find drivers?

In Section 4.3 where one-sided operations are discussed, we see there are two problems to support one-sided operations, and the first is the local FFR does not know the corresponding s-mem on the other side. To solve this problem, FreeFlow builds a central key-value store in FFO for all FFRs to learn the mapping between mem’s pointer in application’s virtual memory space and the corresponding s-mem’s pointer in FFR’s virtual memory space. However, our testings of ib_read_bw and ib_write_bw all succeeded without FFO installed, though we don't know how to install FFO.
It should be noted that all of our ib_send/read/write_bw testings are based on rdma_cm mode, because if we install libibverbs, we will encounter a warning of 'no userspace device-specific driver found'.
image
So we only install libmlx4 and librdmacm, and all testings are based on standard libibvers of rdma. Then if we test based on non rdma_cm mode, it will not go through router.
Did you met this problem before? We tried to solve this problem, and found that the function try_driver in init.c fails to find dirvers when executing
image
Then we think it is caused by driver initialization, and locate to function mlx4_driver_init defined in mlx4.c in libmlx4. We also found in file mlx4.c, you cut many lines, that make us confused. The problem we finally located to is in the following code, it doesn't 'goto found', so 'return NULL' early.
image
But why? Why rdma_cm mode doesn't met this problem? But with libibvers installed, both modes are influenced?
Wish your answer!

ssh community cannot work normally with the rsocket

I have tried to use FreeFlow open source project to test the performance of app. I noticed that you have test the rsocket with freeflow. But I got a problem when I tried to test big data app , the ssh community cannot work normally with the rsocket. The prompt message as following:
ssh
Have you ever had a similar problem? I am wondering if you can give me some advices to solve this problem ? Or even just a few names you think we should talk to. Thank you very much !

Floating point exception(core dump) while running ib_send_bw

Problem

I'm reproducing freeflow on my own machine, and when I run ib_send_bw in two containers located at the same physical machine, Floating point exception(core dump) occured. The error is also in two containers located at two machines while these two machines can run ib_send_bw correctely.
I am strictly following the quick start of github. Here is my environment and how I run freeflow.

Environment

  • os: ubuntu 14.04.6 with Linux Kernel 4.4.0-142-generic
  • RMDA NIC:
05:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
		Product Name: CX516A - ConnectX-5 QSFP28
05:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
		Product Name: CX516A - ConnectX-5 QSFP28=
  • OFED version:MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64,
  • Docker version: Docker version 1.13.0, build 49bf474
  • weave: 2.5.2
  • gcc: gcc-4.8.5 gcc-5.5.0 gcc-6.5.0 gcc-7.4.0.

When searching my problems in google, I saw the reason for this error is because the gcc version is wrong. So I tried gcc version with gcc-4.8.5 gcc-5.5.0 gcc-6.5.0 gcc-7.4.0. However, it is still like this.

host IPs and virtual IP to host IP mapping

I have two machines, 192.168.2.203 and 192.168.2.206. They are connected by weave overlay.I have modied host IPs and virtual IP to host IP mapping in my code. In ffrouter.h#L76,

const char HOST_LIST[HOST_NUM][16] = {
    "192.168.2.13",
    "192.168.2.15"
};

In ffrouter.cpp#L215,

    this->vip_map["10.47.128.0"] = "192.168.2.203";
    this->vip_map["10.47.0.5"] = "192.168.2.206";

Implementation

At host 203, enter freeflow router container and excute ./router router1, and run a container named node1, whose ip is 10.47.128.0, and run ib_send_bw.
At host 206, enter freeflow router container and excute ./router router1, and run a container named node2, whose ip is 10.47.0.8, and run ib_send_bw 10.47.128.0.
Then the error happened.
In container which run ib_send_bw, log is
error
In container which run ib_send_bw 10.47.128.0, log is
error2
How should I solve it? And can you tell me your gcc version ?

Error in `ib_send_bw': malloc(): memory corruption: 0x0000000001c20a10

When I run ib_send_bw in a container connected to a ffrouter, got the following failure:

root@10614386764b:/# ib_send_bw 
### FreeFlow ###
context->qp_table_mask=2047
mlx4: Warning: BlueFlame available, but failed to mmap() BlueFlame page.
*** Error in `ib_send_bw': malloc(): memory corruption: 0x0000000001c20a10 ***
Aborted (core dumped)
root@10614386764b:/# uname -a
Linux 10614386764b 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Any idea about this failure?

Linux Containers(LXC) - does it work?

Hi,
Are there steps to deploy it with linux containers?
I tried following the steps described with Dockers, but when I start the router.
I gave HOST_IP=hostname of the Ubuntu host where I deployed a linux container for router1.

I get this error -
/usr/share/Freeflow/ffrouter# ./router router1
could not get device

Thanks for your help.
_Kiran

the CPU overheads increase much when running a rsocket based app

In your paper "FreeFlow-Software-based Virtual RDMA Networking for Containerized Clouds", you compared native TCP with FreeFlow + rsocket, and verified FreeFlow always outperforms Weave both for throughput and latency. In our test, we have obtained the similar results that support your results, but the CPU overheads were higher than we imagine.
The CPU utilization ratio only decreases 20% to 30% than Weave. We initially consider that using rsocket will bring higher CPU overheads,
image
and the loss of CPU increases 50% when compared with ib_send_bw. So we want to know if you got similar problems, or our test results were wrong.

Problem about compile and symbol lookup error

Problem

Sorry to bother you. I'm reproducing Freeflow on my own machine. At first, Some compilation problems encountered and I fix it with gangliao/k8s-freeflow methods(change some compile options). After I finish the compilation, I run ib_send_bw, An error happens as below:
### FreeFlow ### libibverbs: Warning: couldn't load driver 'mlx5': /lib64/libmlx5-rdmav2.so: symbol ibv_exp_cmd_create_srq, version IBVERBS_1.1 not defined in file libibverbs.so.1 with link time reference ib_send_bw: symbol lookup error: /usr/lib/libibverbs.so.1: undefined symbol: mempool_create

Environment

OS: centos7.4.1708 with Linux Kernel 3.10.0-693.el7.x86_64
RMDA NIC: Mellanox Technologies MT27500 Family [ConnectX-3]
OFED version:MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.4-x86_64
Docker version: Docker version 1.13.1
weave: 2.5.2
gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)

I hope I can get your help or anyone's help, Thank you very much !

The Segmentation fault.

When i run "ib_send_bw" ,I have the following situations

FreeFlow


  • Waiting for client to connect... *

Couldn't get ibv_exp_post_send pointer

Program received signal SIGSEGV, Segmentation fault.
__ibv_create_cq (context=0x6316b0, cqe=, cq_context=0x0, channel=0x0, comp_vector=0) at src/verbs.c:538
538 cq_map[cq->handle] = cq;


I would really appreciate it if someone could help me

k8s-freeflow

Follow my prev issue #1, I found it's too hard to contribute k8s/etcd code in this official repo [Still have some compile/link error in your repo]. So I create a new repo to hack it and verify some idea. Since someone may be interested in it, I post some progress at here.

k8s-freeflow

TO-DO List

  • IP Hunter: write changed nodes and map(vip, pip) into ETCD periodically CODE

  • IP Hunter's Docker Image CODE

  • Have yet Test ffrouter update HOST_LIST and vip_map via RESTful API of ETCD's Watch mode periodically**. CODE

  • ffouter docker image

  • client docker image

  • GoogleTest PASS: ETCD V3 API's watch mode for ffrouter CODE

    The Watch API provides an event-based interface for asynchronously monitoring changes to keys. An etcd3 watch waits for changes to keys by continuously watching from a given revision, either current or historical, and streams key updates back to the client. https://coreos.com/etcd/docs/latest/learning/api.html

  • GoogleTest PASS: ETCD V3 API's range mode for ffrouter

  • GoogleTest PASS: ETCD V2 API's put mode for ffrouter

  • new ffrouter depends on curl, base64 and jsoncpp. CODE

  • the process of compilation passed. CODE

  • testing new ffouter with IP hunter and fix bug.

  • IP Hunter -> Kubernetes POD: ip_hunter_pod.yaml

  • ffrouter -> Kubernetes Daemonset: ffrouter_daemonset.yaml

  • Benchmark: baseline Kubeflow

undefined reference to `mempool_del'

when I make libibverbs-1.2.1mlnx1,I I ran into this problem,I hope can get your help,thanks!

CCLD examples/ibv_devices
./src/.libs/libibverbs.so: undefined reference to `mempool_del'
./src/.libs/libibverbs.so: undefined reference to `mempool_insert'
./src/.libs/libibverbs.so: undefined reference to `mempool_create'
./src/.libs/libibverbs.so: undefined reference to `mempool_get'

my OS is ubuntu18.04 server,the docker container also is ubuntu18.04,download from http://hub-mirror.c.163.com
The driver is MLNX_OFED_LINUX-4.7-1.0.0.1-ubuntu18.04-x86_64

your build-client.sh just have make and make install,but there is not Makefile under the folder.so,I need to ./autogen.sh and ./configure --prefix=/usr/ --libdir=/usr/lib/ --sysconfdir=/etc/ .I will encounter many other problem,but I have slove.This problem I can't slove,so I have to ask for your help.Thanks your help!

Can i run a FFL container in no privileged mode?

Hi:
As far as i am concerned, now that freeflow is a resolution of RDMA virtualization, the container is no need to have the privilege to access the RDMA device in the host. After i install OFED in the original ubuntu container and commit it as a image named ubuntu_ffl ,I wonder if the bold fonts in the following is unnecessary when i use the ubuntu_ffl to run a ffl container?

sudo docker run --name node1 --net weave -e "FFR_NAME=router1" -e "FFR_ID=10" -e "LD_LIBRARY_PATH=/usr/lib" -e --ipc=container:router1 -v /sys/class/:/sys/class/ -v /freeflow:/freeflow -v /dev/:/dev/ --privileged --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm -it -d ubuntu_ffl /bin/bash

So, i try it with the following command:

docker run --name node1 --net weave -e "FFR_NAME=router1" -e "FFR_ID=12" -e "LD_LIBRARY_PATH=/usr/lib"  --ipc=container:router1  -v /public/hxx/freeflow:/freeflow - -it -d ubuntu_ffl /bin/bash

Howerver,when i run ib_send_bw in this ffl,there are some errors occured, just like:
image

Is it wrong to run a ffl container in no privileged mode?Hope for your carification.

pull access denied for exce

I have started the freeflow router and tried logging in ,getting below error. Please provide suggestions.

error2-pull access denied

Why the Bandwidth doesn't change?

Freeflow

TCP Physical Bandwidth

ethtool eth0 | grep Speed
# 	Speed: 1000Mb/s

demo test

Baseline with Flannel

10.141.162.80:

sudo docker run -it --entrypoint /bin/bash --name iperf networkstatic/iperf3
ip addr show  # 172.30.81.4
iperf3 -s

10.141.170.36:

sudo docker run -it --entrypoint /bin/bash --name iperf networkstatic/iperf3
iperf3 -c 172.30.81.4
Connecting to host 172.30.81.4, port 5201
[  4] local 172.30.64.3 port 34259 connected to 172.30.81.4 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   111 MBytes   932 Mbits/sec    0   1.58 MBytes
[  4]   1.00-2.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   2.00-3.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   3.00-4.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   4.00-5.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   5.00-6.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   6.00-7.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   7.00-8.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   8.00-9.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   9.00-10.00  sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.06 GBytes   911 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.06 GBytes   909 Mbits/sec                  receiver

FreeFlow

10.141.186.119:

sudo docker run -d -it --privileged --net=host -v /freeflow:/freeflow -e "HOST_IP_PREFIX=10.141.184.0/21" --name freeflow freeflow/freeflow:tcp

sudo docker run -it --entrypoint /bin/bash -v /freeflow:/freeflow -e "VNET_PREFIX=172.30.92.0/24" -e "LD_PRELOAD=/freeflow/libfsocket.so" --name iperf networkstatic/iperf3

ip addr show  # 172.30.92.18
iperf3 -s

10.141.186.118:

sudo docker run -d -it --privileged --net=host -v /freeflow:/freeflow -e "HOST_IP_PREFIX=10.141.184.0/21" --name freeflow freeflow/freeflow:tcp

sudo docker run -it --entrypoint /bin/bash -v /freeflow:/freeflow -e "VNET_PREFIX=172.30.108.0/24" -e "LD_PRELOAD=/freeflow/libfsocket.so" --name iperf networkstatic/iperf3

iperf3 -c 172.30.92.18
[  4] local 172.30.108.12 port 38826 connected to 172.30.92.18 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   111 MBytes   933 Mbits/sec    0   1.50 MBytes
[  4]   1.00-2.00   sec   109 MBytes   912 Mbits/sec    0   1.50 MBytes
[  4]   2.00-3.00   sec   108 MBytes   902 Mbits/sec    0   1.50 MBytes
[  4]   3.00-4.00   sec   109 MBytes   912 Mbits/sec    0   1.50 MBytes
[  4]   4.00-5.00   sec   108 MBytes   902 Mbits/sec    0   1.50 MBytes
[  4]   5.00-6.00   sec   109 MBytes   912 Mbits/sec    0   1.50 MBytes
[  4]   6.00-7.00   sec   108 MBytes   902 Mbits/sec    0   1.57 MBytes
[  4]   7.00-8.00   sec   109 MBytes   912 Mbits/sec    0   1.57 MBytes
[  4]   8.00-9.00   sec   108 MBytes   902 Mbits/sec    0   1.57 MBytes
[  4]   9.00-10.00  sec   109 MBytes   912 Mbits/sec    0   1.57 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.06 GBytes   910 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.06 GBytes   908 Mbits/sec                  receiver

migration of container

When i migrate the container from one host another host then the ip of container is lost.how i can migrate the freeflow customer container ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.