Giter Site home page Giter Site logo

loxilb-io / loxilb Goto Github PK

View Code? Open in Web Editor NEW
1.2K 1.2K 73.0 52.98 MB

eBPF based cloud-native load-balancer. Powering Kubernetes|Edge|5G|IoT|XaaS Apps.

Home Page: https://www.loxilb.io

License: Apache License 2.0

Go 100.00%
cloud-native clustering ebpf edge golang hybrid-cloud k8s kubernetes kubernetes-networking loadbalancing nat nat64 nat66 network-security networking public-cloud sctp service-loadbalancer

loxilb's People

Contributors

backguynn avatar codesnip12 avatar cybwan avatar ianchen0119 avatar inhogog2 avatar k8sguru avatar krizerg avatar luisgerhorst avatar nik-netlox avatar packetcrunch avatar trekkiecoder avatar ultrainstinct14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loxilb's Issues

Support for SCTP connection tracking

Most telco/3GPP systems and frameworks use SCTP protocol. We need to implement stateful conntracking for SCTP and support proper load-balancing of the same

Basic Sanity-CI fails randomly

It is seen that basic sanity-CI workflow fails randomly with the following error when running go unit test framework

unknown flag `t'
exit status 1
FAIL	github.com/loxilb-io/loxilb	0.936s
make: *** [Makefile:19: test] Error 1
Error: Process completed with exit code 2.

Need to look into it. Additional logs

cicd, scale and long-run testing effort

Scale and performance are the defining factors for load-balancers. Although we have some performance numbers with wrk, we need to get some scale numbers eg. how many sessions , how many CTs etc. We need to find a proper open-source tool for this testing e.g Trex

Passive stateful conntrack mode support in loxilb

loxilb provide's its own alternate conntrack implementation. Some users have requested conntrack only mode where loxilb does nothing else but conntrack mode. It might be an interesting feature for quick debugging in the cloud-native networking arena without affecting anything.

loxilb crash after creating load-balancer rule

loxilb build info

loxilb version: 0.7.0 2022_08_31-main

Logs

panic: runtime error: invalid memory address or nil pointer dereference

goroutine 21 [running]:
[github.com/loxilb-io/loxilb/loxinet.(*DpEbpfH).DpStat(0xc0000bcf20](http://github.com/loxilb-io/loxilb/loxinet.(*DpEbpfH).DpStat(0xc0000bcf20)?, 0xc003f06100)
	/root/loxilb-io/loxilb/loxinet/dpebpf_linux.go:758 +0x429
[github.com/loxilb-io/loxilb/loxinet.(*DpH).DpWorkOnStat(...)](http://github.com/loxilb-io/loxilb/loxinet.(*DpH).DpWorkOnStat(...))
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:335
[github.com/loxilb-io/loxilb/loxinet.DpWorkSingle(0xc0000bcf88](http://github.com/loxilb-io/loxilb/loxinet.DpWorkSingle(0xc0000bcf88)?, {0xb7af80?, 0xc003f06100?})
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:372 +0x1d3
[github.com/loxilb-io/loxilb/loxinet.DpWorker(0x0](http://github.com/loxilb-io/loxilb/loxinet.DpWorker(0x0)?, 0xc00010ed20, 0xc000130c60)
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:387 +0xe5
created by [github.com/loxilb-io/loxilb/loxinet.DpBrokerInit](http://github.com/loxilb-io/loxilb/loxinet.DpBrokerInit)
	/root/loxilb-io/loxilb/loxinet/dpbroker.go:406 +0x16e

Steps to reproduce

  1. Create any LB rule as follows
loxicmd create lb 20.20.20.1 --tcp=2020:5001 --endpoints=32.32.32.1:1

Host DNAT functionality

In certain cases, when end-point of a load-balancer rule is the originating host itself, it results in traffic loss. It is especially required in K8s CNI LB implementation but less so in external LB situation.

Evaluate Go report card for loxilb

Go report card always shows the following

There was an error processing your request: Could not analyze the repository: could not download repo: could not get latest module version from https://proxy.golang.org/loxilb/@latest: bad request: invalid escaped module path "loxilb": malformed module path "loxilb": missing dot in first path element

The same is reported properly for loxilib.

Random LB session initiation gets dropped

After initial creation of a LB rule, initial traffic session which uses hits this rule gets dropped. It is further observed that randomly some sessions do not connect.

Steps to reproduce -

  1. Create LB rule
loxicmd -p 11112 create lb 20.20.20.1 --tcp=2020:5001 --endpoints=31.31.31.1:1,32.32.32.1:1,17.17.17.1:1
  1. Send traffic to hit the LB rule

Configurable timeout per LB rule

We need to support a configurable timeout usually for TCP connections. Normally the LB should send TCP reset in established mode, if timeout is reached.

Github CI/CD integration

We need to initially have basic CI/CD pipeline based on go unit test framework throughout loxilb. Later we can build on this pipeline.

Need garbage collection of eBPF fc-map

After running 1k LB session run, it is seen that fc-map entries remain in loxilb

bpftool map dump pinned /opt/loxilb/dp/bpf/fc_v4_map  | grep -i key | wc -l
1024

fc-map eBPF entries can get reused inside eBPF logic depending on usage but that depends on incoming traffic. Hence, we need to do garbage collection of fc-map entries.

sctp processing problem in 5.4 linux kernel

I created a sctp load-balancer rule as follows in loxilb docker based on loxilb documentation -

root@5affc126b9e2:/# loxicmd  get lb -o wide
| EXTERNAL IP | PORT | PROTOCOL | SELECT | ENDPOINT IP | TARGET PORT | WEIGHT |
|-------------|------|----------|--------|-------------|-------------|--------|
| 20.20.20.1  | 2020 | sctp     |      0 | 32.32.32.1  |        5001 |      1 |
|             |      |          |        | 33.33.33.1  |        5001 |      1 |
|             |      |          |        | 34.34.34.1  |        5001 |      1 |

But when LB session packets are sent towards the VIP (20.20.20.1), nothing is shown in conntrack table. However TCP rule is being processed properly.

When kernel was upgraded to 5.13. the sctp problem went away on its own. Can somebody clarify this behavior ??

L7 parsing support

We need to be able to support L7 proxy or splicing as popularly known

goBGP integration stability

goBGP integration is in nascent stage. We need to test and stabilize it for both imported and exported routes

Hang issue when failed to add new neighbor info

when i get This error log, loxilb get hang:

INFO: 2022/08/03 07:25:39 [NLP] NH 192.168.57.101 mac [8 0 39 36 110 98] dev eth0 added
INFO: 2022/08/03 07:25:39 [NLP] NH 192.168.57.101 mac [8 0 39 36 110 98] dev eth0 added
ERR:  2022/08/03 07:25:44 Neigh MAC add failed-Same FDB
ERR:  2022/08/03 07:25:44 [NLP] NH 192.168.57.104 mac [8 0 39 157 200 222] dev eth0 add failed NH mac error

This is lock issue in loxinet/apiclient.go

HA clustering support

Usually load-balancers need to be deployed in cluster. So, as a first step we need two things -

  1. Integration with keepalived or something similar (need to discuss)
  2. HA state management. Most importantly eBPF conntrack data/maps need to be in proper sync during HA transition

Overall need to make sure, there is no traffic loss in loxilb during HA transitions

ULCL classifier support

GTP is the de-facto standard tunneling used in 3GPP. We need to be able to parse (including extension), support encap-decap and load-balance on outer or inner header fields in ebpf kernel.

vMirroring support

We need to support mirroring or SPAN as is better known for debugging as well as for logging as and when required

Performance issues and random drops in SCTP sessions

How to reproduce -

  • Run sctp server
sctp_test -H 32.32.32.1 -P 5001 -l
  • Run sctp client ( repeated )
sctp_test -H 100.100.100.1  -h 32.32.32.1 -p 5001 -s -c 1 -M 100
  • Check loxilb ct status
root@8b74b5ddc4d2:~/loxilb-io# loxicmd -p 11112 get ct
| DESTINATIONIP |   SOURCEIP    | DESTINATIONPORT | SOURCEPORT | PROTOCOL | STATE | ACT | PACKETS | BYTES  |
|---------------|---------------|-----------------|------------|----------|-------|-----|---------|--------|
| 32.32.32.1    | 100.100.100.1 |            5001 |      38066 | sctp     | est   |     |      47 | 207472 |
| 100.100.100.1 | 32.32.32.1    |           44888 |       5001 | sctp     | est   |     |      59 |   3204 |
| 32.32.32.1    | 100.100.100.1 |            5001 |      44888 | sctp     | est   |     |      67 | 269500 |
| 100.100.100.1 | 32.32.32.1    |           38066 |       5001 | sctp     | est   |     |      32 |   1580 |

Some sessions do not transition to SCTP shutdown-complete state which is expected behavior. For example, ideally, following is expected for all SCTP sessions -

oot@8b74b5ddc4d2:~/loxilb-io# loxicmd -p 11112 get ct
| DESTINATIONIP |   SOURCEIP    | DESTINATIONPORT | SOURCEPORT | PROTOCOL |     STATE     | ACT | PACKETS | BYTES  |
|---------------|---------------|-----------------|------------|----------|---------------|-----|---------|--------|
| 32.32.32.1    | 100.100.100.1 |            5001 |      46201 | sctp     | shut-complete |     |      26 | 111824 |
| 32.32.32.1    | 100.100.100.1 |            5001 |      52981 | sctp     | shut-complete |     |      28 | 120648 |
| 100.100.100.1 | 32.32.32.1    |           57020 |       5001 | sctp     | shut-complete |     |      39 |   1888 |
| 100.100.100.1 | 32.32.32.1    |           44950 |       5001 | sctp     | shut-complete |     |      31 |   1488 |
| 100.100.100.1 | 32.32.32.1    |           60093 |       5001 | sctp     | shut-complete |     |      31 |   1488 |

loxilb ebpf is not working in 5.4 linux kernel

When we run loxilb in little older kernel we get the following logs -

288=mm0000mm fp-296=00000000
1420: (b7) r1 = 1
; lock_xadd(&act->ctd.pb.packets, 1);
1421: (db) lock *(u64 *)(r7 +104) += r1
 R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1_w=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
 R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1_w=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
1422: (05) goto pc+54
1477: safe

from 1458 to 1460: R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
; int z = 0;
1460: (b7) r1 = 0
1461: (63) *(u32 *)(r10 -16) = r1
; if (F->l4m.ct_sts != 0) {
1462: (71) r1 = *(u8 *)(r10 -114)
; if (F->l4m.ct_sts != 0) {
1463: (55) if r1 != 0x0 goto pc+13
1464: (bf) r2 = r10
; 
1465: (07) r2 += -16
1466: (bf) r3 = r10
1467: (07) r3 += -296
; bpf_map_update_elem(&xfis, &z, F, BPF_ANY);
1468: (18) r1 = 0xffff8dca7928ba00
1470: (b7) r4 = 0
1471: (85) call bpf_map_update_elem#2
; bpf_tail_call(ctx, &pgm_tbl, idx);
1472: (bf) r1 = r6
1473: (18) r2 = 0xffff8dca7b94d200
1475: (b7) r3 = 1
1476: (85) call bpf_tail_call#12
tail_calls are not allowed in programs with bpf-to-bpf calls

goBGP handling

We need to be able to manage goBGP process from inside loxilb ( forking, restarting etc)

Travis-CI failing

Travis-CI is failing with the following logs :

/usr/bin/ld: cannot find -lbsd
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:27: ip] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.