loxilb-io / loxilb Goto Github PK
View Code? Open in Web Editor NEWeBPF based cloud-native load-balancer. Powering Kubernetes|Edge|5G|IoT|XaaS Apps.
Home Page: https://www.loxilb.io
License: Apache License 2.0
eBPF based cloud-native load-balancer. Powering Kubernetes|Edge|5G|IoT|XaaS Apps.
Home Page: https://www.loxilb.io
License: Apache License 2.0
Most telco/3GPP systems and frameworks use SCTP protocol. We need to implement stateful conntracking for SCTP and support proper load-balancing of the same
loxilb has to follow standard indentation rules as per - https://go.dev/doc/effective_go#formatting
Need to explore gofmt tool for this purpose as well
There have been many features developed and since performance is of much importance to us, we need to be make sure performance remains optimal.
It is seen that basic sanity-CI workflow fails randomly with the following error when running go unit test framework
unknown flag `t'
exit status 1
FAIL github.com/loxilb-io/loxilb 0.936s
make: *** [Makefile:19: test] Error 1
Error: Process completed with exit code 2.
Need to look into it. Additional logs
loxilb provide's its own alternate conntrack implementation. Some users have requested conntrack only mode where loxilb does nothing else but conntrack mode. It might be an interesting feature for quick debugging in the cloud-native networking arena without affecting anything.
loxilb version: 0.7.0 2022_08_31-main
panic: runtime error: invalid memory address or nil pointer dereference
goroutine 21 [running]:
[github.com/loxilb-io/loxilb/loxinet.(*DpEbpfH).DpStat(0xc0000bcf20](http://github.com/loxilb-io/loxilb/loxinet.(*DpEbpfH).DpStat(0xc0000bcf20)?, 0xc003f06100)
/root/loxilb-io/loxilb/loxinet/dpebpf_linux.go:758 +0x429
[github.com/loxilb-io/loxilb/loxinet.(*DpH).DpWorkOnStat(...)](http://github.com/loxilb-io/loxilb/loxinet.(*DpH).DpWorkOnStat(...))
/root/loxilb-io/loxilb/loxinet/dpbroker.go:335
[github.com/loxilb-io/loxilb/loxinet.DpWorkSingle(0xc0000bcf88](http://github.com/loxilb-io/loxilb/loxinet.DpWorkSingle(0xc0000bcf88)?, {0xb7af80?, 0xc003f06100?})
/root/loxilb-io/loxilb/loxinet/dpbroker.go:372 +0x1d3
[github.com/loxilb-io/loxilb/loxinet.DpWorker(0x0](http://github.com/loxilb-io/loxilb/loxinet.DpWorker(0x0)?, 0xc00010ed20, 0xc000130c60)
/root/loxilb-io/loxilb/loxinet/dpbroker.go:387 +0xe5
created by [github.com/loxilb-io/loxilb/loxinet.DpBrokerInit](http://github.com/loxilb-io/loxilb/loxinet.DpBrokerInit)
/root/loxilb-io/loxilb/loxinet/dpbroker.go:406 +0x16e
loxicmd create lb 20.20.20.1 --tcp=2020:5001 --endpoints=32.32.32.1:1
We had gh-24 for load-balancer end-point health checks but sctp sessions require special handling as goLang's net package does not seem to support sctp yet.
when I run this command, I get this error message
docker cp /opt/loxilb/llb_ebpf_main.o 65ac58b2e101:/opt/loxilb/llb_ebpf_main.o
lstat /opt/loxilb: no such file or directory
It seems that docker does not have access to the /opt/ directory.
In certain cases, when end-point of a load-balancer rule is the originating host itself, it results in traffic loss. It is especially required in K8s CNI LB implementation but less so in external LB situation.
CI/CD needs to be enhanced to include more test-cases
Linux already supports rich egress QoS using TC . eBPF can redirect to HTB-qdisc based on queue_mapping. There needs to be a seamless way to achieve this between loxilb and TC/QoS.
We need to support policer per port or per rule and have api/cmd interface to configure it
Go report card always shows the following
There was an error processing your request: Could not analyze the repository: could not download repo: could not get latest module version from https://proxy.golang.org/loxilb/@latest: bad request: invalid escaped module path "loxilb": malformed module path "loxilb": missing dot in first path element
The same is reported properly for loxilib.
Create a LB rule
# loxicmd create lb 20.20.20.1 --tcp=2020:5001 --endpoints=31.31.31.1:1,32.32.32.1:1,17.17.17.1:1
Send traffic to hit this LB rule
Send normal traffic
(Randomly gets dropped or corrupted)
We need to prepare docker image for first ever release of loxilb
After initial creation of a LB rule, initial traffic session which uses hits this rule gets dropped. It is further observed that randomly some sessions do not connect.
Steps to reproduce -
loxicmd -p 11112 create lb 20.20.20.1 --tcp=2020:5001 --endpoints=31.31.31.1:1,32.32.32.1:1,17.17.17.1:1
This has been reported by @backguynn many times. Need to debug this
We need to support a configurable timeout usually for TCP connections. Normally the LB should send TCP reset in established mode, if timeout is reached.
We need to initially have basic CI/CD pipeline based on go unit test framework throughout loxilb. Later we can build on this pipeline.
After running 1k LB session run, it is seen that fc-map entries remain in loxilb
bpftool map dump pinned /opt/loxilb/dp/bpf/fc_v4_map | grep -i key | wc -l
1024
fc-map eBPF entries can get reused inside eBPF logic depending on usage but that depends on incoming traffic. Hence, we need to do garbage collection of fc-map entries.
Need to make sure loxilb is compliant and implements the latest CCM iterfaces
I created a sctp load-balancer rule as follows in loxilb docker based on loxilb documentation -
root@5affc126b9e2:/# loxicmd get lb -o wide
| EXTERNAL IP | PORT | PROTOCOL | SELECT | ENDPOINT IP | TARGET PORT | WEIGHT |
|-------------|------|----------|--------|-------------|-------------|--------|
| 20.20.20.1 | 2020 | sctp | 0 | 32.32.32.1 | 5001 | 1 |
| | | | | 33.33.33.1 | 5001 | 1 |
| | | | | 34.34.34.1 | 5001 | 1 |
But when LB session packets are sent towards the VIP (20.20.20.1), nothing is shown in conntrack table. However TCP rule is being processed properly.
When kernel was upgraded to 5.13. the sctp problem went away on its own. Can somebody clarify this behavior ??
Need to double confirm they are handled properly
Better to move it under debug flag
It will help the loxilb command-line invocation simple
We need to be able to support L7 proxy or splicing as popularly known
goBGP integration is in nascent stage. We need to test and stabilize it for both imported and exported routes
when i get This error log, loxilb get hang:
INFO: 2022/08/03 07:25:39 [NLP] NH 192.168.57.101 mac [8 0 39 36 110 98] dev eth0 added
INFO: 2022/08/03 07:25:39 [NLP] NH 192.168.57.101 mac [8 0 39 36 110 98] dev eth0 added
ERR: 2022/08/03 07:25:44 Neigh MAC add failed-Same FDB
ERR: 2022/08/03 07:25:44 [NLP] NH 192.168.57.104 mac [8 0 39 157 200 222] dev eth0 add failed NH mac error
This is lock issue in loxinet/apiclient.go
Usually load-balancers need to be deployed in cluster. So, as a first step we need two things -
Overall need to make sure, there is no traffic loss in loxilb during HA transitions
GTP is the de-facto standard tunneling used in 3GPP. We need to be able to parse (including extension), support encap-decap and load-balance on outer or inner header fields in ebpf kernel.
We need to support mirroring or SPAN as is better known for debugging as well as for logging as and when required
Currently, loxilb -v just shows version information. We need to have some additional information like date of build etc.
It might be better to have these as part of gh-actions CI.
We need to be able to support end-point probing by ourselves in standalone mode
I have noticed that some code-comments are either non intuitive or not present at all .It would be great if this can be addressed !!
We need to be support basic Ipv6 load-balancing
Originally posted by TrekkieCoder August 18, 2022
Hi,
Do you guys plan to support NPTv6 ?? It could be a good feature to have for the future !!
Currently, CT entries do not have a packet and byte count. We need to have these for visualization and accounting
Fragmented packets also need to be properly conntracked and handled in eBPF
How to reproduce -
sctp_test -H 32.32.32.1 -P 5001 -l
sctp_test -H 100.100.100.1 -h 32.32.32.1 -p 5001 -s -c 1 -M 100
root@8b74b5ddc4d2:~/loxilb-io# loxicmd -p 11112 get ct
| DESTINATIONIP | SOURCEIP | DESTINATIONPORT | SOURCEPORT | PROTOCOL | STATE | ACT | PACKETS | BYTES |
|---------------|---------------|-----------------|------------|----------|-------|-----|---------|--------|
| 32.32.32.1 | 100.100.100.1 | 5001 | 38066 | sctp | est | | 47 | 207472 |
| 100.100.100.1 | 32.32.32.1 | 44888 | 5001 | sctp | est | | 59 | 3204 |
| 32.32.32.1 | 100.100.100.1 | 5001 | 44888 | sctp | est | | 67 | 269500 |
| 100.100.100.1 | 32.32.32.1 | 38066 | 5001 | sctp | est | | 32 | 1580 |
Some sessions do not transition to SCTP shutdown-complete state which is expected behavior. For example, ideally, following is expected for all SCTP sessions -
oot@8b74b5ddc4d2:~/loxilb-io# loxicmd -p 11112 get ct
| DESTINATIONIP | SOURCEIP | DESTINATIONPORT | SOURCEPORT | PROTOCOL | STATE | ACT | PACKETS | BYTES |
|---------------|---------------|-----------------|------------|----------|---------------|-----|---------|--------|
| 32.32.32.1 | 100.100.100.1 | 5001 | 46201 | sctp | shut-complete | | 26 | 111824 |
| 32.32.32.1 | 100.100.100.1 | 5001 | 52981 | sctp | shut-complete | | 28 | 120648 |
| 100.100.100.1 | 32.32.32.1 | 57020 | 5001 | sctp | shut-complete | | 39 | 1888 |
| 100.100.100.1 | 32.32.32.1 | 44950 | 5001 | sctp | shut-complete | | 31 | 1488 |
| 100.100.100.1 | 32.32.32.1 | 60093 | 5001 | sctp | shut-complete | | 31 | 1488 |
When we run loxilb in little older kernel we get the following logs -
288=mm0000mm fp-296=00000000
1420: (b7) r1 = 1
; lock_xadd(&act->ctd.pb.packets, 1);
1421: (db) lock *(u64 *)(r7 +104) += r1
R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1_w=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1_w=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
1422: (05) goto pc+54
1477: safe
from 1458 to 1460: R0=map_value(id=0,off=0,ks=4,vs=16,imm=0) R1=invP1 R2=invP(id=0,smin_value=-4,smax_value=11,umin_value=2) R3=invP2 R6=ctx(id=0,off=0,imm=0) R7=map_value(id=111,off=0,ks=16,vs=144,imm=0) R8=fp-280 R9=fp-274 R10=fp0 fp-16=mmmmmmmm fp-24=mmmm???? fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=00000000 fp-56=000000mm fp-64=m0000000 fp-72=0000mmmm fp-80=mm0m0000 fp-88=mm0mmm00 fp-96=00000000 fp-104=00000000 fp-112=00000000 fp-120=00000000 fp-128=00000000 fp-136=00000000 fp-144=00000000 fp-152=00000000 fp-160=00000000 fp-168=00000000 fp-176=00000000 fp-184=00000000 fp-192=00000000 fp-200=00000000 fp-208=00000000 fp-216=00000000 fp-224=00000000 fp-232=00000000 fp-240=00000000 fp-248=00000000 fp-256=00000000 fp-264=00000000 fp-272=m000mmmm fp-280=mmmmmmmm fp-288=mm0000mm fp-296=00000000
; int z = 0;
1460: (b7) r1 = 0
1461: (63) *(u32 *)(r10 -16) = r1
; if (F->l4m.ct_sts != 0) {
1462: (71) r1 = *(u8 *)(r10 -114)
; if (F->l4m.ct_sts != 0) {
1463: (55) if r1 != 0x0 goto pc+13
1464: (bf) r2 = r10
;
1465: (07) r2 += -16
1466: (bf) r3 = r10
1467: (07) r3 += -296
; bpf_map_update_elem(&xfis, &z, F, BPF_ANY);
1468: (18) r1 = 0xffff8dca7928ba00
1470: (b7) r4 = 0
1471: (85) call bpf_map_update_elem#2
; bpf_tail_call(ctx, &pgm_tbl, idx);
1472: (bf) r1 = r6
1473: (18) r2 = 0xffff8dca7b94d200
1475: (b7) r3 = 1
1476: (85) call bpf_tail_call#12
tail_calls are not allowed in programs with bpf-to-bpf calls
We need to be able to manage goBGP process from inside loxilb ( forking, restarting etc)
Travis-CI is failing with the following logs :
/usr/bin/ld: cannot find -lbsd
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:27: ip] Error 1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.