Comments (8)
I tried the instructions that @ljkiraly attached above but added scaling of NSEs too. From my test it seems that by creating NSEs NSMgr exhausts more memory than with scaling NSCs.
from cmd-nsmgr.
Description updated with version information. Also might be important that I was tested on a kind cluster with 4 nodes.
from cmd-nsmgr.
Hi Nikita,
Thanks for sharing your results.
Does "no leak" mean that the memory consumption of the nsmgr
container goes back to the original value when you scale NSCs and/or NSEs to zero?
In my tests nsmgr
consumes 14-15M memory before I deploy the endpoint and client. The nsmgr
container on the node where the NSE runs start consuming 21-23M, on the other node where the NSC is nsmgr
consumes around 20M.
When I scale NSE and NSC down to 0 then the first nsmgr
still shows 20M,
the second one 17M consumption,
and it does not really change in time.
So, I cannot really reproduce a situation when the memory consumption goes down or at least near to the original level.
from cmd-nsmgr.
Hi @denis-tingaikin,
I created an nsmgr image based on the following commits:
eefee38ab907156eafc3d7f2a69552c4779af393 - tmp disable connectionmonitor authroization
7202075a97e5bf1874afec4268fb38d9f063f199 - fix linter
ce37208c0b9ea9bee8bf9b0cfe68752445288f86 - fix mem leak in authorize
ccf42a564dce826dd7e0b5647393c70037643447 - fix memory leaks
- from PRs 1616 and 1617.
- Contains an grpc upllift to google.golang.org/grpc v1.63.2 based on github.com/szvincze/grpcfd v1.0.0
- Also added a code to produce a memprofile in each hour.
Asked to test it in a customer like environment with more then 80 endpoints and traffic running.
The result was better than before (with NSMv1.13).
Still there was a memory increase, especially in one of the nsmgr container:
nsmgr-lr96t-n5
==============
After install: 13.3 MB
UTC 11:38: 93.9MB
UTC 23:38: 97.2MB
UTC 03:38 (after trffic test): 109MB
UTC 05:38 (after uninstalling the application using NSM): 84.5 MB
Find the collected memprofiles attached.
As you can see at the slice from May7 2:38am (CEST), the profiling tool shows that the memory used by nsmgr was 38428.29kB (~40MB). It is strange that kubelet's metrics server showing a higher RSS that time (around 90MB-100MB).
File: nsmgr
Type: inuse_space
Time: May 7, 2024 at 2:38am (CEST)
Showing nodes accounting for 38428.29kB, 100% of 38428.29kB total
flat flat% sum% cum cum%
7922.50kB 20.62% 20.62% 7922.50kB 20.62% bufio.NewReaderSize (inline)
6866.17kB 17.87% 38.48% 6866.17kB 17.87% google.golang.org/grpc/internal/transport.newBufWriter (inline)
3076.16kB 8.00% 46.49% 3076.16kB 8.00% fmt.Sprintf
3073.31kB 8.00% 54.49% 3073.31kB 8.00% runtime.malg
1538.03kB 4.00% 58.49% 1538.03kB 4.00% bytes.growSlice
Hope that helps.
from cmd-nsmgr.
Another important detail, that I tested with nsmgr pod without exclude prefixes container and the same behavior can be seen, the memory increase still present. Edited the issue slogan.
from cmd-nsmgr.
Hello! I think I managed to reproduce the leak. I tried several setups:
NSEs with CIDR 172.16.0.0/16
kind
cluster with 4 nodes- Scaling only NSCs (no leak)
- Scaling NSEs and NSCs (no leak)
- Scaling NSEs and NSCs with different number of k8s-registries (no leak)
kind
cluster with 1 node- Scaling only NSCs (no leak)
- Scaling NSEs nd NSCs (no leak)
- Scaling NSEs and NSCs with different number of k8s-registries (no leak)
NSEs with CIDR 172.16.0.0/30
kind
cluster with 1 node- Scaling only NSCs (leak)
- Scaling NSEs and NSCs (leak)
It looks like we have a leak when there are no enough NSEs for all NSCs. After scaling NSEs and NSCs 10 times nsmgr
consumes 116M of memory. After several hours it still consumes the same amount of memory even though NSCs and NSEs scaled to zero.
Profiles
goroutines.pdf
memory.pdf
block.pdf
mutex.pdf
threadcreate.pdf
Profiles doesn't show any leaks. Memory profile tells only about 4.5M memory used by nsmgr
. The number of goroutines is also reasonable. Usually nsmgr
has about 50 goroutines running when there are no clients and endpoints.
Maybe there is a problem with the logs. Trying to check it now.
from cmd-nsmgr.
Hi,
I created a heap profile during a long running test. After 10 hours this is the memory situation in nsmgr:
I used tinden/cmd-nsmgr:v1.13.0-fix.5
and tinden/cmd-forwarder-vpp:v1.13.0-fix.5
images.
Right now one nsmgr uses 108M, the other 78M. The increase is much slower than before. It seems the runtime uses less than 30M in both cases.
It seems that metrics-server also counts if we are talking about memory increase.
However I haven't monitored the registry-k8s
pod, but it was OOMKilled few hours ago.
from cmd-nsmgr.
Here I add three heap profiles I created during my tests. The first one is from an idle state, then one after scaling of NSCs and NSEs started and another one from a later phase of scaling.
from cmd-nsmgr.
Related Issues (17)
- cmd-nsmgr application and testing HOT 4
- Update NSMgr to latest SDK HOT 1
- NSMgr adds NSE in registry with wrong URL
- NSMgr deployed via k8s daemonset can't access node IP address
- TestNSmgrEndpointCallback has unexpected Errors in output. HOT 1
- Request for livenessProbe and readinessProbe functionality HOT 26
- NSMgr leaks memory somewhere (probably in `grpcfd`) HOT 5
- Forwarder request processing chain element HOT 1
- Log level cannot be set HOT 1
- NSMgr container restarts due to concurrent map writes HOT 1
- NSE registry functionality HOT 1
- NSM_LISTEN_ON unix socket file permissions HOT 4
- nsc to connect nsmgr via tcp
- nsmgr crashing when deploying the `floating_vl3-basic` example on 3 kind clusters HOT 11
- What should be the real memory consumed by nsmgr process? HOT 1
- Connectivity changes during runtime HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cmd-nsmgr.