Comments (12)
@ajhunyady can you reproduce it with RUST_LOG=debug?
specifically, I'd like to know if it happened because of a version discrepancy.
For example, if you had fluvio running, and then restarted your machine and cargo built master before resuming, you might see this:
2024-05-07T21:00:15.505156Z WARN ... fluvio_cluster::start::common: Current Version 0.11.8-dev is not same as expected: 0.11.7-dev
edit - more context:
this may happen because the platform version is loaded from the local-config file, as it's a value that is deduced from the CLI. The start/resume commands verify that the version is as expected. When it's not, the SPU verification fails despite a successful connection, and loops infinitely.
I think it's worth addressing it, that is, allow you to be platform version agnostic in the local case (and have a proper warning, instead of an infinite loop) - but I do want to understand first if that is the thing that happened to you
from fluvio.
@avikam can you please take a look at this one?
from fluvio.
here is 1 before bats
:
#3999
seems like there is an alignment so I'll keep pursuing that
from fluvio.
restart logic should check version number and warn user of incompatibility
from fluvio.
@avikam the debug log is attached.
debug.log
from fluvio.
indeed it's the same issue.
your log shows:
2024-05-07T13:55:39.967622Z WARN install_only:launch_sc{host_name="127.0.0.1:9003" port=9003 pb=Indicatiff(ProgressBar)}:try_connect_to_sc{config=FluvioConfig { endpoint: "127.0.0.1:9003", use_spu_local_address: false, tls: Disabled, metadata: {}, client_id: None } platform_version=Version { major: 0, minor: 11, patch: 6 } pb=Indicatiff(ProgressBar)}: fluvio_cluster::start::common: Current Version 0.11.7 is not same as expected: 0.11.6
I would agree with @sehz the solution show be showing a warning to the user, and proceeding with the start/resume process
from fluvio.
I actually tried both variants.
% fluvio cluster resume
β
Local Fluvio is not installed
π All checks passed!
β
Local Cluster initialized
π€ Profile set
π₯οΈ Trying to connect to SC: 127.0.0.1:9003 19 seconds elapsed /
^C
% fluvio cluster start
π Running pre-flight checks
β
Local Fluvio is not installed
β Check Clean Fluvio Local Installation failed Local Fluvio cluster wasn't deleted. Use 'resume' to resume created cluster or 'delete' before starting
π Some pre-flight check failed!
Preflight check failed
I was forced to delete to recover.
from fluvio.
Ok, I think I understand what's going on. We are currently updating smartmodules, so I'm constantly upgrading & downgrading to test. Since the smart engine changed, I must restart the cluster every time.
Now I realize that we may actually need a fluvio cluster upgrade
command that should do whatever is needed to ensure the new version is compatible & permitted. In this case, nothing metadata-related changed, so the upgrade should ensure that the "previously saved cluster" accepts the new version. In other cases, a proper upgrade may be needed, but the upgrade translation should be handled by the developer who changed the metadata.
The current fluvio cluster upgrade
only handles K8.
@sehz @avikam
Did I get this right?
from fluvio.
Yup. restart
should check to ensure nothing changed. if does, should ask user to perform upgrade
command
from fluvio.
Iv'e been thinking to address it in two PRs:
- stopgap: add a pre-check to the resume command to verify the versions match. This will remediate the bad user experience of looping forever.
- refactor upgrade, so that in the local installation type, instead of shut-down and start (which is currently broken, because we must use "resume"), we will update the platform-version only and resume the cluster.
an alternative for (2) is to remove the entire local-config file before starting the cluster. However, we'd lose configuration like number of SPUs, TLS policy, logs, etc.
@ajhunyady , @digikata does that sound reasonable?
from fluvio.
Idea of having local-config
is sound (although we have something similar in K8 config, so they should be consistent or converge).
First should have anyway as required pre-flight check.
Then figure out process of upgrade as this needs to be common across multiple cluster types
from fluvio.
-
definitely sounds good as a stopgap.
-
sounds like good mechanics for an upgrade of the local cluster, and we can also figure out if there is a commonality between a local start upgrade and k8.
from fluvio.
Related Issues (20)
- Include README as part of `smdk publish`
- index repair API and tooling HOT 1
- `cdk generate` params and generated project integrity
- mirroring: delete topic on remote when delete it from home HOT 1
- mirroring: remove βhideβ flag of home and remote commands HOT 1
- mirroring: add consumer --mirror argument to consume only from the selected edge. HOT 1
- mirroring: export file should have topics again
- mirroring: topic --mirror-delete argument
- mirroring: disconnect remote when delete remote from home
- mirroring: add disconnect and enable commands
- Sync scope and policy on SPU
- mirroring: implement scope/policy authorization for spu HOT 1
- mirroring: feedback when connecting to home
- mirroring: disallow produce mirror topic from home HOT 1
- [Bug]: cluster delete cmd should kill only the binaries of the profile select HOT 1
- [Bug]: Error while connecting to local cluster HOT 6
- Reenable Cluster Resume Tests HOT 1
- Key Value API HOT 7
- Replace `fluvio-cli` atty use with std trait IsTerminal
- Allow smartmodules on sink connectors to run client-side HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluvio.