Giter Site home page Giter Site logo

Comments (12)

avikam avatar avikam commented on June 20, 2024 2

@ajhunyady can you reproduce it with RUST_LOG=debug?
specifically, I'd like to know if it happened because of a version discrepancy.

For example, if you had fluvio running, and then restarted your machine and cargo built master before resuming, you might see this:

2024-05-07T21:00:15.505156Z  WARN ... fluvio_cluster::start::common: Current Version 0.11.8-dev is not same as expected: 0.11.7-dev

edit - more context:
this may happen because the platform version is loaded from the local-config file, as it's a value that is deduced from the CLI. The start/resume commands verify that the version is as expected. When it's not, the SPU verification fails despite a successful connection, and loops infinitely.
I think it's worth addressing it, that is, allow you to be platform version agnostic in the local case (and have a proper warning, instead of an infinite loop) - but I do want to understand first if that is the thing that happened to you

from fluvio.

ajhunyady avatar ajhunyady commented on June 20, 2024 1

@avikam can you please take a look at this one?

from fluvio.

avikam avatar avikam commented on June 20, 2024 1

here is 1 before bats:
#3999

seems like there is an alignment so I'll keep pursuing that

from fluvio.

sehz avatar sehz commented on June 20, 2024

restart logic should check version number and warn user of incompatibility

from fluvio.

ajhunyady avatar ajhunyady commented on June 20, 2024

@avikam the debug log is attached.
debug.log

from fluvio.

avikam avatar avikam commented on June 20, 2024

indeed it's the same issue.
your log shows:

2024-05-07T13:55:39.967622Z  WARN install_only:launch_sc{host_name="127.0.0.1:9003" port=9003 pb=Indicatiff(ProgressBar)}:try_connect_to_sc{config=FluvioConfig { endpoint: "127.0.0.1:9003", use_spu_local_address: false, tls: Disabled, metadata: {}, client_id: None } platform_version=Version { major: 0, minor: 11, patch: 6 } pb=Indicatiff(ProgressBar)}: fluvio_cluster::start::common: Current Version 0.11.7 is not same as expected: 0.11.6

I would agree with @sehz the solution show be showing a warning to the user, and proceeding with the start/resume process

from fluvio.

ajhunyady avatar ajhunyady commented on June 20, 2024

I actually tried both variants.

% fluvio cluster resume
 ✅ Local Fluvio is not installed
🎉 All checks passed!
✅ Local Cluster initialized
👤 Profile set
🖥️  Trying to connect to SC: 127.0.0.1:9003 19 seconds elapsed /                                                                                            
^C

% fluvio cluster start
📝 Running pre-flight checks
    ✅ Local Fluvio is not installed
    ❌ Check Clean Fluvio Local Installation failed Local Fluvio cluster wasn't deleted. Use 'resume' to resume created cluster or 'delete' before starting
    💔 Some pre-flight check failed!
Preflight check failed

I was forced to delete to recover.

from fluvio.

ajhunyady avatar ajhunyady commented on June 20, 2024

Ok, I think I understand what's going on. We are currently updating smartmodules, so I'm constantly upgrading & downgrading to test. Since the smart engine changed, I must restart the cluster every time.

Now I realize that we may actually need a fluvio cluster upgrade command that should do whatever is needed to ensure the new version is compatible & permitted. In this case, nothing metadata-related changed, so the upgrade should ensure that the "previously saved cluster" accepts the new version. In other cases, a proper upgrade may be needed, but the upgrade translation should be handled by the developer who changed the metadata.

The current fluvio cluster upgrade only handles K8.

@sehz @avikam
Did I get this right?

from fluvio.

sehz avatar sehz commented on June 20, 2024

Yup. restart should check to ensure nothing changed. if does, should ask user to perform upgrade command

from fluvio.

avikam avatar avikam commented on June 20, 2024

Iv'e been thinking to address it in two PRs:

  1. stopgap: add a pre-check to the resume command to verify the versions match. This will remediate the bad user experience of looping forever.
  2. refactor upgrade, so that in the local installation type, instead of shut-down and start (which is currently broken, because we must use "resume"), we will update the platform-version only and resume the cluster.

an alternative for (2) is to remove the entire local-config file before starting the cluster. However, we'd lose configuration like number of SPUs, TLS policy, logs, etc.
@ajhunyady , @digikata does that sound reasonable?

from fluvio.

sehz avatar sehz commented on June 20, 2024

Idea of having local-config is sound (although we have something similar in K8 config, so they should be consistent or converge).

First should have anyway as required pre-flight check.

Then figure out process of upgrade as this needs to be common across multiple cluster types

from fluvio.

digikata avatar digikata commented on June 20, 2024
  1. definitely sounds good as a stopgap.

  2. sounds like good mechanics for an upgrade of the local cluster, and we can also figure out if there is a commonality between a local start upgrade and k8.

from fluvio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.