Giter Site home page Giter Site logo

helm-charts's People

Contributors

2opremio avatar brahman81 avatar erika-sdf avatar jacekn avatar marcelosalloum avatar marwen-abid avatar mollykarcher avatar mwtzzz avatar reecexlm avatar satyamz avatar sreuland avatar stellar-terraform avatar stfung77 avatar tsachiherman avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

helm-charts's Issues

charts/core: helm install, core pod has db init startup error

What version are you using?

main

What did you do?

helm repo add stellar https://helm.stellar.org/charts && helm repo update stellar
helm install testcore stellar/core   --namespace sandbox   --set global.image.core.tag=19.13.1-1459.bf4363684.focal --set global.network=testnet --devel

What did you expect to see?

running core deployment on cluster

What did you see instead?


kubectl describe pod/testcore-0 -n sandbox
...
Init Containers:
  core-new-db:
    Container ID:  containerd://bc0038c94b191f2ad5302b5ee43ebc664fd74c93c02f0de0db7b38b1b91cd113
    Image:         docker.io/stellar/stellar-core:19.13.1-1459.bf4363684.focal
    Image ID:      docker.io/stellar/stellar-core@sha256:a8e293cb0cbdbbc548025ad1fd13dd4fef3160d55c7cdf6e75a9bbd9df62e4e4
    Port:          11626/TCP
    Host Port:     0/TCP
    Args:
      new-db
      --conf
      /config/stellar-core.cfg
      --console
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 31 Aug 2023 16:07:31 -0700
      Finished:     Thu, 31 Aug 2023 16:07:31 -0700
    Ready:          False
    Restart Count:  4
    Limits:
      cpu:     250m
      memory:  512Mi
    Requests:
      cpu:        100m
      memory:     256Mi
    Environment:  <none>
    Mounts:
      /config from core-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n5zzb (ro)

...

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Normal   Scheduled         2m23s                 default-scheduler  Successfully assigned sandbox/testcore-0 to ip-172-22-24-82.ec2.internal
  Normal   Pulled            2m20s                 kubelet            Successfully pulled image "docker.io/stellar/stellar-core:19.13.1-1459.bf4363684.focal" in 1.973165104s
  Normal   Pulled            2m19s                 kubelet            Successfully pulled image "docker.io/stellar/stellar-core:19.13.1-1459.bf4363684.focal" in 102.370461ms
  Normal   Pulling           2m5s (x3 over 2m22s)  kubelet            Pulling image "docker.io/stellar/stellar-core:19.13.1-1459.bf4363684.focal"
  Normal   Created           2m5s (x3 over 2m20s)  kubelet            Created container core-new-db
  Normal   Started           2m5s (x3 over 2m20s)  kubelet            Started container core-new-db
  Normal   Pulled            2m5s                  kubelet            Successfully pulled image "docker.io/stellar/stellar-core:19.13.1-1459.bf4363684.focal" in 97.389568ms
  Warning  DNSConfigForming  110s (x9 over 2m22s)  kubelet            Search Line limits were exceeded, some search paths have been omitted, the applied search line is: sandbox.svc.cluster.local svc.cluster.local cluster.local dev.kube001.internal.stellar-ops.com internal.stellar-ops.com dev.services.stellar-ops.com
  Warning  BackOff           110s (x4 over 2m18s)  kubelet            Back-off restarting failed container

charts/horizon: helm install has failed pod startup

What version are you using?

main

What did you do?

helm repo add stellar https://helm.stellar.org/charts && helm repo update stellar
helm install testhorizon stellar/horizon   --namespace sandbox   --set global.image.horizon.tag=2.26.1 --set global.network=testnet --devel

What did you expect to see?

horizon service running on cluster

What did you see instead?

statefulset startup failed due to pod/spec error:

kubectl get events -n sandbox
LAST SEEN   TYPE      REASON         OBJECT                          MESSAGE
69s         Warning   FailedCreate   statefulset/my-horizon-ingest   create Pod my-horizon-ingest-0 in StatefulSet my-horizon-ingest failed error: Pod "my-horizon-ingest-0" is invalid: spec.containers[0].envFrom[0].secretRef.name: Required value

looking at the statefulset created by helm on cluster, can see the secretRef issue, looks like it has invalid yaml, some empty braces:

....
 spec:
      containers:
      - args:
        - --apply-migrations
        envFrom:
        - secretRef: {}
        - configMapRef:
            name: my-horizon-ingest-env
        image: docker.io/stellar/stellar-horizon:2.26.1
        imagePullPolicy: Always
        name: horizon
....

Stabilize RPC rolling update behavior on k8s

What problem does your feature solve?

RPC usage of Statefulset on k8s may have leave potential for problems during rolling upgrades:

  • rpc update rollouts on cluster could potentially corrupt the single pvc due to statefulset can only do rolling upgrades which allows for two pods simultaneously to be accessing the pvc during upgrade rollout as k8s spins up pending first before terminating current pod.
  • rpc downtimes during update rollouts have been consistently observed, it doesn't appear that stateful set rolling upgrade strategy preserves old pods long enough before new pods are ready, i.e. it's cutting over to new pods before they pass readiness. observation notes here

What would you like to see?

  • The rpc rolling upgrade will not corrupt the underlying pvc and will not exhibit any http service downtime during the rollout.

  • Evaluate if converting to a Deployment with Recreate update strategy, and deifne a stand-alone PVC which the deployment will reference for volume mount, will fix the http service downtime aspect, that rollout should retain the old pod until new pod passes readinessProbe.

What alternatives are there?

enable zero-downtime deployments for RPC

What problem does your feature solve?

In it's current form, RPC takes ~30 minutes to deploy new versions to pubnet (thread1, thread2) due to iops limits when initializing it's in-memory data storage from disk.

What would you like to see?

A new RPC version rolls out, and there's no disruption in service. There is also no loss of historical transaction/events history upon rollout (that is, the db/history does not reset to nothing).

What alternatives are there?

  • Blue/green deployment model. We'd maintain 2 instances of RPC, with one always being kept in "standby" mode and not used for client requests
  • Horizontally scale RPC to 2 replicas, each using their own independent PVC, and load balance between them. On deployments, we would deploy to one at a time, making sure we always had 1 ready/available. This is the strategy that we think is better/optimal

charts/core: values.yml for testnet and pubnet

What problem does your feature solve?

if you want to install the helm chart for core on testnet or pubnet, there is no reference values.yml to use, which aggregates many specific settings needed for network like validators, passphrase, archive urls, etc. it would take a long time and be error prone to try and set that up manually.

What would you like to see?

testnet_values.yml and pubnet_values.yml are created in the core chart path to be usable in helm install --values.
Or global.network=[testnet|pubnet] works the same and prevents need for testnet_values.yml and pubnet_values.yml .

updated docs can refer to correct config - stellar/stellar-docs#216

What alternatives are there?

users take the futurenet_values.yml and manually figure out the changes from there for testnet or pubnet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.