Giter Site home page Giter Site logo

kas-installer's People

Contributors

agullon avatar akoserwal avatar b1zzu avatar biswassri avatar emmanuelbernard avatar franvila avatar grdryn avatar guicassolato avatar k-wall avatar kornys avatar machi1990 avatar maleck13 avatar miguelsorianod avatar mikeedgar avatar pb82 avatar ppatierno avatar racheljpg avatar rareddy avatar robobario avatar sambarker avatar shawkins avatar showuon avatar sjhiggs avatar stanleykaleta avatar steventobin avatar tinaselenge avatar tonyxrmdavidson avatar vbusch avatar vmanley avatar wtrocki avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kas-installer's Issues

Investigate way(s) to deploy subsystems independently

The fleetshard installation currently relies on (at least) the creation of several secrets in the fleet manager script. See if there is a way to untangle this so that an option to skip fleet manager is possible, similar to the way mas-sso and fleetshard installations can be skipped. The benefit is reduced wait time when repeatedly installing/uninstalling fleetshard during testing.

kas-fleet-manager pod failed on installation: CrashLoopBackOff

kas-fleet-manager pod failed on CrashLoopBackOff when running ./kas-installer.sh. Here are the logs from the service container:

F0117 08:46:45.271815       1 main.go:47] error running command: unknown flag: --vault-kind
Error: unknown flag: --vault-kind
Usage:
  kas-fleet-manager serve [flags]

enable us-east-2 region

Error creating Kafka instance 'instance': region eu-central-1 is not supported for aws, supported regions are: [us-east-1]

Hello, can we get at least another us region enabled, e.g. us-east-2?

Thanks

Kas-installer is not working for Openshift 4.9.X

Using kas-installer for OSD 4.9.X does not work because of the observability operator version. It should install the latest version (currently 3.0.7) instead of the one used (I think it is 3.0.2). The installer is stuck on this step:

Screenshot from 2021-11-17 11-54-42

I really don't know about the backward compatibility, but we would need kas-installer to work with both versions of Openshift, 4.8.X and 4.9.X (and future) as we should always be testing with the latest version and been able to test in older versions to reproduce any issue.

Jenkins run: https://ci.int.devshift.net/view/managed-services/job/managed-kafka-perf-tests/65/console

Cannot create an instance using kas-installer: cluster capacity exhausted

After this change has been applied, we cannot create an instance on the cluster, no matter the region where the cluster is (we have tried us-east-1, eu-west-1 and us-east-2). We always receive the following error:

./managed_kafka.sh --create kafka-instance
Error creating Kafka instance 'kafka-instance': cluster capacity exhausted

Cannot install Observatorium CRDs on K8s 1.22+

Due to the removal of the apiextensions.k8s.io/v1beta1 API as of Kubernetes v1.22, the version of the Observatorium CRDs currently hosted cannot be applied in clusters based on this version or later (e.g. OpenShift v4.11), effectively causing the installation of the Observatorium Operator to fail with:

error: unable to recognize "STDIN": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"

Integrate new KFM Quota list changes

There have been recent changes around the quota list management as per bf2fc6cc711aee1a0c2a/kas-fleet-manager#1368

The issue is to investigate any change that might be required or any new release notes that must go along when the KFM version is bumped. The changes are not supposed to be breaking changes as they are backward compatible so likely nothing will need to be done from user perspective

/cc @ziccardi to provide further insights on the changes

Kafka instance creation is always failing

Since yesterday, when executing

./managed_kafka.sh --create instance

We always receive the status failed indefinitely.
Screenshot from 2022-02-16 11-51-07

After running ./managed_kafka.sh --list, the output is:

{
  "kind": "KafkaRequestList",
  "page": 1,
  "size": 1,
  "total": 1,
  "items": [
    {
      "id": "c86dbc2in864m3gvmfl0",
      "kind": "Kafka",
      "href": "/api/kafkas_mgmt/v1/kafkas/c86dbc2in864m3gvmfl0",
      "status": "failed",
      "cloud_provider": "aws",
      "multi_az": true,
      "region": "us-east-1",
      "owner": "fvila_kafka_sre",
      "name": "instance",
      "created_at": "2022-02-16T10:45:04.398942Z",
      "updated_at": "2022-02-16T10:50:31.890442Z",
      "failed_reason": "failed to get desired Strimzi version c86dbc2in864m3gvmfl0",
      "instance_type": "eval",
      "reauthentication_enabled": true,
      "kafka_storage_size": "1000Gi"
    }
  ]
}

PodSecurity warnings from fleetmanager when deploying to OpenShift 4.11.18

I'm noticing when deploying to ROSA (OpenShift 4.11.18), I'm seeing a deprecation warning.

serviceaccount/kas-fleet-manager configured
Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "migration", "service", "envoy-sidecar" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "migration", "service", "envoy-sidecar" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "migration", "service", "envoy-sidecar" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "migration", "service", "envoy-sidecar" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/kas-fleet-manager created

kas-fleetshard should be installed using a bundle

kas-fleetshard is currently installed using the yaml files that are copied over from the kas-fleetshard repository, this should be converted to using a bundle. The only bundle currently available is in managed-tenants repo. we should also consider a public-facing bundle as a separate effort.

Request: replicate keycloak and keycloak-postgresql pods in all AZ

Currently, kas-installer configuration only deploys 1 keycloak pod and 1 keycloak-postgresql pod for all AZs in the k8s cluster.

I'd like support to deploy keycloak pods and keycloak-postgresql pods on every AZ, because we need at least 1 keycloak pod and 1 keycloak-postgresql always available to produce and consume messages when targeting external bootstrap url. More info in this thread: https://chat.google.com/room/AAAAHwoNLuU/H_Ulxb4OHi4

This request is needed to reproduce the outage AZ Fault scenario: https://issues.redhat.com/browse/MGDSTRM-7130

Allow the configuration of the quota management list

Allow overriding the https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/70ccf0061089c6938b8ca4e8cb1c17bd42f6c426/templates/service-template.yml#L278-L286 parameter to configure quota for an org/user

Without this feature, a kas-installer user is limited to only one eval instance if their org is not part of the default orgs. A kas-installer user should be able to create standard instances (they own the Data plane cluster, they should do anything)

deployment keycloak-postgresql gets stuck

I'm using kas-installer.sh and it's getting stuck with the below error:

deployment keycloak-postgresql still not created. Waiting 10s...

I'm using 4.12.0 OSD version. The keycloak custom resource shows the below message :

message: no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"

KAS Fleet Manager Deployment failed

When running kas-installer.sh installation is stuck in the step Waiting until KAS Fleet Manager Deployment is available... with kas-fleet-manager pod on CrashLoopBackOff state. This is the error logs from the pod:

I1210 14:17:23.280718       1 environment.go:108] Initializing development environment
E1210 14:17:23.281175       1 environment.go:119] unable to read configuration files: yaml: unmarshal errors:
  line 1: cannot unmarshal !!seq into config.InstanceTypeMap
F1210 14:17:23.281191       1 cmd.go:21] Unable to initialize environment: unable to read configuration files: yaml: unmarshal errors:
  line 1: cannot unmarshal !!seq into config.InstanceTypeMap

Tested with the latest commit on december 8th.

kas-installer execution log from Jenkins: https://ci.int.devshift.net/job/managed-kafka-fault-tests-nightly/78/console

Provide option to deploy kas-fleetshard from git reference

kas-fleetshard is currently deployed from a set of static files in this repository. A possible enhancement would be to support deploying fleetshard from a git branch/reference. This would require running a build of fleetshard during the installation process to generate the necessary YAML artifacts and images.

cc: @k-wall , @racheljpg

Provisioning Kafka instance fails

On the current tip of main (da3132b), I am experiencing an issue when I try to provision a Kafka instance using ./managed_kafka.sh --create gryan. It gets stuck in a provisioning/failed state, because the Kafka brokers can't successfully start. Here's an example log: https://gist.github.com/grdryn/cc3605b8a7f92b5145e061defcf161fb#file-kafka-0-log-L726..L727

In the mas-sso Keycloak, I see a lot of cases of the following exception, which may be related?

18:24:44,033 ERROR [org.jboss.as.controller.management-operation] (management I/O-2) WFLYCTL0013: Operation ("read-attribute") failed - address: ([
    ("subsystem" => "infinispan"),
    ("cache-container" => "keycloak"),
    ("cache" => "userRevisions")
]): org.jboss.msc.service.ServiceNotFoundException: Service service org.wildfly.clustering.infinispan.cache.keycloak.userRevisions not found
	at [email protected]//org.jboss.msc.service.ServiceContainerImpl.getRequiredService(ServiceContainerImpl.java:663)
	at [email protected]//org.jboss.as.controller.OperationContextImpl$OperationContextServiceRegistry.getRequiredService(OperationContextImpl.java:2293)
	at [email protected]//org.wildfly.clustering.service.ServiceSupplier$1.run(ServiceSupplier.java:54)
	at [email protected]//org.wildfly.clustering.service.ServiceSupplier$1.run(ServiceSupplier.java:51)
	at [email protected]//org.wildfly.clustering.service.PrivilegedActionSupplier.get(PrivilegedActionSupplier.java:37)
	at [email protected]//org.wildfly.clustering.service.ServiceSupplier.get(ServiceSupplier.java:67)
	at [email protected]//org.jboss.as.clustering.infinispan.subsystem.CacheMetricExecutor.execute(CacheMetricExecutor.java:53)
	at [email protected]//org.jboss.as.clustering.infinispan.subsystem.CacheMetricExecutor.execute(CacheMetricExecutor.java:38)
	at [email protected]//org.jboss.as.clustering.controller.MetricHandler.executeRuntimeStep(MetricHandler.java:75)
	at [email protected]//org.jboss.as.controller.AbstractRuntimeOnlyHandler$1.execute(AbstractRuntimeOnlyHandler.java:59)
	at [email protected]//org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext.java:1006)
	at [email protected]//org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:743)
	at [email protected]//org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
	at [email protected]//org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1423)
	at [email protected]//org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:446)
	at [email protected]//org.jboss.as.controller.ModelControllerImpl.lambda$executeForResponse$0(ModelControllerImpl.java:257)
	at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:289)
	at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:255)
	at [email protected]//org.jboss.as.controller.ModelControllerImpl.executeForResponse(ModelControllerImpl.java:257)
	at [email protected]//org.jboss.as.controller.ModelControllerImpl.executeOperation(ModelControllerImpl.java:251)
	at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient.executeInModelControllerCl(ModelControllerClientFactoryImpl.java:275)
	at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient.access$400(ModelControllerClientFactoryImpl.java:126)
	at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient$1.run(ModelControllerClientFactoryImpl.java:168)
	at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient$1.run(ModelControllerClientFactoryImpl.java:163)
	at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:289)
	at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:255)
	at [email protected]//org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:198)
	at [email protected]//org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:175)
	at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient.executeOperation(ModelControllerClientFactoryImpl.java:163)
	at [email protected]//org.jboss.as.controller.LocalModelControllerClient.execute(LocalModelControllerClient.java:54)
	at [email protected]//org.jboss.as.controller.LocalModelControllerClient.execute(LocalModelControllerClient.java:39)
	at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector.readAttributeValue(MetricCollector.java:331)
	at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector.access$400(MetricCollector.java:74)
	at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector$3.getValue(MetricCollector.java:205)
	at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector$3.getValue(MetricCollector.java:202)
	at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.createSimpleValueLine(OpenMetricsExporter.java:492)
	at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.exposeEntries(OpenMetricsExporter.java:192)
	at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.getEntriesForScope(OpenMetricsExporter.java:158)
	at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.exportAllScopes(OpenMetricsExporter.java:109)
	at io.smallrye.metrics//io.smallrye.metrics.MetricsRequestHandler.handleRequest(MetricsRequestHandler.java:116)
	at io.smallrye.metrics//io.smallrye.metrics.MetricsRequestHandler.handleRequest(MetricsRequestHandler.java:73)
	at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricsContextService$1.handleRequest(MetricsContextService.java:81)
	at [email protected]//org.jboss.as.domain.http.server.security.RealmReadinessHandler.handleRequest(RealmReadinessHandler.java:51)
	at [email protected]//org.jboss.as.domain.http.server.security.ServerErrorReadinessHandler.handleRequest(ServerErrorReadinessHandler.java:35)
	at [email protected]//io.undertow.server.handlers.PathHandler.handleRequest(PathHandler.java:91)
	at [email protected]//io.undertow.server.handlers.ChannelUpgradeHandler.handleRequest(ChannelUpgradeHandler.java:211)
	at [email protected]//io.undertow.server.handlers.cache.CacheHandler.handleRequest(CacheHandler.java:92)
	at [email protected]//io.undertow.server.handlers.error.SimpleErrorPageHandler.handleRequest(SimpleErrorPageHandler.java:78)
	at [email protected]//io.undertow.server.handlers.CanonicalPathHandler.handleRequest(CanonicalPathHandler.java:49)
	at [email protected]//org.jboss.as.domain.http.server.ManagementHttpRequestHandler.handleRequest(ManagementHttpRequestHandler.java:57)
	at [email protected]//org.jboss.as.domain.http.server.cors.CorsHttpHandler.handleRequest(CorsHttpHandler.java:75)
	at [email protected]//org.jboss.as.domain.http.server.ManagementHttpServer$UpgradeFixHandler.handleRequest(ManagementHttpServer.java:717)
	at [email protected]//io.undertow.server.Connectors.executeRootHandler(Connectors.java:390)
	at [email protected]//io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:255)
	at [email protected]//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
	at [email protected]//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
	at [email protected]//org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
	at [email protected]//org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
	at [email protected]//org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
	at [email protected]//org.xnio.nio.WorkerThread.run(WorkerThread.java:591)

metrics not available through fleet-manager endpoint

Attempts to read the metrics currently fail within the fleet-manager. This occurs because there is no observability component in the kas-installer installed environment.

Internally, fleet-manager fails like this:

E0715 10:17:40.368875       1 api.go:262] error from metric Post "/api/metrics/v1/test/api/v1/query": can't request metrics without auth
E0715 10:17:40.368900       1 metrics.go:55] error getting metrics: KAFKAS-MGMT-9: failed to retrieve metrics

and returns 500 back to the caller.

MANAGEDKAFKA_ADMINSERVER_EDGE_TLS_ENABLED broken

The intent of MANAGEDKAFKA_ADMINSERVER_EDGE_TLS_ENABLED was that it would cause the admin server to be run HTTPs, edge, terminated. This feature seems to have become broken.

I've not looked why.

You can work around by setting the same env var on the fleetshard subscription.

Changes to support provisioning a Kafka on a dedicated cluster a.k.a enterprise data plane cluster

Allow to configure KFM in dynamic scaling mode

KFM can be configured in dynamic scaling mode i.e the scaling mode is auto as opposed to the current manual mode which requires one to have an already provisioned data plane cluster.
When KFM is running in auto mode:

  • the OCM credentials are needed
  • KFM will auto provision needed data plane clusters and terraform them
  • KFM will delete all empty clusters provided that there will be at least one data plane cluster in the region

See https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/main/docs/architecture/data-plane-osd-cluster-dynamic-scaling.md for the different logic.

The scale up/down and various other configurations can be controlled via configuration knobs explained in https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/main/config/dynamic-scaling-configuration.yaml

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.