Giter Site home page Giter Site logo

redkubes / otomi-core Goto Github PK

View Code? Open in Web Editor NEW
2.2K 31.0 165.0 223.94 MB

Application Platform for Kubernetes

Home Page: https://otomi.io

License: Apache License 2.0

Shell 3.93% Smarty 19.94% Dockerfile 0.62% Open Policy Agent 4.03% Mustache 53.58% Python 0.42% TypeScript 16.69% JavaScript 0.73% Makefile 0.06%
kubernetes paas developer-selfservice devops self-hosted gitops platform-engineering

otomi-core's Introduction

Website

This website is built using Docusaurus 2, a modern static website generator.

Installation

$ yarn

Local Development

$ yarn start

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

Build

$ yarn build

This command generates static content into the build directory and can be served using any static contents hosting service.

Deployment

Using SSH:

$ USE_SSH=true yarn deploy

Not using SSH:

$ GIT_USER=<Your GitHub username> yarn deploy

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.

otomi-core's People

Contributors

0-sv avatar ani1357 avatar bartusz01 avatar ben10k avatar caslubbers avatar dennisvankekem avatar dependabot[bot] avatar diabhey avatar dunky13 avatar eldermatt avatar ferruhcihan avatar githubcdr avatar j-zimnowoda avatar k7o avatar k8sbee avatar leiarenee avatar martijncalker avatar merll avatar mojtabaimani avatar morriz avatar oshah97 avatar panpan0000 avatar rawc0der avatar renovate-bot avatar renovate[bot] avatar srodenhuis avatar staticvoid255 avatar tre7roja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

otomi-core's Issues

create post-deploy script

create post-deploy script to reboot some pods (preferably when conditions match, like when team is added):

  • istio-pilot.istio-system: done in drone pipeline
  • loki-0.monitoring (investigate if it's needed)

Template to deploy WordPress container with MySQL database

Add template to template repository to create a WordPress setup with:

  1. a Wordpress image (multiple versions)
  2. a MySQL database (with persistent storage)
  3. persistent file storage (multiple tiers)
  • fast (SSD)
  • regular (HDD)
  1. backup plan (file and db)

Add / support Promitor (Azure)

In our AKS setup we utilise an App Gateway in combination with a WAF feature. BCT has asked if logs from the WAF can be showed in a Grafana dashboard.

Maybe https://promitor.io/ can be a solution. Can we do a small test to see if we can get relevant logs/metrics out of Azure into a Grafana dashboard. We would like to have a single plane of glass for all metrics/logs

Customer CRD

Should reflect what a customer has chosen:

  • one or more projects (results in namespaces)
  • shared services (in {custId}-shared namespace

Upgrade istio

Upgrade istio with:

  1. SDS for Azure (Azure needs feature flag to be able to use secret projection)
  2. full mtls: figure out what is now not working with mtls correctly

Slack tuning

Now we get too many slack issues, which we can resolve:

CRIT:

  • Kubelet down: do we need to monitor this? Is it only an active rule? Can we disable this?

NON CRIT:

  • KubeQuotaExceeded: limit can be removed once we tuned OPA to not allow teams to edit their quota. (see #19)

Containers in external-dns and cert-manager-cainjector PODs does not start

Describe the bug
After deploying otomi-stack on GKE cluster:

➜  otomi-stack git:(master) ✗ ki get po                                                                                                                                                                                  (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
NAME                                             READY   STATUS             RESTARTS   AGE
cert-manager-795c889b5d-dxjlp                    1/1     Running            0          15m
cert-manager-cainjector-84565c968b-tvssk         0/2     CrashLoopBackOff   7          15m
external-dns-54687bdf76-gvs2x                    0/2     CrashLoopBackOff   6          15m
nginx-ingress-controller-55cd9d6867-j9fck        1/1     Running            0          15m
nginx-ingress-default-backend-67bfdcffcc-7jr8k   1/1     Running            0          15m
➜  otomi-stack git:(master) ✗ ki logs external-dns-54687bdf76-ngssv -c external-dns                                                                                                                                      (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
time="2020-02-26T13:28:04Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:google GoogleProject:otomi-cloud DomainFilter:[otomi.cloud] ExcludeDomains:[] ZoneIDFilter:[otomi] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false NS1Endpoint: NS1IgnoreSSL:false TransIPAccountName: TransIPPrivateKeyFile:}"
time="2020-02-26T13:28:04Z" level=info msg="Created Kubernetes client https://10.64.0.1:443"
time="2020-02-26T13:29:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
➜  otomi-stack git:(master) ✗ ki logs cert-manager-cainjector-84565c968b-scnt7 -c cert-manager                                                                                                                           (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
I0226 13:32:14.717843       1 start.go:82] starting ca-injector v0.12.0 (revision 0e384f5d0)
E0226 13:32:14.719970       1 manager.go:238] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused"
F0226 13:32:14.719999       1 start.go:118] error creating manager: Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused

Additional context
Killing pods does not help

Team namespace: knative services

System input:

  • docker image info
  • https domain
  • certArn?

System output:

  • running app on domain
  • cert generated by cert-manager? (if not certArn)

network policies in charts/team-ns

We need to limit the following for team namespaces:

  • no cross namespace traffic except to "shared" namespace
  • disable all egress, but only if a more specific egress rule for apps can be deployed later, resulting in allowed egress to targets specified in a service config's egressTargets

App CRD

Results:

App CRD which results in:

  1. Autoscaled knative service from docker image / git repo
  2. Stateful service connection from Service Catalog (redis, mongo, mysql)

TODO:

  1. Research & choose: OAM (with dapr & rudr) or CNAB?

Enhance OPA policies

We want our opa policies to also limit access to the following resources:

  • istio stuff
  • ResourceQuota
  • ?

Additional otomi-stack tests

Pipeline tests for dev cluster only:

  1. lint (already there)
  2. diff

Add additionalConfigs target in prometheus-operator for prom-blackbox-exporter

Team namespace

  • ingress
  • network policies
  • opa policies
  • istio authorization policies
  • knative services (if container given)

otomi-stack-api can be deployed via helm chart

The Pod configuration:

  1. have initContainer that:
  • pulls git repo with otomi-stack and stores in EmptyDir volume
  • prepares .kube and stores in EmptyDir volume
  1. The otomi-stack-api container should
  • mount otomi-stack volume
  • mount kube volume
  • use env from ConfigMap

The ConfigMap:

  • PORT

Drone pods stuck in terminating state

Describe the bug

➜  otomi-stack git:(master) ✗ kap                                                                                                                                                     (⎈ otomi-aks-dev-admin:default)
NAMESPACE           NAME                                                      READY   STATUS        RESTARTS   AGE
drone-pipelines     drone-12ny47i784r9wmssru66                                4/7     Terminating   2          8d
drone-pipelines     drone-cn0falbrifj4kanetbk1                                3/7     Terminating   2          8d
drone-pipelines     drone-hlj1pvowss6eyfd1ha5t                                4/7     Terminating   2          8d
drone-pipelines     drone-kql4jpfqxnvb8s2cxz3q                                3/7     Terminating   2          8d
drone-pipelines     drone-o3uwwjm5z0009g8gxcsy                                4/7     Terminating   2          8d
drone-pipelines     drone-ov3c9pc8ynykr4atoi4p                                4/7     Terminating   2          8d
drone-pipelines     drone-ytz1e3ow1w00guko80k5                                3/7     Terminating   2          8d

Add cluster auto scaler to be used on AKS (Azure)

It seems the cluster auto scaling feature for AKS (still in preview) is not working. We need to make sure the cluster auto scaler add-on can be used instead.

Also test the auto-scaling feature on both AKS and EKS

Drone leaves unterminated pods

➜  istio-operator k -n drone-pipelines get po --show-labels                                                                                                                                                 (⎈ aks-elemenz-ota-admin:default)
NAME                         READY   STATUS        RESTARTS   AGE    LABELS
drone-1eff6bd9z1lz3q3bur1v   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=7,io.drone.name=drone-1eff6bd9z1lz3q3bur1v,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-4amhypaz59acbew602xn   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=6,io.drone.name=drone-4amhypaz59acbew602xn,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-4ozf776086btrsy16oj3   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=4,io.drone.name=drone-4ozf776086btrsy16oj3,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-9kv76nkffsywtlvx6t2k   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=3,io.drone.name=drone-9kv76nkffsywtlvx6t2k,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-b8oys2tdykp77vuq1she   2/7     Terminating   4          5d5h   io.drone.build.event=push,io.drone.build.number=25,io.drone.name=drone-b8oys2tdykp77vuq1she,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-csmtayx4oi13u380qtni   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=2,io.drone.name=drone-csmtayx4oi13u380qtni,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-omtr6s7w48angst7h5g3   2/5     Terminating   2          19d    io.drone.build.event=push,io.drone.build.number=3,io.drone.name=drone-omtr6s7w48angst7h5g3,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-oz79r874kkw26z3k0qem   2/7     Terminating   4          9d     io.drone.build.event=push,io.drone.build.number=14,io.drone.name=drone-oz79r874kkw26z3k0qem,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-uab7pw1virhpaf6e1t1f   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=8,io.drone.name=drone-uab7pw1virhpaf6e1t1f,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true

Create Demo for Mediawiki application (with Knative)

For a (potential) customer, I would like to give a demo for the following use case:

  • they have a private Gitlab CI instance
  • they manage their own images (for media wiki instances) based on a mediawiki base image
  • We demonstrate the deployment of a new app. This results in a Knative deployment and a URL to access the app is provided.
  • We demonstrate how to update (add plug-ins and PHP extensions) the image and redeploy the updated image
  • Requires a MySQL instance

Azure Monitor in Grafana

User story

As an Azure user, I want to use Azure Monitor, so I can see Azure related metrics and logs.

Ideally, this is in the issue title, but if not, you can put it here. If so, delete this section.

Acceptance criteria

  • View datasource in Grafana and see that Azure Monitor is there
  • View Dashboards in Grafana and see that the following dashboards are there:
    • azure monitor
    • azure appgw
    • azure mariadb
    • azure redis

Tasks

If relevant, you can make a checklist for tasks.

    • Add datasource and make configurable
    • Import dashboards into stack and make configurable

Team dashboards

Features for team dashboards:

  • Not 'Service Index', but 'Team [$TEAM_NAME] dashboard' (when dashboard is for a specific team)
  • Not 'Service Index', but Cluster Admin Dashboard (when dashboard is for admins with full cluster scope)
  • Otomi Stack logo on page

Other:
Grafana only showing some dashboards relevant to the team (no cluster resources like nodes, no k8s resources)

Unable to redeploy Drone

Reason:

Events:
  Type     Reason              Age   From                                         Message
  ----     ------              ----  ----                                         -------
  Normal   Scheduled           3m    default-scheduler                            Successfully assigned team-admin/drone-server-75bd9df5bc-bkwwq to aks-agentpool1-36062263-vmss000004
  Warning  FailedAttachVolume  3m    attachdetach-controller                      Multi-Attach error for volume "pvc-44c5170c-0995-495a-81eb-98f550c56da9" Volume is already used by pod(s) drone-server-78bfd656cc-sg4qx
  Warning  FailedMount         57s   kubelet, aks-agentpool1-36062263-vmss000004  Unable to mount volumes for pod "drone-server-75bd9df5bc-bkwwq_team-admin(648cf00b-92f0-4bd2-bcac-7bb2f8409b52)": timeout expired waiting for volumes to attach or mount for pod "team-admin"/"drone-server-75bd9df5bc-bkwwq". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-g7jdk istio-envoy istio-certs]

Workaround:

Delete old drone ReplicaSet:

k -n team-admin get rs drone-<old-rs>

chart resource checkup & tuning

Most charts have preconfigured sane resource specifications, but we need to find out which don't have sane values, and which ones have none.

Team service will get a LimitRange as fallback, but we really want to tune all of our own workloads.

A service by default is not exposed to the public domain.

EXAMPLE

Add isExposed field:

teamConfig:
  teams:
    otomi:
      name: otomi
      services:
        - name: hello
          isPublic: false # does not need oauth2 sso
          isExposed: false # Service is not going to be exposed
          domain: custom.doma.in
          hasCert: true 
        - name: hello2
          isPublic: false # does not need oauth2 sso
          isExposed: true # service is going to be exposed 
          domain: custom.doma.in
          hasCert: true 

Adding this feature involves changes in the following files:

  • make conditional adding service to Nginx-ingress incharts/team-ns/templates/nginx-ingress.yaml
  • make conditional adding hosts field for VirtualService in
    charts/team-ns/templates/istio-virtualservices.yaml
  • make conditional adding host to ingressgateway in charts/team-ns/templates/istio-gateway.yaml

Moreover customer values migration will be needed for already exposed services!

Need to talk to @Morriz about above changes.

Implement dashboards for admins

Admin:

Landing on service dashboard

Top menu with items:

  1. Team list:
  • list of teams, with create/edit/delete leading to

1.1 Team:

  • name
  • password (used for multitenancy proxies, should become generated)
  • oidc details
  • base domain
  • list of services, create/edit/delete leading to

1.1.1 Team services:

  • name (used for url creation)
  • service toggle:
    • k8s svc (predeployed k8s service):
      • name
      • port
    • docker image:
      • location:
      • pull secret
      • semver to deploy automatically
  • domain
  • certArn

Add AAD Pod Identity (Azure) and Kube2IAM (AWS)

Both NS and BCT have asked to provide access to k8s applications to specific cloud resources (databases) based on role based access. To support this feature we need to integrate/support:

AAD Pod Identity for Azure
Kube2IAM for AWS

Execution of hfp command sometimes fails with error

Describe the bug
Execution of hfp command sometimes fails with error

Get https://34.90.25.94/api/v1/namespaces/tillerless/secrets?labelSelector=NAME%3Dweave-scope%2COWNER%3DTILLER: error executing access token command "/Users/jehoszafatzimnowoda/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=fork/exec /Users/jehoszafatzimnowoda/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=

It might be related to old token stored in ~/.kube/config , since I am able to fix it by calling kap command.

To Reproduce
It usually happens if I am not using this command by 1H

Expected behavior
It always works :)

Creating GKE cluster script has dependency on directory that I mounted from Otomi-values repo

➜  otomi-stack git:(master) ✗ ./bin/create-gke-cluster.sh                                                                                                  (⎈ gks_otomi-cloud_europe-west4_otomi-gke-dev:default)
bin/env.sh: line 5: cd: /Users/jehoszafatzimnowoda/workspace/otomi/otomi-stack/env/env: No such file or directory
ERROR: The value of CLOUD env must be one of the following: bin charts helmfile.d helmfile.tpl k8s test tests tools values
bin/env.sh: line 21: /Users/jehoszafatzimnowoda/workspace/otomi/otomi-stack/env/env/google/dev.sh: No such file or directory
WARNING: From 1.14, legacy Stackdriver GKE logging is deprecated. Thus, flag `--enable-cloud-logging` is also deprecated. Please use `--enable-stackdriver-kubernetes` instead, to migrate to new Stackdriver Kubernetes Engine monitoring and logging. For more details, please read: https://cloud.google.com/monitoring/kubernetes-engine/migration.
WARNING: From 1.14, legacy Stackdriver GKE monitoring is deprecated. Thus, flag `--enable-cloud-monitoring` is also deprecated. Please use `--enable-stackdriver-kubernetes` instead, to migrate to new Stackdriver Kubernetes Engine monitoring and logging. For more details, please read: https://cloud.google.com/monitoring/kubernetes-engine/migration.
ERROR: (gcloud.container.clusters.create) could not parse resource []
ERROR: (gcloud.container.clusters.get-credentials) argument --region: expected one argument
Usage: gcloud container clusters get-credentials NAME [optional flags]
  optional flags may be  --help | --internal-ip | --region | --zone

For detailed information on this command and its flags, run:
  gcloud container clusters get-credentials --help

Containers in external-dns and cert-manager-cainjector PODs does not start

Describe the bug
After deploying otomi-stack on GKE cluster:

➜  otomi-stack git:(master) ✗ ki get po                                                                                                                                                                                  (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
NAME                                             READY   STATUS             RESTARTS   AGE
cert-manager-795c889b5d-dxjlp                    1/1     Running            0          15m
cert-manager-cainjector-84565c968b-tvssk         0/2     CrashLoopBackOff   7          15m
external-dns-54687bdf76-gvs2x                    0/2     CrashLoopBackOff   6          15m
nginx-ingress-controller-55cd9d6867-j9fck        1/1     Running            0          15m
nginx-ingress-default-backend-67bfdcffcc-7jr8k   1/1     Running            0          15m
➜  otomi-stack git:(master) ✗ ki logs external-dns-54687bdf76-ngssv -c external-dns                                                                                                                                      (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
time="2020-02-26T13:28:04Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:google GoogleProject:otomi-cloud DomainFilter:[otomi.cloud] ExcludeDomains:[] ZoneIDFilter:[otomi] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false NS1Endpoint: NS1IgnoreSSL:false TransIPAccountName: TransIPPrivateKeyFile:}"
time="2020-02-26T13:28:04Z" level=info msg="Created Kubernetes client https://10.64.0.1:443"
time="2020-02-26T13:29:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
➜  otomi-stack git:(master) ✗ ki logs cert-manager-cainjector-84565c968b-scnt7 -c cert-manager                                                                                                                           (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
I0226 13:32:14.717843       1 start.go:82] starting ca-injector v0.12.0 (revision 0e384f5d0)
E0226 13:32:14.719970       1 manager.go:238] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused"
F0226 13:32:14.719999       1 start.go:118] error creating manager: Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Cannot redeploy drone

Events:
  Type     Reason              Age    From                                         Message
  ----     ------              ----   ----                                         -------
  Normal   Scheduled           3m24s  default-scheduler                            Successfully assigned team-admin/drone-server-78bfd656cc-bdl7f to aks-agentpool1-23650041-vmss000004
  Warning  FailedAttachVolume  3m24s  attachdetach-controller                      Multi-Attach error for volume "pvc-c4ce7cc3-25c4-4129-b8bd-66e3a674c0bc" Volume is already used by pod(s) drone-server-5cdf564c56-vtl8p
  Warning  FailedMount         81s    kubelet, aks-agentpool1-23650041-vmss000004  Unable to mount volumes for pod "drone-server-78bfd656cc-bdl7f_team-admin(491871ef-fab4-4bcd-9925-7b323ae764d3)": timeout expired waiting for volumes to attach or mount for pod "team-admin"/"drone-server-78bfd656cc-bdl7f". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-gw7cn istio-envoy istio-certs]

The prometheus-operator-prometheus-node-exporter resource limits changes with each deplyment

monitoring, prometheus-operator-prometheus-node-exporter, DaemonSet (apps) has changed:
  # Source: prometheus-operator/charts/prometheus-node-exporter/templates/daemonset.yaml
  apiVersion: apps/v1
  kind: DaemonSet
  metadata:
    name: prometheus-operator-prometheus-node-exporter
    namespace: monitoring
    labels:     
      app: prometheus-node-exporter
      heritage: Helm
      release: prometheus-operator
      chart: prometheus-node-exporter-1.8.1
      jobLabel: node-exporter
  spec:
    selector:
      matchLabels:
        app: prometheus-node-exporter
        release: prometheus-operator
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
    template:
      metadata:
        labels:         
          app: prometheus-node-exporter
          heritage: Helm
          release: prometheus-operator
          chart: prometheus-node-exporter-1.8.1
          jobLabel: node-exporter
      spec:
        serviceAccountName: prometheus-operator-prometheus-node-exporter
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
        containers:
          - name: node-exporter
            image: "quay.io/prometheus/node-exporter:v0.18.1"
            imagePullPolicy: IfNotPresent
            args:
              - --path.procfs=/host/proc
              - --path.sysfs=/host/sys
              - --web.listen-address=0.0.0.0:9100
              - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
              - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
            ports:
              - name: metrics
                containerPort: 9100
                protocol: TCP
            livenessProbe:
              httpGet:
                path: /
                port: 9100
            readinessProbe:
              httpGet:
                path: /
                port: 9100
            resources:
-             limits:
-               cpu: 200m
-               memory: 50Mi
-             requests:
-               cpu: 100m
-               memory: 30Mi
+             {}
            volumeMounts:
              - name: proc
                mountPath: /host/proc
                readOnly:  true
              - name: sys
                mountPath: /host/sys
                readOnly: true
        hostNetwork: true
        hostPID: true
        tolerations:
          - effect: NoSchedule
            operator: Exists
        volumes:
          - name: proc
            hostPath:
              path: /proc
          - name: sys
            hostPath:
              path: /sys

Add online form to admin dashboard to submit support tickets

When a customer uses otomi stack, they will always get support. We can provide a form (automatically configured with the correct customer information (like customer, support level, clustername, et cetera) that can be used to submit tickets to us

Analysis

  • Find out connectivity with zoho desk

A customer can upgrade otomi-stack

Prerequisites:

  • A customer has its own repo that contains only values
  • values has appVersion
  • FluxCD with Redis backend (link)

Upgrade scenario:

  • an otomi-stack version is released as a docker image
  • the scanner in otomi-api detects it and patches drone with the new STACK_VERSION env var. Redis client listens (pub-sub) to new image deployments detected by FluxCD.
  • the pipeline needs to be changed to use the new STACK_VERSION and the next values COMMIT will trigger the pipeline to deploy the new API potentially an otomi-api is released as a new docker image
  • drone pulls a new otomi-stack image
  • drone upgrades otomi-api
  • otomi-api sees that there is appVersion mismatch and perform values upgrade
  • drone deploys new stack after values upgrade

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.