redkubes / otomi-core Goto Github PK

Application Platform for Kubernetes

License: Apache License 2.0

Shell 3.93% Smarty 19.94% Dockerfile 0.62% Open Policy Agent 4.03% Mustache 53.58% Python 0.42% TypeScript 16.69% JavaScript 0.73% Makefile 0.06%

kubernetes paas developer-selfservice devops self-hosted gitops platform-engineering

otomi-core's Introduction

Website

This website is built using Docusaurus 2, a modern static website generator.

Installation

$ yarn

Local Development

$ yarn start

This command starts a local development server and opens up a browser window. Most changes are reflected live without having to restart the server.

Build

$ yarn build

This command generates static content into the build directory and can be served using any static contents hosting service.

Deployment

Using SSH:

$ USE_SSH=true yarn deploy

Not using SSH:

$ GIT_USER=<Your GitHub username> yarn deploy

If you are using GitHub pages for hosting, this command is a convenient way to build the website and push to the gh-pages branch.

otomi-core's People

Contributors

Stargazers

Watchers

Forkers

sjefvanleeuwen munir-debug silasb ekmixon the-mirak tbazz venkat8887 ajinkyaingole30 bujail yasir2000 rrengaramanujam johnmarcos mkjmkumar florianstoeber armfarm wojtekdmyszewicz azizur235 j-zimnowoda shashanksr6694 relentlessfarms devopstoday11 sphara-app sambarisrinivas vebr hyvz-devops jesusoctavioas kjm0001 srodenhuis smilexizheng jeroenbrons ayush758 vikranth18devops deepak7093 ngchub ankurjha21 lotf ginolucianorojo1993 kawael taregnaeem skytesting gangadharure svietz optionalg enixdark rajivchirania b2rana lokhmakov m-assila mohamed-rafraf zukobronja vasili23 fluffy73 rafmo20d blank-1 yangzhi201 derek2060 bugbounted cduran gillarpc suriyaruk nikhilvenkatkumsetty amruta-bandhu-chaudhury karnatisrinivas naman-tiwari anshuman35 akshit42-hue gayathri-bluemeric selvakumar-arumugapandian marianod92 marcoadasilvaa gursimarsm denisgolius tre7roja spo-ops feliux maxfield-allison cristibaca na3 rajkumar-dev143 allenliu88 jevans3 annihilatorrrr jacky68147527 jiten-kmar reneshreddygujju gpraveen9 turbo-sec elmehditaii kostyanius gostool pterameta soft-wa-re sulabh88 mahfuz68 alikl182 djtms roninby seifrajhi tundebello23 arunans23

otomi-core's Issues

create post-deploy script

create post-deploy script to reboot some pods (preferably when conditions match, like when team is added):

istio-pilot.istio-system: done in drone pipeline
loki-0.monitoring (investigate if it's needed)

Template to deploy WordPress container with MySQL database

Add template to template repository to create a WordPress setup with:

a Wordpress image (multiple versions)
a MySQL database (with persistent storage)
persistent file storage (multiple tiers)

fast (SSD)
regular (HDD)

backup plan (file and db)

Add / support Promitor (Azure)

In our AKS setup we utilise an App Gateway in combination with a WAF feature. BCT has asked if logs from the WAF can be showed in a Grafana dashboard.

Maybe https://promitor.io/ can be a solution. Can we do a small test to see if we can get relevant logs/metrics out of Azure into a Grafana dashboard. We would like to have a single plane of glass for all metrics/logs

Customer CRD

Should reflect what a customer has chosen:

one or more projects (results in namespaces)
shared services (in {custId}-shared namespace

Upgrade istio

Upgrade istio with:

SDS for Azure (Azure needs feature flag to be able to use secret projection)
full mtls: figure out what is now not working with mtls correctly

map all core services onto one domain with path mapping

both cloud ingress as well as nginx-ingress should have this

NOTE: because of auth cookie we have to create at least one dashboard per team.

Slack tuning

Now we get too many slack issues, which we can resolve:

CRIT:

Kubelet down: do we need to monitor this? Is it only an active rule? Can we disable this?

NON CRIT:

KubeQuotaExceeded: limit can be removed once we tuned OPA to not allow teams to edit their quota. (see #19)

Containers in external-dns and cert-manager-cainjector PODs does not start

Describe the bug
After deploying otomi-stack on GKE cluster:

➜  otomi-stack git:(master) ✗ ki get po                                                                                                                                                                                  (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
NAME                                             READY   STATUS             RESTARTS   AGE
cert-manager-795c889b5d-dxjlp                    1/1     Running            0          15m
cert-manager-cainjector-84565c968b-tvssk         0/2     CrashLoopBackOff   7          15m
external-dns-54687bdf76-gvs2x                    0/2     CrashLoopBackOff   6          15m
nginx-ingress-controller-55cd9d6867-j9fck        1/1     Running            0          15m
nginx-ingress-default-backend-67bfdcffcc-7jr8k   1/1     Running            0          15m

➜  otomi-stack git:(master) ✗ ki logs external-dns-54687bdf76-ngssv -c external-dns                                                                                                                                      (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
time="2020-02-26T13:28:04Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:google GoogleProject:otomi-cloud DomainFilter:[otomi.cloud] ExcludeDomains:[] ZoneIDFilter:[otomi] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false NS1Endpoint: NS1IgnoreSSL:false TransIPAccountName: TransIPPrivateKeyFile:}"
time="2020-02-26T13:28:04Z" level=info msg="Created Kubernetes client https://10.64.0.1:443"
time="2020-02-26T13:29:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
➜  otomi-stack git:(master) ✗ ki logs cert-manager-cainjector-84565c968b-scnt7 -c cert-manager                                                                                                                           (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
I0226 13:32:14.717843       1 start.go:82] starting ca-injector v0.12.0 (revision 0e384f5d0)
E0226 13:32:14.719970       1 manager.go:238] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused"
F0226 13:32:14.719999       1 start.go:118] error creating manager: Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused

Additional context
Killing pods does not help

Team namespace: knative services

System input:

docker image info
https domain
certArn?

System output:

running app on domain
cert generated by cert-manager? (if not certArn)

network policies in charts/team-ns

We need to limit the following for team namespaces:

no cross namespace traffic except to "shared" namespace
disable all egress, but only if a more specific egress rule for apps can be deployed later, resulting in allowed egress to targets specified in a service config's egressTargets

App CRD

Results:

App CRD which results in:

Autoscaled knative service from docker image / git repo
Stateful service connection from Service Catalog (redis, mongo, mysql)

TODO:

Research & choose: OAM (with dapr & rudr) or CNAB?

Enhance OPA policies

We want our opa policies to also limit access to the following resources:

istio stuff
ResourceQuota
?

Additional otomi-stack tests

Pipeline tests for dev cluster only:

lint (already there)
diff

Add additionalConfigs target in prometheus-operator for prom-blackbox-exporter

App Identity and Access Istio Adapter

The adapter is out of the box solution that seamlessly integrates authentication and authorisation functionality for services (apps) that are deployed in k8s cluster.
Refer to:

Two factor authentication may disallow to login to drone

While accessing Drone: e.g.: https://drone.team-admin.dev.aks.otomi.cloud

You may see HTTP 500 error do to the fact that our SSO does not support 2-factor authentication.

Workaround:
If you see this issue then

In you web browser logout from github
then login you will be prompted to fill additional code.
After successful login try to access drone from otomi-stack

Team namespace

ingress
network policies
opa policies
istio authorization policies
knative services (if container given)

otomi-stack-api can be deployed via helm chart

The Pod configuration:

have initContainer that:

pulls git repo with otomi-stack and stores in EmptyDir volume
prepares .kube and stores in EmptyDir volume

The otomi-stack-api container should

mount otomi-stack volume
mount kube volume
use env from ConfigMap

The ConfigMap:

PORT

Istio Authorization Policies for teams

Auth proxy passes Auth-Group header to create Istio Authorization Policies. These policies must end up in the team-ns chart.

Drone pods stuck in terminating state

Describe the bug

➜  otomi-stack git:(master) ✗ kap                                                                                                                                                     (⎈ otomi-aks-dev-admin:default)
NAMESPACE           NAME                                                      READY   STATUS        RESTARTS   AGE
drone-pipelines     drone-12ny47i784r9wmssru66                                4/7     Terminating   2          8d
drone-pipelines     drone-cn0falbrifj4kanetbk1                                3/7     Terminating   2          8d
drone-pipelines     drone-hlj1pvowss6eyfd1ha5t                                4/7     Terminating   2          8d
drone-pipelines     drone-kql4jpfqxnvb8s2cxz3q                                3/7     Terminating   2          8d
drone-pipelines     drone-o3uwwjm5z0009g8gxcsy                                4/7     Terminating   2          8d
drone-pipelines     drone-ov3c9pc8ynykr4atoi4p                                4/7     Terminating   2          8d
drone-pipelines     drone-ytz1e3ow1w00guko80k5                                3/7     Terminating   2          8d

Add cluster auto scaler to be used on AKS (Azure)

It seems the cluster auto scaling feature for AKS (still in preview) is not working. We need to make sure the cluster auto scaler add-on can be used instead.

Also test the auto-scaling feature on both AKS and EKS

Drone leaves unterminated pods

➜  istio-operator k -n drone-pipelines get po --show-labels                                                                                                                                                 (⎈ aks-elemenz-ota-admin:default)
NAME                         READY   STATUS        RESTARTS   AGE    LABELS
drone-1eff6bd9z1lz3q3bur1v   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=7,io.drone.name=drone-1eff6bd9z1lz3q3bur1v,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-4amhypaz59acbew602xn   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=6,io.drone.name=drone-4amhypaz59acbew602xn,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-4ozf776086btrsy16oj3   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=4,io.drone.name=drone-4ozf776086btrsy16oj3,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-9kv76nkffsywtlvx6t2k   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=3,io.drone.name=drone-9kv76nkffsywtlvx6t2k,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-b8oys2tdykp77vuq1she   2/7     Terminating   4          5d5h   io.drone.build.event=push,io.drone.build.number=25,io.drone.name=drone-b8oys2tdykp77vuq1she,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-csmtayx4oi13u380qtni   2/6     Terminating   3          16d    io.drone.build.event=push,io.drone.build.number=2,io.drone.name=drone-csmtayx4oi13u380qtni,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-omtr6s7w48angst7h5g3   2/5     Terminating   2          19d    io.drone.build.event=push,io.drone.build.number=3,io.drone.name=drone-omtr6s7w48angst7h5g3,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-oz79r874kkw26z3k0qem   2/7     Terminating   4          9d     io.drone.build.event=push,io.drone.build.number=14,io.drone.name=drone-oz79r874kkw26z3k0qem,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true
drone-uab7pw1virhpaf6e1t1f   2/7     Terminating   4          16d    io.drone.build.event=push,io.drone.build.number=8,io.drone.name=drone-uab7pw1virhpaf6e1t1f,io.drone.repo.name=otomi-values-bct,io.drone.repo.namespace=redkubes,io.drone=true

Create Demo for Mediawiki application (with Knative)

For a (potential) customer, I would like to give a demo for the following use case:

they have a private Gitlab CI instance
they manage their own images (for media wiki instances) based on a mediawiki base image
We demonstrate the deployment of a new app. This results in a Knative deployment and a URL to access the app is provided.
We demonstrate how to update (add plug-ins and PHP extensions) the image and redeploy the updated image
Requires a MySQL instance

create architecture overview

per configuration:

external LB (or not)
oauth2
istio
team ns

Azure Monitor in Grafana

User story

As an Azure user, I want to use Azure Monitor, so I can see Azure related metrics and logs.

Ideally, this is in the issue title, but if not, you can put it here. If so, delete this section.

Acceptance criteria

View datasource in Grafana and see that Azure Monitor is there
View Dashboards in Grafana and see that the following dashboards are there:
- azure monitor
- azure appgw
- azure mariadb
- azure redis

Tasks

If relevant, you can make a checklist for tasks.

- Add datasource and make configurable
- Import dashboards into stack and make configurable

otomi-stack deploys otomi-stack-api chart for admin role

Team dashboards

Features for team dashboards:

Not 'Service Index', but 'Team [$TEAM_NAME] dashboard' (when dashboard is for a specific team)
Not 'Service Index', but Cluster Admin Dashboard (when dashboard is for admins with full cluster scope)
Otomi Stack logo on page

Other:
Grafana only showing some dashboards relevant to the team (no cluster resources like nodes, no k8s resources)

Unable to redeploy Drone

Reason:

Events:
  Type     Reason              Age   From                                         Message
  ----     ------              ----  ----                                         -------
  Normal   Scheduled           3m    default-scheduler                            Successfully assigned team-admin/drone-server-75bd9df5bc-bkwwq to aks-agentpool1-36062263-vmss000004
  Warning  FailedAttachVolume  3m    attachdetach-controller                      Multi-Attach error for volume "pvc-44c5170c-0995-495a-81eb-98f550c56da9" Volume is already used by pod(s) drone-server-78bfd656cc-sg4qx
  Warning  FailedMount         57s   kubelet, aks-agentpool1-36062263-vmss000004  Unable to mount volumes for pod "drone-server-75bd9df5bc-bkwwq_team-admin(648cf00b-92f0-4bd2-bcac-7bb2f8409b52)": timeout expired waiting for volumes to attach or mount for pod "team-admin"/"drone-server-75bd9df5bc-bkwwq". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-g7jdk istio-envoy istio-certs]

Workaround:

Delete old drone ReplicaSet:

k -n team-admin get rs drone-<old-rs>

Add isPublic key that indicated that a service is public domain

chart resource checkup & tuning

Most charts have preconfigured sane resource specifications, but we need to find out which don't have sane values, and which ones have none.

Team service will get a LimitRange as fallback, but we really want to tune all of our own workloads.

A service by default is not exposed to the public domain.

EXAMPLE

Add isExposed field:

teamConfig:
  teams:
    otomi:
      name: otomi
      services:
        - name: hello
          isPublic: false # does not need oauth2 sso
          isExposed: false # Service is not going to be exposed
          domain: custom.doma.in
          hasCert: true 
        - name: hello2
          isPublic: false # does not need oauth2 sso
          isExposed: true # service is going to be exposed 
          domain: custom.doma.in
          hasCert: true

Adding this feature involves changes in the following files:

make conditional adding service to Nginx-ingress incharts/team-ns/templates/nginx-ingress.yaml
make conditional adding hosts field for VirtualService in
charts/team-ns/templates/istio-virtualservices.yaml
make conditional adding host to ingressgateway in charts/team-ns/templates/istio-gateway.yaml

Moreover customer values migration will be needed for already exposed services!

Need to talk to @Morriz about above changes.

make kiali anonymous

check if we can use AzureOIDC connector for oauth2-proxy

oauth2-proxy/oauth2-proxy#308

Add Kured to Otomi Stack

Both BCT and NS have asked if Kured could be shipped as part of Otomi Stack

See: https://github.com/weaveworks/kured

Implement dashboards for admins

Admin:

Landing on service dashboard

Top menu with items:

Team list:

list of teams, with create/edit/delete leading to

1.1 Team:

name
password (used for multitenancy proxies, should become generated)
oidc details
base domain
list of services, create/edit/delete leading to

1.1.1 Team services:

name (used for url creation)
service toggle:
- k8s svc (predeployed k8s service):
  - name
  - port
- docker image:
  - location:
  - pull secret
  - semver to deploy automatically
domain
certArn

Add AAD Pod Identity (Azure) and Kube2IAM (AWS)

Both NS and BCT have asked to provide access to k8s applications to specific cloud resources (databases) based on role based access. To support this feature we need to integrate/support:

AAD Pod Identity for Azure
Kube2IAM for AWS

Task: Change key isPublic to addSingleSignOn

Execution of hfp command sometimes fails with error

Describe the bug
Execution of hfp command sometimes fails with error

Get https://34.90.25.94/api/v1/namespaces/tillerless/secrets?labelSelector=NAME%3Dweave-scope%2COWNER%3DTILLER: error executing access token command "/Users/jehoszafatzimnowoda/google-cloud-sdk/bin/gcloud config config-helper --format=json": err=fork/exec /Users/jehoszafatzimnowoda/google-cloud-sdk/bin/gcloud: no such file or directory output= stderr=

It might be related to old token stored in ~/.kube/config , since I am able to fix it by calling kap command.

To Reproduce
It usually happens if I am not using this command by 1H

Expected behavior
It always works :)

Add vertical pod auto scaler

See https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

otomi-stack deploys otomi-stack-api chart for each team

GitLab tryout

https://gitlab.k8s-dev.otomi.cloud

limitations:

no mtls yet (issue opened with gitlab team)
no gcs yet (minio with pvcs)

todo:

create basic ci/cd pipeline

Creating GKE cluster script has dependency on directory that I mounted from Otomi-values repo

➜  otomi-stack git:(master) ✗ ./bin/create-gke-cluster.sh                                                                                                  (⎈ gks_otomi-cloud_europe-west4_otomi-gke-dev:default)
bin/env.sh: line 5: cd: /Users/jehoszafatzimnowoda/workspace/otomi/otomi-stack/env/env: No such file or directory
ERROR: The value of CLOUD env must be one of the following: bin charts helmfile.d helmfile.tpl k8s test tests tools values
bin/env.sh: line 21: /Users/jehoszafatzimnowoda/workspace/otomi/otomi-stack/env/env/google/dev.sh: No such file or directory
WARNING: From 1.14, legacy Stackdriver GKE logging is deprecated. Thus, flag `--enable-cloud-logging` is also deprecated. Please use `--enable-stackdriver-kubernetes` instead, to migrate to new Stackdriver Kubernetes Engine monitoring and logging. For more details, please read: https://cloud.google.com/monitoring/kubernetes-engine/migration.
WARNING: From 1.14, legacy Stackdriver GKE monitoring is deprecated. Thus, flag `--enable-cloud-monitoring` is also deprecated. Please use `--enable-stackdriver-kubernetes` instead, to migrate to new Stackdriver Kubernetes Engine monitoring and logging. For more details, please read: https://cloud.google.com/monitoring/kubernetes-engine/migration.
ERROR: (gcloud.container.clusters.create) could not parse resource []
ERROR: (gcloud.container.clusters.get-credentials) argument --region: expected one argument
Usage: gcloud container clusters get-credentials NAME [optional flags]
  optional flags may be  --help | --internal-ip | --region | --zone

For detailed information on this command and its flags, run:
  gcloud container clusters get-credentials --help

support sops to encrypt values/_env folder

We should support sops so that we can encrypt values/_env/**/*

Containers in external-dns and cert-manager-cainjector PODs does not start

Describe the bug
After deploying otomi-stack on GKE cluster:

➜  otomi-stack git:(master) ✗ ki get po                                                                                                                                                                                  (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
NAME                                             READY   STATUS             RESTARTS   AGE
cert-manager-795c889b5d-dxjlp                    1/1     Running            0          15m
cert-manager-cainjector-84565c968b-tvssk         0/2     CrashLoopBackOff   7          15m
external-dns-54687bdf76-gvs2x                    0/2     CrashLoopBackOff   6          15m
nginx-ingress-controller-55cd9d6867-j9fck        1/1     Running            0          15m
nginx-ingress-default-backend-67bfdcffcc-7jr8k   1/1     Running            0          15m

➜  otomi-stack git:(master) ✗ ki logs external-dns-54687bdf76-ngssv -c external-dns                                                                                                                                      (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
time="2020-02-26T13:28:04Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[ingress] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:google GoogleProject:otomi-cloud DomainFilter:[otomi.cloud] ExcludeDomains:[] ZoneIDFilter:[otomi] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false NS1Endpoint: NS1IgnoreSSL:false TransIPAccountName: TransIPPrivateKeyFile:}"
time="2020-02-26T13:28:04Z" level=info msg="Created Kubernetes client https://10.64.0.1:443"
time="2020-02-26T13:29:04Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"
➜  otomi-stack git:(master) ✗ ki logs cert-manager-cainjector-84565c968b-scnt7 -c cert-manager                                                                                                                           (⎈ gke_otomi-cloud_europe-west4_otomi-gke-dev:default)
I0226 13:32:14.717843       1 start.go:82] starting ca-injector v0.12.0 (revision 0e384f5d0)
E0226 13:32:14.719970       1 manager.go:238] cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused"
F0226 13:32:14.719999       1 start.go:118] error creating manager: Get https://10.64.0.1:443/api?timeout=32s: dial tcp 10.64.0.1:443: connect: connection refused

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Cannot redeploy drone

Events:
  Type     Reason              Age    From                                         Message
  ----     ------              ----   ----                                         -------
  Normal   Scheduled           3m24s  default-scheduler                            Successfully assigned team-admin/drone-server-78bfd656cc-bdl7f to aks-agentpool1-23650041-vmss000004
  Warning  FailedAttachVolume  3m24s  attachdetach-controller                      Multi-Attach error for volume "pvc-c4ce7cc3-25c4-4129-b8bd-66e3a674c0bc" Volume is already used by pod(s) drone-server-5cdf564c56-vtl8p
  Warning  FailedMount         81s    kubelet, aks-agentpool1-23650041-vmss000004  Unable to mount volumes for pod "drone-server-78bfd656cc-bdl7f_team-admin(491871ef-fab4-4bcd-9925-7b323ae764d3)": timeout expired waiting for volumes to attach or mount for pod "team-admin"/"drone-server-78bfd656cc-bdl7f". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-gw7cn istio-envoy istio-certs]

The prometheus-operator-prometheus-node-exporter resource limits changes with each deplyment

monitoring, prometheus-operator-prometheus-node-exporter, DaemonSet (apps) has changed:
  # Source: prometheus-operator/charts/prometheus-node-exporter/templates/daemonset.yaml
  apiVersion: apps/v1
  kind: DaemonSet
  metadata:
    name: prometheus-operator-prometheus-node-exporter
    namespace: monitoring
    labels:     
      app: prometheus-node-exporter
      heritage: Helm
      release: prometheus-operator
      chart: prometheus-node-exporter-1.8.1
      jobLabel: node-exporter
  spec:
    selector:
      matchLabels:
        app: prometheus-node-exporter
        release: prometheus-operator
    updateStrategy:
      type: RollingUpdate
      rollingUpdate:
        maxUnavailable: 1
    template:
      metadata:
        labels:         
          app: prometheus-node-exporter
          heritage: Helm
          release: prometheus-operator
          chart: prometheus-node-exporter-1.8.1
          jobLabel: node-exporter
      spec:
        serviceAccountName: prometheus-operator-prometheus-node-exporter
        securityContext:
          runAsNonRoot: true
          runAsUser: 65534
        containers:
          - name: node-exporter
            image: "quay.io/prometheus/node-exporter:v0.18.1"
            imagePullPolicy: IfNotPresent
            args:
              - --path.procfs=/host/proc
              - --path.sysfs=/host/sys
              - --web.listen-address=0.0.0.0:9100
              - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
              - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
            ports:
              - name: metrics
                containerPort: 9100
                protocol: TCP
            livenessProbe:
              httpGet:
                path: /
                port: 9100
            readinessProbe:
              httpGet:
                path: /
                port: 9100
            resources:
-             limits:
-               cpu: 200m
-               memory: 50Mi
-             requests:
-               cpu: 100m
-               memory: 30Mi
+             {}
            volumeMounts:
              - name: proc
                mountPath: /host/proc
                readOnly:  true
              - name: sys
                mountPath: /host/sys
                readOnly: true
        hostNetwork: true
        hostPID: true
        tolerations:
          - effect: NoSchedule
            operator: Exists
        volumes:
          - name: proc
            hostPath:
              path: /proc
          - name: sys
            hostPath:
              path: /sys

re-enable mtls when possible for:

cert-manager
grafana

Remove Certificate resources in favour of transparent tls/acme annotation

Add online form to admin dashboard to submit support tickets

When a customer uses otomi stack, they will always get support. We can provide a form (automatically configured with the correct customer information (like customer, support level, clustername, et cetera) that can be used to submit tickets to us

Analysis

Find out connectivity with zoho desk

add kiali and jaeger to team dashboards

A customer can upgrade otomi-stack

Prerequisites:

A customer has its own repo that contains only values
values has appVersion
FluxCD with Redis backend (link)

Upgrade scenario:

an otomi-stack version is released as a docker image
the scanner in otomi-api detects it and patches drone with the new STACK_VERSION env var. Redis client listens (pub-sub) to new image deployments detected by FluxCD.
the pipeline needs to be changed to use the new STACK_VERSION and the next values COMMIT will trigger the pipeline to deploy the new API potentially an otomi-api is released as a new docker image
drone pulls a new otomi-stack image
drone upgrades otomi-api
otomi-api sees that there is appVersion mismatch and perform values upgrade
drone deploys new stack after values upgrade