Giter Site home page Giter Site logo

concourse-chart's Introduction

Concourse: the continuous thing-doer.

Discord Build Contributors Help Wanted

Concourse is an automation system written in Go. It is most commonly used for CI/CD, and is built to scale to any kind of automation pipeline, from simple to complex.

booklit pipeline

Concourse is very opinionated about a few things: idempotency, immutability, declarative config, stateless workers, and reproducible builds.

The road to Concourse v10

Concourse v10 is the code name for a set of features which, when used in combination, will have a massive impact on Concourse's capabilities as a generic continuous thing-doer. These features, and how they interact, are described in detail in the Core roadmap: towards v10 and Re-inventing resource types blog posts. (These posts are slightly out of date, but they get the idea across.)

Notably, v10 will make Concourse not suck for multi-branch and/or pull-request driven workflows - examples of spatial change, where the set of things to automate grows and shrinks over time.

Because v10 is really an alias for a ton of separate features, there's a lot to keep track of - here's an overview:

Feature RFC Status
set_pipeline step #31 ✔ v5.8.0 (experimental)
Var sources for creds #39 ✔ v5.8.0 (experimental), TODO: #5813
Archiving pipelines #33 ✔ v6.5.0
Instanced pipelines #34 ✔ v7.0.0 (experimental)
Static across step 🚧 #29 ✔ v6.5.0 (experimental)
Dynamic across step 🚧 #29 ✔ v7.4.0 (experimental, not released yet)
Projects 🚧 #32 🙏 RFC needs feedback!
load_var step #27 ✔ v6.0.0 (experimental)
get_var step #27 🚧 #5815 in progress!
Prototypes #37 ⚠ Pending first use of protocol (any of the below)
run step 🚧 #37 ⚠ Pending its own RFC, but feel free to experiment
Resource prototypes #38 🙏 #5870 looking for volunteers!
Var source prototypes 🚧 #6275 planned, may lead to RFC
Notifier prototypes 🚧 #28 ⚠ RFC not ready

The Concourse team at VMware will be working on these features, however in the interest of growing a healthy community of contributors we would really appreciate any volunteers. This roadmap is very easy to parallelize, as it is comprised of many orthogonal features, so the faster we can power through it, the faster we can all benefit. We want these for our own pipelines too! 😆

If you'd like to get involved, hop in Discord or leave a comment on any of the issues linked above so we can coordinate. We're more than happy to help figure things out or pick up any work that you don't feel comfortable doing (e.g. UI, unfamiliar parts, etc.).

Thanks to everyone who has contributed so far, whether in code or in the community, and thanks to everyone for their patience while we figure out how to support such common functionality the "Concoursey way!" 🙏

Installation

Concourse is distributed as a single concourse binary, available on the Releases page.

If you want to just kick the tires, jump ahead to the Quick Start.

In addition to the concourse binary, there are a few other supported formats. Consult their GitHub repos for more information:

Quick Start

$ wget https://concourse-ci.org/docker-compose.yml
$ docker-compose up
Creating docs_concourse-db_1 ... done
Creating docs_concourse_1    ... done

Concourse will be running at 127.0.0.1:8080. You can log in with the username/password as test/test.

⚠️ If you are using an M1 mac: M1 macs are incompatible with the containerd runtime. After downloading the docker-compose file, change CONCOURSE_WORKER_RUNTIME: "containerd" to CONCOURSE_WORKER_RUNTIME: "houdini". This feature is experimental

Next, install fly by downloading it from the web UI and target your local Concourse as the test user:

$ fly -t ci login -c http://127.0.0.1:8080 -u test -p test
logging in to team 'main'

target saved

Configuring a Pipeline

There is no GUI for configuring Concourse. Instead, pipelines are configured as declarative YAML files:

resources:
- name: booklit
  type: git
  source: {uri: "https://github.com/vito/booklit"}

jobs:
- name: unit
  plan:
  - get: booklit
    trigger: true
  - task: test
    file: booklit/ci/test.yml

Most operations are done via the accompanying fly CLI. If you've got Concourse installed, try saving the above example as booklit.yml, target your Concourse instance, and then run:

fly -t ci set-pipeline -p booklit -c booklit.yml

These pipeline files are self-contained, maximizing portability from one Concourse instance to the next.

Learn More

Contributing

Our user base is basically everyone that develops software (and wants it to work).

It's a lot of work, and we need your help! If you're interested, check out our contributing docs.

concourse-chart's People

Contributors

adamdang avatar adwski avatar aoldershaw avatar arbourd avatar ari-becker avatar chenbh avatar chrishiestand avatar cirocosta avatar clarafu avatar cpanato avatar estebanfs avatar evanchaoli avatar geemanjs avatar gowrisankar22 avatar jmccann avatar matthope avatar muntac avatar norbertbuchmueller avatar paulczar avatar richardalberto avatar rowanruseler avatar shyamz-22 avatar spimtav avatar swade1987 avatar taylorsilva avatar vito avatar wanderanimrod avatar xtreme-sameer-vohra avatar xtremerui avatar youssb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

concourse-chart's Issues

not logs for wrong password attempts and setting log levels

Hello Colleagues,
Why basic auth wrong password attempts are not logged for concourse. I don't find 401 not authorized when I check via kubectl logs po/concourse-we | grep admin_concourse.

Also, I am not sure to set multiple log levels. Actually I want to see all the logs debug, info, error.
How to get this?

Below settings will work?

   ## Minimum level of logs to see. Possible options: debug, info, error.
    ##
    logLevel: info,debug,error

https://github.com/concourse/concourse-chart/blob/master/values.yaml#L149-L151

Default values for secrets in values.yaml is not safe

The way secrets are being handled in this helm chart is not optimal and can lead to security holes. One might decide to provide a values.yaml file during deployment overriding the default keys (like this one: https://github.com/concourse/concourse-chart/blob/master/values.yaml#L1962) but if the chart at some point changes the name of the key, the override will do nothing an the deployment will happen with the default keys, which could be a security problem for public facing instances. No default private key or password should ever be in the default values.yaml file otherwise one would have to go through the whole file before every deployment or upgrade to make sure no default was added that should be overriden.

A better approach would be to run a job that generates all the needed secrets if they are not already there. Let me know what you think.

readme updates for helm 3x

i can make a pr but not sure if you want to use the helm 3x as default

changes found so far
helm install --name my-release concourse/concourse
helm install my-release concourse/concourse

Concourse gets into a restart loop if the web nodes take long to start up

In our large scale environment, the web nodes get into a restarting loop whenever we do an upgrade or purely restarting the web node.

In our case, this is usually whenever we upgrade and we see that the upgraded web nodes will have a status of CrashLoopBackOff and will switch to Running state and then eventually go back to CrashLoopBackOff. There is usually one web node that is still up and running, which we assume is the node that is kept so it can be a rolling deploy.

The failures we see on the crashing web nodes are Liveness probe failed: Get http://<ip>:80/api/v1/info: dial tcp <ip>:80: connect: connection refused, which had us think that because the web nodes were taking so long to come up (possibly due to migrations) the liveness probe started and was not getting a response so it ended up killing the node. We eventually configured the initialDelaySeconds on the liveness probe so that it starts checking the health of the web node after 5 minutes and this fixed the crashing web nodes.

Having to configure the initialDelaySeconds on the liveness probe isn't an optimal solution for solving the problem of slow migrations. In most cases, the start up of the web nodes should be fairly quick so configuring a long initial delay will cause k8s to take much longer to determine if the web nodes are healthy after they start up. Maybe we can look into configuring a default for the startupProbe https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#when-should-you-use-a-startup-probe that will allow for slower starting of web nodes due to slow migrations? Reading the docs https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes it seems like we can configure a failureThreshold * periodSeconds that will be long enough to cover the worse case startup time.

Ingress is not created

I have install concourse using helm chart.
I have add custom ingress data but two things happen:

  • first of all ingress isn't created
  • second - web pod is ignoring externalUrl flag.

Service is created using helm command:

helm install builder -f concourse-settings.yaml concourse/concourse --namespace builder

My config file:

concourse:
  web:
    enabled: true
    externalUrl: builder.my.domain.name
    bindPort: 80
    ingress:
      hosts:
        - builder.my.domain.name
      enabled: true
      annotations:
        external-dns.alpha.kubernetes.io/hostname: builder.my.domain.name
        kubernetes.io/ingress.class: alb
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/target-type: ip
        alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
        alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
        alb.ingress.kubernetes.io/certificate-arn: <arn-here>
  worker:
    baggageclaim:
      driver: btrfs
  persistence:
    worker:
      storageClass: gp2
  postgresql:
    persistence:
      storageClass: gp2
  secrets:
    bitbucketCloudClientId: <token>
    bitbucktetCloudClientSecret: <secret>

Because ingress creation isn't working I have made ingress myself:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "concourse-ingress"
  namespace: "builder"
  annotations:
    external-dns.alpha.kubernetes.io/hostname: builder.my.domain.name
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    alb.ingress.kubernetes.io/certificate-arn: <arn-here>
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: "builder-web"
              servicePort: 80

Everything is working - i can access dashboard on: https://builder.my.domain.name - but when I click login button it redirects me to 127.0.0.1

I have checked and pod web have configured enviornment variable CONCOURSE_EXTERNAL_URL: builder.my.domain.name.

Then as i supposed it should work correctly.
Have you any advises why it doesn't work correctly?

Wrong imageTag for chart

Looks like afe4388 changed the imageTag to a non-existent version of concourse/concourse image.

Failed to pull image "concourse/concourse:8.4.1": rpc error: code = Unknown desc = Error reading manifest 8.4.1

Ways to force HTTPS?

Hi, Would it be possible to extend the chart with a way to force HTTPS for requests to concourse web UI? Not sure if it's possible without ingress. For those using ingress controller with support for IngressRoute, it could do if the chart would come with (optional) IngressRoute manifest. Not sure how the support for IngressRoute in Kubernetes will develop over time.
Other options maybe?

Liveness probe fails on `web` pod

The web pod fails to run, seeing this event

  Warning  Unhealthy       15s (x2 over 25s)      kubelet, minikube  Readiness probe failed: Get http://172.17.0.5:8080/api/v1/info: dial tcp 172.17.0.5:8080: connect: connection refused

I'm using the default values.yml with the test user.

On minikube

minikube version: v1.6.2
commit: 54f28ac5d3a815d1196cd5d57d707439ee4bb392

Create Git Tags to Line up with Charts on `hub.helm.sh`

The latest version I see tagged in this repo is v9.0.0, but on hub.helm.sh I see versions later than that.

It would be helpful for me to be able to line up versions to git commits.

Thanks for working making this project!

Include garden networkpool configuration

Hi there,

Are we able to include the env variable of CONCOURSE_GARDEN_NETWORK_POOL to be part of the helm chart?

to support --garden-network-pool https://bosh.io/jobs/garden?source=github.com/cloudfoundry/garden-runc-release#p%3dgarden.network_pool

ie.

 garden:
      ## Path to the 'gdn' executable (or leave as 'gdn' to find it in $PATH)
      ##
      bin: gdn

      ## Path to a config file to use for Garden in INI format.
      ##
      ## For example, in a ConfigMap:
      ##
      ##   [server]
      ##     max-containers = 100
      ##
      ## For information about the possible values:
      ## Ref: https://bosh.io/jobs/garden?source=github.com/cloudfoundry/garden-runc-release
      ##
      config:

      ## Enable a proxy DNS server for Garden
      ##
      dnsProxyEnable:

      ## Use the insecure Houdini Garden backend.
      ##
      useHoudini:

      ## How long to wait for requests to Garden to complete. 0 means no timeout.
      ##
      requestTimeout:

      ## Network range to use for dynamically allocated container subnets. (Default value: 10.254.0.0/22)
      ## Ref: https://bosh.io/jobs/garden?source=github.com/cloudfoundry/garden-runc-release#p%3dgarden.network_pool
      networkPool:

Thanks!

Bump the chart continously

Right now the chart needs to be manually published by someone on the Concourse team hitting this button: https://ci.concourse-ci.org/teams/main/pipelines/concourse/jobs/ship-chart/

I don't like this workflow having manual components so I'd like to work toward having the chart bump automatically on every new commit merged into master.

One thing the concourse team does right now is add feature flags that aren't part of the concourse/concourse:latest image to the master branch of this repo. I'm thinking of having those PR's go into a dev branch and then merging dev into master once those features are release in the concoures/concoures:latest image.

Postgres dependency is deprecated

The current postgres dependency is pulled from the helm/charts stable registry. That chart is deprecated and in fact the whole repository will soon be deprecated

The chart is now hosted in bitnami's registry, it would be good to switch over to this new version as it also adds new functionality such as creating PodSecurityPolicies automatically for you.

issue is with configRBAC and auth config

Hi, I am trying to deploy 5.8.0 using helm charts. While deploying it fails to enter configRBAC and config auth. I have entered like below but my web node is failing.

➜  ~ k logs po/concourse-web-58997b67fd-4xkjc
{"timestamp":"2020-01-09T11:45:59.269737359Z","level":"info","source":"atc","message":"atc.cmd.start","data":{"session":"1"}}
{"timestamp":"2020-01-09T11:45:59.339357808Z","level":"info","source":"atc","message":"atc.credential-manager.configured credentials manager","data":{"name":"kubernetes","session":"6"}}
error: default team auth not configured: error converting YAML to JSON: yaml: did not find expected ',' or ']'
{"timestamp":"2020-01-09T11:45:59.344813625Z","level":"info","source":"atc","message":"atc.cmd.finish","data":{"duration":77408,"session":"1"}}
configRBAC:
  owner:
  - SetTeam
  - RenameTeam
  - DestroyTeam
      mainTeam:
        ## Configuration file for specifying team params.
        ## Ref: https://concourse-ci.org/managing-teams.html#setting-roles
        ##
        config:
          - name: owner
            local:
              users: ["admin"]

without setting the value helm install is failing.

error validating data: unknown object type "nil" in ConfigMap.data.config-rbac.yml; if you choose to ignore these errors, turn validation off with --validate=false

Can you please help , how to use this feature becasue document is not clear.

Best practices for vaultAuthParam?

Apologies if this is a stupid question but we have concourse and vault setup working really well but whenever we do an upgrade of the chart this secret is being removed because we don't store it anywhere. Is there a way of setting it and then having it ignored by any future upgrades or do we need another secrets manager for this? It's a bit of a chicken vs egg situation I think.

Any guidance would be really appreciated.

Add psp name as parameter

The podsecuritypolicy used in worker-role.yaml line 18 should be parameterised. The reason is that there are k8s deployments that use different psp names for default privileged access than the one that is hardcoded.

concourse 6.0 service monitor is not auto discoverd by prometheus operator

Hello Colleagues,

I have enabled the Prometheus monitor as well as a scrapper. The concourse is running in my default namespace and Prometheus operator is running in monitoring namespace.

Not sure why the service monitor is not auto-discovered by the Prometheus Operator.

@

prometheus:
      enabled: true

      ## IP to listen on to expose Prometheus metrics.
      ##
      bindIp: "0.0.0.0"

      ## Port to listen on to expose Prometheus metrics.
      ##
      bindPort: 9391
      ## If Prometheus operator is used, also create a servicemonitor object
      serviceMonitor:
        enabled: true
        interval: "30s"
        # Namespace the servicemonitor object should be in
        namespace: monitoring

@KYannick can you help with this issue ? Am i missing anythig ?

BRs, Gowrisankar

Secrets generating commands not compatible with openssh 7.9

Hi,

we are following the README section to generate secrets (https://github.com/concourse/concourse-chart#secrets). Using openssh 7.9
the documented commands (e.g. ssh-keygen -t rsa -f host-key -N '') generate keys that don't work with the helm chart. The web pod dies with the following error:

kubectl logs -f -n concourse concourse-web-776f984dfc-gv2wr                                                                                                                                   
{"timestamp":"2020-01-14T11:37:52.542320328Z","level":"info","source":"atc","message":"atc.cmd.start","data":{"session":"1"}}                                                                 
{"timestamp":"2020-01-14T11:37:52.617416433Z","level":"info","source":"atc","message":"atc.credential-manager.configured credentials manager","data":{"name":"kubernetes","session":"6"}}     
{"timestamp":"2020-01-14T11:37:52.641506010Z","level":"info","source":"atc","message":"atc.cmd.finish","data":{"duration":93901,"session":"1"}}                                               
panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                       
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1100805]                                                                                                                       
                                                                                                                                                                                              
goroutine 1 [running]:                                                                                                                                                                        
crypto/rsa.(*PrivateKey).Public(0x0, 0x0, 0x0)                                                                                                                                                
        /usr/local/go/src/crypto/rsa/rsa.go:100 +0x5                                                                                                                                          
golang.org/x/crypto/ssh.NewSignerFromSigner(0x7f2db321ba48, 0xc000010010, 0xc000010010, 0x7f2db321ba48, 0xc000010010, 0xeb0c01)                                                               
        /tmp/build/1c3187db/gopath/pkg/mod/golang.org/x/[email protected]/ssh/keys.go:720 +0x35                                                                       
golang.org/x/crypto/ssh.NewSignerFromKey(0x2b7f860, 0xc000010010, 0x7f2db32506d0, 0x0, 0xef2fc8, 0xc00118d198)                                                                                
        /tmp/build/1c3187db/gopath/pkg/mod/golang.org/x/[email protected]/ssh/keys.go:695 +0x167                                                                      
github.com/concourse/concourse/tsa/tsacmd.(*TSACommand).configureSSHServer(0xc0001ac540, 0xc00005ac00, 0xc000d54ea0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...)                                  
        /tmp/build/1c3187db/concourse/tsa/tsacmd/command.go:186 +0x2a1                                                                                                                        
github.com/concourse/concourse/tsa/tsacmd.(*TSACommand).Runner(0xc0001ac540, 0xc000d54970, 0x0, 0x1, 0x36a28e0, 0xc000dc7080, 0x0, 0x0)                                                       
        /tmp/build/1c3187db/concourse/tsa/tsacmd/command.go:97 +0x27a                                                                                                                         
main.(*WebCommand).Runner(0xc000690d88, 0xc000d54970, 0x0, 0x1, 0x7, 0x6, 0xc000d547e0, 0xc00027d0c0)                                                                                         
        /tmp/build/1c3187db/concourse/cmd/concourse/web.go:56 +0xf4 

After reading these docs: https://concourse-ci.org/concourse-generate-key.html we realized that concourse binary can be used to generate the same secrets. Using concourse to generate the secrets make it work. We used commmands like:

docker run -v $PWD:/keys --rm -it concourse/concourse generate-key -t rsa -f /keys/session-signing-key
docker run -v $PWD:/keys --rm -it concourse/concourse generate-key -t ssh -f /keys/worker-key
docker run -v $PWD:/keys --rm -it concourse/concourse generate-key -t ssh -f /keys/host-key

It seems like concourse binary implements the ssh-keygen command internally but is probably not compatible with the latest openssh (?).

In any case, the README should be aligned with the concourse-ci.org documentation so it might be better if the ssh-keygen command are changed with other's using the concourse binary (even the working ones listed above).

Unable to do helm dependency build

@taylorsilva I am getting an error while doing helm dependency build

➜  concourse-chart git:(release/5.7.x) helm dependency build
Error: Chart.lock is out of sync with Chart.yaml

Can you check ?

Add Microsoft auth and Conjur credential manager configuration options to Helm chart

New parameters introduced by recently merged PRs need to be included in Helm packaging.

From concourse/concourse:#4684:

  • CONCOURSE_MICROSOFT_CLIENT_ID
  • CONCOURSE_MICROSOFT_CLIENT_SECRET
  • CONCOURSE_MICROSOFT_TENANT
  • CONCOURSE_MICROSOFT_GROUPS
  • CONCOURSE_MICROSOFT_ONLY_SECURITY_GROUPS
  • CONCOURSE_MAIN_TEAM_MICROSOFT_USER
  • CONCOURSE_MAIN_TEAM_MICROSOFT_GROUP

BOSH changes are here

From concourse/concourse:#4693:

  • CONCOURSE_CONJUR_APPLIANCE_URL
  • CONCOURSE_CONJUR_ACCOUNT
  • CONCOURSE_CONJUR_CERT_FILE
  • CONCOURSE_CONJUR_AUTHN_LOGIN
  • CONCOURSE_CONJUR_AUTHN_API_KEY
  • CONCOURSE_CONJUR_AUTHN_TOKEN_FILE
  • CONCOURSE_CONJUR_PIPELINE_SECRET_TEMPLATE
  • CONCOURSE_CONJUR_TEAM_SECRET_TEMPLATE
  • CONCOURSE_CONJUR_SECRET_TEMPLATE

BOSH packaging changes are here

While we're here we may as well also fix this issue for the Vault namespace parameter (CONCOURSE_VAULT_NAMESPACE), BOSH changes are merged already

Is this the new "official" place for the Concourse chart?

Hey there-- not trying to pin down absolutes, just trying to do my due diligence; do you all expect the "official" concourse-chart to be hosted out of this repo (now or in the future) rather than the helm/charts upstream repo?

Thanks! 👍

install failing on kubernetes 1.17

Error: unable to build kubernetes objects from release manifest: [unable to recognize "": no matches for kind "Deployment" in version "extensions/v1beta1", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta2", unable to recognize "": no matches for kind "StatefulSet" in version "apps/v1beta1"]

Helm3 Support

are there any plans to support helm3? If there is interest or on-going work, would love to contribute

Login redirects to 127.0.0.1:8080

Hi,

When I run this chart, access the web UI and attempt to login, the login redirects from my domain name to http://127.0.0.1:8080/sky/issuer/auth, which subsequently fails.
I do not have any issues with other links on the Concourse Web UI.

I have not found any way to configure the external hostname or IP. How would I go about this?

Worth to mention that I have not enabled this chart's ingress.

Remove `authorizedKeys`

We should remove authorizedKeys from values.yaml, if it's not used anywhere. Otherwise let's make sure that it's wired up correctly if we do need it.

Cannot change worker persistence disk size

I've deployed the chart with default configuration which sets the workers disk size to 20Gi.

The disks filled up and I wanted to increase disk size by setting persistence.worker.size. When I do this, I'm getting:

Error: UPGRADE FAILED: cannot patch "concourse-worker" with kind StatefulSet: StatefulSet.apps "concourse-worker" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

Migrating from one chart reference to another

Forgive me, I haven't spent time experimenting with this this yet, but I've been having a difficult time finding an answer:

If we still have Concourse deployed referencing stable/concourse, is it possible to update the chart deployment to point to a new chart repo (concourse/concourse-chart) without any problems or changes in the deployment? Or will it try to do something like delete/recreate the chart deployment? Or cause unforeseen problems in any other way?

Again, I know, I should just try it, it'd take like 15 minutes, apologies for blindly asking-- but I'm curious if anyone has anything to say about it.

Thanks for anyone's time 👍

Readme/post-install warning feedback

We successfully installed concourse on a GKE cluster today.

Install warning feedback:

The post-install baggage claim driver feedback is useful, but the values.yml is so big that it's hard to track down the parent keys so that you can actually set it. (i.e. concourse.worker.baggageclaim.driver). That configuration value is not listed in the README page, which it probably should be if there's going to be a post-install warning about it. The key should probably be printed in the warning message to make it easy on users.

We also got this warning:

"You're using the default "test" user with the default "test" password."

I think this is spurious in our case, because we also set secrets.create: false. However we did find the code in the chart that was printing this, and realized we could set secrets.localUsers: "" to make the warning go away. I did confirm that test:test does not work on our helm deployed concourse.

README feedback

I found that the README section about secrets was inaccurate with respect to creating secrets for local users, github secrets, and the postgres username/password. The README indicates that you put these in the $HELM_RELEASE-concourse secret, but we found that we had to put them into the $HELM_RELEASE-web secret.

PreStop Hook exited with 137 blocking clean `kubectl delete pod`

Using the following command stucks for too much time:

smoke@rkirilov-work-pc ~ $ kubectl delete pod -n ci concourse-ci-worker-0 
pod "concourse-ci-worker-0" deleted

When I describe the POD it is clear that the PreStop Hook did not exit clean:

smoke@rkirilov-work-pc ~ $ kubectl describe pod -n ci concourse-ci-worker-0 | cat | tail -n 12
Events:
  Type     Reason             Age   From                                  Message
  ----     ------             ----  ----                                  -------
  Normal   Scheduled          79s   default-scheduler                     Successfully assigned ci/concourse-ci-worker-0 to ip-10-200-3-38.ec2.internal
  Normal   Pulled             78s   kubelet, ip-10-200-3-38.ec2.internal  Container image "concourse/concourse:5.8.0" already present on machine
  Normal   Created            78s   kubelet, ip-10-200-3-38.ec2.internal  Created container concourse-ci-worker-init-rm
  Normal   Started            78s   kubelet, ip-10-200-3-38.ec2.internal  Started container concourse-ci-worker-init-rm
  Normal   Pulled             72s   kubelet, ip-10-200-3-38.ec2.internal  Container image "concourse/concourse:5.8.0" already present on machine
  Normal   Created            72s   kubelet, ip-10-200-3-38.ec2.internal  Created container concourse-ci-worker
  Normal   Started            72s   kubelet, ip-10-200-3-38.ec2.internal  Started container concourse-ci-worker
  Normal   Killing            54s   kubelet, ip-10-200-3-38.ec2.internal  Stopping container concourse-ci-worker
  Warning  FailedPreStopHook  11s   kubelet, ip-10-200-3-38.ec2.internal  Exec lifecycle hook ([/bin/bash /pre-stop-hook.sh]) for Container "concourse-ci-worker" in Pod "concourse-ci-worker-0_ci(8688f7aa-6444-11ea-9917-0ad140727ba9)" failed - error: command '/bin/bash /pre-stop-hook.sh' exited with 137: , message: ""

So the only workaround is to now force delete the pod:

smoke@rkirilov-work-pc ~ $ kubectl delete pod --force --grace-period=0 -n ci concourse-ci-worker-0 
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "concourse-ci-worker-0" force deleted

May be /pre-stop-hook.sh should be patched to handle (trap) the relevant signals (e.g. SIGTERM, SIGINT, SIGHUP) and exit cleanly. I assume when the dumb-init is signaled, it on its own tries to cleanly terminate the /pre-stop-hook.sh and given it does not terminate cleanly - it gets killed with the exit code 137 that then blocks K8S.

I will give it a try and will update the ticket, hopefully with a PR.

Actually K8S just waits for the PreStop Hook only for a terminationGracePeriodSeconds amount of time and then sends a SIGTERM the containers and then SIGKILL all the running processes after 2 more seconds as per kubernetes/kubernetes#39170 (comment) and https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

However strange thing is the POD is left in terminating state for many more minutes and doesn't seem to restart.

So may be the best course of action would be to use timeout -k {.Values.worker.terminationGracePeriodSeconds} bash -c 'while [ -e /proc/1 ]; do sleep 1; done' or something similar I guess. This way at least the delete command will not be blocked.

Also it is important to increase the .Values.worker.terminationGracePeriodSeconds to something that makes sense for your own Pipelines.

Error `worker.beacon-runner.beacon.forward-conn.failed-to-dial`

I'm getting hundreds of these errors in my worker pod a second with a fairly minimal configuration of this chart:

{"timestamp":"2019-12-19T10:10:49.596891570Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.forward-conn.failed-to-dial","data":{"addr":"127.0.0.1:7777","error":"dial tcp 127.0.0.1:7777: connect: connection refused","network":"tcp","session":"4.1.4"}}

The configuration is:

web:
  replicas: 1
  ingress:
    annotations:
      kubernetes.io/ingress.class: "nginx"
      nginx.ingress.kubernetes.io/proxy-body-size: "0"
      cert-manager.io/cluster-issuer: "letsencrypt-prod"
    enabled: true
    tls:
    - secretName: tls-secret
      hosts:
      - XXXXXXX
    hosts:
    - XXXXXXX

concourse:
  web:
    tsa:
      heartbeatInterval: 120s
    kubernetes:
      createTeamNamespaces: false
    externalUrl: "https://XXXXXXX"
    localAuth:
      enabled: true
    auth:
      mainTeam:
        localUser: XXXXXXX

worker:
  replicas: 1
  emptyDirSize: 20Gi

secrets:
  create: false

persistence:
  worker:
    size: 256Gi

I'm working off of commit 8c45b70dc559e65fd0a0a2953254873ee222a49a (which is tag v8.4.1) with a minor modification to the stateful set:

diff --git a/templates/worker-statefulset.yaml b/templates/worker-statefulset.yaml
index 80c5bb0..dd35bb2 100644
--- a/templates/worker-statefulset.yaml
+++ b/templates/worker-statefulset.yaml
@@ -56,7 +56,7 @@ spec:
           {{- end }}
           imagePullPolicy: {{ .Values.imagePullPolicy | quote }}
           securityContext:
-            privileged: true
+            allowPrivilegeEscalation: true
           command:
             - /bin/bash
           args:
@@ -280,7 +280,7 @@ spec:
 {{ toYaml .Values.worker.resources | indent 12 }}
 {{- end }}
           securityContext:
-            privileged: true
+            allowPrivilegeEscalation: true
           volumeMounts:
             - name: concourse-keys
               mountPath: {{ .Values.worker.keySecretsPath | quote }}

Upgrade chart to Support Helm 3

With the release of Helm 3 and v2 of the API, we should look at migrating the chart to the new version. Looking to get an idea if people are ready to make that change and/or if there's a reason to hold off on migrating the chart yet. #53 already does this migration so taking it in now would resolve this issue. We want to get an idea if this is something users are okay with. The only negative thing about this change is that you can't use the Helm 2 CLI with the chart once it goes to v2 of the API.

EDIT: We're pretty far into the lifecyle of Helm 3 now and Helm 2 is only has three months of security patches left. We should cut over now. https://helm.sh/blog/helm-v2-deprecation-timeline/

Notes:

I'm unsure if any issues currently do exist when trying to use helm 3 with the current version of the chart.


Goal

Using Helm v3 with the chart works without any issues.

Tasks

Updating the chart first is probably a good first step. Then let the jobs that fail in CI guide the rest of this story/epic.

failure to clean concourse-work-dir without btrfs disk

This is a movement of this issue: helm/charts#17803

Chart
stable/concourse

The Bug
Launch the chart on any kubernetes without having a storage provider specify btrfs as the fsType, and the concourse-worker pods will fail to start with a clean disk.

You can't clear the concourse workDir with just rm -rf, as attempted to fix by this previous commit.
helm/charts@4060f5b#diff-33a16a0789cde854c77e65e7774aa2beR61

The problem though is that the previous commit assumes the workDir disk is a btrfs disk. On kubernetes clusters with default, like AWS-EBS gp2, its ext4. So without the attached disk being btrfs, you get this.

On a worker pod:

# mount | grep work-dir
/dev/xvdcu on /concourse-work-dir type ext4 (rw,relatime,data=ordered)
/dev/loop0 on /concourse-work-dir/volumes type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)

The concourse-worker-init-rm container:

    Command:
      /bin/bash
    Args:
      -ce
      for v in $((btrfs subvolume list --sort=-ogen "/concourse-work-dir" || true) | awk '{print $9}'); do
        (btrfs subvolume show "/concourse-work-dir/$v" && btrfs subvolume delete "/concourse-work-dir/$v") || true
      done
      rm -rf "/concourse-work-dir/*"
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
# btrfs subvolume list --sort=-ogen "/concourse-work-dir"
ERROR: not a btrfs filesystem: /concourse-work-dir
ERROR: can't access '/concourse-work-dir'
# echo $?
1

So while the pod terminated cleanly, it didn't infact delete anything. Concourse doesnt seem to like a new worker joining with old state.

Work Around
Confirmed working for us.

Make a new storage class for concourse set to fsType: btrfs ala these docs: https://kubernetes.io/docs/concepts/storage/storage-classes/#parameters
And specify the chart to use that storage class here in the values:
https://github.com/helm/charts/blob/master/stable/concourse/values.yaml#L1733

Make sure the underlying kubernetes node has btrfs-progs installed.

Background
What you'll actually see: the concourse workers run out of subnets because they think too many containers are running.

> fly -t main workers
name                containers  platform  tags  team  state    version  age
concourse-worker-1  256         linux     none  none  running  2.2      59s
concourse-worker-2  249         linux     none  none  running  2.2      23h1m

What you'll see in the logs:

08T19:18:16.751663080Z","level":"info","source":"guardian","message":"guardian.create.create-failed-cleaningup.destroy.delete.finished","data":{"cause":"insufficient subnets remaining in the pool","handle":"fe03bd40-66fa-4919-6b16-ebb63215dec9","session":"637.8.1.1"}}

Why does this happen? Because I think somethings broken in concourse baggage-claim related to time jobs, like listed here here. concourse/concourse#847 Thats a separate problem upstream with concourse that for some reason the maintainers may not be seeing, probably because they run much more ephemeral concourse workers than this helm chart. Maybe starting new workers with old state in the workDir is the cause of the whole thing. I'm not sure.

In our case, to fix this bug we should just be able to restart the pod and clear out the persistent disk right? Nope. Thus the bug.

Why this is an issue and not a PR

Mostly because I just started working on concourse a few weeks ago and have little idea what I'm doing.

So this link lists the problem that concourse workers need disks or there's a problem with ImageGC. https://github.com/helm/charts/tree/master/stable/concourse#persistence

Ok so the disks are stateful, but thats a problem. Because new workers need to come up with no state from the previous run of the worker, as mentioned
Here:
https://github.com/helm/charts/blob/5a33da9adf31ee802ca6e1247b0b5fdac2bb9aca/stable/concourse/values.yaml#L1060-L1075
Here:
https://github.com/helm/charts/blob/5a33da9adf31ee802ca6e1247b0b5fdac2bb9aca/stable/concourse/values.yaml#L1538-L1540
and this comment here:
concourse/concourse#1194 (comment)

Which would bring me to ask why aren't workers done as a standard deployment where by default, the persistent disk would be attached and then deleted when the pod is cycled? That would still provide separate disks but remove the need of the concourse-worker-init-rm container entirely?

Or based on a conversation in concourse's discord, upstream might change away from btrfs entirely and then everything might work fine?
concourse/concourse#4071

`prs` pipeline for `concourse-chart`?

Since we have our own chart repo now, we do not have to go through the process:
branch merged -> helm/charts/stable/concourse -> our master branch

So the code in legacy branch maintenance of concourse/charts does not quite fit to our new repo. e.g.: the pipeline reconfigure

In concourse/concourse when a new PR is drafted the pipeline prs is going to do the test against the PR and update the PR status after the test is accomplished.

Do we need the similar stuff for helm?

Versioning in the presence of backports

Hey,

Given that we have two versions for the Helm chart (the version of the chart
itself - i.e., packaging -, as well as the version of the underlying software -
concourse), versioning a backport is quite weird: should we just bump the
patch number? Would we end up with collisions? This becomes speciall important
as we create a repository to store all of our chart versions.

What I think we could do, is establish a little process that would allow us to
keep those versions sane without having to spend too much thinking about it:

  • backward-incompatible changes that break the contract of interacting with the
    chart mean a bump in major

e.g.: renaming a field in values.yaml

  • new functionality that is backward-compatible, get a minor
    bump

e.g.: adding a new field to values.yaml, or changing certain templates in a
way that doesn't affect any of the interaction

  • changes in the underlying version of concourse get a minor bump
    • except when backporting: changes in backports always produce a patch

e.g.:


REGULAR RELEASES

			bump concourse to 5.5.6
			   -----------> 

		before				after

	version: 1.2.3				version: 1.3.0
	appVersion: 5.5.5			appVersion: 5.5.6



"LTS" VERSION (one receiving a backport)



			bump concourse to 5.5.6
			   -----------> 

		before				after

	version: 1.2.3				version: 1.2.4
	appVersion: 5.5.5			appVersion: 5.5.6

Wdyt?

I tried deviating the less from semver, but it seems like there a little bit of
compromising that we'd need to do 😬

thanks!

Why did k8s-topgun not catch the broken configmap?

We should figure out why the broken configmap (fixed by #67) was not caught by our integration test suite and ensure future major issues like that are caught.

Idea: Add test that tests the default install steps provided in the README.md

Installation issue on mac

I am following the readme.md documentation, trying to install Concourse on a Mac.

helm repo add concourse https://concourse-charts.storage.googleapis.com/
"concourse" has been added to your repositories

helm install concourse/concourse
Error: must either provide a name or specify --generate-name

Providing a name and --generate-name generates the same error:

helm install concourse/concourse --generate-name
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [unknown object type "nil" in ConfigMap.data.config-rbac.yml, unknown object type "nil" in ConfigMap.data.main-team.yml]

helm version
version.BuildInfo{Version:"v3.0.2", GitCommit:"19e47ee3283ae98139d98460de796c1be1e3975f", GitTreeState:"clean", GoVersion:"go1.13.5"}

kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:30:10Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:12:17Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

So not much luck in installing Concourse.

There's no way to pass --ssh flag when building image use image_resource vito/oci-build-task

Hi Vito,

Here's a frustrating issue. In my Dockerfile, some pip dependencies come from private repositories. For a manual build, I pass the key as the following:

  1. Inside Dockerfile
    RUN --mount=type=ssh,id=github_ssh_key $VIRTUAL_ENV/bin/pip3 install -U -r ./hats/requirements-test.txt
  2. When executing docker command
    DOCKER_BUILDKIT=1 d build --progress=plain -t image:tag -f Dockerfile github_ssh_key=<PATH-TO-KEY>/.

Now, my Concourse uses K8s credentials manager as it runs as Helm Chart. My job looks like the following:

jobs:

- name: build-image
  plan:
  - get: repo
  - task: build-image-task
    privileged: true
    config:
      platform: linux
      image_resource:
        type: registry-image
        source:
          repository: vito/oci-build-task
      inputs:
      - name: repo
      outputs:
      - name: image
      run:
        path: build
      caches:
      - path: cache
      params:
        CONTEXT: repo
        BUILD_ARG_ssh: github_ssh_key=((github-key))

((github-key)) is a K8s secret resource.

I was hoping the line BUILD_ARG_ssh: github_ssh_key=((github-key)) could pass the ssh flag as --ssh github_ssh_key=<MY-KEY>, but I only realized that the --ssh flag only accepts the path to the key, not the key value. Now since I'm using K8s credentials manager from Helm Chart, I have no idea how to find the path to the key.

[stable/concourse] Cannot set empty string value for "xFrameOptions"

We set the value of xFrameOptions to an empty string "" in our ci deployment. This allows us to embed example pipelines in our docs page. But helm does not recognize a diff between empty strings and not setting it.

+      xFrameOptions: ""

but helm diff does not indicate there are any changes to apply.

Namespace creation is not very optional

Describe the bug
When setting concourse.web.kubernetes.createTeamNamespaces: false, namespaces are not created. But, web-rolebindings.yaml does not contain a proper conditional to survive the absence of team namespaces, thus causing tiller to error:

$ helm install --set concourse.web.kubernetes.createTeamNamespaces=false /path/to/concourse-chart
Error: release concourse failed: namespaces "concourse-main" not found

This used to be helm/charts#18008

Version of Helm and Kubernetes:

$ helm version
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:13:54Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.