orange-cloudfoundry / cf-ops-automation Goto Github PK
View Code? Open in Web Editor NEWa collaboration framework for operating cloudfoundry and services at scale
a collaboration framework for operating cloudfoundry and services at scale
Cf-ops-automation is tied to folder structure which can be a bottleneck for adoption, each teams use is own structure to handle bosh/cf deployment.
This proposal intent also to:
In this project each deployment must have a deployment-dependencies.yml
which actually more describe deployment than dependencies.
We can use this file as a more generic solution to describe all deployments and where to find them, this file could be at the root of repository like travis can do with .travis.yml
or any other CI system.
Proposal of the new format:
.deployments.yml
and place is in the root folderpath
to know where the stubs and templates are for the deployment.name
can be added.enable-cf-app.yml
file, instead prefer use an enum called type
which can have 2 possible values bosh
(for bosh deployment) or cf
for (for cloud foundry app)sites
key which contains list of sites available for deployment (e.g.: clouwatt
, openwatt
, sophia
...)types
key inside each site declared by sites
which contains list of type of deployment (e.g.: prod
, preprod
, dev
...)targets
could be added for this usecase.Here what can look the manifest:
deployments:
- name: micro-bosh
type: bosh
cli_version: v2
path: path/to/my/deployment
stemcells:
bosh-openstack-kvm-ubuntu-trusty-go_agent: latest
releases:
route-registrar-boshrelease:
base_location: https://bosh.io/d/github.com/
repository: cloudfoundry-community/route-registrar-boshrelease
shield:
base_location: https://bosh.io/d/github.com/
repository: starkandwayne/shield-boshrelease
xxx_boshrelease:
base_location: https://bosh.io/d/github.com/
repository: xxx/yyyy
errands:
smoke_tests: ~
sites:
- name: cloudwatt
types: [prod, dev]
- name: my-app
type: bosh
path: path/to/my/deployment
targets:
- cf_api_url: ~
cf_username: ~
cf_password: ~
cf_organization: ~
cf_space: ~
Actually, a deployment must always follow a structure and implicit manifest style (like using spruce).
Instead of oblige ops to follow this structure a simple shell script inside the deployment could be called.
Script should always receive the same input from concourse call and always the same output but you ops will be free to construct his final manifest.
Script name should always be called generate_manifest.sh
to be after used by a CI systems.
As a side benefit by using a script let us be able to deploy in other CI than concourse.
$ ./generate_manifest.sh <sitename> <type>
cloudwatt
, openwatt
, sophia
, bagnolet
...prod
, preprod
, qa
, dev
...Those inputs let ops be able to generate manifest by these identifiers but he is not oblige to use it.
Input will be given by sites
and types
keys from .deployments.yml
file by concourse.
The output should always return one or many absolute path to manifests to deploy for a deployment.
If there is more than one each path will be separate by a new line:
$ ./generate_manifest.sh cloudwatt prod
/path/to/my/first/manifest
/path/to/my/second/manifest
To be able to push an application to CF, we need app binaries. We use pre-cf-push.sh
mechanism to download binaries. It's better to use dedicated resource to handle binary download (ie maven, github-release, etc...)
Updating a file xx-tpl.yml in template triggers the git scan and the concourse deployment.
It seems xx.yml file (non spruced file) do not trigger the git scan.
(context: deply cloudfoundry job, ops-automation v1.3-prod)
If confirmed, this is a blocker for bosh2 manifest files.
Introduces a shared/models
dir in paas-templates to allow sharing across root deployment.
Origin model could be described in deployment-dependencies.yml
---
deployment:
bosh-ops:
resources:
template:
base_path: ["shared/models/bosh"]
secrets:
...
CC: @JCL38-ORANGE
We support bosh release not hosted on bosh.io using deploy.sh
script mechanism. But release is not described as a dependency. It may generate additional manual operation to complete an upgrade.
We could use the git+https
format
Today errand jobs (if there are some for a deployment) are grouped all together in a unique Concourse box.
The request is to have one box by errand job in order to trigger each one separately.
Moreover, we would like to categorize errands job :
*A manual category means "errand triggered by human (click)"
*An automatic category means "errand triggered by concourse"
as reusing on the shelf manifest, deploy fails with vars file property defined with a leading /
See cloudfoundry/bosh-cli#300
Guess the concourse resource used by cf-automation should leverage the latest bosh cli v2. Version
https://github.com/cloudfoundry/bosh-deployment-resource
NB: bosh cli version 2.0.36 is ok with this syntax.
Currently when pushing secret or template changes related to CF or terraform, the pipeline user needs to browse through the ops-depls-generated
pipeline to check proper execution of update-pipeline-ops-depls-generated
job and then browse to the TF pipelines/groups (such as ops-depls-generated?groups=Terraform
)
It would be great to have the update-pipeline-ops-depls-generated
job be part of the ops-depls-generated?groups=Terraform
Cloud and runtime config support ops and var files. Add this support to cf-ops automation
https://bosh.io/docs/cli-v2.html#update-cloud-config
Cloud config
Update current cloud config on the Director.
$ bosh -e my-env update-cloud-config config.yml [-v ...] [-o ...] (Alias: ucc)
Runtime config
bosh -e my-env update-runtime-config config.yml [-v ...] [-o ...] (Alias: urc)
Update current runtime config on the Director.
The old bosh deployment resource is broken on concourse 3.5.x and 3.6.0. The switch from busybox to alpine doesn't include bosh-cli v1. It is fixed by vmware-archive/bosh-deployment-resource#49, but no concourse version includes it yet.
FYI: @szaouam @poblin-orange
Now that the pipeline jobs are common among different providers (cf, openstack, ...) the naming including "cf" in micro-depls pipeline is confusing.
What about removing the "cf" prefix to become check-terraform-consistency
, tf-manual-approval
, and enforce-terraform-consistency
?
When a root deployment is only composed by disabled
deployment it cannot be loaded.
Log output
Dependencies loaded:
---
cf-rabbit:
status: disabled
mongo-docker:
status: disabled
cloudfoundry-mysql:
status: disabled
cf-operator:
status: disabled
admin-ui:
status: disabled
Error:
exit status 1 - error: invalid configuration:
invalid resources:
resource 'bosh-stemcell' is not used
as a result, we have the following duplicated credentials across apps.
cf-app:
cf-etherpad:
cf_api_url:
cf_username:
cf_password:
cf_organization:
cf_space:
Would be great to have defaults in shared/secrets.yml for these properties
This is a follow up of #40 and more generally the terraform pipelines
In the context of on-demand deployments, we need to have root deployments that would leverage TF and not bosh.
Currently, the generated pipeline always include the bosh-related jobs recreate-all
, cloud-config-and-runtime-config-for-cloudflare-depls
, execute-deploy-script
. These jobs will remain failing/red
Additionally, the upate-pipeline-cloudflare-depls-generated
fails if the bosh director credentials are missing
- unbound variable in template: 'bosh-target'
- unbound variable in template: 'bosh-username'
- unbound variable in template: 'bosh-password'
We use a custom task to push application to CF. Using this:
Consider using a native concourse resource to push cf app: https://github.com/concourse/cf-resource.
This resource doesn't supports dynamic configuration, so it doesn't match our current requirement:
api, username, password, organization, space should be evaluated at runtime not during pipeline generation.
When we have a given xxx deployment, a single xxx.yml in xxx/template directory is not used for bosh deployment.
The concourse automation expects a xxx-tpl.yml file, to be spruced then deployed. However, new credhubs / var-store syntax is not spruce compatible.
A defaut behavior could be to just deploy xxx.yml if present (without spruce templating)
A Iaas specifics ops and var file is also required for bosh deployment.
Related to #50
Such support would be quite useful both at the root-deployment and deployment level
http://bosh.io/docs/cli-ops-files.html
Support in the bosh deployment resource: https://github.com/cloudfoundry/bosh-deployment-resource#out-deploy-a-bosh-deployment
The paas-template repo aims at being public (at least its main remote, see #62) with submodules fetching public github repos (such as cf-deployment).
COA needs to be able to operate in a transiently disconnected/offline mode (i.e. be robust to transient internet connectivity loss).
Our strategy is to have local replicas of all resources fetched from the internet, supporting this offline mode, as well as saving internet bandwidth and local caching.
A pipeline would replicate a number of upstream repos in the local gitlab repo.
Additionally, during clone/update of git repos (such as paas-template), we need the sync-feature-branch pipeline to configure git to fetch and update submodules from a local replica instead of straight from the internet. This may be achieved using git config insteadOf syntax https://git-scm.com/docs/git-config#git-config-urlltbasegtinsteadOf in the /home/user/.gitconfig
[url "https://gitlab.internal.paas"]
insteadOf = "https://github.com"
The COAB GitProcessor would maintain the same .gitconfig.
For all deployments (manifest file) contained in a BOSH director, the stemcell and bosh versions are defined in the file XXX-depls-versions.yml :
...
stemcell-version = XXX
bosh-version = YYY
...
The requirement is to enable the overriding capability for a dedicated deployment.
When a new deployment is added, it freezes until the first pipeline step is executed.
Due to first pipeline step (update-pipeline-xxx-depls-generated), the pipeline auto-update. After auto-update, the new deployment is added with new dependencies. But theses dependencies haven't passed first step, so pipeline freeze.
feature #50 brought the ability to have clean naming /directory convention to separate ops-files per iaas-type.
eg:
template
openstack
openstack-operators.yml
vsphere
vpshere-operators.yml
This is very convenient. The mechanism could be extended, with multiple values specified in iaas_type property.
eg:
iaas_type: openstack,preprod
This should apply the operators in subdir template/openstack, AND template/preprod.
cf-op-automation is able to retrieve bosh releases from bosh.io, and we can express the deployment => bosh releases dependency, wich generates correct pipeline.
However, we sometime need specific bosh release only for runtime-config.
There is no way to express the dependent bosh relaases.
As a workaround, we do a manual bosh-upload command in template/deploy.sh, but its limited (scheduled daily, not triggered when runtime-config is updated)
cc @aveyrenc @shoudusse
Currently, bootstrapping a bosh director is a manual operation. The pipeline micro-bosh-init-pipeline.yml applies inception terraform and generates unused manifest.
Some steps are missing to have a full bootstrap support. So using bosh-cli-v2, bosh create-env
command to setup this new director.
We need to persist as secrets:
bosh.pem
location could be parametrized using spruce and environment variable
CC: @poblin-orange
The bosh director needs a scheduled clean up of it unused resources (stemcells and releases) otherwise eventually its persistent disk becomes full.
The framework already supports for managing the bosh director resources (through cloud-config, run-time config, and deploy.sh script), see diagram. Having 1st class systematic support for periodic clean up would avoid explicit per director script in deploy.sh
Related bosh support:
The bosh deploylment concourse resource also supports invocating the bosh clean up https://github.com/concourse/bosh-deployment-resource/blob/master/spec/out_spec.rb#L186-L196
Currently the sync-feature-branches pipeline watches a single paas-template git resource.
As we opensource common features into paas-template, and each keep orange entities keeps specific automation in their own fork, we'll need the pipeline to fetch and merge from more than one remote (see related #83)
Additionally, the COAB broker intends to commit to such an additional clone (as to avoid need to get git push permissions to the full paas-templates repo)
We need to upgrade from 0.7.3 to latest 0.9.1 (may be 0.8.1) to manage space security groups within TF
Steps:
Terraform providers are no longer distributed as part of the main Terraform distribution. Instead, they are installed automatically as part of running terraform init
A new flag -auto-approve has been added to terraform apply. We suggest that anyone running terraform apply in wrapper scripts or automation refer to the upgrade guide to learn how to prepare such wrapper scripts for the later breaking change.
currently a bosh deploy failure only notifies on slack. It would be useful to have the on_failure be invocating multiple steps (with do ?), including one that would dump the previously generated manifest
example for admin-ui
- put: deploy-admin-ui
attempts: 3
params:
manifest: final-release-manifest/admin-ui.yml
stemcells:
- bosh-stemcell/stemcell.tgz
releases:
- admin-ui-boshrelease/release.tgz
on_failure:
put: failure-alert
params:
channel: {{slack-channel}}
text: Failed to run [[$BUILD_PIPELINE_NAME/$BUILD_JOB_NAME ($BUILD_NAME)]($ATC_EXTERNAL_URL/teams/main/pipelines/$BUILD_PIPELINE_NAME/jobs/$BUILD_JOB_NAME/builds/$BUILD_NAME)].
icon_url: http://cl.ly/image/3e1h0H3H2s0P/concourse-logo.png
username: Concourse
The following error occurs :
cd updated-git-resource
git checkout -b develop -t origin/develop
fatal: A branch named 'develop' already exists.
There is no update of the docker image used by this task.
/cc @poblin-orange
The documented syntax for bosh operators files is KO. In fact, operators file are not spruce compatible (array of actions, spruce requires a root yaml element).
Solution is to use simple operator name (xxx-operators.yml instead of xxx-operators-tpl), and use an accociated var file to populate the secretss (xxx-vars-tpl.yml, with spruce grab if required).
The delete lifecyle currently requires human approval for bosh deployment deletion.
In the context of user-triggered creation and deletion, such human approval does not scale, and will lead to Iaas resources leaks and quota exhaustion. We need an option (e.g. in ci-deployment-overview.yml) to configure a periodic automatic delete approval per root deployment (say every 24 hours), leaving enough time for an emergency undelete manual action upon user request.
It would be quite useful to have the failure of the cf push dump the recent logs as suggested TIP: use 'cf logs cf-etherpad --recent' for more information
It seems [skip ci] is ignored when a ressource is updated with a put
.
According to Concourse doc and git resource:
"A build can produce a new version of a resource by running a put step. This version is automatically detected and becomes available in later build steps and later pipeline jobs."
"Note that if you want to push commits that change these files via a put, the commit will still be "detected", as check and put both introduce versions. To avoid this you should define a second resource that you use for commits that change files that you don't want to feed back into your pipeline - think of one as read-only (with ignore_paths) and one as write-only (which shouldn't need it)."
To improve auditability of secrets generated and used by bosh director, it would be usefull to have in secrets repo a fingerprint audit of the variables
eg :
$ bosh variables
ID Name
8d9da341-8fe8-4557-88bd-9c4e7dd9cbbb /bosh-expe/bosh-remote-iaas/admin_password
297c53d6-b712-4e1a-9a02-8021c7bdf6e1 /bosh-expe/bosh-remote-iaas/blobstore_agent_password
Currently, the pipeline generation and autoloading from ops-depls-generated
are not included into the ops-depls-cf-apps-generated
pipeline.
As a result, a change to secret that triggers a change to the pipeline often fails to be picked up automatically by the ops-depls-cf-apps-generated
pipeline (both pipelines are competing/racing).
By having the update-pipeline-ops-depls-generated
push to a concourse ressource (say a git repository resource holding the generated pipeline source code), then the ops-depls-cf-apps-generated
could watch on this resource, and automatically retirger a build on any new pipeline version.
Currently, the failure of the execute-deploy-script
is hard to diagnose, as the execute deploy.sh script isn't displayed, only standard output of script is run (see sample trace below)
The cf-ops-automation/concourse/tasks/execute_deploy_script.yml script could be changed to display executed commands e.g.
Alternatively, each template deploy.sh
scripts could be enhanced to have this set -x command in non confidential scripts, or only parts of it.
run:
path: sh
args:
- -exc
- |
cp -r script-resource/. run-resource
cp -r templates/. run-resource
source ./script-resource/scripts/bosh_cli_v2_login.sh ${BOSH_TARGET}
run-resource/${CURRENT_DEPLS}/deploy.sh
Task 667926 done
Succeeded
Using environment 'https://192.168.116.158:25555' as client 'admin'Task 667927
10:17:48 | Downloading remote release: Downloading remote release (00:00:08)
10:17:56 | Extracting release: Extracting release (00:00:00)
10:17:56 | Verifying manifest: Verifying manifest (00:00:00)
10:17:57 | Resolving package dependencies: Resolving package dependencies (00:00:00)
10:17:57 | Processing 3 existing packages: Processing 3 existing packages (00:00:00)
10:17:57 | Processing 1 existing job: Processing 1 existing job (00:00:00)
10:17:57 | Release has been created: logsearch-shipper/4+dev.2 (00:00:00)Started Wed Aug 16 10:17:48 UTC 2017
Finished Wed Aug 16 10:17:57 UTC 2017
Duration 00:00:09Task 667927 done
Succeeded
Using environment 'https://192.168.116.158:25555' as client 'admin'Task 667928
10:17:58 | Downloading remote release: Downloading remote release (00:00:01)
10:17:59 | Extracting release: Extracting release (00:00:00)
10:17:59 | Verifying manifest: Verifying manifest (00:00:00)
L Error: Release manifest not found10:17:59 | Error: Release manifest not found
Started Wed Aug 16 10:17:58 UTC 2017
Finished Wed Aug 16 10:17:59 UTC 2017
Duration 00:00:01Task 667928 error
Uploading remote release 'https://github.com/orange-cloudfoundry/spring-microservices-toolbox-boshrelease/archive/v4.tar.gz':
Expected task '667928' to succeed but state is 'error'
To ease complex deployment troubleshooting, it would be useful to offer a preview of the combination of the files provided in manifest (including operators order).
https://www.terraform.io/docs/internals/debugging.html
This is leveraged by https://github.com/orange-cloudfoundry/terraform-provider-cloudfoundry/#provider-configuration that can display CC_API verbose traces (equivalent of CF_TRACE=true) as well as recent app logs during app deployment (similar as #5)
Note that terraform-resource supports this into the env param. To this issue might be solved with #1
Currently, to guard against errors when applying terraform config on sensitive providers (e.g openstack or CF), the TF specs are only applied after a manual invocation of the "cf-manual-approval" job, which then flows to the "enforce-terraform-cf-consistency" job.
The current UX is 3 fold:
check-cf-consistency
, checking dependent resources where proper updated, and check the terraform plan
output.cf-manual-approval
and check it output.enforce-terraform-cf-consistency
job to self trigger, and verify its successful completionThis creates the following UX problems:
enforce-terraform-cf-consistency
job can misleading be directly invoked by impatient contributors, unexpectedly applying old version of the terraform specs.Suggested changes:
cf-manual-approval
and enforce-terraform-cf-consistency
jobs into a single approve-and-enforce-terraform-cf-consistency
reenforce-current-terraform-cf-consistency
to trigger reapplication of the "terraform-plan" on previous version of paas-template resource and paas-secretCurrently, a single team is used with all pipelines included into them.
This includes some pipelines that may not be useful to contributors to templates/secrets, (e.g. bootstraping pipelines). This polutes the concourse UX and may open the door to mistakes if some pipelines get manually triggered by error.
delete-lifecyle current implementation compare the paas-template deployment templates and paas-secret deployment config to identify candidate deployments for deletion.
In the context of on-demand deployments that produce both dynamic templates in paas-template and deployment config in paas-secret, then the current implementation prevents COAB from cleaning up the dynamic service instance deployment in paas-template, otherwise no candidate deployments for deletion get identified.
This results into COAB generating files leak into paas template. For example, a smoke test on each COAB deployment would leak 5-10 files in paas-template in each.
For each bosh deployment, the generated pipeline commit the generated manifest in the paas-secret repo. However, the delete lifecycle does not remove the generated bosh manifest, resulting into a leak paas-secret
cf-ops-automation/concourse/pipelines/template/depls-pipeline.yml.erb
Lines 281 to 286 in ba94fc6
We would need to automate invocation of "terraform state mv" when the tf configuration changes impact resources that should be preserved on infra.
Background:
Uses cases:
Some potential inspirations:
Latest version of spruce do ignore bosh 2 vars (credhub var files)
This should help on paas-templates manifests (no more ! escaping)
https://github.com/geofffranks/spruce/releases/tag/v1.14.0
/cc @o-orand
When a deployment doesn't contain any releases, pipeline generation crash. The deployment-dependencies.yml
may look like this:
deployment:
my_deployment:
stemcells:
bosh-openstack-kvm-ubuntu-trusty-go_agent:
errands:
smoke-tests:
It may happen when a bosh-release is manually added using deploy.sh
script),
Error message is:
(erb):439:in `block in load_context_into_a_binding': undefined method `each' for nil:NilClass (NoMethodError)
from (erb):419:in `each'
from (erb):419:in `load_context_into_a_binding'
from /home/wooj7232/.rbenv/versions/2.3.1/lib/ruby/2.3.0/erb.rb:861:in `eval'
from /home/wooj7232/.rbenv/versions/2.3.1/lib/ruby/2.3.0/erb.rb:861:in `block in result'
from /home/wooj7232/.rbenv/versions/2.3.1/lib/ruby/2.3.0/erb.rb:862:in `result'
from /home/wooj7232/Projects/Elpaaso/cf-ops-automation/lib/template_processor.rb:26:in `block in process'
from /home/wooj7232/Projects/Elpaaso/cf-ops-automation/lib/template_processor.rb:22:in `each'
from /home/wooj7232/Projects/Elpaaso/cf-ops-automation/lib/template_processor.rb:22:in `process'
from ./scripts/generate-depls.rb:122:in `block in <main>'
from ./scripts/generate-depls.rb:121:in `each'
from ./scripts/generate-depls.rb:121:in `<main>'
CC: @aveyrenc
This is a refinement of the iaas-specifics-support to include terraform resource.
Currently, the current automation picks all TF config files.
cf-ops-automation/concourse/tasks/terraform_plan_cloudfoundry.yml
Lines 50 to 57 in 7a09101
Sharing the same cf-ops-automation based templates across root deployments leveraging distinct TF config (e.g. openstack and cloudstack) is likely to dammage TF config readeability
The apparent TF syntax support is the conditional support in interpolation:
A common use case for conditionals is to enable/disable a resource by conditionally setting the count:
resource "aws_instance" "vpn" {
count = "${var.something ? 1 : 0}"
}
In the example above, the "vpn" resource will only be included if "var.something" evaluates to true. Otherwise, the VPN resource will not be created at all.
Wrapping this logic into modules only enables to factor out the conditionals variable and does not alleviate the readability issue causes in every invoked resources having the conditional count interpolation as illustrated into hashicorp/terraform#12906 (comment)
It may make more sense to conditionally load different TF config files depending on a per root deployment flag (such as in ci-deployment-overview.yml
ci-deployment.x-depl.terraform_config.iaas_spec-prefix
)
This may for instance translate into naming convention on additional directories where to fetch iaas-specific spec files such as "specs-openstack" and "specs-cloudstack"
Currently, the generated pipelines include deploy, and recreate. The delete phase is still a manual operation done interactively with each resource provider (bosh, cf, iaas)
Adding a paused delete pipeline would help automation for delete.
A pre_delete hook would be useful, e.g. to backup/export data before delete.
Possible content:
The generated pipeline includes automatic merging of branch from the template and secrets git repos maching the regular expression develop,WIP-,wip-,feature-, Feature- see source
In case of merge conflict or rewritten history, you may trigger a reset by manually triggering the reset-merged-wip-features job.
The current practice is to create a feature branch matching the regular expression below and to push incremental commits into it. Once satisfied and ready to share/PR, then:
git branch archived-feature-etherpad feature-etherpad; git push gitlab-preprod --set-upstream archived-feature-etherpad; git push gitlab-preprod :feature-etherpad
Currently, application of terraform specs download the terraform binary and providers from internet, in a task, applying sh -c "$(curl -fsSL https://raw.github.com/orange-cloudfoundry/terraform-provider-cloudfoundry/master/bin/install.sh)"
This is slow and vulnerable to failure when internet connection is broken or github down/slow.
Suggestion made by @o-orand and @ArthurHlt is to use a concourse terraform resource such as https://github.com/ljfranklin/terraform-resource. The terraform binary and provider binairies would be cached locally in the docker image. See Image Variants for steps to build custom image with custom providers.
Edit: in addition, currently there is a manual activity to propagate the resources provisionned by terraform down to the pipeline. We don't yet leverage the Terraform output concept to flow produced resources down.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.