argoproj / argo-workflows Goto Github PK

Workflow Engine for Kubernetes

Home Page: https://argo-workflows.readthedocs.io/

License: Apache License 2.0

Makefile 0.78% Go 82.87% Shell 0.31% Python 0.08% Dockerfile 0.08% TypeScript 12.78% HTML 0.01% JavaScript 0.09% SCSS 0.65% Java 0.05% Nix 2.29%

airflow argo argo-workflows batch-processing cloud-native cncf dag data-engineering gitops hacktoberfest k8s knative kubernetes machine-learning mlops pipelines workflow workflow-engine

argo-workflows's Introduction

Argoproj - Get stuff done with Kubernetes

What is Argoproj?

Argoproj is a collection of tools for getting work done with Kubernetes.

Argo Workflows - Container-native Workflow Engine
Argo CD - Declarative GitOps Continuous Delivery
Argo Events - Event-based Dependency Manager
Argo Rollouts - Progressive Delivery with support for Canary and Blue Green deployment strategies

Also argoproj-labs is a separate GitHub org that we setup for community contributions related to the Argoproj ecosystem. Repos in argoproj-labs are administered by the owners of each project. Please reach out to us on the Argo slack channel if you have a project that you would like to add to the org to make it easier to others in the Argo community to find, use, and contribute back.

Community Blogs and Presentations

Project specific community blogs and presentations are at

Adopters

Each Argo sub-project maintains its own list of adopters. Those lists are available in the respective project repositories:

Contributing

To learn about how to contribute to Argoproj, see our contributing documentation. Argo contributors must follow the CNCF Code of Conduct.

For help contributing, visit the #argo-contributors channel in CNCF Slack.

To learn about Argoproj governance, see our community governance document.

Project Resources

Argo Community Meeting Calendar
- ICS file
Argo GitHub: https://github.com/argoproj
Argo website: https://argoproj.github.io
Argo Slack: https://argoproj.github.io/community/join-slack

argo-workflows's People

Contributors

Stargazers

Watchers

Forkers

abhinavdas shrinandj zhan849 felixapplatix sytianhe teddybearz wokegit rpelczar cookingcodewithme francis-ax jwalters-gpsw durgeshsanagaram gnadaraj nkhare nuaays jondlm bobhenkel jessesuen sandeepbhojwani cyrusbiotechnology pnovotnak xieydd javierbq cinderellagarage dvavili jimexist anshumanbh markjacksonfishing dylangraham mthx anexture bryanl gaganapplatix kmarquardsen miketlive dmonakhov mehercharan dakale franklinharry emeraldsci ragodev discordianfish bodepd aeweidne jadoonf gpmayorga ironpan etsangsplk dougsc decarboxy vaibhavpage magaldima cfontes vaibhavrit sagansystems aiyi2099 just-digital timperrett dschmidtdev breakingbadcoder fitzse mathmonsterx sebdoido bat-cha eirinikos edlee2121 sulochan nanne007 samuell sanju2312 seekyiyi julienstroheker rakanixu waxmittmann 0xgj grangerp chemist shigemk2 thadeetrompetter xyhuang avenda jonas getcloudnative cuericlee codeaudit kartikeyap jrhdoty krueladin sivam1 staugust drewda jasonsmithj 0x3bfc rrtaylor vosmith junxu wadeholler gosundy julienbalestra wangwj

argo-workflows's Issues

Fix and update saas unit tests

Many unit tests were broken/disabled as a result of the YAML redesign. This issue is to fix and update the saas unit tests and re-enable them in Argo CI.

Add information about app repository and branch to apps catalog

Project repository and branch should be rendered on project details page page, below the publisher info.

Simplifications to policy datastructures

Would like to simplify the policy struct slightly by not making enabled a bool pointer, and embedding a non-pointer PolicyTemplate struct.

on_pull_request would trigger jobs on any pull requests

The problem with on_pull_request is that any new pull requests would trigger job, this is usually not desirable.

To make it better, we need to verify whether the source repo is added to argo.

IAM role policy should determine partition dynamically

Our code does not work for gov cloud regions as our IAM role policy has hard coded partition as "aws". If the cluster to be installed in GovCloud, we should use "aws-us-gov" as partition. @lorengordon

Please add me to be the collabrator

Provide way to override the notification setting for manually submitted job

Currently UI uses same predefined notification settings for all manually submitted jobs: notify submitter of success or failure. UI show allow user to change notification settings.

The use case:
Developer manually run job which is supposed to build/test release candidate and wish to notify manager about result. Automatic notification would improve use experience.

Invalid margins on several pages

@rahuldhide :

Some components are overlapping due to the new margins. Specially the commits cards is not looking good with Create New Job and jobs numbers in two different lines. If we tweak the margins as per the attached specs, we will be able to resolve all these issues.

Force upgrade need to set upgrade_kube and upgrade_service to True

Add Eclipse specific files to .gitignore

Importing argo in Eclipse (using PyDev) adds a ".classpath" directory at the root level and also updates the existing .gitignore to include /bin.

The existing .gitignore should be therefore be udpated to include these changes.

remove slack handler test

Improve job creation panel

UI have to supply different types of parameters while creating new job:

regular string parameters
artifacts ( workflow id which export artifact )
dns domain
volumes

In order to improve user experience UI should provide auto-completion for all complex parameters (e.g. volumes, domains etc).

Yaml validation failure from tip of master

Please fix the format

$ argo yaml validate ./.argo/
Verifying all yaml files in directory: ./.argo/
[.argo/test-yamls/fixtures-dynamic.yaml]
 - ERROR test-fixtures-dynamic-outputs: outputs.artifacts.WF_OUTPUTS.from invalid format '%%fixtures.DYN_FIX_WITH_OUTPUTS.outputs.artifacts.BIN-DIR%%'. expected format: '%%steps.<step_name>.outputs.artifacts.<artifact_name>%%'

Convert a Task to run as a Pod instead of a kubernetes job

When we query the status of a task, we also care about Pod details, such as does the pod have some image pull failure and the phase of the pod (running, pending, etc). To do this, we need to query the Job status and then we also need to query the pod status of the pod that is created for the Job. It is also possible for the Pod to run into some error. In this case, the Job controller creates another Pod and restarts the Job. Thus, there can be many old Pods that had some error laying around. These pods should be deleted so that we are cleaning up resources.

These are some of the reasons why Job queries take long as we do the following:

Look up job status
Find the pod for this job
Delete old pods

There is no mapping that allows 0(1) lookup of Job to Pod that I am aware of. To find the Pod for a Job, we need to filter out Pods with label selector job-name=THE_NAME_OF_JOB. For n pods, this would be an O(n) lookup. For argo, each Task is mapped to a Job and each Job creates at least 1 Pod. So, for n Tasks, we will have at least n Pods. The status of each such Task is going to be periodically queried by the workflow executor and this makes the sum of these operations O(n^2).

Since, workflow executor already knows how to recreate the Pod on failure, node termination, etc, we do not need to use a Job controller. If we use Pods, then we avoid the O(n) iteration to look for pods. We also, avoid the cleanup required for old Pods created by Job controller.

`argo login` does not prompt user to proceed insecurely for SSL certificate related failures

The argo login command is supposed to prompt the user to proceed with an insecure configuration in the event he tries to login to a cluster with a self-signed certificate, which is the out-of-the box configuration of our clusters after a completed installation.

Redeploy button on the app page.

For each deployment there is a redeploy quick link on the app page. Need it in the deployment page too along with stop and terminate to be consistent

Install command line can use prompt_toolkit to enhance the command line experience for install

prompt toolkit allows autocompletions and input validation as part of CLI which are useful to reduce erroneous input.

Improve date range picker

Date range picker should be improved according to attached mockups:

When date range picker is expanded user is seeing menu with several predefined items and 'Custom Date Range' menu item
If user click 'Custom Date Range' menu item when menu expand into calendar view with ability to select required date range.

Add minion-manager unit tests

Unit tests for the minion-manger component need to be added back. These should also be added to the Argo-CI workflow.

Minion-manager should revert to correct spot-instance configuration after upgrades

It's been reported that after upgrades, the minions are which are switched to on-demand nodes are not getting switched back to spot instances as per the configuration.

Global search/Deployments tab - select all checkbox doesn't work.

Open global search -> go to "Deployments" tab and try to select first checkbox (select all).

Expected result:
All the deployments on the list will be selected

Actual result:
All the deployments on the list are unselected

Improve template filtering/grouping on metrics page

Metrics screen shows aggregated run stats for each template. Statistics is aggregated by repo, template name, branch. From user's point of view there is one template in repo and branches contains different versions of it, so statistics should be aggregated by repo/template name.

Following changes are required:

Implement API which returns unique list of template names.
Filter jobs by template name instead of template id: #183
Add template name filter to jobs global search: #184
Allow navigate to jobs from metics page: #185

Always set instance region but force SigV4 authentication when trying to get bucket location

S3 boto3 client needs to initialize with bucket's region if the region is using signature V4. Currently we are getting region as follows:

Try HeadBucket, but ignore error, as region name is included in header. If does not work,
Try GetBucketLocation, find region, if does not work,
Use HTTP version of virtual host URL <bucket-name>.s3.amazonaws.com according to discussion under aws/aws-sdk-go#720 (comment) as a last resort

This solution works perfectly fine in non-GovCloud partition, but does not work in GovCloud as reported by people in community @lorengordon.

A better solution would be (assume cluster don't access resources across partition, which is a valid one):

Use AWS metadata to get instance region
Instantiate a boto3 Session object with that region, and then a S3 client but force sigv4 auth (According to http://docs.aws.amazon.com/general/latest/gr/signing_aws_api_requests.html, all regions support signature V4)

s3 = boto3.Session(profile_name=<prof>, region_name=<reg>).client('s3', config=Config(signature_version='s3v4'))

Use head bucket to get bucket region
Instantiate actual bucket client use actual region

Forgot/reset/confirm reset pages design is broken

After login page redesign forgot/reset and confirm reset pages look strange:

We need to prepare new design and update pages accordingly.

"Create new job" button on "Job details" page is not working

Steps to reproduce:

Navigate to any job details which has exported artifacts.
Open 'Artifacts' tab and click "Create new job" button

Expected result:

Launch new job panel should be opened

Actual result:

JS Exception happens:

Unable to use fixture output artifact

YAML validator does not allow to use fixture output artifact. Following example demonstrates issue:

https://github.com/alexmt/appstore/blob/34b19a9119148a99ed9548cd0c1e550322c1a636/.argo/selenium_all_ax.yaml#L42-L72

argo yaml validator report following error:

 - ERROR selenium-test-workflow: outputs.artifacts.video.from invalid format '%%fixtures.vnc_recorder.outputs.artifact.video%%'. expected format: '%%steps.<step_name>.outputs.artifacts.<artifact_name>%%'

selenium-server example fails to satisfy arguments dynamic fixture

When submitting the Selenium Demo template, the submission fails during preprocessing with the following error.

{
"code":"ERR_AX_ILLEGAL_ARGUMENT",
"message":"selenium-server 'inputs.parameters.BROWSER' was not satisfied by caller",
"detail":""
}

I suspect preprocessing is not handling argument substitution to dynamic fixtures correctly

Gateway improvement

The idea of restructure the gateway deployment has been around for a while.

Current problems with gateway includes:

It is a pod contains three containers, gateway, repomanager and eventtrigger. However, we do not have any restart mechanism when repomanager and eventtrigger accidentally exits.
Logically, there are correlations between the three and separating them makes debugging hard
The original idea of gateway being a portal of devops components does not exists. Gateway is mostly just dealing with commit related service.
The communication between the three components are rather inefficient and inaccurate. While living in the same pod, they are relying Kafka for simple communications.
The usage of Django framework is both resource heavy and hard to maintain as all other components are using Flask

Purposed fix includes:

Combine the three containers into one for commit related service
Slowly move non-commit related apis to axops (As far as I know, there are only approval api and jira api)
Convert gateway using Flask

Cannot set default cloud profile to "default"

Use case: user wants to install from an aws ec2 instance, and all their aws credentials come from node's IAM profile.

The following cases when user does not give a profile should be carefully taken care of:

User runs argocluster on local mac
- If user does not provide aws profile, and host has a "default" profile, use "default"
- If user does not provide aws profile, and host does not have a "default" profile, FAIL
User runs argocluster on an aws instance
- If user does not provide aws profile, and host has a "default" profile, use "default"
- If user does not provide aws profile, we should set aws profile to None and try to use host IAM role

I would propose for user to have a special input --cloud-profile None. If we get literal "None", we set it to None

Support notification events push delivery

Currently UI has to periodically pull user events using v1/notification_center/events?recipient= API. This should switch to push model to improve performance. Following changes are required:

Implement /v1/notification_center/events_stream API which should implement SSE protocol. API should support recipient query parameter to apply filter by recipient.
Update UI to use /v1/notification_center/events_stream API instead of periodical pulling.

Argo site SEO improvements

Argo site pages title and description should be updated according to following table:



	URL	title (60 char)	description (100-120 char)	URL keywords ?


Home	https://argoproj.github.io/argo-site/#/	Argo \| Open source container-native workflow engine for Kubernetes	Argo is an open source container-native workflow engine for developers working with Kubernetes to orchestrate pipelines and jobs for continuous integration, deployment, and microservices.	n/a

Get Started	https://argoproj.github.io/argo-site/#/get-started/overview	Argo \| Get started with tutorials, training on workflows for Kubernetes	Get started with an overvirew of Argo, an open source workflow engine for Kubernetes. Step through Installation, Architecture, Features and Tutorials.	looks good as is (leaf pages have keywords such as 'overview' 'get started' and 'tutorials' in Url

Docs	https://argoproj.github.io/argo-site/#/docs	Argo \| Documentation for open source workflow engine on Kubernetes	Documentation for Argo, open source workflow engine for Kubernetes. Tutorials, User guide, CLI reference, FAQs, release notes.

		EACH LEAFLET PAGE (or component page) OF DOCS NEEDS UNIQUE <TITLE>


Community	https://argoproj.github.io/argo-site/#/community/overview	Argo \| Community support, forums, release notes	Discuss and contribute to Argo, an open source workflow engine for Kubernetes. GitHub, Slack and Google groups.	looks good as is - has community in the URL

Install	https://argoproj.github.io/argo-site/#/get-started/installation	Argo \| Install open source workflow engine for Kubernetes on AWS	install Argo, an open source engine for Kubernetes, on AWS with CLI commands for Mac and Linux.	looks good as is.

Argo CLI features and improvements

This issue is for general improvements to argocli including:

argo app list to display running apps
argo app show <appname> to show details about an app
argo job logs <service_id> to retrieve logs for container in a job

Also to improve the argo job show --tree view to be able to distinguish from parallel vs. sequential steps in a workflow.

Notification boxes are hidden by page toolbar

Steps to reproduce:

Navigate to timeline -> commits and launch any job

Expected result:

Notification message should be shown

Actual result:

Notification box is hidden by page toolbar

Move workflow out from apps folder

This was due to falsely following the Django structure and should be cleaned up.

Policy documentation update

https://argoproj.github.io/argo-site/#/docs;doc=yaml%2Fpolicy_templates.md

`when:

multiple triggers can be specified

options: on_push, on_pull_request, on_pull_request_merge, on_cron

event: on_push
event: on_pull_request
event: on_pull_request_merge
event: on_cron
cron expression

0 1 * * *

| | | | |

| | | | |

| | | | +---- Run every day of the week

| | | +------ Run every month of the year

| +---------- Run at 1 Hour (1AM)

+------------ Run at 0 Minute
schedule: "0 * * * *"
timezone: "US/Pacific"
`

The cron expression explaination is not aligned properly in documentation.

Cannot export output artifacts of dynamic fixtures

The YAML validator and embedded template generator, currently does not allow the exporting of output artifacts of dynamic fixtures. This is a feature that is actually supported by the workflow executor. The use case is in the selenium examples, which can capture video output of a VNC recorder.

Jobs Timeline page crashes if cluster has too many running jobs ( > 200)

APi '/v1/services' does not apply pagination to running jobs (this is known behavior). Because of that Timeline tries to render all running jobs even if user does not scroll page. So if cluster has > 200 running then page is trying to render all 200 jobs which crashes browser.

Add stress test yaml 2.0 template

Need to convert old applatix_stress_test.yaml to yaml 2.0 spec

Add workflow executor unit tests

Currently the unit tests does not compatible with Yaml 2.0. Need to make it work again

API /v1/templates does not support simultaneous repo/branch filtering and searching

Steps to reproduce:

Click global search box and search for any template name.
On global search results page apply branch filter

Expected result:

Templates should be shown only from specified branch

Actual result:

All templates from specified branch are shown and search text is ignored.

[Firefox] Tabs doesn't work

Firefox - Open any screen with tabs and try to change tab. Javascript exception happens:

Upgrade failed because target_cloud was not set

When a cluster is upgraded using the argo cli, the upgrade fails with the error "'Cloud' object has no attribute '_target_cloud', program might be running locally"

$ argocluster upgrade --cluster-name dev-new --cloud-profile prod --cloud-provider aws
...

2017-08-25T16:14:40 WARNING ax.cloud.cloud MainThread: Cannot determine own cloud: 'Cloud' object has no attribute '_target_cloud', program might be running locally
Traceback (most recent call last):
  File "/ax/bin/master_manager", line 31, in <module>
    run()
  File "/ax/bin/master_manager", line 24, in run
    m = AXMasterManager(usr_args.cluster_name_id, profile=usr_args.profile, region=usr_args.region)
  File "/ax/python/ax/platform/ax_master_manager.py", line 72, in __init__
    self.cluster_info = AXClusterInfo(cluster_name_id=cluster_name_id, aws_profile=profile)
  File "/ax/python/ax/util/singleton.py", line 15, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "/ax/python/ax/platform/ax_cluster_info.py", line 50, in __init__
    self._config = AXClusterConfig(cluster_name_id=cluster_name_id, aws_profile=aws_profile)
  File "/ax/python/ax/util/singleton.py", line 15, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "/ax/python/ax/platform/cluster_config/cluster_config.py", line 31, in __init__
    self._cluster_name_id = AXClusterId(name=cluster_name_id, aws_profile=aws_profile).get_cluster_name_id()
  File "/ax/python/ax/meta/cluster_id.py", line 103, in get_cluster_name_id
    self._load_cluster_name_id_if_needed()
  File "/ax/python/ax/meta/cluster_id.py", line 120, in _load_cluster_name_id_if_needed
    self._load_cluster_name_id()
  File "/ax/python/ax/meta/cluster_id.py", line 148, in _load_cluster_name_id
    self._lookup_id_from_bucket()
  File "/ax/python/ax/meta/cluster_id.py", line 153, in _lookup_id_from_bucket
    name, requested_cid = self._format_name_id(self._input_name)
  File "/ax/python/ax/meta/cluster_id.py", line 175, in _format_name_id
    if Cloud().target_cloud_aws():
  File "/ax/python/ax/cloud/cloud.py", line 101, in target_cloud_aws
    return self._target_cloud == self.AX_CLOUD_AWS
AttributeError: 'Cloud' object has no attribute '_target_cloud'
2017-08-25T16:14:40 ERROR ax.cluster_management.app.cluster_upgrader MainThread: Command '['upgrade-kubernetes']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/ax/python/ax/cluster_management/app/cluster_upgrader.py", line 117, in run
    self._upgrade_kube()
  File "/ax/python/ax/cluster_management/app/cluster_upgrader.py", line 199, in _upgrade_kube
    subprocess.check_call(["upgrade-kubernetes"], env=env)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['upgrade-kubernetes']' returned non-zero exit status 1
2017-08-25T16:14:40 ERROR ax.cluster_management.argo_cluster_manager MainThread: Command '['upgrade-kubernetes']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 66, in parse_args_and_run
    getattr(self, cmd)(args)
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 127, in upgrade
    ClusterUpgrader(upgrade_config).run()
  File "/ax/python/ax/cluster_management/app/cluster_upgrader.py", line 128, in run
    raise RuntimeError(e)
RuntimeError: Command '['upgrade-kubernetes']' returned non-zero exit status 1

 !!! Operation failed due to runtime error: Command '['upgrade-kubernetes']' returned non-zero exit status 1

Timeline should use template branch instead of commit branch to identify job branch

Currently timeline overview and job details pages use job commit branch to identify which branch job belongs to. This behavior causes issues in few cases:

Some jobs does not have commit, but still belongs to branch
/v1/service?branch=master filter jobs by template's branch not commit's branch.

Timeline overview and job details pages should use template branch instead of commit branch

Improvements to logs endpoints (job and app)

I have grievances with service/<service_id>/logs as well as the deployments/<deployment_id>/livelog endpoints.

they follow live containers by default, with no option to not follow. The ability to not follow is needed by the CLI
inconsistency between endpoint (i.e. livelog vs. logs)
it uses text/event-stream as the content type. the world has moved on to websockets.
depending on the timing of when user/UI hits the job logs endpoint, there may be no logs when the container is in a transitioning phase -- ideally we could hold on to the connection while we wait and retry asking kubernetes/AWS for the logs.
inconsistency between the log format for live log vs. completed containers, which uses json log format. The json log format is irritating to parse for most use cases, and should not be the default behavior.

All of these do not make for a consistent/friendly API.

Repository webhook should be per repo instead of per user

No way to configure webhook per repo
Currently enabling takes a long time
Make sure to avoid duplicate repos

Better YAML validation for template inputs

YAML validator is allowing some user errors to slip through by allowing

inputs:
  artifacts:
    CODE:
      from: "%%asdf%%"
      path: /src

Where %%asdf%% is not resolvable.

on_pull_request policy triggered jobs should use template from incoming branch instead of destination

The following scenario happened where a developer made change to both the build scripts, and the argo build template which invoked the corresponding changes to the build script. Both changes need to execute in tandem.

Currently our policy trigger will correctly use the source code of the incoming commit during checkout, but it will not use the templates from the incoming commit. In the above scenario, the job immediately fails because of the mismatch.

This issue is to improve policy execution to use the templates from the incoming commit. This may not be possible to do if the repo is not attached to the cluster, but should be possible if it is. The policy trigger should understand to use the policy ID of the incoming repo/branch when submitting the job.

Repo information is not populated when workflows are accessed directly

Steps to repro:

Create a template that does not have the COMMIT as a parameter and commit it into the .argo directory.

---
type: container
version: 1
name: my-test-checkout
description: Checks out a source repository to /src
resources:
  mem_mib: 500
  cpu_cores: 0.1
image: argoproj/argoscm:v2.0
command: ["axscm"]
args: ["clone", "%%inputs.parameters.REPO%%", "/src"]
inputs:
  parameters:
    REPO:
      default: "%%session.repo%%"
outputs:
  artifacts:
    CODE:
      path: /src

Since, this workflow does not have a COMMIT param, it doesn't show up when the user clicks on a commit and then clicks "Create new job". The only way to access this is to go directly to the "Templates" option on the left side panel.
Click on the Templates and select the one just created above.
Click on the "+" sign to go to the "Review workflow parameters" page.
Notice that the repo information is not populated. It continues to show %%session.repo%%. If the workflow is run, it fails because there is no such repo as %%session.repo%%.

On timeline page, filter closes on every click. It should stay open.

https://dev.applatix.net/app/timeline;view=job;days=1;date=1502953199;failed=true;delayed=true;succeeded=false;running=false;showMyOnly=false

On Timeline page, on the right-up corner, click filter icon, select/unselect filter only works once at a time

Unnecessary left padding on Catalog project details page

Steps to reproduce:

Navigate to Catalog page
Select any project

Actual result:

Project details page has unnecessary left side padding:

argoproj / argo-workflows Goto Github PK

argo-workflows's Introduction

Argoproj - Get stuff done with Kubernetes

What is Argoproj?

Community Blogs and Presentations

Adopters

Contributing

Project Resources

argo-workflows's People

Contributors

Stargazers

Watchers

Forkers

argo-workflows's Issues

multiple triggers can be specified

options: on_push, on_pull_request, on_pull_request_merge, on_cron

cron expression

0 1 * * *

| | | | |

| | | | |

| | | | +---- Run every day of the week

| | | +------ Run every month of the year

| +---------- Run at 1 Hour (1AM)

+------------ Run at 0 Minute

Recommend Projects

Recommend Topics

Recommend Org