Giter Site home page Giter Site logo

argoproj / argo-workflows Goto Github PK

View Code? Open in Web Editor NEW
14.3K 14.3K 3.1K 141.67 MB

Workflow Engine for Kubernetes

Home Page: https://argo-workflows.readthedocs.io/

License: Apache License 2.0

Makefile 0.78% Go 82.87% Shell 0.31% Python 0.08% Dockerfile 0.08% TypeScript 12.78% HTML 0.01% JavaScript 0.09% SCSS 0.65% Java 0.05% Nix 2.29%
airflow argo argo-workflows batch-processing cloud-native cncf dag data-engineering gitops hacktoberfest k8s knative kubernetes machine-learning mlops pipelines workflow workflow-engine

argo-workflows's Introduction

slack

Argoproj - Get stuff done with Kubernetes

Argo Image

What is Argoproj?

Argoproj is a collection of tools for getting work done with Kubernetes.

  • Argo Workflows - Container-native Workflow Engine
  • Argo CD - Declarative GitOps Continuous Delivery
  • Argo Events - Event-based Dependency Manager
  • Argo Rollouts - Progressive Delivery with support for Canary and Blue Green deployment strategies

Also argoproj-labs is a separate GitHub org that we setup for community contributions related to the Argoproj ecosystem. Repos in argoproj-labs are administered by the owners of each project. Please reach out to us on the Argo slack channel if you have a project that you would like to add to the org to make it easier to others in the Argo community to find, use, and contribute back.

Community Blogs and Presentations

Project specific community blogs and presentations are at

Adopters

Each Argo sub-project maintains its own list of adopters. Those lists are available in the respective project repositories:

Contributing

To learn about how to contribute to Argoproj, see our contributing documentation. Argo contributors must follow the CNCF Code of Conduct.

For help contributing, visit the #argo-contributors channel in CNCF Slack.

To learn about Argoproj governance, see our community governance document.

Project Resources

argo-workflows's People

Contributors

agilgur5 avatar alexec avatar alexmt avatar blkperl avatar changhc avatar crenshaw-dev avatar dcherman avatar dependabot[bot] avatar dinever avatar dpadhiar avatar dtaniwaki avatar edlee2121 avatar github-actions[bot] avatar isubasinghe avatar jessesuen avatar joibel avatar juliev0 avatar markterm avatar nikenano avatar rbreeze avatar rohankmr414 avatar sarabala1979 avatar shuangkun avatar simster7 avatar snyk-bot avatar tczhao avatar terrytangyuan avatar tico24 avatar toyamagu-2021 avatar whynowy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

argo-workflows's Issues

Fix and update saas unit tests

Many unit tests were broken/disabled as a result of the YAML redesign. This issue is to fix and update the saas unit tests and re-enable them in Argo CI.

Provide way to override the notification setting for manually submitted job

Currently UI uses same predefined notification settings for all manually submitted jobs: notify submitter of success or failure. UI show allow user to change notification settings.

The use case:
Developer manually run job which is supposed to build/test release candidate and wish to notify manager about result. Automatic notification would improve use experience.

Invalid margins on several pages

@rahuldhide :

Some components are overlapping due to the new margins. Specially the commits cards is not looking good with Create New Job and jobs numbers in two different lines. If we tweak the margins as per the attached specs, we will be able to resolve all these issues.

2

Add Eclipse specific files to .gitignore

Importing argo in Eclipse (using PyDev) adds a ".classpath" directory at the root level and also updates the existing .gitignore to include /bin.

The existing .gitignore should be therefore be udpated to include these changes.

Improve job creation panel

UI have to supply different types of parameters while creating new job:

  1. regular string parameters
  2. artifacts ( workflow id which export artifact )
  3. dns domain
  4. volumes

In order to improve user experience UI should provide auto-completion for all complex parameters (e.g. volumes, domains etc).

Yaml validation failure from tip of master

Please fix the format

$ argo yaml validate ./.argo/
Verifying all yaml files in directory: ./.argo/
[.argo/test-yamls/fixtures-dynamic.yaml]
 - ERROR test-fixtures-dynamic-outputs: outputs.artifacts.WF_OUTPUTS.from invalid format '%%fixtures.DYN_FIX_WITH_OUTPUTS.outputs.artifacts.BIN-DIR%%'. expected format: '%%steps.<step_name>.outputs.artifacts.<artifact_name>%%'

Convert a Task to run as a Pod instead of a kubernetes job

When we query the status of a task, we also care about Pod details, such as does the pod have some image pull failure and the phase of the pod (running, pending, etc). To do this, we need to query the Job status and then we also need to query the pod status of the pod that is created for the Job. It is also possible for the Pod to run into some error. In this case, the Job controller creates another Pod and restarts the Job. Thus, there can be many old Pods that had some error laying around. These pods should be deleted so that we are cleaning up resources.

These are some of the reasons why Job queries take long as we do the following:

  1. Look up job status
  2. Find the pod for this job
  3. Delete old pods

There is no mapping that allows 0(1) lookup of Job to Pod that I am aware of. To find the Pod for a Job, we need to filter out Pods with label selector job-name=THE_NAME_OF_JOB. For n pods, this would be an O(n) lookup. For argo, each Task is mapped to a Job and each Job creates at least 1 Pod. So, for n Tasks, we will have at least n Pods. The status of each such Task is going to be periodically queried by the workflow executor and this makes the sum of these operations O(n^2).

Since, workflow executor already knows how to recreate the Pod on failure, node termination, etc, we do not need to use a Job controller. If we use Pods, then we avoid the O(n) iteration to look for pods. We also, avoid the cleanup required for old Pods created by Job controller.

Redeploy button on the app page.

For each deployment there is a redeploy quick link on the app page. Need it in the deployment page too along with stop and terminate to be consistent

Improve date range picker

Date range picker should be improved according to attached mockups:

  1. When date range picker is expanded user is seeing menu with several predefined items and 'Custom Date Range' menu item
  2. If user click 'Custom Date Range' menu item when menu expand into calendar view with ability to select required date range.

1
2

Add minion-manager unit tests

Unit tests for the minion-manger component need to be added back. These should also be added to the Argo-CI workflow.

Improve template filtering/grouping on metrics page

Metrics screen shows aggregated run stats for each template. Statistics is aggregated by repo, template name, branch. From user's point of view there is one template in repo and branches contains different versions of it, so statistics should be aggregated by repo/template name.

Following changes are required:

  1. Implement API which returns unique list of template names.
  2. Filter jobs by template name instead of template id: #183
  3. Add template name filter to jobs global search: #184
  4. Allow navigate to jobs from metics page: #185

Always set instance region but force SigV4 authentication when trying to get bucket location

S3 boto3 client needs to initialize with bucket's region if the region is using signature V4. Currently we are getting region as follows:

  1. Try HeadBucket, but ignore error, as region name is included in header. If does not work,
  2. Try GetBucketLocation, find region, if does not work,
  3. Use HTTP version of virtual host URL <bucket-name>.s3.amazonaws.com according to discussion under aws/aws-sdk-go#720 (comment) as a last resort

This solution works perfectly fine in non-GovCloud partition, but does not work in GovCloud as reported by people in community @lorengordon.

A better solution would be (assume cluster don't access resources across partition, which is a valid one):

  1. Use AWS metadata to get instance region
  2. Instantiate a boto3 Session object with that region, and then a S3 client but force sigv4 auth (According to http://docs.aws.amazon.com/general/latest/gr/signing_aws_api_requests.html, all regions support signature V4)
s3 = boto3.Session(profile_name=<prof>, region_name=<reg>).client('s3', config=Config(signature_version='s3v4'))
  1. Use head bucket to get bucket region
  2. Instantiate actual bucket client use actual region

Unable to use fixture output artifact

YAML validator does not allow to use fixture output artifact. Following example demonstrates issue:

https://github.com/alexmt/appstore/blob/34b19a9119148a99ed9548cd0c1e550322c1a636/.argo/selenium_all_ax.yaml#L42-L72

argo yaml validator report following error:

 - ERROR selenium-test-workflow: outputs.artifacts.video.from invalid format '%%fixtures.vnc_recorder.outputs.artifact.video%%'. expected format: '%%steps.<step_name>.outputs.artifacts.<artifact_name>%%'

selenium-server example fails to satisfy arguments dynamic fixture

When submitting the Selenium Demo template, the submission fails during preprocessing with the following error.

{
"code":"ERR_AX_ILLEGAL_ARGUMENT",
"message":"selenium-server 'inputs.parameters.BROWSER' was not satisfied by caller",
"detail":""
}

I suspect preprocessing is not handling argument substitution to dynamic fixtures correctly

Gateway improvement

The idea of restructure the gateway deployment has been around for a while.

Current problems with gateway includes:

  • It is a pod contains three containers, gateway, repomanager and eventtrigger. However, we do not have any restart mechanism when repomanager and eventtrigger accidentally exits.
  • Logically, there are correlations between the three and separating them makes debugging hard
  • The original idea of gateway being a portal of devops components does not exists. Gateway is mostly just dealing with commit related service.
  • The communication between the three components are rather inefficient and inaccurate. While living in the same pod, they are relying Kafka for simple communications.
  • The usage of Django framework is both resource heavy and hard to maintain as all other components are using Flask

Purposed fix includes:

  • Combine the three containers into one for commit related service
  • Slowly move non-commit related apis to axops (As far as I know, there are only approval api and jira api)
  • Convert gateway using Flask

Cannot set default cloud profile to "default"

Use case: user wants to install from an aws ec2 instance, and all their aws credentials come from node's IAM profile.

The following cases when user does not give a profile should be carefully taken care of:

  1. User runs argocluster on local mac
    • If user does not provide aws profile, and host has a "default" profile, use "default"
    • If user does not provide aws profile, and host does not have a "default" profile, FAIL
  2. User runs argocluster on an aws instance
    • If user does not provide aws profile, and host has a "default" profile, use "default"
    • If user does not provide aws profile, we should set aws profile to None and try to use host IAM role

I would propose for user to have a special input --cloud-profile None. If we get literal "None", we set it to None

Support notification events push delivery

Currently UI has to periodically pull user events using v1/notification_center/events?recipient= API. This should switch to push model to improve performance. Following changes are required:

  • Implement /v1/notification_center/events_stream API which should implement SSE protocol. API should support recipient query parameter to apply filter by recipient.
  • Update UI to use /v1/notification_center/events_stream API instead of periodical pulling.

Argo site SEO improvements

Argo site pages title and description should be updated according to following table:

         
         
  URL title (60 char) description (100-120 char) URL keywords ?
         
         
Home https://argoproj.github.io/argo-site/#/ Argo | Open source container-native workflow engine for Kubernetes Argo is an open source container-native workflow engine for developers working with Kubernetes to orchestrate pipelines and jobs for continuous integration, deployment, and microservices. n/a
         
Get Started https://argoproj.github.io/argo-site/#/get-started/overview Argo | Get started with tutorials, training on workflows for Kubernetes Get started with an overvirew of Argo, an open source workflow engine for Kubernetes. Step through  Installation, Architecture, Features and Tutorials. looks good as is (leaf pages have keywords such as 'overview'  'get started' and 'tutorials' in Url
         
Docs https://argoproj.github.io/argo-site/#/docs Argo | Documentation for open source workflow engine on Kubernetes Documentation for Argo, open source workflow engine for Kubernetes.  Tutorials, User guide, CLI reference, FAQs, release notes.  
         
    EACH LEAFLET PAGE (or component page) OF DOCS NEEDS UNIQUE <TITLE>    
         
         
Community https://argoproj.github.io/argo-site/#/community/overview Argo | Community support, forums, release notes Discuss and contribute to Argo, an open source workflow engine for Kubernetes.  GitHub, Slack and Google groups. looks good as is - has community in the URL
         
Install https://argoproj.github.io/argo-site/#/get-started/installation Argo | Install open source workflow engine for Kubernetes on AWS install Argo, an open source engine for Kubernetes,  on AWS with CLI commands for Mac and Linux. looks good as is.

Argo CLI features and improvements

This issue is for general improvements to argocli including:

  • argo app list to display running apps
  • argo app show <appname> to show details about an app
  • argo job logs <service_id> to retrieve logs for container in a job

Also to improve the argo job show --tree view to be able to distinguish from parallel vs. sequential steps in a workflow.

Notification boxes are hidden by page toolbar

Steps to reproduce:

  1. Navigate to timeline -> commits and launch any job

Expected result:

Notification message should be shown

Actual result:

Notification box is hidden by page toolbar

Policy documentation update

https://argoproj.github.io/argo-site/#/docs;doc=yaml%2Fpolicy_templates.md

`when:

multiple triggers can be specified

options: on_push, on_pull_request, on_pull_request_merge, on_cron

  • event: on_push
  • event: on_pull_request
  • event: on_pull_request_merge
  • event: on_cron

    cron expression

    0 1 * * *

    | | | | |

    | | | | |

    | | | | +---- Run every day of the week

    | | | +------ Run every month of the year

    | +---------- Run at 1 Hour (1AM)

    +------------ Run at 0 Minute

    schedule: "0 * * * *"
    timezone: "US/Pacific"
    `

The cron expression explaination is not aligned properly in documentation.

Cannot export output artifacts of dynamic fixtures

The YAML validator and embedded template generator, currently does not allow the exporting of output artifacts of dynamic fixtures. This is a feature that is actually supported by the workflow executor. The use case is in the selenium examples, which can capture video output of a VNC recorder.

Upgrade failed because target_cloud was not set

When a cluster is upgraded using the argo cli, the upgrade fails with the error "'Cloud' object has no attribute '_target_cloud', program might be running locally"

$ argocluster upgrade --cluster-name dev-new --cloud-profile prod --cloud-provider aws
...

2017-08-25T16:14:40 WARNING ax.cloud.cloud MainThread: Cannot determine own cloud: 'Cloud' object has no attribute '_target_cloud', program might be running locally
Traceback (most recent call last):
  File "/ax/bin/master_manager", line 31, in <module>
    run()
  File "/ax/bin/master_manager", line 24, in run
    m = AXMasterManager(usr_args.cluster_name_id, profile=usr_args.profile, region=usr_args.region)
  File "/ax/python/ax/platform/ax_master_manager.py", line 72, in __init__
    self.cluster_info = AXClusterInfo(cluster_name_id=cluster_name_id, aws_profile=profile)
  File "/ax/python/ax/util/singleton.py", line 15, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "/ax/python/ax/platform/ax_cluster_info.py", line 50, in __init__
    self._config = AXClusterConfig(cluster_name_id=cluster_name_id, aws_profile=aws_profile)
  File "/ax/python/ax/util/singleton.py", line 15, in __call__
    cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
  File "/ax/python/ax/platform/cluster_config/cluster_config.py", line 31, in __init__
    self._cluster_name_id = AXClusterId(name=cluster_name_id, aws_profile=aws_profile).get_cluster_name_id()
  File "/ax/python/ax/meta/cluster_id.py", line 103, in get_cluster_name_id
    self._load_cluster_name_id_if_needed()
  File "/ax/python/ax/meta/cluster_id.py", line 120, in _load_cluster_name_id_if_needed
    self._load_cluster_name_id()
  File "/ax/python/ax/meta/cluster_id.py", line 148, in _load_cluster_name_id
    self._lookup_id_from_bucket()
  File "/ax/python/ax/meta/cluster_id.py", line 153, in _lookup_id_from_bucket
    name, requested_cid = self._format_name_id(self._input_name)
  File "/ax/python/ax/meta/cluster_id.py", line 175, in _format_name_id
    if Cloud().target_cloud_aws():
  File "/ax/python/ax/cloud/cloud.py", line 101, in target_cloud_aws
    return self._target_cloud == self.AX_CLOUD_AWS
AttributeError: 'Cloud' object has no attribute '_target_cloud'
2017-08-25T16:14:40 ERROR ax.cluster_management.app.cluster_upgrader MainThread: Command '['upgrade-kubernetes']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/ax/python/ax/cluster_management/app/cluster_upgrader.py", line 117, in run
    self._upgrade_kube()
  File "/ax/python/ax/cluster_management/app/cluster_upgrader.py", line 199, in _upgrade_kube
    subprocess.check_call(["upgrade-kubernetes"], env=env)
  File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['upgrade-kubernetes']' returned non-zero exit status 1
2017-08-25T16:14:40 ERROR ax.cluster_management.argo_cluster_manager MainThread: Command '['upgrade-kubernetes']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 66, in parse_args_and_run
    getattr(self, cmd)(args)
  File "/ax/python/ax/cluster_management/argo_cluster_manager.py", line 127, in upgrade
    ClusterUpgrader(upgrade_config).run()
  File "/ax/python/ax/cluster_management/app/cluster_upgrader.py", line 128, in run
    raise RuntimeError(e)
RuntimeError: Command '['upgrade-kubernetes']' returned non-zero exit status 1

 !!! Operation failed due to runtime error: Command '['upgrade-kubernetes']' returned non-zero exit status 1

Timeline should use template branch instead of commit branch to identify job branch

Currently timeline overview and job details pages use job commit branch to identify which branch job belongs to. This behavior causes issues in few cases:

  1. Some jobs does not have commit, but still belongs to branch
  2. /v1/service?branch=master filter jobs by template's branch not commit's branch.

Timeline overview and job details pages should use template branch instead of commit branch

Improvements to logs endpoints (job and app)

I have grievances with service/<service_id>/logs as well as the deployments/<deployment_id>/livelog endpoints.

  1. they follow live containers by default, with no option to not follow. The ability to not follow is needed by the CLI
  2. inconsistency between endpoint (i.e. livelog vs. logs)
  3. it uses text/event-stream as the content type. the world has moved on to websockets.
  4. depending on the timing of when user/UI hits the job logs endpoint, there may be no logs when the container is in a transitioning phase -- ideally we could hold on to the connection while we wait and retry asking kubernetes/AWS for the logs.
  5. inconsistency between the log format for live log vs. completed containers, which uses json log format. The json log format is irritating to parse for most use cases, and should not be the default behavior.

All of these do not make for a consistent/friendly API.

on_pull_request policy triggered jobs should use template from incoming branch instead of destination

The following scenario happened where a developer made change to both the build scripts, and the argo build template which invoked the corresponding changes to the build script. Both changes need to execute in tandem.

Currently our policy trigger will correctly use the source code of the incoming commit during checkout, but it will not use the templates from the incoming commit. In the above scenario, the job immediately fails because of the mismatch.

This issue is to improve policy execution to use the templates from the incoming commit. This may not be possible to do if the repo is not attached to the cluster, but should be possible if it is. The policy trigger should understand to use the policy ID of the incoming repo/branch when submitting the job.

Repo information is not populated when workflows are accessed directly

Steps to repro:

  1. Create a template that does not have the COMMIT as a parameter and commit it into the .argo directory.
---
type: container
version: 1
name: my-test-checkout
description: Checks out a source repository to /src
resources:
  mem_mib: 500
  cpu_cores: 0.1
image: argoproj/argoscm:v2.0
command: ["axscm"]
args: ["clone", "%%inputs.parameters.REPO%%", "/src"]
inputs:
  parameters:
    REPO:
      default: "%%session.repo%%"
outputs:
  artifacts:
    CODE:
      path: /src
  1. Since, this workflow does not have a COMMIT param, it doesn't show up when the user clicks on a commit and then clicks "Create new job". The only way to access this is to go directly to the "Templates" option on the left side panel.

  2. Click on the Templates and select the one just created above.

  3. Click on the "+" sign to go to the "Review workflow parameters" page.

  4. Notice that the repo information is not populated. It continues to show %%session.repo%%. If the workflow is run, it fails because there is no such repo as %%session.repo%%.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.