Giter Site home page Giter Site logo

kestra-io / kestra Goto Github PK

View Code? Open in Web Editor NEW
6.4K 60.0 353.0 35.13 MB

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

Home Page: https://kestra.io

License: Apache License 2.0

Java 76.26% HTML 0.06% JavaScript 3.66% Vue 18.49% Dockerfile 0.02% Shell 0.03% Batchfile 0.07% SCSS 0.77% CSS 0.05% PLpgSQL 0.47% Makefile 0.12%
workflow workflow-engine orchestration scheduler data-pipeline elt etl data data-engineering data-orchestration

kestra's Introduction

Kestra workflow orchestrator

Event-Driven Declarative Orchestrator

Last Version License Github star
Kestra infinitely scalable orchestration and scheduling platform Slack

twitter   linkedin   youtube  


Get started in 4 minutes with Kestra

"Click on the image to get started in 4 minutes with Kestra."

Live Demo

Try Kestra using our live demo.

What is Kestra

Kestra is a universal open-source orchestrator that makes both scheduled and event-driven workflows easy. By bringing Infrastructure as Code best practices to data, process, and microservice orchestration, you can build reliable workflows and manage them with confidence.

In just a few lines of code, you can create a flow directly from the UI. Thanks to the declarative YAML interface for defining orchestration logic, business stakeholders can participate in the workflow creation process.

Kestra offers a versatile set of language-agnostic developer tools while simultaneously providing an intuitive user interface tailored for business professionals. The YAML definition gets automatically adjusted any time you make changes to a workflow from the UI or via an API call. Therefore, the orchestration logic is always managed declaratively in code, even if some workflow components are modified in other ways (UI, CI/CD, Terraform, API calls).

Adding new tasks in the UI

Key concepts

  1. Flow is the main component in Kestra. It's a container for your tasks and orchestration logic.
  2. Namespace is used to provide logical isolation, e.g., to separate development and production environments. Namespaces are like folders on your file system — they organize flows into logical categories and can be nested to provide a hierarchical structure.
  3. Tasks are atomic actions in a flow. By default, all tasks in the list will be executed sequentially, with additional customization options, a.o. to run tasks in parallel or allow a failure of specific tasks when needed.
  4. Triggers define when a flow should run. In Kestra, flows are triggered based on events. Examples of such events include:
    • a regular time-based schedule
    • an API call (webhook trigger)
    • ad-hoc execution from the UI
    • a flow trigger - flows can be triggered from other flows using a flow trigger or a subflow, enabling highly modular workflows.
    • custom events, including a new file arrival (file detection event), a new message in a message bus, query completion, and more.
  5. Inputs allow you to pass runtime-specific variables to a flow. They are strongly typed, and allow additional validation rules.

Extensible platform via plugins

Most tasks in Kestra are available as plugins, but many type of tasks are available in the core library, including a.o. script tasks supporting various programming languages (e.g., Python, Node, Bash) and the ability to orchestrate your business logic packaged into Docker container images.

To create your own plugins, check the plugin developer guide.

Rich orchestration capabilities

Kestra provides a variety of tasks to handle both simple and complex business logic, including:

  • subflows
  • retries
  • timeout
  • error handling
  • conditional branching
  • dynamic tasks
  • sequential and parallel tasks
  • skipping tasks or triggers when needed by setting the flag disabled to true.
  • configuring dependencies between tasks, flows and triggers
  • advanced scheduling and trigger conditions
  • backfills
  • blueprints
  • documenting your flows, tasks and triggers by adding a markdown description to any component
  • adding labels to add additional metadata to your flows such as the flow owner or team:
id: getting_started
namespace: dev

description: |
  # Getting Started
  Let's `write` some **markdown** - [first flow](https://t.ly/Vemr0) 🚀

labels:
  owner: rick.astley
  project: never-gonna-give-you-up

tasks:
  - id: hello
    type: io.kestra.core.tasks.log.Log
    message: Hello world!
    description: a *very* important task
    disabled: false
    timeout: PT10M
    retry:
      type: constant # type: string
      interval: PT15M # type: Duration
      maxDuration: PT1H # type: Duration
      maxAttempt: 5 # type: int
      warningOnRetry: true # type: boolean, default is false

  - id: parallel
    type: io.kestra.core.tasks.flows.Parallel
    concurrent: 3
    tasks:
      - id: task1
        type: io.kestra.plugin.scripts.shell.Commands
        commands:
          - 'echo "running {{task.id}}"'
          - 'sleep 2'
      - id: task2
        type: io.kestra.plugin.scripts.shell.Commands
        commands:
          - 'echo "running {{task.id}}"'
          - 'sleep 1'
      - id: task3
        type: io.kestra.plugin.scripts.shell.Commands
        commands:
          - 'echo "running {{task.id}}"'
          - 'sleep 3'

triggers:
  - id: schedule
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "*/15 * * * *"
    backfill:
      start: 2023-10-05T14:00:00Z

Built-in code editor

You can write workflows directly from the UI. When writing your workflows, the UI provides:

  • autocompletion
  • syntax validation
  • embedded plugin documentation
  • example flows provided as blueprints
  • topology view (view of your dependencies in a Directed Acyclic Graph) that get updated live as you modify and add new tasks.

Stay up to date

We release new versions every month. Give the repository a star to stay up to date with the latest releases and get notified about future updates.

Star the repo

Getting Started

Follow the steps below to start local development.

Prerequisites

Make sure that Docker is installed and running on your system. The default installation requires the following:

Launch Kestra

Download the Docker Compose file:

curl -o docker-compose.yml https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml

Alternatively, you can use wget https://raw.githubusercontent.com/kestra-io/kestra/develop/docker-compose.yml.

Start Kestra:

docker compose up -d

Open http://localhost:8080 in your browser and create your first flow.

Hello-World flow

Here is a simple example logging hello world message to the terminal:

id: getting_started
namespace: dev

tasks:
  - id: hello_world
    type: io.kestra.core.tasks.log.Log
    message: Hello World!

For more information:

Plugins

Kestra is built on a plugin system. You can find your plugin to interact with your provider; alternatively, you can follow these steps to develop your own plugin.

For a full list of plugins, check the plugins page.

Here are some examples of the available plugins:

Airbyte Cloud Airbyte OSS Amazon Athena
Amazon CLI Amazon DynamoDb Amazon Redshift
Amazon S3 Amazon SNS Amazon SQS
AMQP Apache Avro Apache Cassandra
Apache Kafka Apache Pinot Apache Parquet
Apache Pulsar Apache Spark Apache Tika
Azure Batch Azure Blob Storage Azure Blob Table
CSV ClickHouse Compression
Couchbase Databricks dbt cloud
dbt core Debezium Microsoft SQL Server Debezium MYSQL
Debezium Postgres DuckDb ElasticSearch
Email Fivetran FTP
FTPS Git Google Big Query
Google Pub/Sub Google Cloud Storage Google DataProc
Google Firestore Google Cli Google Vertex AI
Google Kubernetes Engines Google Drive Google Sheets
Groovy Http JSON
Julia Jython Kubernetes
Microsoft SQL Server Microsoft Teams MongoDb
MQTT MySQL Nashorn
NATS Neo4j Node
OpenAI Open PGP Oracle
PostgreSQL Power BI PowerShell
Python Rockset RScript
SFTP ServiceNow Singer
Shell Slack Snowflake
Soda SSH Telegram
Trino XML Vertica

This list is growing quickly and we welcome contributions.

Community Support

If you need help or have any questions, reach out using one of the following channels:

  • Slack - join the community and get the latest updates.
  • GitHub discussions - useful to start a conversation that is not a bug or feature request.
  • Twitter - to follow up with the latest updates.

Contributing

We love contributions, big or small. Check out our contributor guide for details on how to contribute to Kestra.

See our Plugin Developer Guide for details on developing and publishing Kestra plugins.

License

Apache 2.0 © Kestra Technologies

kestra's People

Contributors

aliczin avatar anna-geller avatar ben8t avatar brahimalm avatar brian-mulier-p avatar corentinghigny avatar dependabot[bot] avatar edwardli-coder avatar eregnier avatar fhussonnois avatar kination avatar kriko avatar loicmathieu avatar matvey-ososkov avatar mengzhuo avatar npranav10 avatar nsteinmetz avatar olla-dev avatar sagunn-echo avatar shrutimantri avatar simonpic avatar skraye avatar smantri-moveworks avatar tanay1337 avatar tchiotludo avatar teqsk1514 avatar v1nc3n4 avatar yuri-lima avatar yuri1969 avatar yvrng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kestra's Issues

Handle large log output

Seems that with Kafka runner, some very long line will kill a consuming worker silently.
No error will be in logs and the consumer is remove for Kafka silently.

Really need to investigate

hello test

Task create temporary file like that :

        // temp file
        File tempFile = File.createTempFile(this.getClass().getSimpleName().toLowerCase() + "_", ".csv");
        ObjectOutputStream output = new ObjectOutputStream(new FileOutputStream(tempFile));

It's mostly done for now to handle in order to forward to StorageInterface with a : runContext.putFile(tempFile).getUri()

We need to find a better way to have StorageObject directly without intermediate temporary files or at least delete the temporary files.

Inconsistent metric

In some metrics, namespace is in a label "namespace" (eg: kestra_org_kestra_core_tasks_debugs_return_duration_seconds_count) and in other, it is in the label "namespace_id" (eg: kestra_executor_execution_duration_seconds_count)

Display inputs on the general page

On the main page, add a inputs table below the actual with key / value of inputs.
If possible, add a download link on any inputs of type file

Trigger execution from webui don't display the execution when some lag

When we create the execution from the webui, the execution is created and return directly.
But the indexer could not have the time to index the execution on the backend.
This lead by a 404 on /api/v1/executions/{{executionId}} and the ui freeze here.

Difficult use case here, I think the better will be to use only the execution return during the creation and reach the updated information from the listen url (without trying to reach the url below).
This will lead to no dependency on indexer on this part of the webui.

Execution status randomly stuck in "created" state just after a flow is triggered

After triggering a flow, the Ui calls 2 http routes, first executions/{executionId}/trigger then executions/{executionId}/follow

This problem occurs sometimes when an execution is so fast that the second call (follow) is done when the execution is already terminated.

Currently the follow route, does not handle this case and always wait for the passed execution to terminate ...

What must be done :

  • Update the follow route to handle this case by checking the execution state and returning without waiting ...

StorageObject have to many ///

Currently the uri generated by StorageInterface have to many slash, example :
floworc:////org/floworc/tests/inputs/executions/test/inputs/file/application.yml.

We need to have only : floworc:///org/floworc/tests/inputs/executions/test/inputs/file/application.yml. like that : scheme://{nohost}/org/floworc

Add "restartable" property to TaskRun

Add a "restartable" property to TaskRun.

Currently the fact that a taskRun can be restarted for an execution is computed on Ui side ...

Ui Side : See 'Restart' component

Capture full stacktrace to logs

For now only the message exception is saved on execution.
Capture recursively cause message and add the full stacktrace is TRACE log

This is a test issue to validate my self hosted instance

id: errors
namespace: org.floworc.tests

tasks:
- id: failed
  type: org.floworc.core.tasks.scripts.Bash
  commands:
  - 'exit 1'
  errors:
  - id: 2nd
    type: org.floworc.core.tasks.debugs.Echo
    format: second {{task.id}}

The errors will not be triggers, need to be global to work or on FlowableTask

Start from source

Hello~
Is there a guide to start working via source code? I couldn't find in document...

Thanks.

Every command on cli will generate temp folders

for every command on cli, this code from TestCommand will be run :

    @SuppressWarnings("unused")
    public static Map<String, Object> propertiesOverrides() {
        try {
            Path tempDirectory = Files.createTempDirectory(TestCommand.class.getSimpleName());

            return ImmutableMap.of(
                "kestra.repository.type", "memory",
                "kestra.queue.type", "memory",
                "kestra.storage.type", "local",
                "kestra.storage.local.base-path", tempDirectory.toAbsolutePath().toString()
            );
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

and will create a temp folders, must be created only on test command

Update a flow and changing id or namespace failed

Changing the namespace or the id will led to a 404 on the update from the webui.
It's a normal behaviour on the FlowControler we ask and update on a flow that don't exists.

This led to 2 issues :

  • 404 are not catch and loading bar keep growing, this must be capture and displayed as error.
  • How can we rename a flow ? is this a possible action ?

Since there is a lot of "foreign keys" (see #60), we need to disabled the edition of namespace and flow id for now (waiting #60)

Just curious about choice of language

Hi,

I wanted to reach out over email, but your site is under development. I'm curious about one thing -

Why are most of the orchestration engines I see on GitHub built using Java? This seems to be a common trait among them.

Could you please detail some of the reasons why you chose Java over, say, go or python? Is it just that you are more familiar with the language?

Log must be realtime

For now, log are emitted only at the end of the current task.
They must be real time and emitted as soon as possible to be displayed on the web ui

Display revision on flow

Display a new tab on flow with a list of revision.
On click one revision, display a popup with comparison between previous & selected version.
Add a select on top in order to allow user to change revision for comparison.

Cleanup RunContext

Refactor the RunContext, the object is mutable right now and will not be usable in whole case.
For example : ApplicationContext is need for some method and can be null.

Maybe implement a WorkerRunContext that extends RunContext

final test

Allow to configure :

  • errors
  • timeout
  • retry

for :

  • global configuration
  • per tasks type configuration
  • per namespace
  • per flow

Switch task with no possible case throw a NPE

If there isn't any possibility in the Switch FlowableTask, the task throw an NPE, and the task remain "running"

2019-12-26 15:12:01,892 INFO  mory-queue-4 l.parent-seq [execution: 3Pf6uUWsqTo2jjynz1T9As] [taskrun: 2M4fE75in0WRObmtLX8lTJ] Task parent-seq (type: Switch) started
Exception in thread "memory-queue-7" java.lang.NullPointerException
	at org.kestra.core.runners.FlowableUtils.resolveState(FlowableUtils.java:76)
	at org.kestra.core.tasks.flows.Switch.resolveState(Switch.java:58)
	at org.kestra.core.runners.AbstractExecutor.childWorkerTaskResult(AbstractExecutor.java:60)
	at org.kestra.runner.memory.MemoryExecutor.lambda$handlChild$5(MemoryExecutor.java:133)
	at java.base/java.util.stream.ReferencePipeline$11$1.accept(ReferencePipeline.java:441)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
	at org.kestra.runner.memory.MemoryExecutor.handlChild(MemoryExecutor.java:136)
	at org.kestra.runner.memory.MemoryExecutor.lambda$run$1(MemoryExecutor.java:59)
	at org.kestra.runner.memory.MemoryQueue.lambda$emit$0(MemoryQueue.java:45)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

This must failed the task properly

API route for Flow revisions

FlowRepository implements a method :

    List<Flow> findRevisions(String namespace, String id);

It would be great to be able to use it 😄 (I need to find a culprit for wasting 3 hours of my life - might be me, probably me, but it least I'll be able to sleep on both ears)

Prerequisite for #62

⭐ Allow users to rename Flow namespace & id

Since there is a lot of "foreign keys" on flow namespace + flow id, we need to implement a special method to move a flow and change the name.

The ui must be like on github with its "danger zone" (it will break all app using the flow).
The move method must changed the flow namespace & id, since the operation can be very long, it must be async.

Also, when a user want to change the namespace it encounters multiple “Page not found” before understanding that it wasn’t possible. We should at least improve the error message.

Command test don't stop the process at end of the flow

running a simple flow ./kestra test flow.yaml that open a connection to a server for example (keep resource open) will hang the command test (reproduced with SlackExecution command)

We need to find a way to close resource :

  • Keep resource open for a frequent tasks can lead to some performance improvements.
  • But that can lead to overflow the destination server with many connection open and never used, also that can lead to some exception when the server close the connection.

I think the proper way will to close all connection for now.

FlowableTask : handle Exception properly

For now, some exception can be throw during evaluation of any method of FlowableTask interface.
If this exception occured, it will break the flow and maybe break the application.

We need to properly capture these exception to be resilient in case of an invalid flow

Uniformise version number in the whole app

Actually, some files have version hardcoded :

  • org/kestra/webserver/Application.java
  • org/kestra/cli/App.java
  • package.json

We need to find a way to only rely on gradle.properties

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.