commsor / titanoboa Goto Github PK

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Home Page: https://titanoboa.io

License: GNU Affero General Public License v3.0

Clojure 99.82% Shell 0.15% Batchfile 0.03%

workflow workflow-engine esb distributed distributed-systems ipaas integrations jvm service-bus workflow-platform

titanoboa's People

Contributors

Stargazers

Watchers

Forkers

c0debrain koolhead17 ebottabi ivanpierre nickstares gitter-badger huiwenhan udhayamgit omarajmi ideaplexus mznsolucoes lucyio jaywalker76 lesliesibbs nsk muniao alex-doerfler freezeburger kapware dermonarch marqueeeeeee ryaryu dat7e aadrian pushpen zeta1999 spepakay cybernetics amarpaka jeffatatl prabhugopal bookvik moayyaed michcim joaonart ahmadfikrimasyhur jiegao1977 khurhao em3ndez sunhan2emily kotlenik yalayo topsgod rmvids nagyist jiyouyou125

titanoboa's Issues

ship extracted ext-depedencies file (and updated start scripts)

As discussed in #26

Add API endpoint to stop particular worker

currently the api only has endpoint to add new workers:
https://github.com/mikub/titanoboa/blob/1d596c2e7fe0d94e1581e39732d5923177ebf9c3/src/clj/titanoboa/handler.clj#L96

Terminating a worker is really a needed feature esp. in case a job gets stuck etc.

Step retries should take precedence over triggering :next step defined on "ERROR" condition

Currently retries are performed if a step throws an exception and there is no next step with "ERROR" condition: https://github.com/mikub/titanoboa/blob/cluster/src/clj/titanoboa/processor.clj#L577

Though this design is good and has its use cases it would be great if the behaviour was configurable. Trying a N retries first before moving on to the "ERROR" step would be an ideal setup for circuit breaker pattern. E.g. retry 3 times and if all fail then sleep for 5 sec before retrying again.

When condition in next step is a Boolean it is not shown in UI

Consider adding support for Swagger or OpenAPI

As mentioned in #18 this might be worthwhile- e.g. using https://github.com/metosin/compojure-api or something similar.

Test Github Integration

keep calm, this is just a test!

workload-fn with java lambda expression cannot access external maven dependencies

Additional external maven dependencies loaded during runtime are accessible to all workflow step functions - i.e. either to clojure library functions, java classes or clojure anonymous functions.

They however do not seem to be visible to java lambda function code (see 4 in wiki)

Steps to reproduce:

I load new maven dependencies (using GUI or just by updating ext-dependencies file) during runtime.
I then prototype a new workflow step with a java lambda expression and reference the library loaded in step 1
I get an error:

Compiler messages:
ERROR: package xxx.xxx.xxx does not exist
Compiler standard error output:

Expected behaviour: no error in step 3, instead the java lambda code should see the library on the classpath.

Example:

load new maven dependency [com.github.librepdf/openpdf "1.3.11"]
create a new workflow step with following java expression:

p -> { 
        com.lowagie.text.Document doc = new com.lowagie.text.Document();
        java.util.Map m = new java.util.HashMap();
        m.put("doc", doc);
        return m;
      }

error message is as follows:

Compiler messages:
ERROR: package com.lowagie.text does not exist
Compiler standard error output:

If I don't use java lambda and just switch to clojure fn I can instantiate com.lowagie.text.Document without issues:

(fn [p]
{:doc (com.lowagie.text.Document.)})

Support strings (with spaces) as step type names

Currently step types' names are derived from their folder's or file's name and should not contain strings.
To make UX nicer it is desirable to have step type with more humanly readable names.

This could be e.g. solved with adding a :name key-value pair into the step type's map and subsequently showing that in UI.

Support query opts in jdbc tasklet

Currently jdbc tasklet does not support passing query options to clojure.java.jdbc/query function (see doc).

This would be highly beneficial as it would allow people to use :row-fn and other options making the DB querying super flexible and removing any need for further ETL steps.

does not start

i installed it with GUI according to to readme.
Ubuntu, openjdk 11.0.3

root@51ccf4932c3d:~/titanoboa# ./start.sh
INFO [main] - Logging initialized @2396ms
INFO [main] - Running titanoboa with parameters: nil
INFO [main] - Starting Titanoboa server...
INFO [main] - Loading database extensions...
INFO [main] - Loading system definitions...
INFO [main] - Loading tasklet definitions...
INFO [main] - Initialization: Copying external dependencies from read-only file on classpath...
INFO [main] - Initialization: Loading external dependencies from /root/titanoboa
INFO [main] - Loading external dependencies:
[[io.titanoboa.tasklet/pdf 0.1.0] [io.titanoboa.tasklet/aws-s3 0.1.0]]
from repositories {central https://repo1.maven.org/maven2/, clojars https://clojars.org/repo}
INFO [main] - Requiring external namespaces: [[io.titanoboa.tasklet.aws.s3] [io.titanoboa.tasklet.pdf]]
WARNING: read already refers to: #'clojure.core/read in namespace: io.titanoboa.tasklet.aws.s3, being replaced by: #'io.titanoboa.tasklet.aws.s3/read
INFO [main] - Starting to watch external dependencies file for changes: /root/titanoboa
INFO [main] - Hello, I am core.async server-config and I am being loaded...
Exception in thread "main" Syntax error compiling var at (5:17).
at clojure.lang.Compiler.analyzeSeq(Compiler.java:7114)
at clojure.lang.Compiler.analyze(Compiler.java:6789)
at clojure.lang.Compiler.analyze(Compiler.java:6745)
at clojure.lang.Compiler$MapExpr.parse(Compiler.java:3104)
at clojure.lang.Compiler.analyze(Compiler.java:6797)
at clojure.lang.Compiler.analyze(Compiler.java:6745)
at clojure.lang.Compiler$MapExpr.parse(Compiler.java:3104)
at clojure.lang.Compiler.analyze(Compiler.java:6797)
at clojure.lang.Compiler.analyze(Compiler.java:6745)
at clojure.lang.Compiler$MapExpr.parse(Compiler.java:3104)
at clojure.lang.Compiler.analyze(Compiler.java:6797)
at clojure.lang.Compiler.analyze(Compiler.java:6745)
at clojure.lang.Compiler$InvokeExpr.parse(Compiler.java:3888)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:7108)
at clojure.lang.Compiler.analyze(Compiler.java:6789)
at clojure.lang.Compiler.analyze(Compiler.java:6745)
at clojure.lang.Compiler$InvokeExpr.parse(Compiler.java:3888)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:7108)
at clojure.lang.Compiler.analyze(Compiler.java:6789)
at clojure.lang.Compiler.analyze(Compiler.java:6745)
at clojure.lang.Compiler$BodyExpr$Parser.parse(Compiler.java:6120)
at clojure.lang.Compiler$FnMethod.parse(Compiler.java:5467)
at clojure.lang.Compiler$FnExpr.parse(Compiler.java:4029)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:7104)
at clojure.lang.Compiler.analyze(Compiler.java:6789)
at clojure.lang.Compiler.eval(Compiler.java:7173)
at clojure.lang.Compiler.load(Compiler.java:7635)
at clojure.lang.Compiler.load(Compiler.java:7582)
at clojure.core$load_reader.invokeStatic(core.clj:4087)
at clojure.core$load_string.invokeStatic(core.clj:4089)
at clojure.core$load_string.invoke(core.clj:4089)
at titanoboa.server$init_config_BANG_.invokeStatic(server.clj:104)
at titanoboa.server$init_config_BANG_.doInvoke(server.clj:98)
at clojure.lang.RestFn.invoke(RestFn.java:421)
at titanoboa.server$start_BANG_.invokeStatic(server.clj:138)
at titanoboa.server$start_BANG_.doInvoke(server.clj:129)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at titanoboa.server$_main.invokeStatic(server.clj:175)
at titanoboa.server$_main.doInvoke(server.clj:169)
at clojure.lang.RestFn.invoke(RestFn.java:397)
at clojure.lang.AFn.applyToHelper(AFn.java:152)
at clojure.lang.RestFn.applyTo(RestFn.java:132)
at titanoboa.server.main(Unknown Source)
Caused by: java.lang.RuntimeException: Unable to resolve var: titanoboa.system.local/local-core-system in this context
at clojure.lang.Util.runtimeException(Util.java:221)
at clojure.lang.Compiler$TheVarExpr$Parser.parse(Compiler.java:720)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:7106)
... 42 more

stop.sh script does not work properly on Ubuntu

It seems SIGTERM should be used there instead of SIGINT

Jobs in :initial state might not get evicted from job cache in clustered setup

Common scenario in a large cluster setup:
one node starts job and updates a state-agent accordingly:
https://github.com/mikub/titanoboa/blob/master/src/clj/titanoboa/processor.clj#L72

However this job id is never marked for eviction. This node will then broadcast state of this job as :initial even long after it has been finished and evicted from other node's cache.
This is not just a cosmetics issue, since it might potentially lead to uncontrolled cache growth in certain scenarios.

There is also downside to the obvious solution: if the :initial state of the job is evicted from cache BEFORE it gets to be processed by another node (e.g. in case of huge load), it will disappear from GUI completely. This still is an acceptable solution since this could be avoided if eviction interval is set properly and cluster is sized accordingly (and no DDoS happens)...

Allow using env variables to configure tboa

Currently mainly java system properties are used to configure tboa.
This makes configuration of docker images harder and sometimes requires rebuilding the docker image.

Add env properties lookup for tboa's server config path as well as external dependencies file.

add Dockerfile

I see there's an image here https://hub.docker.com/layers/titanoboa/titanoboa/0.9.0/images/sha256-367dac2bcc85717872a06646c0605a9ff339663d8b92ced748f9322b3be6b600?context=explore

but where's the Dockerfile?

Updates to the workflow repository may not be noticed by all nodes

It seems that on ECS Fargate/EFS cluster setup, the Directory watcher used in titanoboa.repo/RepoWatcherComponent does not get triggered every time a new workflow (aka job definition) revision is saved.
This leads to a situation where some nodes may have an obsolete view of repo:

e.g. node A sees that the latest :head revision of workflow X is 5
while node B sees that the latest :head revision of workflow X is 2

This is interesting since with EFS this DOES work on EC2/EFS setup. So it may be related to some idiosyncrasy of Fargate.

We can fix this by periodically refreshing the workflow repo atom.

Show health of each worker

whenever a worker finishes a step (or polls for a new step) refresh a counter of its last heartbeat to now. This could be kept in the systems' atom.

This way we will now if a worker thread failed or is stuck polling or on I/O etc.

MQ session pool macro does not return a session to the pool upon error

with-mq-sessionpool fn here should wrap its body in try and use finally block to return the session to the "pool" core.async channel.

External dependencies: imports are not visible from java lambdas

Imports specified in ext-dependencies (under :import key) are not visible from java lambdas.

Create folder property on workflow deifnition should be false by default

Currently :create-folder? workflow definition property is set to true unless explicitly set to false.
Change the default value to false to avoid unnecessary creation of job folders - since people may not know this setting exists they might tend to keep it on and unknowingly flood their file systems with lots of unnecessary (and empty) job folders.

Add ability for Workers to self-initate restart in case of a fatal error.

Normally when a worker encounters an error that can't be handled (e.g. an error is fired again from inside of the catch clause) the processing loop as well as worker thread will terminate.

Good example of this type of error is - in distributed setup - if the underlying MQ connection is terminated then basically all the workers stop and all workflow processing grinds to a halt. Currently such situation has to be monitored/discovered/handled by admins.

Proposed solution:
Add a property :restart-workers-on-error to systems so as it can be configurable to initiate a restart in case of such fatal error.

Add support for private s3 mvn repos

Currently private s3 mvn repo support is achievable via following workaround requiring rebuild:
https://github.com/mikub/titanoboa/wiki/Server-Configuration#using-private-s3-maven-repositories

It would be nice if this would work out of the box, currently people have to maintain a separate build/fork to make this work.

Add ability to suspend/resume workflow jobs

Though this is trivial in the local setup, in the distributed setup (e.g. on Rabbit MQ) this can get more complicated.
The implementation should be generic enough to allow for distributed setup support.

Database drivers added to ext-dependencies are not found

Titanoboa uses C3p0 connection pool and due to this bug swaldman/c3p0#105 any DB driver classes have to be loaded on the main class loader.

This means that any DB drivers you put into ext-dependencies will not be recognized by C3p0 and would have to be put into project.clj dependencies.
This requires re-build and is not in line with how Titanoboa is intended to be used (i.e. modular & easily extensible).

E.g. Postgres driver used in the following example https://github.com/mikub/titanoboa/wiki/Server-Configuration#non-core-systems has to be currently put into project.clj dependencies - or simply just copied into the /lib folder (the easiest workaround).

Update API docs: add Job Suspension operation & also add all the missing Archive and Systems API endpoints

Wiki talks about PATCH call to /systems/:system/jobs/:jobid in https://github.com/mikub/titanoboa/wiki/Designing-Workflows#suspendable but API docs do not mention them (https://github.com/mikub/titanoboa/wiki/API-Documentation)

Terminating a worker via thread interrupt can lead to a workflow job failure

Terminating a worker on which a job is running should be a transparent operation:
The worker should stop, nacking the current step processing and it should be then picked by a different worker and processed as if nothing happened (unless the workflow itself has been suspended).

It seems however that sometimes the thread interruptions makes the current step to fail: