Giter Site home page Giter Site logo

impresario's People

Contributors

kyleburton avatar rn-ci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

impresario's Issues

DBI Driver should prompt for user/password

If a username or password are not supplied, the DBI driver should prompt for them at the terminal (if the stdin is not a pipe). These prompts must be on stderr so that they do not interfere with the command line tools being used in a pipeline. In fact the prompting should only be performed when using the command line tools!

Re-Implement Tab Driver in C/FFI

Based on prior experience, re-implementing the tab driver in C and using it via Ruby's FFI will provide a very valuable performance improvement. This is especially true since the tab format is the default used by all of the tools.

Utility: atsort

Implement a sort utility. Allow it to shell out to GNU sort so we can leverage all of the engineering that it embodies.

  1. specify keys and ascend/descend for each key
  2. write out the data into a tab delimited file, prepending the necessary columns so that gnu sort can perform the sort.
  3. Shell out to GNU sort and read/stream the output from its stdout.

Use temporary files.

Ensure they're cleaned up when the driver is finished with them.

This will be challenging due to all of the concerns (tmp files have to be shared between the driver and gnu sort).

Make them easy to clean up by hand: eg: put them into a subdirectory.

Ensure that the temporary directory path can be specified by the user as an option to atsort (don't just assume /tmp has enough space).

Driver: Excel Spreadsheets

Implement a driver for excel workbooks.

URL needs to support specifying:

  • the worksheet to read from / write to
  • the row number and column name to start at -- data tables are often not at the top/left
    though that should be the default

Utility: atanlyze

Basic file / table analysis.

Try to guess the file's format.

Fill rates for the columns.

Identify highly duplicated values within columns.

Identify min/max/mean/mode/median column widths.

Use heuristics to try to guess at the column type.

Use that col type information to guess if the first line is a header.

State Tracking Bug: Serialized Contexts and Newly Added States

Impresario supports serialized contexts for long lived / running workflows. When the workflow definition changes (a new state is added), and the context is used by the new version, the state tracking throws an exception when attempting to update the tracking map. This should be handled more gracefully so that long running, evolving workflows are supported.

Implement Driver for Fixed Width format files

Implement a driver for fixed width files.

  • Support optional record separators - some fixed with files have no record boundaries.
    default is to support all of: CR, LF, CRLF
  • Support an optional header line, default is none (is this the most common default?)

The columns in the file will be specified as the query string parameter 'columns':

atcat "fixed://some/file.dat?columns=first:3,second:8,third:1"

Support Bzip Compression

All file based drivers should recognize when the file name ends in .bz2 and compress when writing and uncompress when reading.

Refactor file based drivers to have a common base class

Within this class implement most of the common functionality: opening, closing files, handling compression, common options (record separators, column separators, skip lines, if a header is specified or not, specification of column names when no header is present -- fold in optional support for widths).

Try to have 'parse_line' function be the only part that needs customization (if possible).

Support Gzip Compression

All file based drivers should recognize when files end in .gz and automatically decompress when reading and compress when writing.

Validation of states and transitions

Impresario does not currently validate that the state names mentioned in the transitions-to clauses are actually declared as states. There should be a validation check used as a guard before workflows are placed into the registry.

This should be a hard error at workflow registration time. This can also be performed at DSL declaration time by the defmachine DSL macro.

dotty generation: include triggers

Can we add trigger info to the dotty generated output so we can see on what nodes and on what edges triggers have been defined? Perhaps add a [!] to the node name or the edge name?

It'd be nice to show the difference between: enter triggers (on a node), exit triggers (on a node) and transition triggers (on an edge).

Driver for MS Access Databases

Ruby has libraries that wrap mdbtools, which is a C library for reading from MS Access database files. This isn't supported by ruby's dbi, so will have to be supported separately.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.