Giter Site home page Giter Site logo

multiprocessio / datastation Goto Github PK

View Code? Open in Web Editor NEW
2.9K 23.0 109.0 158.61 MB

App to easily query, script, and visualize data from every database, file, and API.

Home Page: https://datastation.multiprocess.io

License: Other

TypeScript 49.89% Python 0.84% HTML 0.03% CSS 3.03% Shell 1.25% JavaScript 20.73% PowerShell 0.22% PLpgSQL 0.04% Dockerfile 0.02% Go 23.52% Jsonnet 0.43%
sql mysql postgresql sqlite3 cockroachdb mariadb sql-server data-visualization data-analysis nginx

datastation's People

Contributors

0michalsokolowski0 avatar akhenakh avatar dependabot[bot] avatar eatonphil avatar fritzgrabo avatar gl28 avatar jimsparkman avatar krymtkts avatar posrabi avatar sajuno avatar steirico avatar tooolbox avatar xwjdsh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datastation's Issues

Allow referencing panels by name

User reported it would be more convenient to reference panels by name: DM_getPanelByName("Untitled Panel #0"). This is an improvement on fetching by integer index because the name doesn't change as panels are added/moved and because it contains more info in the function call than just fetching an integer.

https://discord.com/channels/852998104931631115/852998104931631118/892793099460419635

The only concern with doing this is making sure that the panel names are unique. Or maybe it doesn't matter and we just fetch the first one by that name if fetched by name.

Support .zip, .tar, .tar.gz files and HTTP endpoints

Right now you need to manually script unarchiving these files. It would be better if DataStation could unarchive them and deal with a particular interior file directly in the workflow.

Dealing with the interior file could be solved in a similar manner as when solving the particular Excel sheet problem when there is more than one Excel sheet since that is not handled well right now.

Add support for querying CrateDB

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Stream process stdout/stderr back when running any panel eval

Now that all panels are run in a subprocess this should be even easier.

The tricky part here is that streaming data back will work differently in server mode vs desktop mode. In desktop mode you just have to work around the rpc library in desktop/rpc.ts and desktop/preload.ts. But for server mode this might mean using websockets. Except that the streaming is only in one direction. So maybe SSE is ok in this case which may be simpler than websockets. Not sure.

[DSQ] Passing multiple files as different "tables" and doing joins

For the DSQ tool:

Seems like this tool would be a lot more useful if I could pass multiple files, naming each of them as a "table" name, and then doing joins on the data. Would be especially useful if one could do this with different datatype, maybe I have a couple of Excel documents, a csv and some json data coming from different datasources and I'd like to run a query over all of them.

Current workflows involves writing data importers to get all of the data into a proper SQL database server, and then running queries against to generate a report.

Followup tracking after #158

  • Test Snowflake
  • Test Prometheus
  • Fix macos http test

Tutorials:

  • Influx, Influx2 and FluxQL
  • Prometheus
  • Cassandra
  • Snowflake
  • SQLite (not read-only copy when doing remote proxy)

Flatten all objects being used in database panels

Currently any objects pulled into database panels have nested fields skipped. That's not ideal since there are plenty of good reasons for nested structures.

This will require bringing the Go port of shape completely up to date with the original JavaScript implementation since the Go shape library completely skips nested fields while the JS one does not.

Once we know about nested fields in the shape the DB panels can do the collapse when they ingest panels.

This means you'd be able to query like SELECT "x.y.z" FROM DM_getPanel('My JSON rows'). You'll need to quote the field though with dots in it because the column name will literally have dots in it which would conflict with the natural SQL parser unless the column is escaped.

Required for multiprocessio/dsq#10

Use panel.id (uuid) for panelSource instead of panel index

Everywhere that a panel references another one in configuration (table, graph, and visual transform panels) they use the index of the panel. This is fragile and I can't remember any reason for it to be like this. Instead these panels should reference panel.id, a uuid, instead of the panel index. This way when panels are reordered or added/removed the reference stays constant.

HTTPS, TLS support corporate certificate issuer

I try to access a resource through HTTPS using a certificate signed by my compagny. This certificate authority (CA) is not known by datastation. I fails

Error evaluating panel:
Error: [ERROR] 2021-12-22T09:11:31.713Z request to https://internal.server/resource failed, reason: unable to get local issuer certificate 
    at ClientRequest.<anonymous> (...\datastation-win32-x64-0.4.0\resources\app.asar\node_modules\node-fetch\src\index.js:95:4)
    at ClientRequest.emit (node:events:394:28)
    at TLSSocket.socketErrorListener (node:_http_client:447:9)
    at TLSSocket.emit (node:events:394:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

    at ChildProcess.<anonymous> (...\datastation-win32-x64-0.4.0\resources\app.asar\desktop\panel\eval.ts:120:28)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

A solution could be to allow to bypass server certificate verification or better, to be able to add the CA in the tool.

0.2.0 release checklist

Maybe:

  • Break out relative and fixed dropdowns into radio buttons
  • Add info helper to inputs
    • Clarify that time range filter gets converted to UTC
  • Restrict SQL editor to one line

Visual transform maxes out at 4GB (and probably lower than that)

Because of Node's default 4GB limits and the 1GB string limit. Need to find some ways around this.

One solution would be to build a fuzzy JSON parser for large data files where individual rows are less than 1GB but there are millions/billions of rows. The fuzzy parser could be aware of only the outer array and pass the internal objects to regular JSON.parse.

Or maybe we could store arrays as json newline.

Add support for querying Cassandra

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Column with name `alter` crashes Visual Transform panel

Array of
  Object with
    'baum_id' of
      number,
    'anlage' of
      string,
    'anlage_id' of
      number,
    'lateinischer_name' of
      string or
      null,
    'deutscher_name' of
      string or
      null,
    'alter' of
      number or
      null

Reported by schmidt_fu on Discord.

Missing integration test checklist

There are missing integration tests for:

  • Connecting to databases over SSH
  • Reading SQLite files over SSH (database panel)
  • Prometheus
  • Snowflake

Go runner compatibility checklist

Databases

  • Oracle support tested
  • Elasticsearch support
  • Snwoflake tested
  • Prometheus support
  • Influx support

General

  • Test SQL over SSH Program panel

Elasticsearch HTTPS Connection

Hi,

Upon attempting to query an ElasticSearch server configured to only support HTTPS (no protocol specified in the Host config value for the data source):

Error: 
[INFO] 2022-01-13T19:05:20.118Z DataStation Community Edition Panel Runner 0.5.0 DEBUG
[INFO] 2022-01-13T19:05:20.273Z Connecting to http://xyz.us-east-1.aws.found.io:9243 for elasticsearch query
[INFO] 2022-01-13T19:05:20.275Z Elasticsearch request: {"size":1000,"index":["par_document_promoted_stag"],"q":"","body":{}}
[ERROR] 2022-01-13T19:05:20.431Z Client sent an HTTP request to an HTTPS server.

Upon attempting to query an ElasticSearch server configured to only support HTTPS (https:// specified in the Host config value for the data source):

Error: 
[INFO] 2022-01-13T19:08:17.683Z DataStation Community Edition Panel Runner 0.5.0 DEBUG
[INFO] 2022-01-13T19:08:17.837Z Connecting to https://64c5f893d5fc400f9d3354820a0dbe81.us-east-1.aws.found.io:9243 for elasticsearch query
[INFO] 2022-01-13T19:08:17.838Z Elasticsearch request: {"size":1000,"index":["par_document_promoted_stag"],"q":"","body":{}}
[ERROR] 2022-01-13T19:08:17.950Z Client sent an HTTP request to an HTTPS server.

Add support for querying AWS Athena

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Add support for scatterplots

Unlike all the existing charts that graph a string (most likely) against a number, this graphs numbers on both axises. So in addition to the configuration changes needed for passing the right field to chartjs, the PR for this should also change the "preferred type" to "number" for the x axis when the chart type is scatter plot.

Investigate runner slowness

Right now the runner runs through Electron which makes panel evaluation take way longer than it needs to.

I did this in 0.2.0 because I wasn't sure if there was a real Node process bundled with Electron. But it is way too slow this way.

Also, breaking the runner into its own directory/package and bundling with pkg may significantly reduce the time to unzip on Windows since the deps will be bundled into a binary.

Link to Online Environment is broken

In the feature list of the datastation.multiprocess.io, the link to app.datastation.multiprocess.io os broken.

The href of the <a>-tag starts with https//app... (note the missing colon), and should be replaced with https://app...

image

image

Release 0.3.0 checklist

Definitely:

  • Systemd service for running the server and export crons
  • Server install instructions/documentation
  • TLS not required, migrations should be handled by the server
  • Builtin functions explainer is impossible to read
  • Copy datastation-documentation into site
  • Docs portion of site should have link back to repo and should have working header anchor/links

Maybe?

  • Fix up UI for scheduling and dashboards
  • Invalid dependent results in filter aggregate panel are not formatted correctly, whole stacktrace is shown
  • Changing tabs is slow
  • Server reloads a few times when loading onto a project

Bugs

  • Editing input text hangs pretty frequently, maybe on save

Draft of release notes

  • Dashboard mode
  • Scheduled email exports
  • Faster panel runs
  • Named DM_getPanel('my panel name') calls
  • All panels details are collapseable
  • Install scripts and built artifacts for server deploy
  • Graphing panel updates (size, unique color, line charts)

Test out coldbrew for load times compared to pyodide

Right now the inMemoryEval for Python uses pyodide. It takes 10-20 seconds to load though. All python panels fail while it's loading.

I just heard about https://github.com/plasticityai/coldbrew so it might be worth comparing load times of coldbrew and pyodide.

Importantly, you must still be able to pass JavaScript objects to the python panel reasonably and send Python objects to JavaScript reasonably (i.e. with DM_getPanel, DM_setPanel). Pyodide improved on Brython which didn't allow you to do obj['x'] in Python from DM_getPanel objects it only allowed you to do obj.x which is not very Pythonic so a bad user experience.

Add support for querying Splunk

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Sign windows builds

I don't think this is something an external contributor can help with because most of it is me signing up for all the stuff and putting keys in Github Actions.

Program eval can't handle medium sized results (200-500mb)

image

Error evaluating panel:
Error: [ERROR] 2021-12-04T19:23:16.794Z Cannot create a string longer than 0x1fffffe8 characters 
    at Buffer.toString (node:buffer:783:17)
    at evalProgram (C:\Users\philn\multiprocess\datastation\desktop\panel\program.ts:85:34)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at Object.handler (C:\Users\philn\multiprocess\datastation\desktop\panel\eval.ts:181:17)
    at main (C:\Users\philn\multiprocess\datastation\desktop\runner.ts:100:24)

    at ChildProcess.<anonymous> (C:\Users\philn\multiprocess\datastation\desktop\panel\eval.ts:120:18)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.