multiprocessio / datastation Goto Github PK

App to easily query, script, and visualize data from every database, file, and API.

Home Page: https://datastation.multiprocess.io

License: Other

TypeScript 49.89% Python 0.84% HTML 0.03% CSS 3.03% Shell 1.25% JavaScript 20.73% PowerShell 0.22% PLpgSQL 0.04% Dockerfile 0.02% Go 23.52% Jsonnet 0.43%

sql mysql postgresql sqlite3 cockroachdb mariadb sql-server data-visualization data-analysis nginx

datastation's People

Contributors

Stargazers

Watchers

Forkers

intermine-com-au weiplanet polaris2018 franksoong swufe-resource gegelulu cmrfrd huangweiboy2 forestlzj jimsparkman dosycorps andyrizzuto arizbb sourcesync stvhanna lyhiving suryatmodulus dconstan nickfantasy xiaolushuo arsalan0c domino43 thecocce zk461759809 nasingfaund syllogy giogkarakis cliffordfajardo alexrogalskiy namica cxz yibit laplacekorea mikemols manu-ayuda gl28 spread0x fhchina erickwang trmlxz fritzgrabo posrabi steirico hj3938 lunker2019 ylmist lisp2021 akhenakh hongshu-share jamesallain sergeyenin jacobjohansen amigomcu enzg tooolbox nineinchnick eriklangille jm4rc05 profencer quinndiggity ahamidi appdirectory rayleyva liaoyw muratgundes luciorubeens davgit sajuno ffengill fengweijp lumiqai 0michalsokolowski0 intensifier jayfeihe joeljaeschke fbiville erisonliang krymtkts mc-borscht xwjdsh realsummer spasticus74 thundergod77 qqqqtest123 wesley-yang mike-snyder-tfs jrcribb jeffamaxey genostack icodein gg-big-org joanlopez kokizzu itsharex deepak-tw kekewind ayuryshev jackysense 0sioryomclavde ictserv

datastation's Issues

hide columns defintion of table type

Allow referencing panels by name

User reported it would be more convenient to reference panels by name: DM_getPanelByName("Untitled Panel #0"). This is an improvement on fetching by integer index because the name doesn't change as panels are added/moved and because it contains more info in the function call than just fetching an integer.

https://discord.com/channels/852998104931631115/852998104931631118/892793099460419635

The only concern with doing this is making sure that the panel names are unique. Or maybe it doesn't matter and we just fetch the first one by that name if fetched by name.

Support .zip, .tar, .tar.gz files and HTTP endpoints

Right now you need to manually script unarchiving these files. It would be better if DataStation could unarchive them and deal with a particular interior file directly in the workflow.

Dealing with the interior file could be solved in a similar manner as when solving the particular Excel sheet problem when there is more than one Excel sheet since that is not handled well right now.

Add checkbox to automatically page through all HTTP requests

The user will need to pick a paging methodology, the couple that should be supported up front are:

Following a path in the result object (e.g. result.next_url)
Query parameter page number offset (user picks a page)?

Add support for querying bigquery

Add support for querying CrateDB

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Stream process stdout/stderr back when running any panel eval

Now that all panels are run in a subprocess this should be even easier.

The tricky part here is that streaming data back will work differently in server mode vs desktop mode. In desktop mode you just have to work around the rpc library in desktop/rpc.ts and desktop/preload.ts. But for server mode this might mean using websockets. Except that the streaming is only in one direction. So maybe SSE is ok in this case which may be simpler than websockets. Not sure.

Feature Request: Add support for Prometheus and PromQL

I am just introduced to https://datastation.multiprocess.io/.
This seems to be a really cool approach and I like that it supports mix and match between data sources and scripting languages.
I just thought it would be pretty cool if this project supports Prometheus and PromQL, https://prometheus.io/docs/prometheus/latest/querying/basics/ down the road.

[DSQ] Passing multiple files as different "tables" and doing joins

For the DSQ tool:

Seems like this tool would be a lot more useful if I could pass multiple files, naming each of them as a "table" name, and then doing joins on the data. Would be especially useful if one could do this with different datatype, maybe I have a couple of Excel documents, a csv and some json data coming from different datasources and I'd like to run a query over all of them.

Current workflows involves writing data importers to get all of the data into a proper SQL database server, and then running queries against to generate a report.

Followup tracking after #158

Test Snowflake
Test Prometheus
Fix macos http test

Tutorials:

Flatten all objects being used in database panels

Currently any objects pulled into database panels have nested fields skipped. That's not ideal since there are plenty of good reasons for nested structures.

This will require bringing the Go port of shape completely up to date with the original JavaScript implementation since the Go shape library completely skips nested fields while the JS one does not.

Once we know about nested fields in the shape the DB panels can do the collapse when they ingest panels.

This means you'd be able to query like SELECT "x.y.z" FROM DM_getPanel('My JSON rows'). You'll need to quote the field though with dots in it because the column name will literally have dots in it which would conflict with the natural SQL parser unless the column is escaped.

Required for multiprocessio/dsq#10

Graph labels are white (invisible)

This happened during dark mode addition, I forgot to fix it.

Use panel.id (uuid) for panelSource instead of panel index

Everywhere that a panel references another one in configuration (table, graph, and visual transform panels) they use the index of the panel. This is fragile and I can't remember any reason for it to be like this. Instead these panels should reference panel.id, a uuid, instead of the panel index. This way when panels are reordered or added/removed the reference stays constant.

New project UI not styled in dark mode

File styles in dark mode are weird

HTTPS, TLS support corporate certificate issuer

I try to access a resource through HTTPS using a certificate signed by my compagny. This certificate authority (CA) is not known by datastation. I fails

Error evaluating panel:
Error: [ERROR] 2021-12-22T09:11:31.713Z request to https://internal.server/resource failed, reason: unable to get local issuer certificate 
    at ClientRequest.<anonymous> (...\datastation-win32-x64-0.4.0\resources\app.asar\node_modules\node-fetch\src\index.js:95:4)
    at ClientRequest.emit (node:events:394:28)
    at TLSSocket.socketErrorListener (node:_http_client:447:9)
    at TLSSocket.emit (node:events:394:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21)

    at ChildProcess.<anonymous> (...\datastation-win32-x64-0.4.0\resources\app.asar\desktop\panel\eval.ts:120:28)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

A solution could be to allow to bypass server certificate verification or better, to be able to add the CA in the tool.

Expose tsv and ods in the UI

The runner already supports tsv and ods but they don't show up in the UI dropdown for file type selector. These should both be added there: https://github.com/multiprocessio/datastation/blob/main/ui/components/ContentTypePicker.tsx.

0.2.0 release checklist

Maybe:

Break out relative and fixed dropdowns into radio buttons
Add info helper to inputs
- Clarify that time range filter gets converted to UTC
Restrict SQL editor to one line

Visual transform maxes out at 4GB (and probably lower than that)

Because of Node's default 4GB limits and the 1GB string limit. Need to find some ways around this.

One solution would be to build a fuzzy JSON parser for large data files where individual rows are less than 1GB but there are millions/billions of rows. The fuzzy parser could be aware of only the outer array and pass the internal objects to regular JSON.parse.

Or maybe we could store arrays as json newline.

Add support for querying Cassandra

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Oracle query-statements that do not return rows shows an error although the query succeeds

For example running CREATE TABLE or INSERT will succeed but the panel will still show an error.

Column with name `alter` crashes Visual Transform panel

Array of
  Object with
    'baum_id' of
      number,
    'anlage' of
      string,
    'anlage_id' of
      number,
    'lateinischer_name' of
      string or
      null,
    'deutscher_name' of
      string or
      null,
    'alter' of
      number or
      null

Reported by schmidt_fu on Discord.

Missing integration test checklist

There are missing integration tests for:

Connecting to databases over SSH
Reading SQLite files over SSH (database panel)
Prometheus
Snowflake

Show a more useful error message when scripting with a language that isn't installed on the system

Right now this will just generate a stack trace. If instead all these errors are caught with an error registered in here then in the UI it will show up as just a message in a warning box not a stacktrace in an error box.

Hopefully, this can be easily integration tested in desktop/panel/program.test.js by running the program panel with PATH="" on the parent so that no programs can be found.

Go runner compatibility checklist

Databases

General

Test SQL over SSH Program panel

Table styles in dark mode is wrong

Add support for flattening nested objects in visual transform

Otherwise these fields are basically impossible to reach by SQL calls to DM_getPanel().

Add support for picking graph colors and setting project-wide theme

Expose DM_getPanelByRow(panelId, callback: row => void) instead of DM_getPanelFile

Keep the on disk format private.

Add autocomplete support to CodeEditor based on panel shapes

Each panel stores its resulting shape. This should be usable in CodeEditor components to fill autocomplete information from previous panel results.

Elasticsearch HTTPS Connection

Hi,

Upon attempting to query an ElasticSearch server configured to only support HTTPS (no protocol specified in the Host config value for the data source):

Error: 
[INFO] 2022-01-13T19:05:20.118Z DataStation Community Edition Panel Runner 0.5.0 DEBUG
[INFO] 2022-01-13T19:05:20.273Z Connecting to http://xyz.us-east-1.aws.found.io:9243 for elasticsearch query
[INFO] 2022-01-13T19:05:20.275Z Elasticsearch request: {"size":1000,"index":["par_document_promoted_stag"],"q":"","body":{}}
[ERROR] 2022-01-13T19:05:20.431Z Client sent an HTTP request to an HTTPS server.

Upon attempting to query an ElasticSearch server configured to only support HTTPS (https:// specified in the Host config value for the data source):

Error: 
[INFO] 2022-01-13T19:08:17.683Z DataStation Community Edition Panel Runner 0.5.0 DEBUG
[INFO] 2022-01-13T19:08:17.837Z Connecting to https://64c5f893d5fc400f9d3354820a0dbe81.us-east-1.aws.found.io:9243 for elasticsearch query
[INFO] 2022-01-13T19:08:17.838Z Elasticsearch request: {"size":1000,"index":["par_document_promoted_stag"],"q":"","body":{}}
[ERROR] 2022-01-13T19:08:17.950Z Client sent an HTTP request to an HTTPS server.

New project doesn't resize after creation on desktop

When you create a new project, it keeps the same size as the create project dialog window. It should grow to be the normal default size once the project is created.

The window size is defined here: https://github.com/multiprocessio/datastation/blob/master/desktop/project.ts#L103.

The make project handler is here: https://github.com/multiprocessio/datastation/blob/master/desktop/store.ts#L122.

Sign macOS builds

Simon Willison has some good notes here https://til.simonwillison.net/electron/sign-notarize-electron-macos.

I don't think this is something an external contributor can help with because most of it is me signing up for all the stuff and putting keys in Github Actions.

Shift highlight on text in code panel isn't highlighted (blends into background) in dark mode

Add support for querying AWS Athena

For a beginner read ARCHITECTURE.md, HACKING.md and git grep snowflake or git grep prometheus and copy the basics from one of these existing systems.

Add support for scatterplots

Unlike all the existing charts that graph a string (most likely) against a number, this graphs numbers on both axises. So in addition to the configuration changes needed for passing the right field to chartjs, the PR for this should also change the "preferred type" to "number" for the x axis when the chart type is scatter plot.

Long text in alerts can escape the UI

Investigate runner slowness

Right now the runner runs through Electron which makes panel evaluation take way longer than it needs to.

I did this in 0.2.0 because I wasn't sure if there was a real Node process bundled with Electron. But it is way too slow this way.

Also, breaking the runner into its own directory/package and bundling with pkg may significantly reduce the time to unzip on Windows since the deps will be bundled into a binary.

Link to Online Environment is broken

In the feature list of the datastation.multiprocess.io, the link to app.datastation.multiprocess.io os broken.

The href of the <a>-tag starts with https//app... (note the missing colon), and should be replaced with https://app...

Release 0.3.0 checklist

Definitely:

Systemd service for running the server and export crons
Server install instructions/documentation
TLS not required, migrations should be handled by the server
Builtin functions explainer is impossible to read
Copy datastation-documentation into site
Docs portion of site should have link back to repo and should have working header anchor/links

Maybe?

Fix up UI for scheduling and dashboards
Invalid dependent results in filter aggregate panel are not formatted correctly, whole stacktrace is shown
Changing tabs is slow
Server reloads a few times when loading onto a project

Bugs

Editing input text hangs pretty frequently, maybe on save

Draft of release notes

Dashboard mode
Scheduled email exports
Faster panel runs
Named DM_getPanel('my panel name') calls
All panels details are collapseable
Install scripts and built artifacts for server deploy
Graphing panel updates (size, unique color, line charts)

Test out coldbrew for load times compared to pyodide

Right now the inMemoryEval for Python uses pyodide. It takes 10-20 seconds to load though. All python panels fail while it's loading.

I just heard about https://github.com/plasticityai/coldbrew so it might be worth comparing load times of coldbrew and pyodide.

Importantly, you must still be able to pass JavaScript objects to the python panel reasonably and send Python objects to JavaScript reasonably (i.e. with DM_getPanel, DM_setPanel). Pyodide improved on Brython which didn't allow you to do obj['x'] in Python from DM_getPanel objects it only allowed you to do obj.x which is not very Pythonic so a bad user experience.

Error evaluating panel:
Error: [ERROR] 2021-12-04T19:23:16.794Z Cannot create a string longer than 0x1fffffe8 characters 
    at Buffer.toString (node:buffer:783:17)
    at evalProgram (C:\Users\philn\multiprocess\datastation\desktop\panel\program.ts:85:34)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at Object.handler (C:\Users\philn\multiprocess\datastation\desktop\panel\eval.ts:181:17)
    at main (C:\Users\philn\multiprocess\datastation\desktop\runner.ts:100:24)

    at ChildProcess.<anonymous> (C:\Users\philn\multiprocess\datastation\desktop\panel\eval.ts:120:18)
    at ChildProcess.emit (node:events:394:28)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:290:12)

multiprocessio / datastation Goto Github PK

datastation's People

Contributors

Stargazers

Watchers

Forkers

datastation's Issues

Draft of release notes

Recommend Projects

Recommend Topics

Recommend Org