nytlabs / st-core Goto Github PK
View Code? Open in Web Editor NEWa data flow graphical programming language for data science
License: Apache License 2.0
a data flow graphical programming language for data science
License: Apache License 2.0
there needs to be an example that shows how to control flow with gate and latch
assets used for display should either be embedded in the binary of placed in a path somewhere. st-core should not use rely on relative paths for assets.
an example that demonstrates blocking, what it is, and how to work with it. this should probably include several different circumstances for examples.
how pressure works in a chain of connected blocks.
There should be the option, even if it's not the default, to hide a group's internal connections. These are the ones that have been given a value, or are connected to other blocks in the same group. The reason is that the group, at the moment, is only doing half a job of simplifying the pattern. For example:
the ==publish
, the peek and shift
and the kvIncrement
groups aren't particularly complex, but look like these big fat ICs lying around in the pattern. While it's certainly better than them all being exposed, the ==publish
filter could just have two inputs and one out, the peek and shift
could be one out and two value connections, and the kvIncrement
could be two ins and two value connections.
A toolkit is an assembly of tools; set of basic building units for user interfaces.
an example that shows the ingest of some data, have the data split up and operated on through different means, and then joined at the end.
An inbound route should:
A path allows the user to specify which bit of a message to use in the block. Paths use go-fetch
by @nikhan which is a specific subset of jq/gojee that is only for access and not for manipulation at all. A path will default to .
which indicated the whole method.
Each route has a path which can be set by the UI. A path is always a string like ".foo[3].bar
, though the parsed fetch.Query
is actually stored in the Input
. Type checking of an input occurs once the query has been executed by way of type assertions inside the block.
The user should be able to set a value for any route. The value will be treated as satisfying that route, allowing the block to proceed when all other routes are satisfied. If a block has only one route, and that route's value is set, then the block will process that value continually until something downstream blocks it - effectively turning that block into a pusher block. Similarly if a block has multiple routes that are all set to values the block will operate on all those values together constantly.
In the background, the set values will appear to the block as messages that are always ready, by instantiating a small pusher goroutine against each route. The goroutine will run until a connection is made or until the value is changed. If the value changes a new goroutine will appear in its place.
The UI will allow the user to set the value as a JSON string, the API is responsible for parsing the string and sending to the Input's SetValue
method.
Internally to the block, values will be treated exactly the same as messages that came from another block. They still appear as Message
objects on an inbound Connection
. This keeps everything nice and simple.
blocks: operate on data
stores: store data
streams: events are delivered from or to
pollables: state is requested from elsewhere and turned into messages
This is to suggest how priority queues - an important language construct for dealing with streaming data - should appear in the language.
A priority queue is a store - and will live in memory in core
.
Highest priority is the lowest number in the queue.
It will have associated blocks:
Sample patterns to come below.
a single writer can possible be sent to multiple blocks at the same time. we need to double check that this is OK.
If errors are on a different output then
fix regression in #115
Each block has a field "Description". In Description is a short description of the block's function.
there is a memory race somewhere in the server with the websocket that sends state updates. it is causing failures with chrome, bytes out of order, frame problems, etc.
spending a lot of time trying to figure out what the hell it was I am doing. Need a way of leaving myself hints.
Groups should be unnamed until the user names them.
Sources should have their "route" be the name of the source, removing the need to name them, too.
need a palette of blocks, if for no other reason than I have completely forgotten which blocks we've made. I guess it should be generaliseable so we can paste new blocks into it, and they get stored somewhere for re-use.
/
: GET streamtools UI/library
: GET all possible blocks/version
: GET current version/status
: GET current status/import
: POST add collection of blocks/export
: GET recieve/blocks
: GET lists all blocks; POST adds new block; DELETE clears all blocks/blocks/{id}
: GET gets block; PUT updates block; DELETE deletes block/blocks/{id}/{route}
: POST send message to route;/ws/{id}
: output stream for block {id}/log
: ui/log websocket/connections
: GET lists all connections; POST creates connection/connections/{id}
: GET describes conneciton; DELETE deletes block'nuff said
A rather intense document that should cover how the following works, and how the following can work together:
There will be a core library with an absolute minimal number of blocks:
Additional libraries will exist in other repositories. Additional libraries can be imported by POSTing repo url to /library.
fix regression #115
A block that, when triggered, emits the current time as ms epoch
the canvas rebuild needs to be finished
streamtools needs a new splash page at nytlabs.github.com/streamtools that reflects the new hotness
fix regression #115
currently, groups do not save translations and do not have relative coordinates when grouping/ungrouping.
the latter then the former
need a block to round a number to the nearest integer
This issue is to describes those blocks we currently feel should be included in st-core. Each comment should describe a block or a group of related blocks.
Let's try and get some nice vocab together under one issue so we can resolve any strangenesses, unify as many concepts as we can, and get st-core's concepts into a really tight subset.
autocomplete box is dysfunctionally ugly. needs minimal styling.
there needs to be an example that details how information is put it and taken out of st-core
The following document intends to illustrate what Streamtools might look like if it was actually a language.
Every time we want to do some kind of operation that is not currently afforded by Streamtools we need to write Go, recompile, rerelease. This is especially annoying for situations where we are limited by the grammar of ST. The lack of modularity is also impacts people who may wish to curate their own work environment with blocks that are specific to their work.
There are several problems with how ST currently uses flow for control.
The inconsistent block API is product of inconsistent use of flow for control as well as inconsistent access to state. For instance, the cache
, set
, and histogram
blocks should be very closely coupled. They are also lacking, in different ways, many operations we expect from a key-value data structure.
Diagram 1
The above diagram represents a system where two messages enter the +
block to produce a sum. This layout is impossible in the current instantiation of ST without some kind of workaround since it is impossible for the 2 input streams to ever arrive at the same time. An example workaround would be to associate the message containing '1' with the message containing '2' with a delay to wait for the message that has yet to be arrived. Another workaround would be something like our join
block, where we assume order is correct.
This is undesirable, as it introduces all kinds of issues of synchronicity in the stream or unnecessary metadata embedded in each message. For instance, we could associate the wrong messages with one another or we could require a whole new system to deal with metadata in each message to resolve how the messages should be paired. Either way, this is added complexity that does not help in the development/understanding of a grammar consistent across blocks. This leads us to our first proposal:
This means that blocks like timeseries
which seem to allow asynchronous access (the pushing to the timeseries and the polling of a timeseries) are no longer valid. The funny thing is that this rule actually isn't changing how blocks currently work internally: currently each block does one operation at at time, all this idea does is express the synchronous aspect already internal to the block to the input routes.
What the above means is that all input routes block until all input routes have received a message. Once all routes have been satisfied the operation of the block is triggered.
The above suggestion is a bit at odds with our current instantiation of the idea of "rules" which are used to give parameters to a block. Many of our blocks have "rules" that persist throughout the duration the blocks life and never need to be changed. Which leads us to...
Diagram 2
Since our core blocks depend having all inputs satisfied once per block operation, this presents a problem when we have situations where we'd like a constant value to be applied as a parameter for a given block. We don't want to send one message per message in the above addition example if we want to add a constant to a stream.
Given the ability to configure how a route accepts and treats its value fixes this problem. In diagram 2, two modes of configuration are presented. The dialog on top is for a situation where a block is meant to accept a stream and retrieve a specific value from within an input object. In the case of the diagram, the route is configured accept the number for the +
block from {foo:{bar:{baz:}}}. This is a "Path" route.
The preserve value
checkbox is meant to satisfy the need when a block needs to be modified out-of-step with the other input stream. This means that if you have the left input stream running at 50hz, and you modify the parameter for the right stream at 1hz, that the value parameter is preserved for all messages until the right stream is updated again.
The bottom dialog in Diagram 2 presents an option to input JSON directly into the right stream input, so that it is always constant across operations. This is a "Constant" route. This provides us with 3 types of patterns:
Diagram 3
The middle pattern in Diagram 3 presents a problem: how can we guarantee two streams appear at that block in a way that is useful to us?
Diagram 4
Diagram 4 introduces some new blocks:
source
: let's consider this some magical place where messages come from.unpack
: given the key-path configured in the input route, emit 1 message per array element found at that key-path location on the left output stream. Before starting a stream of elements, it sends the number of elements that it will be unpacking out its right output.pack
: pack accepts a the length of elements expected to be packed into array in the right input. Once a value is accepted, the right input blocks until the left input has satisfied the correct length. Upon reaching the length, the array is emitted, and the right input becomes unblocked.set
: set accepts a JSON object on the right input. This JSON object serves as a template for which the values (left input) are inserted. If the right input contains the value {foo: ""} (the value can be anything, it will be overwritten), every message emitted will be in the format {foo: value}.merge
: like +
, merge requires an input on both left input and right input to commit an operation. merge
combines two maps, privileging the right input.This pattern takes advantage of blocking in order to maintain consistency across events. When a message is broadcast from source
, it is immediately sent to merge
which blocks source
from sending any more messages into the system. Next, a message is sent to unpack
, which immediately sends a message to pack
. pack
then blocks until the array length is satisfied, blocking unpack
from sending any more messages downstream. Each message is added to 1 and sent to pack
. Once pack
emits the array, unpack
is unblocked, and the message is set
into an object. This object is sent to merge
which unblocks source
.
NOTE: The right input for pack
may seem at odds with proposals 1 & 2. Instead of having a 1:1 relationship with its fellow left input, it has a 1:N relationship. Considering this block as a portal between many messages and a single message, I feel less icky about the situation.
b
Diagram 5
One of the deficiencies in our current implementation is that complex operations with state are rather difficult. This is because we can only access the state via one block -- a block that has one output. Determining what the output means in regards to the state can be rather tricky -- and has resulted in the inclusion metadata into our outputs.
One way to amend the current situation is to afford multiple outputs that signal various circumstances. Similar to the unpack
block in Diagram 4, various output routes can be added to afford the control of flow. For example, if a lookup query returns no result when burying a set, then instead of returning null (which is valid JSON that should be able to be included in a set) or nothing (which is unhelpful), we should send a signal out of a separate output channel, signaling that no message was found.
In addition to multiple outputs, another way to fix the problem of having to worry about how many different state queries work together is to divorce the state from the block. This amounts to having a completely new element in ST -- a data store.
The data store does not participate in message flow. It is a node that is associated with various blocks that act as an API to its contents. Diagram 5 illustrates what some of this blocks may look like. The "association" is illustrated as a route that exists on the side -- this is for illustration purposes only. There is no directionality when associating a block with a store.
This affords us the ability to run multiple control flows with a single data store and affords the construction of complex patterns.
Diagram 6
The above diagram illustrates a timeseries. values come in through source
, source
broadcasts one message to set
which puts the value into an object, like {"value":22}. source
also sends a message to now
which produces a time in epoch ms. This epoch ms is also set into an object, like {"time": 140092818110000}. The two are merged, and then pushed to an array. push
returns the new length of the array, and if the new length of the array is greater than our sample size (60), then pop an element off the array. If at any time we would like to view the timeseries, we send a message to dump
which does not interfere with our push/pop flow at all.
->Diagram 7<-
Building blocks is still not something I've entirely figured out. The above diagram displays how a map
would work. It's a lot more work, but the data is much more clearly expressed. One thing you'll see is that merge
has 3 inputs. I am of the mind that merge
should be a magic block that can allow infinite inputs. I imagine this UI-wise as just dragging the block wider.
Here are some scattered notes on how I think custom blocks should work:
Errata:
No more DSLs, for anything. All key paths are represented as JSON -- no go-fetch. I realized this at the very end here, but there is no reason to represent a key-path as a string. Just use a JSON object. Just imagine Diagram 2 has {foo:{bar{baz:""}}} instead of .foo.bar.baz.
A route's:
should be queryable.
when importing a pattern, the entire pattern should remain in an "stopped" state until the pattern has finished its import.
when panning inside a group, a block is created at a position not relative to the translated space. it should appear wherever the autocomplete box was invoked, taking into account the group translation.
For the life of me I can't think how to draw the cache block (or set or histogram or count etc) as a marble diagram (as in https://gist.github.com/staltz/868e7e9bc2a7b8c1f754). But, I reckon I could draw some diagrams that would describe a mechanism that implements the flow modification of a cache (if not the subsequent feedback):
I'm wondering then if it make sense to draw the diagram first, then write the block.
A kernel is the core function of a block. It is a function that processes a set of inputs to produce a set of outputs. It is escapable and alerts on error.
A typical block contains a single kernel, however, a user may be able to compose a set of kernels into a single block.
A block composed of multiple kernels:
This is different than what is proposed in the language spec (#8). That document proposes "blocks" that can be composed of multiple blocks -- however it does not say anything about composing a block's function. Because of the need to compose groups of blocks together, many of which may contain cycles, I propose that those are called something different -- like "groups"
A group:
We need to be able to release binaries for all platforms
a state is a marshalable variable that lives inside an instantiated block.
a block maintains multiple states.
each inbound route has an associated callback function.
each callback function can only see one input function.
each callback function can see all the states and all the outbound routes.
each state has an auto-generated accessor
fix regression from #115
because that would be lovely!
Code is here:
https://github.com/mikedewar/st-core/blob/master/thinking/api/main.go
The following test code:
curl localhost:7071/block -d'{"name":"A"}'
curl localhost:7071/block -d'{"name":"B"}'
curl localhost:7071/block -d'{"name":"C"}'
printf "\n"
printf "\n"
curl localhost:7071
printf "\n"
curl localhost:7071/group -d'{"ParentID":0, "ChildIDs":[1,2]}'
printf "\n"
printf "\n"
curl localhost:7071
returns
OKOKOK
0 -
1
2
3
OK
0 -
4-
1
2
3
No more query routes and rules.
Instead, every block has a number of inputs, a number of outputs, and a number of states.
Inputs
The only parameter an input has is path. This allows a user to specify where in the incoming JSON object to look. This path parameter is a simplified version of gojee, such that, it always returns a single value or it returns an error if a value is not found. It removes the capability of doing a query across array elements, such that .foo[].bar
is no longer a valid (.foo[0].bar
is still valid, however).
Instead of rules, there are a number of inputs. For instance: a count block would have a msg
input and a window
input. The window
does not accept rules, it accepts a stream of parameters that are used to effect the state of the block. These parameters would look like a stream of strings, 30s
.
(depends on #3)
State
Each block can have a number of states. These states can optionally have a pollable input pin/output pin emitter as well as an HTTP endpoint. The benefit of having a state over our current implementation is that it combines the functionality the production of a message for an HTTP endpoint as well as for the rest of the ST system, meaning a block author does not need to write multiple bits of code that do the same thing.
Outputs
Instead of having a single output, a block can have multiple outputs. These should be able to be named.
fix regression #115
if you double click on a group, it should show you what's inside
since the reboot, blocks don't have names. It would be ace to be able to see which blocks are which.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.