maxiv-kitscontrols / web-maxiv-hdbppviewer Goto Github PK

View Code? Open in Web Editor NEW

3.0 11.0 7.0 5.19 MB

A web based viewer for HDB++/Cassandra data. Project hosted on MAX IV internal GitLab.

License: GNU General Public License v3.0

Python 26.31% JavaScript 64.76% CSS 6.03% HTML 2.26% Dockerfile 0.56% Makefile 0.09%

web-maxiv-hdbppviewer's Introduction

Introduction

This is a web based viewer for HDB++ archive data, currently only supporting the Cassandra backend.

It is currently in a "beta" stage, with basic functionality in place but very limited testing. Bug reports are welcome!

Features

Basic functionality

Searching for stored attributes
Selecting which attributes to add
Free scrolling/zooming the time scale in the plot
Two separate Y axes (no hard restriction, but needs UI)
Y axes autoscale
Encodes current view in URL (e.g. for saving as a bookmark)
Display min/max etc. on mouseover
Linear and logarithmic Y axes
Cache database queries in memory.

Missing functionality

Configure color, Y-axis etc for each line
Periodical updates
Display attribute configuration
Display errors
"Special" datatypes: String, Boolean, State, Spectrum, ...
Cassandra authentication (?)
General robustness
Allow downloading "raw" data
Displaying data as a table
Manual scaling of Y axes.
Rescale the UI when the window size changes
Handling different keyspaces

Improvements needed

Optimize data readout and processing
UI is pretty basic
Mouseover stuff is a mess
Server configuration
Not sure about the url hash json stuff...

Ideas

Use websocket to send data incrementally?
Use canvas for plotting
Now re-loads the view each time anything changes, maybe possible to be smarter here?
Would it be useful (or just confusing) to allow more than two Y-axes?
Other ways of browsing for attributes; e.g. a tree?
Mobile optimized view? The plot actually works pretty well on a mobile screen, but the rest is unusable as it is.

Requirements

Note: the repo includes a Dockerfile, that can be used to build a docker container image for easy deployment together with all dependencies. Have a look in the file for instructions, you will probably need to modify some things to suit your needs.

Python (for running)

python >= 3.5
aiohttp
aiohttp_cors
aiohttp_utils
cassandra-driver >= 3.6 (needs to be built with numpy support!)
datashader

Datashader has a bunch of scientific python dependencies, the easiest way to get it is probably through anaconda (https://www.continuum.io/downloads)

Javascript (for frontend development)

node.js
npm

Cassandra

You also, obviously, need to have access to a Cassandra installation somewhere, containing HDB++ formatted data.

Building

The frontend is written using babel, react and redux and managed with webpack. To build it, the following steps should work:

$ npm install
$ webpack

Docker Image build

Before building the docker image Please VERSION your release by editing the VERSION variable in the Makefile. Then you can build by simply executing:

$ make build

and publish your image:

$ make publish

Configuration

By default, the server will load the config file "hdbppviewer.conf". It contains some example configuration and comments. You can create your own configuration file and put it wherever you like, and point the server to it using the "-c" argument.

Running

$ python server.py

Then point a web browser at http://localhost:5005/index.html

Local installation

To disable the database, comment out the HDBPlusPlusConnection in server.py and put 'hdbpp = None' instead. Now the project (only the front-end) can be compiled and served locally. Currently we need to run 2 processes/commands simulatenously, one will re-generate bundle.js and one will serve the bundle.js, the commands are

npm run dev
npm run watch You can run these commands in separate terminals (until we come up with one command that do both parts)

Querying

It's also possible to access data from the server in JSON or CSV format, e.g. using httpie from the commandline. Searching for attributes matching a given string (may be a regex):

$ http --json POST localhost:5005/search target=mag cs='my.control.system:10000'

Get data for a given period of time for one or more attributes, resampled to 5m intervals, in CSV format:

$ http --json POST localhost:5005/query targets:='[{"target": "r3/mag/dia-01/current", "cs": "my.control.system:10000"}]' range:='{"from": "2017-06-16T15:00:00", "to": "2017-06-16T15:30:00"}' interval=5m Accept:text/csv

This API is intentionally compatible with Grafana, making it easy to create a Grafana plugin for HDB++ data (in progress).

web-maxiv-hdbppviewer's People

Contributors

Stargazers

Watchers

Forkers

antoinedupre 13bscsaamjad johanfforsberg initfve gandorm muhammad-saad-maxiv beenje

web-maxiv-hdbppviewer's Issues

re-format the downloaded csv file

Download raw data as csv file

autoscaling all axis

add a button for this

Plot configuration

The user should be able to configure some things about the plot, most importantly:

Line colors (widths?)
Y axis limits
It should be possible to hide/show individual lines without having to add/remove them from the plot

Zoom out crash?

If the user zooms out too quickly way too many requests are made, and it becomes too slow. So limit the rate of scroll events somehow

Display "events"

The plot should show important archiving "events" such as when an attribute was started/stopped, added, removed, paused etc. This should be helpful to differ between errors and manual actions.

Range inputs values

Initialize the inputs values with the ranges that come from the current plots.

Markers for curves

allow different markes (square, dot...) for a curve, selectable by the user.

Goal: differentiate better each plot

Display errors

The plot needs to somehow indicate where there were errors recorded.

Horizontal and vertical difference

the user should be able to define two points on the canvas and the interface should tell the horizontal and/or vertical difference

More y-axes

A request from the users is to be able to let each attribute have its own Y axis, so that they can all be scaled independently. This is very useful in order to see how lots of different things correlate in time. The epics "striptool" https://epics.anl.gov/EpicsDocumentation/ExtensionsManuals/StripTool/StripTool.html#GraphWindow that was used at the old MaxLab has been brought up as an example.

Today the viewer has only two independent Y scales, right and left and each attribute is displayed on one of them. Technically there should be no problem to support more Y axes, it's mostly a matter of UI and how to present it. Displaying many axes at the same time doesn't seem like a good idea since it becomes a mess visually. The striptool apparently only displays one Y axis at a time, depending on which attribute is selected.

I think this would require some changes in the UI but it would be a powerful feature.

Mouseover information

The mouse cursor should reveal more specific information about the closest line, such as the name of the attribute, value (min/max/avg?) at the point, etc.

Logging

Add some kind of logging mechanism into a file (or into kibana)

Download image seems broken

When clicking the button I see the following in the console:

Uncaught TypeError: can't access property 0, _this3.plot.svg._groups is undefined onClick webpack:///./js/plotwrapper.js?:139

Color selection improvements

Show currently selected color next to 'pick color'
An 'ok' button for closing the color picker panel
Be able to change the color for already existing plots, be aware that the user may want to re-color several plots, so keep the changes onhold and apply them via a new button
reset color selection when loading attribute list
When the user does not select a color, display as well what will be the default displayed color

The user can break the server

It's pretty easy for a user to basically break the server by e.g. zooming out a lot. I think the main reason is that there's no limitation on the number of pending queries to the DB, which means that they will queue up and the server will have too much to do for a long time, possibly also running out of memory.

Not sure what's the best way to deal with this. Perhaps simply keeping track of each user session to prevent queuing up new queries until the previous one completed. This is not quite trivial since it's important that the user at least eventually gets the latest data requested. Maybe keeping track of the "current" and the "latest" request per user session, and whenever the "current" is complete, if no "latest" request exists we return the result, otherwise discard it and process the "latest". If a new request comes in before the "current" request is completed, we just replace the "latest" with the new. As far as I can tell, it's not really possible to manually abort a Cassandra query that is already running in the cluster.

Attribute list saving

like the old archiver, the user should be able to save the currently displayed attrs and later recover. Everyone sees everyone else's list

Handle dates with timezone

The backend currently gives an error if a query specifies a time with timezone.
'from':"2021-02-16T13:55:53Z" does not work, but 'from':"2021-02-16T13:55:53" does.
Specifying a timezone results in an exception from Pandas about comparing dates with and without timezone.

Plot drawing method

I'd like to discuss the pros and cons of the current method of drawing, now that the application has been in use for a while. Feel free to add your observations and opinions to this issue. If the drawbacks are too large, we should start thinking about a new solution.

The plots are currently completely drawn on the server, as an image in whatever size the client requests, and sent to the client as a single base64 encoded PNG. To do this we use datashader (http://datashader.org/).

Advantages

Datashader is able to plot huge datasets without downsampling. By this I mean that all points are always drawn no matter the total number of points. For example, a single pressure spike lasting less than a second will be visible even if you plot an entire year of otherwise even pressure. Downsampling data is a very tricky subject, so I think this is a huge practical advantage and it's the main reason I went with this method.
A very important factor is how many points we can expect to plot. As a start, let's consider an attribute that is stored once per second. This means ~2.5 million points in a month. I think this is not a very unusual case, e.g when looking at a long term trend. So we at least should aim to handle tens of millions of points routinely. Datashader is advertised as handling datasets of hundreds of millions of points or more and so far I think it has handled things very well. It should even be possible to distribute datashader in a computation cluster (using "dask") if performance is not good enough.

Disadvantages

Drawing everything on the server means that changing simple properties of the plot (e.g. line color) requires redrawing the entire image. This is probably solvable without changing the whole architecture (see #17) but it will complicate things a bit.
We can't use any of the existing third party JS plotting libraries since they depend on getting point data (we do use D3, but only for axis drawing and such).
The plots don't look great; no antialiasing on lines, and no line styles. This can be a real problem for people with color blindness.
There's no straightforward way to cache data in the client, since images need to be redrawn when the Y axis changes.

Other notes

The images may be fairly large, but not huge; according to some quick measurements, realistic data at HD resolution will typically transfer ~50-100 kb per attribute plotted, with compression. Not sure it would be that much more efficient to send raw data points though. Also, the size of the images is essentially independent of the size of the raw data, so plotting a million points can take the same bandwidth as a thousand. I.e. the bandwidth usage is limited.

Alternative solutions

I haven't checked the options during the last year so I may be missing some important ones, but to me there are two main options:

Bokeh https://bokeh.pydata.org/en/latest/ is a library that basically solves the same problem, of having large datasets on the server and plotting them in the browser. I started the first prototype of the HDB++ viewer using bokeh, but I ran into some problems with updating data that made me change, I don't remember exactly why. However, bokeh has developed a lot since then, and is definitely worth looking into again. If it worked it could simplify both server and client code.
Implementing our own way of downsampling/compressing data, sending it to the client and drawing it with a third party javascript plotting library. Probably tricky, but not impossible.

Dropdown list with attr categories

provide a predefined list (patterns) such as 'VAC', 'CRY', 'FLOW', 'TEMP'... so it populates the attr list only with the relevant params. This is for easy of access, it can be done the same via pattern

Improve curve differentiation

both for the live view as well as for the downloaded png

clear plot

add a button for this, clears everything instead of going one by one and clicking remove

Downloaded image extras

Currently the downloaded image does not contain any legend/label so it is impossible to know which axis is what.

TODO: add a legend on the plot
TODO: rename the downloaded file, provide a default name (somethihg + date), but let the user change it
TODO: check axis background colour

Display write values

The plot should also contain write values for R/W attributes.

This should be pretty easy, since the data and the plotting mechanism is already there. However it's not obvious visually how to do this without it becoming confusing. Maybe the line for the write values should be shown with the same color but "dotted" or 0.5 alpha or something, to make it clear what's what.

Axis decimals

Fix the decimals, it happens that several ticks can have the same number (due to rounding), be smart on this.

Display attribute configuration

The database contains the attribute settings, such as label, unit, etc. This information should be made available in the user interface somehow.

Auto Update the view

to have a selectable option of auto update the view in real time (same as in the old archiver).

Split plot image into one per attribute

This is a internal change that would make it easier to solve some practical problems (related issues #17, #28, #29).

The idea is to stop rendering all attributes into one image on the server, and instead make one separate image per attribute. This way, the client can choose how to draw each attribute (by some clever canvas operations), and e.g. changing the color of one won't require fetching data from the server. It would also be possible to do other tricks like drawing the line several times, to suggest increased line thickness.

Note: this may seem like a waste of bandwidth, but due to the way PNG compression works, it makes a surprisingly small difference if it's one image of N colors or N images of one color each. At least some initial measurements suggest that the difference is not significant.

Display server status

Notify the user about the status of the server: 'BUSY', 'PROCESSING_REQUEST', 'DOWN'...