slac-lcls / ami Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Enabling/Disabling nodes in the gui should add/remove them from the graph since we no longer push the entire graph and only push updates.
Linregress should handle floats and arrays. Right now it cant and crashes when you try to connect a PickN into it.
AMI crashes when detector list changes in psana.
Support blob finder in AMI. Make example with 2D histogram of x,y positions using tmoopal in exp=tmoc00118,run=13.
Could we support detector calibration parameters like common-mode in the "red box"?
Would be nice to access ami plots in a browser and/or Jupyter.
We could see this in the prometheus plots: after we replaced a complex graph with a simple one (with a single hsd red-box display) the global collector continued to use the same significant cpu time of the complex graph. Restarting ami from scratch returned to global collector cpu usage to near zero.
In a python editor box would be nice if user could get output back to the GUI with a function call (e.g. for debug)
When an external library is loaded in AMI, if the library is modified, there is no way to reload it.
If a user clicks "Manage Libraries" and then selects the library file again, the file is not really reloaded, and the old content of the file is shown in the "Manage Libraries" window
We should investigate using graphtik instead of networkfox: https://github.com/pygraphkit/graphtik
It has a much more sophisticated graph execution planner.
How to handle slow collector code that introduces plot latency.
When a box has an exception it is turned red, but the line is quite thin so the users miss it. Also, it is not clear which messages in the Status window are significant errors. Would be good if both of these could be made more visible to the users.
From Xiang: when hovering over a 1D or a 2D plot with the mouse it would be nice to be able to see the coordinates (using the axis scale).
When running from shared memory Average1d of an HSD blows up. This is not reproducible when running offline.
if one does a time history plot of a variable from multiple workers the data does not get properly put into time-order.
Worked with @valmar and @xianglgithub had inadvertently included these two lines in his .fc file:
"library": {
"paths": [
"/cds/home/x/xiangli/ami/ami_tmo/TMO.py",
"/cds/group/pcds/dist/pds/tmo/scripts/TMO/TMO.py"
]
},
We think if these had identical function definitions then an error is thrown but the user doesn't see it: the only symptom is that the plots don't show up.
I believe two ami-local processes on one node fight for a zmq port. See below.
psanagpu106:$ ps -ef | grep ami-local$
cpo 1631 1552 0 18:45 pts/20 00:00:00 grep --color=auto ami-local
roibabar 30426 21928 0 17:05 pts/42 00:00:04 /cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/bin/python3.7 /cds/home/opr/rixopr/git/lcls2_032421/install/bin/ami-local -l xqfs.fc
psanagpu106:
psanagpu106:~$ cd git/lcls2
psanagpu106:lcls2$ source setup_env.sh
(ps-4.3.0) psanagpu106:lcls2$ ami-local -l rix_width.fc -b 1 -f interval=.3 psana://exp=rixx43518,run=45
No module named 'ndarray'
No module named 'pyfftw'
[ 2021-03-30 18:45:41,969 | ami.worker | INFO ] Starting worker # 0, sending to collector at tcp://127.0.0.1:5557 PID: 2651
[ 2021-03-30 18:45:41,973 | ami.collector | INFO ] Starting collector on node # 0 PID: 2652
Process nodecol-n0:
[ 2021-03-30 18:45:41,981 | ami.collector | INFO ] Starting collector on node # 0 PID: 2663
[ 2021-03-30 18:45:41,982 | ami.comm | INFO ] worker000: Started Prometheus client on port: 9205
Process globalcol:
[ 2021-03-30 18:45:41,992 | ami.manager | INFO ] Starting manager, controlling 1 workers on 1 nodes PID: 2669
Process manager:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/cds/home/c/cpo/git/ami/ami/multiproc.py", line 33, in run
self._target(*self._args, **self._kwargs)
File "/cds/home/c/cpo/git/ami/ami/multiproc.py", line 33, in run
self._target(*self._args, **self._kwargs)
File "/cds/home/c/cpo/git/ami/ami/multiproc.py", line 33, in run
self._target(*self._args, **self._kwargs)
File "/cds/home/c/cpo/git/ami/ami/local.py", line 172, in _sys_exit
sys.exit(func(*args, **kwargs))
File "/cds/home/c/cpo/git/ami/ami/local.py", line 172, in _sys_exit
sys.exit(func(*args, **kwargs))
File "/cds/home/c/cpo/git/ami/ami/local.py", line 172, in _sys_exit
sys.exit(func(*args, **kwargs))
File "/cds/home/c/cpo/git/ami/ami/collector.py", line 186, in run_global_collector
hutch)
File "/cds/home/c/cpo/git/ami/ami/manager.py", line 666, in run_manager
hutch) as manager:
File "/cds/home/c/cpo/git/ami/ami/collector.py", line 171, in run_node_collector
hutch)
File "/cds/home/c/cpo/git/ami/ami/collector.py", line 154, in run_collector
prometheus_dir, hutch) as collector:
File "/cds/home/c/cpo/git/ami/ami/manager.py", line 49, in init
super().init(results_addr, hutch=hutch)
File "/cds/home/c/cpo/git/ami/ami/collector.py", line 154, in run_collector
prometheus_dir, hutch) as collector:
File "/cds/home/c/cpo/git/ami/ami/collector.py", line 23, in init
Collector.init(self, collector_addr, ctx=self.ctx, hutch=hutch)
File "/cds/home/c/cpo/git/ami/ami/comm.py", line 897, in init
self.collector.bind(addr)
File "/cds/home/c/cpo/git/ami/ami/collector.py", line 23, in init
Collector.init(self, collector_addr, ctx=self.ctx, hutch=hutch)
File "/cds/home/c/cpo/git/ami/ami/comm.py", line 897, in init
self.collector.bind(addr)
File "/cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/lib/python3.7/site-packages/zmq/sugar/socket.py", line 172, in bind
super().bind(addr)
File "/cds/home/c/cpo/git/ami/ami/comm.py", line 897, in init
self.collector.bind(addr)
File "/cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/lib/python3.7/site-packages/zmq/sugar/socket.py", line 172, in bind
super().bind(addr)
File "zmq/backend/cython/socket.pyx", line 540, in zmq.backend.cython.socket.Socket.bind
File "/cds/sw/ds/ana/conda2/inst/envs/ps-4.3.0/lib/python3.7/site-packages/zmq/sugar/socket.py", line 172, in bind
super().bind(addr)
File "zmq/backend/cython/socket.pyx", line 540, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
File "zmq/backend/cython/socket.pyx", line 540, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use
zmq.error.ZMQError: Address already in use
zmq.error.ZMQError: Address already in use
When changing the number of points in a timeplot the plot doesn't seem to always listen to the changed setting. I believe Dan D. has also seen this.
Reuse this blob-finder code: https://github.com/slac-lcls/lcls2/blob/master/psana/psana/peakFinder/find_blobs.py
Add an image viewer to the data sources in the GUI
Add size of transfers and latency to grafana
Ming-Fu says these should be in log files /reg/g/pcds/pds/tmo/logfiles/2020/10/manager between 3am and 6am on Oct. 18, although I grep'd for "except" and only found these two (at earlier times):
(ps-3.1.16) tmo-daq:scripts> grep -i except /reg/g/pcds/pds/tmo/logfiles/2020/10/manager
/reg/g/pcds/pds/tmo/logfiles/2020/10/17_14:34:14_drp-neh-cmp011:ami-manager.log:During handling of the above exception, another exception occurred:
/reg/g/pcds/pds/tmo/logfiles/2020/10/17_14:34:14_drp-neh-cmp011:ami-manager.log:During handling of the above exception, another exception occurred:
/reg/g/pcds/pds/tmo/logfiles/2020/10/17_23:48:20_drp-neh-cmp011:ami-manager.log:During handling of the above exception, another exception occurred:
/reg/g/pcds/pds/tmo/logfiles/2020/10/17_23:48:20_drp-neh-cmp011:ami-manager.log:During handling of the above exception, another exception occurred:
We need to be able to load saved graphs and execute them for regression tests.
When values change drastically on a scalarplot you can see when different workers process events. This may be a time order problem in psana.
It looks like events/second is calculated incorrectly in some circumstances. Saw 7217 just now and other large numbers? This was with 3 meb nodes and 4 cores per node.
The events/sec label is missing when running with procmgr.
Would be nice to dump a snapshot of the AMI plot data to a file (e.g. .npz) for later analysis.
Its possible for one client to crash AMI when it pushes graph changes.
Show a pop up when failing to apply graph changes.
Plots should reset on new run with psana
Make terminals on boxes larger.
Prefer text in gui instead of icons.
WindowAverage uses Pick-N to average arrays over a time window. This box would average over all arrays since the beginning of the run (so local worker nodes could keep their own sums and event-counts).
Add an apply button on configuration windows in gui.
AMI is still crashing when it sees missing data. Look at psana://exp=tmox43218,run=21 and the attached graph as an example.
There are performance issues with asyncqt and pyzmq. We should investigate moving away from asyncqt: zeromq/pyzmq#1406
Its possible to place boxes on top of each other in the gui. This should not be allowed.
Print a message in the status window when saving plots.
we should have a status window that shows how many graphs it has, which processes have connected, and other useful information.
Would be useful to have subgraphs of boxes.
Use prometheus for monitoring.
Right now it's not clear to users that Sum (for example) is a local operation but Binning triggers the collectors. Maybe we put "global reduction" operations in a separate category in the operations menu?
It would be nice if the manager saved purged graphs so you can get them with ami-console and debug.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.