bluesky / databroker Goto Github PK

Unified API pulling data from multiple sources

Home Page: https://blueskyproject.io/databroker

License: BSD 3-Clause "New" or "Revised" License

Python 99.81% Shell 0.13% Dockerfile 0.06%

databroker's Introduction

Bluesky — An Experiment Specification & Orchestration Engine

Source	https://github.com/bluesky/bluesky
PyPI	`pip install bluesky`
Documentation	https://bluesky.github.io/bluesky
Releases	https://github.com/bluesky/bluesky/releases

Bluesky is a library for experiment control and collection of scientific data and metadata. It emphasizes the following virtues:

Live, Streaming Data: Available for inline visualization and processing.
Rich Metadata: Captured and organized to facilitate reproducibility and searchability.
Experiment Generality: Seamlessly reuse a procedure on completely different hardware.
Interruption Recovery: Experiments are "rewindable," recovering cleanly from interruptions.
Automated Suspend/Resume: Experiments can be run unattended, automatically suspending and resuming if needed.
Pluggable I/O: Export data (live) into any desired format or database.
Customizability: Integrate custom experimental procedures and commands, and get the I/O and interruption features for free.
Integration with Scientific Python: Interface naturally with numpy and Python scientific stack.

Bluesky Documentation.

The Bluesky Project enables experimental science at the lab-bench or facility scale. It is a collection of Python libraries that are co-developed but independently useful and may be adopted a la carte.

Bluesky Project Documentation.

See https://bluesky.github.io/bluesky for more detailed documentation.

databroker's People

Contributors

Stargazers

Watchers

Forkers

danielballan tacaswell chiahaoliu cj-wright mrakitin licode stuwilkins stuartcampbell ronpandolfi cadair jackrblack adolfoeliazat jorgediazjr ke-zhang-rd gwbischof australiansynchrotron zyzelda hz-b malitsky russberg klauer dylanmcreynolds mikehart85 cryos jklynch untzag ericdill claracoo abbygi jmaruland dmgav maffettone mihir008 junaishima sunn23 clintonroy kadykov subindevs btroidl hyperrealist kezzsim padraic-shafer

databroker's Issues

Control flow in broker.py needs a rethink

There is a code path where selected_fields is an unbound local. attn @hhslepicka

db.call return a result proxy?

It's becoming pretty common for queries to return large results sets that take a long time to return. We might want to change db.__call__ to return a result proxy that needs to be fetched like db(**query).fetch(). In early days we were reluctant to do that in the original pass because we were getting pushback on the verbosity of the API, but ultimately I think that was the wrong call. Returning a result proxy would make things fast and also allow us to incorporate things like limit which are hard to fit into our elided query API.

I think we can leave db.__getitem__ as it is because those query results sets scale with the size of the input argument. Since that's the API that people use the most, this might not even break very much user code. Of course it will break some.

Headers don't pretty print

It seems that ipython auto pretty print broke in the move to Header objects.
Also when print(hdr) is called I get back 3 printed copies.

In [108]: print(hdr)

header
======
  
  EventDescriptor
  ---------------
  configuration   :
        pe1_number_of_se: 1                                       
        pe1_images_per_s: 600.0                                   
          source          : SIM:pe1_number_of_sets                  
          shape           : []                                      
          dtype           : number                                  
          source          : SIM:pe1_images_per_set                  
          shape           : []                                      
          dtype           : number                                  
        pe1_number_of_se: 1475709371.6344597                      
        pe1_images_per_s: 1475737775.2205236                      
  +------------------+--------+------------+------------------+-------------+-----------+-----------------+-------------------------------------------+-------+------------------+
  | data keys        | dtype  |  external  | lower_ctrl_limit | object_name | precision |      shape      |                   source                  | units | upper_ctrl_limit |
  +------------------+--------+------------+------------------+-------------+-----------+-----------------+-------------------------------------------+-------+------------------+
  | pe1_image        | array  | FILESTORE: |                  |     pe1     |           | [2048, 2048, 0] |         PV:XF:28IDC-ES:1{Det:PE1}         |       |                  |
  | pe1_stats1_total | number |            |       0.0        |     pe1     |     0     |        []       | PV:XF:28IDC-ES:1{Det:PE1}Stats1:Total_RBV |       |       0.0        |
  +------------------+--------+------------+------------------+-------------+-----------+-----------------+-------------------------------------------+-------+------------------+
  name            : primary                                 
  object_keys     :
    pe1             : ['pe1_image', 'pe1_stats1_total']       
  time            : 1475737908.980231                       
  uid             : cde61981-7ae0-4761-9f4b-f1e10a39b1b4    
      
      RunStart
      ````````
      beamline_id     : xpd                                     
      bkgd_geometry   : kapton_bkgd_1mmOD                       
      bt_experimenters: ['Maxwell', 'Terban', 'Bertolotti', 'Federica', 'Chia-Hao', 'Liu']
      bt_piLast       : Billinge                                
      bt_safN         : 300874                                  
      bt_uid          : bbbf141c                                
      bt_wavelength   : 0.183463                                
      calibration_coll: 14301160-06eb-4807-b346-58ddaf8c80f1    
      calibration_md  :
        time            : 20161005-1911                           
        rot2            : -0.0005143679573110852                  
        tilt            : 0.5107862954405932                      
        directDist      : 211.11527183961394                      
        pixelY          : 200.0                                   
        rot1            : 0.008900051768602826                    
        splineFile      : None                                    
        tiltPlanRotation: -176.69229377493303                     
        pixel1          : 0.0002                                  
        poni1           : 0.20874577192878496                     
        centerX         : 996.7282173197707                       
        rot3            : 1.3330806893924516e-08                  
        dist            : 0.21110688265045421                     
        wavelength      : 1.83463e-11                             
        pixelX          : 200.0                                   
        pixel2          : 0.0002                                  
        poni2           : 0.20122455525862507                     
        detector        : Perkin detector                         
        centerY         : 1043.1859050122002                      
        file_name       : pyFAI_calib_Ni_20161005-1911            
      cif_name        : ['nan']                                 
      collaborators   : ['nan']                                 
      database_id     :
        nan             : N/A                                     
      detectors       : ['pe1']                                 
      group           : XPD                                     
      lead_experimente: ['Federica', 'Bertolotti']              
      my_field_name   : nan                                     
      notes           : Solvent                                 
      num_steps       : 36                                      
      owner           : xf28id1                                 
      plan_args       :
        num             : 36                                      
        detectors       : ["PerkinElmerContinuous(prefix='XF:28IDC-ES:1{Det:PE1}', name='pe1', read_attrs=['tiff', 'stats1'], configuration_attrs=['images_per_set', 'number_of_sets'])"]
      plan_name       : count                                   
      plan_type       : generator                               
      sa_uid          : 7f5a142f                                
      sample_composition:
        H               : 8.0                                     
        C               : 7.0                                     
      sample_maker    : ['nan']                                 
      sample_name     : toluene                                 
      sample_phase    :
        C7H8            : 1.0                                     
      sc_dk_field_uid : 09c6ef6c-c91c-45a5-9093-48a1432e674f    
      scan_id         : 93                                      
      sp_computed_expo: 60.0                                    
      sp_num_frames   : 600.0                                   
      sp_plan_name    : tseries                                 
      sp_requested_exp: 60                                      
      sp_time_per_fram: 0.1                                     
      sp_type         : tseries                                 
      sp_uid          : 673f5786-8a34-4876-906b-961e306de9fa    
      tags            : ['blank']                               
      time            : 1475737844.1481683                      
      uid             : dff9ca9d-906d-40d6-911f-6fc2d8d47677    
  
  RunStart
  --------
  beamline_id     : xpd                                     
  bkgd_geometry   : kapton_bkgd_1mmOD                       
  bt_experimenters: ['Maxwell', 'Terban', 'Bertolotti', 'Federica', 'Chia-Hao', 'Liu']
  bt_piLast       : Billinge                                
  bt_safN         : 300874                                  
  bt_uid          : bbbf141c                                
  bt_wavelength   : 0.183463                                
  calibration_coll: 14301160-06eb-4807-b346-58ddaf8c80f1    
  calibration_md  :
    time            : 20161005-1911                           
    rot2            : -0.0005143679573110852                  
    tilt            : 0.5107862954405932                      
    directDist      : 211.11527183961394                      
    pixelY          : 200.0                                   
    rot1            : 0.008900051768602826                    
    splineFile      : None                                    
    tiltPlanRotation: -176.69229377493303                     
    pixel1          : 0.0002                                  
    poni1           : 0.20874577192878496                     
    centerX         : 996.7282173197707                       
    rot3            : 1.3330806893924516e-08                  
    dist            : 0.21110688265045421                     
    wavelength      : 1.83463e-11                             
    pixelX          : 200.0                                   
    pixel2          : 0.0002                                  
    poni2           : 0.20122455525862507                     
    detector        : Perkin detector                         
    centerY         : 1043.1859050122002                      
    file_name       : pyFAI_calib_Ni_20161005-1911            
  cif_name        : ['nan']                                 
  collaborators   : ['nan']                                 
  database_id     :
    nan             : N/A                                     
  detectors       : ['pe1']                                 
  group           : XPD                                     
  lead_experimente: ['Federica', 'Bertolotti']              
  my_field_name   : nan                                     
  notes           : Solvent                                 
  num_steps       : 36                                      
  owner           : xf28id1                                 
  plan_args       :
    num             : 36                                      
    detectors       : ["PerkinElmerContinuous(prefix='XF:28IDC-ES:1{Det:PE1}', name='pe1', read_attrs=['tiff', 'stats1'], configuration_attrs=['images_per_set', 'number_of_sets'])"]
  plan_name       : count                                   
  plan_type       : generator                               
  sa_uid          : 7f5a142f                                
  sample_composition:
    H               : 8.0                                     
    C               : 7.0                                     
  sample_maker    : ['nan']                                 
  sample_name     : toluene                                 
  sample_phase    :
    C7H8            : 1.0                                     
  sc_dk_field_uid : 09c6ef6c-c91c-45a5-9093-48a1432e674f    
  scan_id         : 93                                      
  sp_computed_expo: 60.0                                    
  sp_num_frames   : 600.0                                   
  sp_plan_name    : tseries                                 
  sp_requested_exp: 60                                      
  sp_time_per_fram: 0.1                                     
  sp_type         : tseries                                 
  sp_uid          : 673f5786-8a34-4876-906b-961e306de9fa    
  tags            : ['blank']                               
  time            : 1475737844.1481683                      
  uid             : dff9ca9d-906d-40d6-911f-6fc2d8d47677    
  
  RunStop
  -------
  exit_status     : success                                 
  time            : 1475761955.610756                       
  uid             : 1291a1a1-3334-444b-b8e7-ef718713c93f    
      
      RunStart
      ````````
      beamline_id     : xpd                                     
      bkgd_geometry   : kapton_bkgd_1mmOD                       
      bt_experimenters: ['Maxwell', 'Terban', 'Bertolotti', 'Federica', 'Chia-Hao', 'Liu']
      bt_piLast       : Billinge                                
      bt_safN         : 300874                                  
      bt_uid          : bbbf141c                                
      bt_wavelength   : 0.183463                                
      calibration_coll: 14301160-06eb-4807-b346-58ddaf8c80f1    
      calibration_md  :
        time            : 20161005-1911                           
        rot2            : -0.0005143679573110852                  
        tilt            : 0.5107862954405932                      
        directDist      : 211.11527183961394                      
        pixelY          : 200.0                                   
        rot1            : 0.008900051768602826                    
        splineFile      : None                                    
        tiltPlanRotation: -176.69229377493303                     
        pixel1          : 0.0002                                  
        poni1           : 0.20874577192878496                     
        centerX         : 996.7282173197707                       
        rot3            : 1.3330806893924516e-08                  
        dist            : 0.21110688265045421                     
        wavelength      : 1.83463e-11                             
        pixelX          : 200.0                                   
        pixel2          : 0.0002                                  
        poni2           : 0.20122455525862507                     
        detector        : Perkin detector                         
        centerY         : 1043.1859050122002                      
        file_name       : pyFAI_calib_Ni_20161005-1911            
      cif_name        : ['nan']                                 
      collaborators   : ['nan']                                 
      database_id     :
        nan             : N/A                                     
      detectors       : ['pe1']                                 
      group           : XPD                                     
      lead_experimente: ['Federica', 'Bertolotti']              
      my_field_name   : nan                                     
      notes           : Solvent                                 
      num_steps       : 36                                      
      owner           : xf28id1                                 
      plan_args       :
        num             : 36                                      
        detectors       : ["PerkinElmerContinuous(prefix='XF:28IDC-ES:1{Det:PE1}', name='pe1', read_attrs=['tiff', 'stats1'], configuration_attrs=['images_per_set', 'number_of_sets'])"]
      plan_name       : count                                   
      plan_type       : generator                               
      sa_uid          : 7f5a142f                                
      sample_composition:
        H               : 8.0                                     
        C               : 7.0                                     
      sample_maker    : ['nan']                                 
      sample_name     : toluene                                 
      sample_phase    :
        C7H8            : 1.0                                     
      sc_dk_field_uid : 09c6ef6c-c91c-45a5-9093-48a1432e674f    
      scan_id         : 93                                      
      sp_computed_expo: 60.0                                    
      sp_num_frames   : 600.0                                   
      sp_plan_name    : tseries                                 
      sp_requested_exp: 60                                      
      sp_time_per_fram: 0.1                                     
      sp_type         : tseries                                 
      sp_uid          : 673f5786-8a34-4876-906b-961e306de9fa    
      tags            : ['blank']                               
      time            : 1475737844.1481683                      
      uid             : dff9ca9d-906d-40d6-911f-6fc2d8d47677

Set up Archiver Appliance in Travis so we can test Archiver integration

We keep accidentally breaking it. Can you tackle this, @arkilic?

ENH: override filter

Would it be possible to override existing elements of the filter without rewriting from scratch?

Question: Can I always expect a `filled` key in events?

I'd like to know if I can build code on filled.

Time-based searches are apparently still broken

Queries like db(start_time='2017') and db(start_time='2017-01-01') returned zero results at CHX yesterday.

ENH: add amostra plugin

It would be good to also view amostra data which is referenced in the run header. Ideally this would include a recursive external lookup so that filestore data in amostra records can be auto-loaded.

Reinstate Archiver Appliance integration on top of EventStore interface

requested by @ambarb

Background/Calibration handeling

Two issues which might need to be in MDS but I am not certain.

How to associate calibration data with actual data
How to associate background data with actual data

Both these are issues because they might be performed at any point during your total beamtime but need to be associated with all your data. Also the calibration/background might be different for different samples/runs. Finally, calibration data is most useful in the form of (at least for XPD) a dictionary of distances and detector angles, it would be very tedious (without an auto calibrator) to need to calibrate every time you touch your data. Thus it would be great to store this processed data with the calibration data somehow.

Thoughts?

ANN: Splitting up dataportal

As discussed offline with some of you, it's become clear between teaching at the beamlines and overhauling the top-level documentation that "dataportal" as an umbrella over databroker and datamuxer is confusing and adds no value.

From a developer point of view, it was convenient for us to lump them together while all three pieces (broker, muxer, replay) were evolving fast,. But now muxer and broker are really quite separate, and replay is retired. Broker and muxer have no dependencies on each other, and they provide very different functionality that is only sometimes needed together.

The dataportal package will live on as a skeleton of shims, so that all old user code will continue to work. (This is what IPython did when they split up the main package.)

ENH: track databroker or (MDS/FS) upon header return

It might be extremely useful to include some sort of database identifying uid(s) (or connection information?) in the returned headers, that way we could access the database that the header came from only knowing the header. An example of this is getting all the events from an otherwise unknown databroker given the header.

Moved from NSLS-II/metadatastore#260

ENH: broker data from analysisstore

Would it be possible to extract data from analysisstore via the databroker?

Revisit decision to use toolz/cytoolz

Starting with @klauer's comment here: NSLS-II/dataportal#41 (comment)

get_table of run with no Events fails with TypeError

TypeError: data type not understood

In [11]: metadatastore.__version__
Out[11]: '0.3.2'

In [12]: databroker.__version__
Out[12]: '0.3.0+45.gd66915f'



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-c4afefaab4e4> in <module>()
----> 1 for rs in run_starts: print(rs.uid, databroker.get_table(db[rs.uid], fill=False))

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/databroker/databroker.py in get_table(headers, fields, fill, convert_times)
    420             dfs.append(df)
    421     if dfs:
--> 422         return pd.concat(dfs)
    423     else:
    424         # edge case: no data

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    811                        verify_integrity=verify_integrity,
    812                        copy=copy)
--> 813     return op.get_result()
    814
    815

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/tools/merge.py in get_result(self)
    993
    994             new_data = concatenate_block_managers(
--> 995                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
    996             if not self.copy:
    997                 new_data._consolidate_inplace()

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4454                                                 copy=copy),
   4455                          placement=placement)
-> 4456               for placement, join_units in concat_plan]
   4457
   4458     return BlockManager(blocks, axes)

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/core/internals.py in <listcomp>(.0)
   4454                                                 copy=copy),
   4455                          placement=placement)
-> 4456               for placement, join_units in concat_plan]
   4457
   4458     return BlockManager(blocks, axes)

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
   4551     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4552                                          upcasted_na=upcasted_na)
-> 4553                  for ju in join_units]
   4554
   4555     if len(to_concat) == 1:

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/core/internals.py in <listcomp>(.0)
   4551     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4552                                          upcasted_na=upcasted_na)
-> 4553                  for ju in join_units]
   4554
   4555     if len(to_concat) == 1:

/home/klauer/ramdisk/mc/envs/ophyd0/lib/python3.5/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   4799
   4800             if self.is_null and not getattr(self.block,'is_categorical',None):
-> 4801                 missing_arr = np.empty(self.shape, dtype=empty_dtype)
   4802                 if np.prod(self.shape):
   4803                     # NumPy 1.6 workaround: this statement gets strange if all

TypeError: data type not understood

Access convention between `db[]` and `db()`

db[-1] gives latest entry.
However, db(param="foo")[-1] gives earliest entry.

Will this continue to be the convention in databroker? Just wanted to verify.

Issue with filtering by time

Hi,

I'm trying out filters, available v 0.6 onwards.
I'm running databroker 0.7.0+11.g71804f3 (latest fetched master branch as of now).

The time filters don't seem to work as expected. For instance, in this code:

import datetime
def timestring(unixtime):
    timestr = datetime.datetime.fromtimestamp(
        int(unixtime)).strftime('%Y-%m-%d %H:%M:%S')
    return timestr

#### Clear filters just in case
chxdb.clear_filters()
#### this search works
headers = chxdb(comment="H3 R360nm #1")
header1 = headers[0]
print("header 1 time : {}".format(header1['start']['time']))
print("header 1 time : {}".format(timestring(header1['start']['time'])))
#### add filters
chxdb.add_filter(start_time="2017-02-01")
chxdb.add_filter(stop_time = "2017-03-00")
#### This search returns nothing, even though time is within filter (see output)
headers = chxdb(comment="H3 R360nm #1")

header2 = headers[0]
print("header 2 time : {}".format(header2['start']['time']))
print("header 2 time : {}".format(timestring(header['start']['time'])))

The output is

header 1 time : 1486929765.6251316
header 1 time : 2017-02-12 15:02:45
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/.../somefile.py in <module>()
     21 chxdb.add_filter(stop_time = "2017-03-00")
     22 headers = chxdb(comment="H3 R360nm #1")
---> 23 header2 = headers[0]
     24 print("header 2 time : {}".format(header2['start']['time']))
     25 print("header 2 time : {}".format(timestring(header['start']['time'])))

Basically, after adding running add_filter, data is no longer found found. Other filters seem to work (chxdb.add_filter(comment="foo"), works). Seems to have something to do with time. thanks!

databroker returns error on searchs earlier than 2015-10-24

As I noticed, some changes seem to happen just now. An hour ago, databorker worked normally as I can trace back to 2015-09, but now it returns error message after I go earlier than 2015-10-24

screenshot is attached here

ENH: filter with slicing

Would it be possible to slice the databroker with filters on?
eg

db.add_filter(bt_piLast='Billinge')
a = db[-1]
# a is the latest header from the Billinge beamtime

Recurse documents to find configuration data

I have a WIP branch, but I found this neat snippet that might be relevant: https://github.com/aaronchall/usefulpython/blob/master/assemblehierarchydict.py

Discussion: should we have coverage on tests?

My argument is yes we should. Testing is very important and doubly so that all the test code gets written. Consider the following test:

def test_iter():
    a  = db.restream(hdr)
    for b in a:
        assert something

This test will not fail if a is empty, we will simply miss the assert. One way to deal with this is to throw a known fail into the code, run it, check that it failed, then remove it because it is messy. Another way to deal with this would be to run coverage on the test code. Coverage would show that the for loop line was ran but none of the internals were executed thus hinting to us that the internal asserts are not being tested.

One potential solution is to run coverage twice, one without the test code (for our line tallys) and one with the tests covered to test the health of the test codebase.

ENH: Switchable case sensitivity in filters/searches

Would it be possible to change the case sensitivity on a per key basis in either searches or filters?

filter get_events

Would it be possible to add a kwarg to filter the output of get_events?
I know that it can be done via list comprehension but that kinda defeats the purpose of having get_events as a generator.

ENH: fill filled when filling events

It seems that the filled event key does not get switched when pulling from the databroker with restream(hdr, fill=True). Would it be possible to fix this?

Summary of db and mds changes considered

An issue for tracking the API-breaking changes being considered in databroker and related libs

To improve performance:

Break db.__call__ API to return a generator or some generator-like lazy iterable

To make documents from RE and db match exactly:

Do not dereference by default. That is, currently header.descriptors[0].start == header.start but we would change to header.descriptors[0].start == header.start.uid. Likewise event.descriptor would become a uid.
Do not fill in place by default (so that callbacks don't interfere with one another)

To avoid configuration-file spaghetti:

Break from databroker import db and replace it with something like import nsls2utils; nsls2utils.get_broker('2-ID')

[Question]: what is the best way to get an event given an event uid?

What is the best way to get an event given an event uid?
Methods exist to do the same for starts, stops, and descriptors but not events.

Proposed modification to `databroker.restream`

The restream docstring says that it "can be used as a drop-in replacement for the output of the bluesky Run Engine." Now that is not strictly true. restream will emit the RunStart, followed by the Descriptors, followed by the Events (sorted by time because get_events is sorted by time because the cursor is sorted by time. Is that the right lingo @arkilic ?) followed by the RunStop. However, if there is more than one Descriptor, it will not be emitted in the same order that the RunEngine emitted it because the RunEngine emitted it right before it was about to save an Event that was different from the previous events.

Question: Do we care? Also relevant: Do we care right now?

It would not be difficult to match the output of the RunEngine; just sort all documents by document.time (or document.timstamp, I don't remember off the top of my head) and then just emit them in that order. Probably makes sense to fix this once someone has a use case where it is relevant that the order is something more like:

RunStart
Descriptor0
Event0-0
Event0-1
Event0-1
Descriptor1
Event1-0
Event0-2
Event1-1
...
RunStop

instead of (currently implemented)

RunStart
Descriptor0
Descriptor1
Event0-0
Event0-1
Event0-1
Event1-0
Event0-2
Event1-1
...
RunStop

thoughts?

Can no longer plot data while it is being collected

We used to be able to do this but now get an error about a stop document missing. This makes sense as the scan isn't complete, but this defeats the reason for SWMR for csx-tools, @stuwilkins knows more.

What is the reason for the change?

python interactive environment problem

Inside an iterative calculation loop, the current result is displayed using plt.imshow(), plt.show(), plt.draw() commands. It work in regular ipython environment. When switch to 'analysis' environment in order to extract data through databroker, the live display function fails to refresh the image during iteration. It pops a blank window, and wait until the calculation finishes to display it.

Do you know what causes this problem? Thank you.

ENH: selectivly clear filter

Is it possible to remove only parts of a filter?

Time-based search may be broken

Could #96 have broken time-based search? The relevant test is potentially sensitive to the time of day that it is run. I notice that the time-based search happened to fail in #88 because it searched for scans between 2016-11-15 and 2016-11-16 and found none because it was still 2016-11-14 (local time) though already 2015-11-15 in Greenwich.

Either we have a timezone problem or Travis-CI was excited to celebrate my birthday a little early.

list of headers should be reversed

The list of headers should be reversed, currently they are in chronological order.
This means that db[-1] and db()[-1] give back different answers, this is confusing.

Make fill=False by default (for all functions)

It sounds like @tacaswell, @klauer, and I all agree on this. OK with everyone else?

Proximate Motivation: restream and process pass the documents to bluesky callbacks, which expect the un-filled documents that the RunEngine provides. Filled Documents cause them to blow up when the try to (redundant) fill them.

General Motivation: Filling is expensive and we shouldn't do it by default.

Is there any chance I can bait someone else into looking at this? @arkilic or @ericdill? I could use a break from databroker work for awhile, but this should get done soon.

TST: test against both RW and RO versions of backing databases

This might help with issues of which methods go where.

ENH: check total filesize for db.export()

When exporting it might be nice to check/print the total size of the files to be exported, otherwise we might run out of disk space. This way we can let the users know that they need to clear some space or figure something else out to complete the export.

Change how we package runs with no RunStop document into a Header.

Currently, if a run has no run stop (which means something went seriously wrong during a scan) we set header['stop'] = None. This leads to some annoying checks throughout the codebase. Simply leaving off the 'stop' key such that header['stop'] raises a KeyError would actually be simpler.

Typo in Exception

This is just a placeholder for an oops found at 11id:

In [28]: events = get_events(hdr, fields=['data'])

In [29]: list(events)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-29-fa903d44c89f> in <module>()
----> 1 list(events)

/home/xf11id/conda_envs/collection/lib/python3.5/site-packages/databroker/databroker.py in get_events(headers, fields, fill)
    345                 raise ValueError(('{!r} field is not in any descriptor '
    346                                   'of run start {rs.uid}').
--> 347                                  format(k, header['run_star']))
    348 
    349     for header in headers:

KeyError: 'run_star'

Definitely not a "run_star"...

Finish design spec

The Header methods and Helper methods documented in the spec need to be added.

Generate sample data with bluesky

cc NSLS-II/metadatastore#180

Thanks for the awesome suggestion @danielballan

from bluesky.tests.utils import setup_test_run_engine
from bluesky.register_mds import register_mds
RE = setup_test_run_engine()
register_mds(RE)

from bluesky.plans import AbsScanPlan
from bluesky.examples import motor, det
plan = AbsScanPlan([det], motor, 1, 5, 5)
RE(plan)

Remove channelarchiver from this project until we actually implement it

Right now channelarchiver is in a number of installation places (travis, conda recipe) and the watermark function in utils.py, but it is not actually used anywhere in the databroker package.

$ git grep channelarchiver
.travis.yml:  # conda-forge is activated on metadatastore, filestore and channelarchiver
.travis.yml:  # - conda install -n testenv metadatastore filestore channelarchiver
.travis.yml:  - pip install https://github.com/NSLS-II/channelarchiver/zipball/master#egg=channelarchiver
README.rst:- channelarchiver
conda-recipe/meta.yaml:    - channelarchiver
databroker/utils/diagnostics.py:                'channelarchiver', 'xray_vision']

I would advocate for removing it until we actually use it in this project...

Restore _repr_html_ on new implementation of Header

It got deleted in the refactor.

Consider flipping the default of fill_events' `inplace` kwarg.

Broker.fill_event has inplace=True for backward compatibility. If we reverse the sign we avoid confusing mutations but definitely break some user code.

Tag v0.4.0

metadatastore.doc no longer exists. Need to tag a new release on this and metadatastore

TST: use yield for pytest taredown

Just as a suggestion would it be possible to use yield in conftest? It seems like it may be cleaner without injecting any horrible problems into the code.

Filled does not match expected value

This test fails:

def defensive_filestore_call_hfi(stream, db):
    for name, doc in stream:
        if name == 'event':
            print([v for v in doc['filled'].values()])
            print(all([v for v in doc['filled'].values()]))
            print(doc['data'])
            db.fill_event(doc)
            print([v for v in doc['filled'].values()])
            print(all([v for v in doc['filled'].values()]))
            print(doc['data'])
        yield name, doc

def test_defensive_filestore_hfi(exp_db):
    hdr = exp_db[-1]
    raw_stream = exp_db.restream(hdr, fill=False)  # looks like the RE Queue
    proc_stream = defensive_filestore_call_hfi(raw_stream, exp_db)
    fill_stream = exp_db.restream(hdr, fill=True)
    for s1, s2 in zip(proc_stream, fill_stream):
        if s1[0] != 'event':
            assert s1[1] == s2[1]
        else:
            assert_array_equal(s1[1]['data']['pe1_image'],
                               s2[1]['data']['pe1_image'])

Conda Install Issue

We are trying to work with DataBroker for some hackathon related topics and get the following error on a relatively clean Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-59-generic x86_64) with conda version 4.3.13-py36_0. Any help is appreciated.

<<<

$ conda install databroker -c lightsource2-tag
Fetching package metadata ...........
Solving package specifications:
PackageNotFoundError: Package not found: '' Dependencies missing in current linux-64 channels:

databroker -> boltons
databroker -> doct -> humanize
databroker -> doct -> prettytable
databroker -> filestore >=0.3.0 -> pims -> slicerator >=0.9.4
databroker -> filestore >=0.3.0 -> tifffile
databroker -> metadatastore >=0.5.1
Close matches found; did you mean one of these?
boltons: boto
prettytable: pytables
You can search for packages on anaconda.org with
anaconda search -t conda tifffile
(and similarly for the other packages)

<<<

$ conda install databroker -c lightsource2-dev
Fetching package metadata ...........
Solving package specifications:
PackageNotFoundError: Package not found: '' Dependencies missing in current linux-64 channels:

databroker -> channelarchiver -> tzlocal
databroker -> filestore >=v0.2.0 -> doct -> humanize
databroker -> filestore >=v0.2.0 -> doct -> prettytable
databroker -> filestore >=v0.2.0 -> pims
databroker -> filestore >=v0.2.0 -> tifffile
databroker -> metadatastore >=v0.2.0
Close matches found; did you mean one of these?
prettytable: pytables
You can search for packages on anaconda.org with
anaconda search -t conda tzlocal
(and similarly for the other packages)

Confusing time display format in pandas DataFrame

attn @yugangzhang

Tom reports that the timezones coming out of the databroker at CHX are not correct. I wrote this simple script to investigate. It prints the local time, runs a dummy scan (no real hardware involved) and shows the table with timestamps. I find that the times agree.

I get this output:

Current Time: 2016-05-25 21:03:32.966762-04:00
                              time       det  motor
1 2016-05-25 21:03:33.645711-04:00  0.606531    1.0
2 2016-05-25 21:03:33.703810-04:00  0.135335    2.0
3 2016-05-25 21:03:33.765744-04:00  0.011109    3.0

As I run this, it is 5:03 PM in New York. The timestamps show GMT (Greenwich Mean Time) with a -04:00 that denotes the adjustment to local time, which is set to 'US/Eastern' in metadatastore's configuration.

I am guessing that the times are correct for you as well, but that the display is confusing -- what with the -04:00 off to the side. It seems we should just display the local time instead, not GMT - 04:00.

from bluesky import RunEngine
from databroker import db
from metadatastore.commands import insert
from bluesky.plans import scan
from bluesky.examples import motor, det
import time
import pandas as pd


# Setup a RunEngine with metadatastore.
RE = RunEngine({})
RE.subscribe('all', insert)

# Print the local time, for comparison.
secs = time.time()  # seconds since 1 January 1970
localized_time = pd.to_datetime(secs, unit='s').tz_localize('US/Eastern')
print('Current Time:', localized_time)

# Show timestamps from a simple scan.
RE(scan([det], motor, 1, 3, 3))
print(db.get_table(db[-1]))

add `summary` function

def summary(list_of_headers, *keys_to_show):

prints out a formatted prettytable with that info.

cc @tacaswell

Time-based searching seems to still be broken

I don't have a reproducible example at this time, but I think time-based searches that use both a start and end time fail. This is a reminder to look into it.

Implement a 'watch' method that yields new documents by polling

Recall the zmq-based subscribe PR in bluesky (not to be confused with the zmq-based "RunEngine as a service"). The RunEngine pushes documents to a central hub which (caches them and) pushes them to consumers over the network:

for name, doc in subscribe(host):
    do_stuff(name, doc)

As a complement to this (as discussed offline), we need:

for name, doc in db.watch(optional_query_to_narrow_results):
    do_stuff(name, doc)

This addresses one of @klauer's concerns with bluesky/bluesky#395 : what happens if the user forgets to configure the zmq stuff until after the scan has started? I am personally against a hybrid solution that was once discussed: to pull from databroker to "catch up" and then subscribe to pushes from zmq. But I think having a puller is a good idea, and if someone wanted to build a smart hybrid from these two pieces, they could.

Details to be worked out:

What happens if db is receiving documents from more than one RunEngine at a time? Maybe the RunEngine should insert its hostname + process number + Python ojbect id into the RunStart so that the search query can watch RunEngine instance at a time, if needed.
How frequent to do the polling? Proposal: multiply polling interval 2 if you don't see a new document on the first try; divide by 2 if you do.