The sacred's discuss from idsia

capture function for subdict of config

maybe like this:

@ex.config
def cfg():
    dataset = {'path':'/', 'variant':'foo'}
    pass

@ex.capture('dataset')
def load(path, variant):
    pass

stacktrace pruning removes wrong entries

Don't warn about minor typechanges

there are warnings about typechanges in the config that are not really helpful:

changes from int64 to int (numpy type to python int)
changes from string to unicode

Enable saving weights to the Database

We should use GridFS to save big numpy arrays to the database. That would enable saving the weights of the network alongside the experiment.

Document experiment

How to import and instantiate
what decorators you can use
how to run
how to add observers

Warn about new items

If a config update from the command line added a new entry we should issue a warning.

missing values report can be wrong

If you execute a command with a missing value some parameters with default values will be reported as missing too.

Split running an experiment into 3 parts

Right now the Experiment class is used for the set-up, and then a Run is created for initialization and running. These two steps should be split.

Lazy strings in updates

So far we support lazy strings (without quotes) only in direct assignments. E.g.: name=bob. But it would be nice to also have that in lists and dicts:

a=[bob,lisa,hugo]
b={name:bob,age:23}

instead of

a=["bob","lisa","hugo"]
b={"name":"bob","age":23}

This would be a big gain because entering quotes on the commandline is painful.

Plot crashes on exit

Error in sys.exitfunc:
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/_pylab_helpers.py", line 86, in destroy_all
    manager.destroy()
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/backends/backend_gtk3.py", line 427, in destroy
    self.canvas.destroy()
AttributeError: FigureManagerGTK3Agg instance has no attribute 'canvas'

remove dependency on blessings

The fewer dependencies the better

document command-line interface

command line option for loglevel

provide an option to change loglevel on the commandline.
Maybe
-l LEVEL --logging LEVEL
where LEVEL can be either an int or the level string debug, info, warn, error, ...

Allow for custom values provided by Modules

Commands cannot make use of captured functions

Because there is no Run the captured functions won't be set up, so dependency injection does not work for them.

Syntax Errors in commandline parsing are not caught

so for example this will fail:

python hello_config.py with message="Hello World"

adding config entries does not work without config scope

If you try to add a config entry to ./hello_world.py it does not work, because it has no ConfigScope

consistently allow dotted path notation

for example in

config_updates

Turn main into a regular command

main could maybe just be a regular command.
If so, we would have to define what is the default command, or do away with default commands altogether.
This should be decided soon, as it might break the API.

Mention the Experiment / Run division in the documentation

track libraries used and their version

experiments can't be run in REPL

Because loading the sourcefile fails

document captured functions

print_config command

Implement the print_config command

Running experiments with IPython parallel can result in missing mainfile

The experiment will then have a mainfile entry like this
u'/home/greff/<ipython-input-1-762a1b9c0e46>'

There should be a way of manually setting the mainfile.

Figure out how to reuse Experiments as modules for Meta-Experiments

post-mortem debugging

Have the experiment upon failure fire up a pdb post-mortem debugger.
This should only happen on automain trigger.
This should be deactivateable with a commandline option.

Warn about type-changes through updates

If a config element is changed by command line and its type changes we should issue a warning.

include info for modules into experiment info

right now only the mainfile is added to the DB entry. If modules are from different files they are ignored

Feature: Reseed a running experiment

It would be useful be able to restart the seeding at any point even while running an experiment.

support for experiment 'inheritance'

Allow to reuse code from a base-experiment with some configuration.

remove TODO items and post issues instead

interface for adding files to an experiment

It would be very useful if one could add files to an experiment that can be then be stored by the observers alongside the information for the run.

I can think of several different types of files to add:

dependencies:
static files:
dynamic files:

Dependencies

This includes additional source-files etc.. They would be part of the info about the experiment. So they could potentially be shared among many runs.

they exist as files already, so you want to specify them as a path
they should be added as part of the setup before initialization and of course also before the run is started
a possible interface could be a simple ex.add_dependency("/my/path/filname.foo")

Static Files

Files that are written once and never changed, but belong to the run not to the experiment. For example a report that is written after the run is finished would fall in this category. They might be a special case of Dynamic Files.

Dynamic Files

Files that are updated continuously. So they should be updated as part of the heartbeat. Examples include current weights or log files.

they are part of the run, so the interface should be part of the run object.
they might not exist yet, so it would be convenient to have an interface that provides a temporary named file to write to that is also tracked and deleted afterwards. Something like f = run.create_tracked_file(filename_in_db)
optionally you might want to create the file yourself. Then a simple run.track_file(filename, filename_in_db=None) might be sufficient.
Remembering the file handler in client code would be tedious. So maybe there should be a method f = run.get_tracked_file(filename_in_db) that returns the file if present or otherwise creates it?

Add special stacktrace printing for DB entries and run_commandline

For those cases we should format the stacktrace ourselves and remove the noise

Capture stdout and stderr

Capture the console output of a program and write it to the database

print_config: when changing values in submodules, submodule is not colored

Rename `Module`

The term module is used for pythons internal packaging system a lot and might lead to confusions. We should use a different term. Suggestions:

Component
~~Relic~~
~~Element~~
~~Unit~~
Ingredient
~~Part~~
~~Parcel~~

Maybe allow the transfer of non-JSON objects between config scopes?

In case of multiple ConfigScopes we could allow using the non-JSON locals from one cfg in the others. This would ease implementing the network architecture module providing inputLayer and outputLayer for later usage in config scopes.

But it will complicate the implementation. Is it worth the trouble?

allow non-ConfigScope options

Some people might prefer using regular dictionaries

fixing lists in ConfigScope has unexpected behaviour

The current behaviour is to override assignments, which in this most basic case is fine:

@ConfigScope
def cfg():
    a = [1, 2]

cfg(fixed={a:[7, 8, 9]})['a']   # returns [7, 8, 9]

But as soon as you start modifying your list it gets weird:

@ConfigScope
def cfg():
    a = [1, 2]
    a += [3, 4]

cfg(fixed={a:[7, 8, 9]})['a']   # returns [7, 8, 9, 3, 4]

Similar things happen with append, insert, extend, *=, pop, remove, del, sort, reverse. This can be fixed by injecting an immutable list, which then simply ignores all the changes.

This removes the weirdness from before it also completely removes the possibility of adding something to a list afterwards. Also it seems a little inconsistent to me to have blocking dictionary that is open to changes apart from the forced ones, but have a list that is completely immutable. But for a lack of better ideas this is still the path I'll take.

remove duplication for saving stuff in the DB
make it easier to determine which runs are from the exact same experiment
better fit with the internal structure of sacred
make the run DB entries more comprehendable by shrinking them

Problems might be:

need database "joins" to get full info for a run
makes using the entries harder

Maybe also extract out host info?

idsia / sacred Goto Github PK

sacred's Issues

Dependencies

Static Files

Dynamic Files

Recommend Projects

Recommend Topics

Recommend Org