asappresearch / flambe Goto Github PK

View Code? Open in Web Editor NEW

261.0 9.0 28.0 13.59 MB

An ML framework to accelerate research and its path to production.

Home Page: https://flambe.ai

License: MIT License

Shell 0.32% Python 97.16% JavaScript 1.39% HTML 1.13%

machine-learning research pytorch python deep-learning distributed ml

flambe's Introduction

Welcome to Flambé, a PyTorch-based library that allows users to:

Run complex experiments with multiple training and processing stages
Search over hyperparameters, and select the best trials
Run experiments remotely over many workers, including full AWS integration
Easily share experiment configurations, results, and model weights with others

Installation

From PIP:

pip install flambe

From source:

git clone [email protected]:asappresearch/flambe.git
cd flambe
pip install .

Getting started

Define an Experiment:

!Experiment

name: sst-text-classification

pipeline:

  # stage 0 - Load the Stanford Sentiment Treebank dataset and run preprocessing
  dataset: !SSTDataset # this is a simple Python object, and the arguments to build it
    transform: # these arguments are passed to the init method
      text: !TextField
      label: !LabelField

  # Stage 1 - Define a model
  model: !TextClassifier
      embedder: !Embedder
        embedding: !torch.Embedding  # automatically use pytorch classes
          num_embeddings: !@ dataset.text.vocab_size # link to other components, and attributes
          embedding_dim: 300
        embedding_dropout: 0.3
        encoder: !PooledRNNEncoder
          input_size: 300
          n_layers: !g [2, 3, 4] # grid search over any parameters
          hidden_size: 128
          rnn_type: sru
          dropout: 0.3
      output_layer: !SoftmaxLayer
          input_size: !@ model[embedder][encoder].rnn.hidden_size # also use inner-links
          output_size: !@ dataset.label.vocab_size

  # Stage 2 - Train the model on the dataset
  train: !Trainer
    dataset: !@ dataset
    model: !@ model
    train_sampler: !BaseSampler
    val_sampler: !BaseSampler
    loss_fn: !torch.NLLLoss
    metric_fn: !Accuracy
    optimizer: !torch.Adam
      params: !@ train[model].trainable_params
    max_steps: 100
    iter_per_step: 100

  # Stage 3 - Eval on the test set
  eval: !Evaluator
    dataset: !@ dataset
    model: !@ train.model
    metric_fn: !Accuracy
    eval_sampler: !BaseSampler

# Define how to schedule variants
schedulers:
  train: !ray.HyperBandScheduler

All objects in the pipeline are subclasses of Component, which are automatically registered to be used with YAML. Custom Component implementations must implement run to add custom behavior when being executed.

Now just execute:

flambe example.yaml

Note that defining objects like model and dataset ahead of time is optional; it's useful if you want to reference the same model architecture multiple times later in the pipeline.

Progress can be monitored via the Report Site (with full integration with Tensorboard):

Features

Native support for hyperparameter search: using search tags (see !g in the example) users can define multi variant pipelines. More advanced search algorithms will be available in a coming release!
Remote and distributed experiments: users can submit Experiments to Clusters which will execute in a distributed way. Full AWS integration is supported.
Visualize all your metrics and meaningful data using Tensorboard: log scalars, histograms, images, hparams and much more.
Add custom code and objects to your pipelines: extend flambé functionality using our easy-to-use extensions mechanism.
Modularity with hierarchical serialization: save different components from pipelines and load them safely anywhere.

Next Steps

Full documentation, tutorials and much more in https://flambe.ai

Contact

You can reach us at [email protected]

flambe's People

Contributors

Stargazers

Watchers

flambe's Issues

Can I specify a path for my `flambe-output` dir?

Error when activation is specified in MLPEncoder

Describe the bug
When I specify activation layers for the MLPEncoder, I see the following error:

2019-09-03 11:51:52,138	ERROR trial_runner.py:487 -- Error processing event.
Traceback (most recent call last):
  File "/home/peter/code/flambe/venv/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 436, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/peter/code/flambe/venv/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 323, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/peter/code/flambe/venv/lib/python3.6/site-packages/ray/worker.py", line 2195, in get
    raise value
ray.exceptions.RayTaskError: ray_worker (pid=9937, host=peter-MS-7758)
  File "/home/peter/code/flambe/venv/lib/python3.6/site-packages/ray/tune/trainable.py", line 87, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/peter/code/flambe/flambe/experiment/tune_adapter.py", line 76, in _setup
    block.load_state(state)
  File "/home/peter/code/flambe/flambe/compile/component.py", line 1093, in load_state
    load(self)
  File "/home/peter/code/flambe/flambe/compile/component.py", line 1092, in load
    load(child, prefix + name + STATE_DICT_DELIMETER)
  File "/home/peter/code/flambe/flambe/compile/component.py", line 1092, in load
    load(child, prefix + name + STATE_DICT_DELIMETER)
  File "/home/peter/code/flambe/flambe/compile/component.py", line 1092, in load
    load(child, prefix + name + STATE_DICT_DELIMETER)
  [Previous line repeated 1 more time]
  File "/home/peter/code/flambe/flambe/compile/component.py", line 1083, in load
    unexpected_keys, error_msgs)
  File "/home/peter/code/flambe/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 685, in _load_from_state_dict
    hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)
  File "/home/peter/code/flambe/flambe/compile/component.py", line 977, in _load_state_dict_hook
    version = local_metadata[VERSION_KEY].split('.')
KeyError: '_flambe_version'

To Reproduce
Steps to reproduce the behavior:
Use the following yaml file:

!Experiment

name: sst-text-classification

pipeline:

  # stage 0 - Load the Stanford Sentiment Treebank dataset and run preprocessing
  dataset: !SSTDataset
    transform:
      text: !TextField
      label: !LabelField

  # Stage 1 - Define a model
  model: !TextClassifier
    embedder: !Embedder
      embedding: !torch.Embedding  # automatically use pytorch classes
        num_embeddings: !@ dataset.text.vocab_size
        embedding_dim: 300
      embedding_dropout: 0.3
      encoder: !MLPEncoder
        input_size: 300
        output_size: 128
        output_activation: !torch.ReLU
        n_layers: 2
        hidden_size: 256
        hidden_activation: !torch.ReLU
    output_layer: !SoftmaxLayer
        input_size: 128
        output_size: !@ dataset.label.vocab_size

  # Stage 2 - Train the model on the dataset
  train: !Trainer
    dataset: !@ dataset
    model: !@ model
    train_sampler: !BaseSampler
    val_sampler: !BaseSampler
    loss_fn: !torch.NLLLoss
    metric_fn: !Accuracy
    optimizer: !torch.Adam
      params: !@ train.model.trainable_params
    max_steps: 10
    iter_per_step: 100

  # Stage 3 - Eval on the test set
  eval: !Evaluator
    dataset: !@ dataset
    model: !@ train.model
    metric_fn: !Accuracy
    eval_sampler: !BaseSampler

# Define how to schedule variants
schedulers:
  train: !tune.HyperBandScheduler

run the experiment.
you'll see the error.

Expected behavior
No error should be raised.

Screenshots

Software Versions (please complete the following information):

OS: Ubuntu 18.04
Python Version 3.6.8
PyTorch Version 1.1.0
Flambé Version master

Additional context

can flambe save all tensorboard plots to a specified dir?

Is your feature request related to a problem? Please describe.

No.

Describe the solution you'd like

I'd like a tb_plots_save_dir option in the Flambe config; if given, the runner saves all TensorBoard plots to that dir.

Describe alternatives you've considered

Writing a tool to do this myself.

Additional context

Erm, no.

Cannot add custom attributes to a flambe module

Describe the bug
If you try to follow this example, you get an error Unexpected key(s) in state_dict. This is b/c of a bug in this if clause:

flambe/flambe/compile/component.py

Lines 1264 to 1269 in aad135b

    
           if isinstance(module, torch.nn.Module): 
        
               module_load_fn = module._load_from_state_dict 
        
           else: 
        
               module_load_fn = module._load_state_dict_hook 
        
           module_load_fn(state_dict, prefix, local_metadata, True, missing_keys, 
        
                          unexpected_keys, error_msgs)

The else clause is never hit if module is flambe.nn.Module b/c it inherits from torch.nn.Module, and therefore any custom attribute is treated as an unexpected key.

To Reproduce
Follow the example in the doc.

Expected behavior
The custom attribute should be loaded correctly.

Screenshots
If applicable, add screenshots to help explain your problem.

Software Versions (please complete the following information):

OS: macOS 10.14
Python Version: 3.6
PyTorch Version: 1.1
Flambé Version: 0.4.6

Additional context
Add any other context about the problem here.

Allow pickle for saving checkpoints in Experiment

Sometimes our custom save format doesn't work well for certain architectures, or users would just prefer pickle, or because the save format isn't as mature as pickle there can be bugs. For all these reasons, we should enable pickle checkpointing in Experiment

python-dateutil version conflict in 0.4.6

Describe the bug
I just did a clean, fresh install for flambe 0.4.6 and I see this error messsage:

botocore 1.13.13 has requirement python-dateutil<2.8.1,>=2.1; python_version >= "2.7", but you'll have python-dateutil 2.8.1 which is incompatible.

flambe itself works fine.

To Reproduce
Steps to reproduce the behavior:

pip install flambe==0.4.6
see the error message

Expected behavior
We shouldn't see any error message.

Software Versions (please complete the following information):

OS: macOS 10.14.6
Python Version: 3.7.4
PyTorch Version: 1.1.0
Flambé Version: 0.4.6

Clearer error message when an extension is not installed.

Describe the bug
When you run a runnable one of whose components uses an extension that hasn't been installed (pip install or flambe -i), flambe throws the following error, which does not tell the user what is actually going on:

11:35:30 | AttributeError("'CommentedMap' object has no attribute 'add_extensions_metadata'",)
Traceback (most recent call last):
  File "/home/peter/code/sci-summary/venv/lib/python3.6/site-packages/flambe/runner/run.py", line 87, in main
    runnable.run(force=args.force, verbose=args.verbose)
  File "/home/peter/code/sci-summary/venv/lib/python3.6/site-packages/flambe/experiment/experiment.py", line 279, in run
    schema_block.add_extensions_metadata(self.extensions)
AttributeError: 'CommentedMap' object has no attribute 'add_extensions_metadata'

To Reproduce
Steps to reproduce the behavior:

Create a yaml file with an extension.
Run it (do not install the extension)

Expected behavior
A better error message that tells the user to install the extension.

Screenshots
If applicable, add screenshots to help explain your problem.

Software Versions (please complete the following information):

OS: Ubuntu 18.04
Python Version 3.6.8
PyTorch Version 1.1.0
Flambé Version 0.4.1

Additional context
Add any other context about the problem here.

Expose activation parameter for SRU

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
When using sru as rnn type it could be useful to expose an activation parameter:
https://github.asapp.dev/ASAPPinc/prodml/blob/062c2d10f0475acd9f2fca9659bebf1450951038/asapp/model/sru/sru_functional.py#L444

No in-built functionality for tracking of metrics during training

This is a feature request bordering on a bug. Right now, flambe does not allow to track metrics during training. This, however, is essential to monitor learning.

One problem that I see is that it does not make sense to compute the train metrics after an entire train epoch, as flambe does for test/eval metrics. Given the size of some datasets, this is not really feasible.

Consequently, the interface to the metrics needs to be able to accommodate for the incremental computation of the metrics. That, in turn, requires a decision as to how this should be implemented, partly because not every metric supports incremental computation (think: AUC).
Unfortunately, having incremental computation requires to keep track of previous computations - i.e., we need a state that we update incrementally

From the top of my head, these are the choices we have:

First option: make the metrics state-ful.

The metrics would then have to be "reset" at the beginning of each epoch
An incremental method, added to the metric, could be used to update the metric

Second option: add a metric-state object.

Flambe initializes a metric-state object at the beginning of each epoch.
This metric-state object is passed into each incremental call of the metric (and any other, possibly, to have a uniform interface)
Logging can happen automatically in a method of that state-object

Third option: add local tracking for each metric
(I don't think this is a good option, but I wanted to mention it for completeness)

This works like the metric state object, but with individual state objects per metric.

Callback based Trainer

Is your feature request related to a problem? Please describe.
Currently, customizing a model requires sometime overriding the Trainer. It would be best if the Trainer was an object that users didn't have to override. Furthermore, it would be good to be able to set more defaults values across the board. The Trainer is very generic which is great, but relying on the model more would simplify the interfaces, and improve user experience. The Trainer would be in charge of executing training, fp16, model parallelization, some logging, etc..

Describe the solution you'd like
The solution is to create a Model class with a set of methods (some of which optional) to be called by the Trainer. This has the added benefits that the new model objects can implement defaults for common parameters such as loss function or sampler.

Here are a set of potential methods for the model class:

forward: takes as input the data and output predictions
batch_train: takes as input a batch, returns the loss (uses forward)
batch_eval: takes as input a batch, and returns the metrics (uses forward)
aggregate_metrics: takes the outputs of the compute metrics function and aggregates
validation_metric: takes the output of aggregate_metrics and returns the main validation metric used for model selection, or maybe just a string?
optimize: called after backward to update the model

Some questions to answer:

How far to take this? Can imagine having the data sampling define as part of the model. How about data processing, could be useful for inference?
We could add an unbounded number of methods, how do we want to proceed on that regard? I propose a "add when needed" policy.

Improve checks when launching processes using tmux

tmux is currently used to launch the processes in the clusters (flambe and flambe-site) and there are some occasions where the process returns a non-success exit code but the process is actually launched correctly.
There is a current fix that checks if the process is running (regardless the exit code), but this may not be the best solution for the problem.

RNN (sru) is incompatible with torch 1.3

Describe the bug

flambe/flambe/nn/rnn.py

Line 146 in 703c343

output, state = self.rnn(data, state, mask_pad=(-padding_mask + 1).byte())

raises this error with torch 1.3:

RuntimeError: Negation, the `-` operator, on a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.

Software Versions (please complete the following information):

OS: macOS 10.14
Python Version 3.7.5
PyTorch Version 1.3.1
Flambé Version 0.2.10

Registry refactor to resolve naming conflicts

Is your feature request related to a problem? Please describe.
Currently registering multiple classes of the same name with YAML will not work properly. We should allow multiple distinct classes of the same name to be registered in different namespaces, e.g. "NLLLoss" and "torch.NLLLoss"

Describe the solution you'd like
The registry should be a separate entity (we should not rely on ruamel.yaml to maintain the registry) so that we can easily manage these cases. The registry will be implemented to organize namespaces as a mapping for tags and classes. Whenever ruamel.yaml is needed, the registry can be synced with yaml to ensure it's up to date.

Make `extra_validation_metrics` a key-value mapping rather than a list

Is your feature request related to a problem? Please describe.
Each metric in extra_validation_metrics (provided to the default Trainer) must be of a different classes, because each is written to Tensorboard using its class name. This prohibits logging different metrics of the same class with different configurations, because they'll be overwritten.

Describe the solution you'd like
Make extra_validation_metrics a key-value mapping rather than a list.

Describe alternatives you've considered
This could be handled by a custom Trainer, or each metric with different parameter could be implemented as a separate class.

Additional context

An example of init.py in custom extensions

Is your feature request related to a problem? Please describe.
I forgot to set up __init__.py properly in my custom extension, and the error message is not quite obvious about what went wrong.

Describe the solution you'd like
It'd be great to have an example of an __init__.py in the doc just like the one for setup.py.

Describe alternatives you've considered
Asking @nmatthews-asapp, who helped me out!

Additional context

Add new dataset tests back once the CI bug is fixed.

Currently, we disabled the tests for a couple of dataset here #126

Once we make Circle CI work with the request for the zip files, we should add them back.

Fix link reference resolution inside of lists

Describe the bug
Currently links inside of lists don't get their targets updated properly before compilation. This should be easily fixable by having the traverse method used in this utility function also recurse on sequences.

Improving error messages in case of double key in config

Is your feature request related to a problem? Please describe.
When certain configs have a duplicate key in the config, then flambe fails with:

ruamel.yaml.constructor.ConstructorError: could not determine a constructor for the tag xxxx

I expect this happens with the ones that have a Runnable from an extension as top level object in the config

Describe the solution you'd like
The error message should be clear on what's happening:

The provided configuration file contains duplicated keys 'xxxx'

Make all Flambe base-models torchscript compatible

It would be great if we could extract torchscript models from flambe models, after training. That requires all flambe base models to be compatible with torchscript. After a very superficial attempt to make this work, I found that (at least) some typing instructions in flambe.component are torchscript incompatible.

Reporting server not logging requests when tensorboard is installed.

Describe the bug
If you run the reporting server without tensorboard, you see Flask logs like so:

 * Serving Flask app "flambe.experiment.webapp.app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://localhost:12345/ (Press CTRL+C to quit)
127.0.0.1 - - [23/Nov/2019 04:50:33] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Nov/2019 04:50:34] "GET /state HTTP/1.1" 200 -

However, if you run the reporting server with tensorboard installed, for some reason, the reporting server stops showing Flask logs.

 * Serving Flask app "flambe.experiment.webapp.app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

The reporting server and tensorboard function fine otherwise. However, this is a bit annoying b/c I can't see which port the reporting server runs on if I don't specify it.

To Reproduce

Run the reporting server without tensorboard and see all the Flask logging.
Install tensorboard
Run the reporting server again. Flask doesn't log stuff anymore.

Expected behavior
Flask should log even with tensorboard.

Screenshots
N/A

Software Versions (please complete the following information):

OS: macOS 10.14.6, Ubuntu 16.04.6 LTS
Python Version 3.6
PyTorch Version 1.1
Flambé Version 0.4.7

Additional context
N/A

Checkpointing sometimes fails because of new safety check

Describe the bug
Because of a new safety check introduced in #136 checkpointing may fail when overwriting a previous save file. It should be possible to overwrite a file when saving. Whether or not it should be opt-in or opt-out is open for discussion.

Better data handling for datasets

Is your feature request related to a problem? Please describe.
There is no clean unified interface for datasets.

Describe the solution you'd like
A unified interface should include the following properties:

downloading data
extraction of archives
flattening data folders, or otherwise creating uniform folder structures

Replaces, for now, the following PRs:
#259
#258

Warning when copying a tensor in the base sampler

Describe the bug
The base sampler produces the following warning:

py.warnings [block_train] /home/ubuntu/.local/lib/python3.6/site-packages/flambe/sampler/base.py:167: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().re
quires_grad_(True), rather than torch.tensor(sourceTensor).
  tensors = [torch.tensor(example) for example in column]

To Reproduce
Steps to reproduce the behavior:
Simply run the base sampler with verbose logging.

Expected behavior
There should be no warning.

Screenshots
N/A

Software Versions (please complete the following information):

OS: Ubuntu 18.04
Python Version 3.6.8
PyTorch Version 1.1
Flambé Version 0.4.8

Additional context
N/A

Avoid os.rmtree dangerous behavior

Describe the bug
This line
https://github.com/asappresearch/flambe/blob/master/flambe/compile/serialization.py#L369
has a potential risk of removing undesired stuff from the file system.

Solution
We need to uncompress the gz in a temp folder and let the OS do the cleanup.

Recurring ValueError when refreshing TensorBoard

I'm often seeing the following ValueError:

  File "/persist/git/flambe/flambe/experiment/progress.py", line 66, in refresh
    k, v = h.split('=')
ValueError: not enough values to unpack (expected 2, got 1)

This happens when viewing TensorBoard in browser, and refreshing the page; the above error is logged to console by the server. This doesn't break anything, but it seems like a bug all the same.

`flambe clean` option

Is your feature request related to a problem? Please describe.

The flambe-output folders take up a lot of space.

Describe the solution you'd like

An interactive flambe clean tool; upon invocation, it prints the locations of all flambe-output folders in a specified directory, their respective sizes, and a Y/n option to delete them en masse.

Describe alternatives you've considered

Writing a script to do this myself.

Early checks for constructor argument incompatibilities

Is your feature request related to a problem? Please describe.
Given that an experiment can take a long time to get to the latest object initialization step, any incompatibilities of arguments to the constructors should be caught as early as possible.

Describe the solution you'd like
I think that flambe.component should have a classmethod check_constructor_args(*args, **kwargs) that's automatically called very early after starting an experiment. This would offer the option to override that method, so that any kind of exception could be raised very, very early in the experiment's pipeline.

flambe expects `transformers`, still imports from pytorch_transformers

Describe the bug

As one example, Flambe passes attention_mask into the forward method of GPT2Model.

This parameter is only available in the version of this model as found in the transformers library.

Flambe imports from pytorch_transformers, and so does not interact with this model (and potentially others?).

To Reproduce

!Experiment

name: sst-text-classification

pipeline:

  # stage 0 - Load the Stanford Sentiment Treebank dataset and run preprocessing
  dataset: !SSTDataset
    transform:
      text: !GPT2TextField
        alias: gpt2
      label: !LabelField

  # Stage 1 - Define a model
  model: !TextClassifier
    embedder: !GPT2Embedder
      alias: gpt2
    output_layer: !SoftmaxLayer
      input_size: !@ model[embedder].hidden_size
      output_size: !@ dataset.label.vocab_size

  # Stage 2 - Train the model on the dataset
  train: !Trainer
    dataset: !@ dataset
    model: !@ model
    train_sampler: !BaseSampler
      batch_size: 3
      # drop_last: true
    val_sampler: !BaseSampler
      batch_size: 3
      # drop_last: true
    loss_fn: !torch.NLLLoss
    metric_fn: !Accuracy
    optimizer: !torch.Adam
      params: !@ train[model].trainable_params
    max_steps: 1
    iter_per_step: 1

Error

...
    pred, target = self.model(*batch)
  File "/persist/conda/envs/flambe/lib/python3.6/site-packages/torch/nn/modules/module.py", line
493, in __call__
    result = self.forward(*input, **kwargs)
  File "/persist/git/flambe/flambe/nlp/classification/model.py", line 71, in forward
    encoding = self.embedder(data)
  File "/persist/conda/envs/flambe/lib/python3.6/site-packages/torch/nn/modules/module.py", line
493, in __call__
    result = self.forward(*input, **kwargs)
  File "/persist/git/flambe/flambe/nlp/transformers/utils.py", line 162, in forward
    head_mask=head_mask)
  File "/persist/conda/envs/flambe/lib/python3.6/site-packages/torch/nn/modules/module.py", line
493, in __call__
    result = self.forward(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'attention_mask'

Option to write predictions to disk

Is your feature request related to a problem? Please describe.

Use-case: I use flambé to both debug models then grid-search over the stuff I'm happy with.

To debug, I often need to see the predictions the model is making. This includes (in a classification problem) the predicted index and a map from the index to its label.

Describe the solution you'd like

Some option in Trainer (re: predicting on the val set) and Evaluator (re: predicting on the test set) that logs predictions for me in a thorough manner--all things I'd want to inspect offline, in other words. This would include: the inputs, the full predicted output, and the target.

Thereafter, I would load this data into (say) a notebook, and start to inspect what's going on.

Generating configs from templates

Is your feature request related to a problem? Please describe.

Yes. I'm trying to dynamically inject pathnames into a config, then run an experiment using that config. Out of the box, there is no clean way to do this.

Describe the solution you'd like

A function that accepts a path to a Jinja2-templated flambe config, an output path, and key:val pairs to inject.

Describe alternatives you've considered

Loading the config into memory with YAML or flambe-YAML tools, editing in memory, then writing a new config to disk.

Additional context

Here's what I've come up with:

import os
import re

import jinja2


def generate_config_from_template(template_path, config_path, remove_comments=False, **template_kwargs):
    dirname = os.path.dirname(template_path)
    basename = os.path.basename(template_path)
    loader = jinja2.FileSystemLoader(searchpath=dirname)
    env = jinja2.Environment(loader=loader)
    template = env.get_template(basename)
    with open(config_path, 'w') as f:
        for line in template.render(**template_kwargs).split('\n'):
            if remove_comments:
                line = re.sub('# .*', '', line).rstrip()
            if line:
                f.write(line + '\n')

Where a config template might look like:

post_process_preds: 'post_process_preds_ext'
---
!Experiment

name: ada-text-classification
pipeline:

  # stage 0 - Load the dataset object SSTDataset and run preprocessing
  0_dataset: !Foo
    train_path: {{ train_path }}
    test_path: {{ test_path }}
    transform:
      text: !TextField
      label: !LabelField

  # stage 1 - train the text classifier on the SSTDataset
  1_train: !Trainer
    dataset: !@ 0_dataset  # link back to the existing dataset
    train_sampler: !BaseSampler  # define a way of sampling dataset
    val_sampler: !BaseSampler
    model: !TextClassifier
      embedder: !Embedder
        embedding: !torch.Embedding  # automatically use pytorch classes
          num_embeddings: !@ 0_dataset.text.vocab_size  # reference vocab size
          embedding_dim: 300
        encoder: !PooledRNNEncoder
          input_size: 300
          rnn_type: lstm
          n_layers: !g [2]
          hidden_size: 256
      output_layer: !SoftmaxLayer
        input_size: !@ 1_train.model.embedder.encoder.rnn.hidden_size
        output_size: !@ 0_dataset.label.vocab_size
        take_log: false
    loss_fn: !torch.NLLLoss  # Use existing PyTorch negative log likelihood
    metric_fn: !torch.NLLLoss  # Used for validation set evaluation
    optimizer: !torch.Adam
      params: !@ 1_train.model.trainable_params  # Link to model parameters
    max_steps: 2  # Each step runs `iter_per_step` iterations
    iter_per_step: 2  # Eval and checkpoint every 50 iterations
  2_eval: !Evaluator
    dataset: !@ 0_dataset
    model: !@ 1_train.model
    metric_fn: !torch.NLLLoss
    output_path: {{ preds_path }}
    eval_sampler: !BaseSampler
    eval_data: test
  3_post_process_preds: !post_process_preds.PostProcessPreds
    preds_path: !@ 2_eval.output_path
    preds_id_path: {{ preds_id_path }}
    post_processed_preds_path: {{ post_processed_preds_path }}
    label_vocab: !@ 0_dataset.label.vocab

Tokenize arrays of strings

Is your feature request related to a problem? Please describe.

The BaseSampler will handle arrays of tensors elegantly. However, these arrays only come to be if you manually construct them yourself.

Describe the solution you'd like

To be able to have an array-as-one-observation, and for the TextField to handle that gracefully (then pass it along to the BaseSampler).

Additional context

Models that embed a large sequence of context utterances, where simply joining those utterances into one string makes for one really long string.

BERT fine-tuning example in docs seems broked

Describe the bug

I run the example, and it doesn't work.

To Reproduce

Steps to reproduce the behavior:

Paste the following into a file called foo.yaml:

!Experiment

name: foo
pipeline:
  dataset: !SSTDataset
    transform:
        text: !BERTTextField.from_alias
            alias: 'bert-base-uncased'
            lower: true
        label: !LabelField

  teacher: !TextClassifier
    embedder: !Embedder
      embedding: !BERTEmbeddings.from_alias
        path: 'bert-base-uncased'
        embedding_freeze: True
      encoder: !BERTEncoder.from_alias
        path: 'bert-base-uncased'
        pool_last: True
    output_layer: !SoftmaxLayer
      input_size: !@ teacher.embedder.encoder.config.hidden_size
      output_size: !@ dataset.label.vocab_size  # We link the to size of the label space

  student: !TextClassifier
    embedder: !Embedder
      embedding: !BERTEmbeddings.from_alias
        path: 'bert-base-uncased'
        embedding_freeze: True
      encoder: !PooledRNNEncoder
        rnn_type: sru
        n_layers: 2
        hidden_size: 256
        pooling: last
    output_layer: !SoftmaxLayer
      input_size: !@ student.embedder.encoder.hidden_size
      output_size: !@ dataset.label.vocab_size

  finetune: !Trainer
    dataset: !@ dataset
    train_sampler: !BaseSampler
      batch_size: 16
    val_sampler: !BaseSampler
      batch_size: 16
    model: !@ teacher
    loss_fn: !torch.NLLLoss
    metric_fn: !Accuracy
    optimizer: !AdamW
      params: !@ finetune.model.trainable_params
      lr: 0.00005

  distill: !DistillationTrainer
    dataset: !@ dataset
    train_sampler: !BaseSampler
      batch_size: 16
    val_sampler: !BaseSampler
      batch_size: 16
    teacher_model: !@ finetune.model
    student_model: !@ student
    loss_fn: !torch.NLLLoss
    metric_fn: !Accuracy
    optimizer: !torch.Adam
      params: !@ distill.student_model.trainable_params
      lr: 0.00005
    alpha_kl: 0.5
    temperature: 1

Then, run flambe foo.yaml.

NB: I changed the line input_size: !@ model.embedder.encoder.config.hidden_size to input_size: !@ teacher.embedder.encoder.config.hidden_size.

Expected behavior

Experiment works.

Screenshots

Error: AttributeError: 'BERTTextField' object has no attribute 'embeddings'

Software Versions (please complete the following information):

look in home directory for clustery.yaml by default

I was just thinking, maybe it would be convenient to have a standard location where flambe will look for the cluster.yaml, so then you wouldn't need to mention it in each flambe command.

AvgPooling fails even when input dimensions match expected ones.

Describe the bug
An error pop's up when using the forward method from AvgPooling

To Reproduce
Steps to reproduce the behavior:
Use the layer with any matching dimensions.
return data / padding_mask.sum(dim=1)
RuntimeError: The size of tensor a (300) must match the size of tensor b (64) at non-singleton dimension 1

Expected behavior
The result should be the average, given the padding_mask.

Software Versions (please complete the following information):

Flambé Version 0.4.7

Allow more fine-grained control of links in YAML configs

Is your feature request related to a problem? Please describe.
Currently it's not very explicit how links are resolved i.e. are they resolved against the config (nested dictionary structure) or against the attributes of the objects once they are initialized. The answer is both; it depends on what you're linking to in the config and where that link is.

Describe the solution you'd like
Change links to be most more powerful and more intuitive by supporting pointing to a specific object in the config and then accessing attributes on that object. This will look like:

!@ train_stage[model][embedder][encoder].rnn.hidden_size

for example. The brackets work similar to brackets in normal python, they access a specific object in the nested dictionary structure created from the YAML config. The dot notation is then used to access attributes on the initialized object.

Describe alternatives you've considered
We've considered leaving it implicit, or dropping the attribute access altogether, but these limitations are confusing and, well, limiting. We also considered other syntax for exactly the same logic, but this familiar syntax should be intuitive to new users as it mirrors how things are done in Python.

`Script` should support positional arguments

Is your feature request related to a problem? Please describe.
Currently, you can't pass in positional arguments when running a script.

Describe the solution you'd like
Let's add a new argument to Script called pos_args which is a list of strings. So it'd look like:

my_project: /path/to/my_pip_installable

---

!Experiment

pipeline:
  stage_0: !Script
    script: my_project.train  # my_project is the name of the module
    args:
      pos_args:
        - pos1
        - pos2
      arg1: 'foo'
      arg2: 'bar'

which would result in a call script pos1 pos2 --arg1 foo --arg2 bar.

Describe alternatives you've considered
Making my script only take keyword arguments. This is possible, but I think it'd help flambe to support positional arguments.

Additional context
N/A

Richer debug mode

The debug mode is already very useful in its current state. However, it would be great if it was extended to include the following features:

adding a parameter in the config.yaml that will only load the first k% of the data (i.e., only load a small part from disk). This will make debugging much faster, as the model's forward call will be triggered much earlier
adding debugging overrides for parameters in the config.yaml, such as debugging-specific dimensionalities that could be much smaller, allowing for faster load time and the use of smaller GPUs.
when a cluster.yaml is specified, it would make sense to issue an early warning, but otherwise disable debug mode and proceed normally. This would override this issue.

Possible solutions:

First solution:

add a --debug parameter to the runner/run.py / flambe scripts that triggers debug mode instead of setting the mode via the config
then have a "debug" section in the config.yaml that can override each parameter
OR
have a --debug-file parameter that enables debug mode and takes the overrides from a 2nd file

Second solution:
Add a special character combination (like !g) that allows you to set two values in the config.yaml, where the first is for non-debug mode, and the second is for debug mode. E.g.:

model: !RNN
    hidden_dim: !d [600, 100]
    num_layers: !d [4, 1]

would set the hidden dim to 100 and the num layers to 1 in debug mode.

Check cluster compatibility of the yaml config before starting the cluster

It is frustrating that the debug: True flag is only tested once the cluster has started up. It would make more sense to check for debug: True - and other cluster-relevant flags - before starting the cluster.

I think that the same could be said for any error that results from a problem with the experiment yaml. Another example is passing a wrong argument name.

A possible solution would be a "dry run" of the config locally first, maybe.

Separate embedder and encoder in flambe.nn.embedder

The current implementation where the encoder is part of the embedder prevents "one embedder, two encoder" implementations. As the embedder is oftentimes the largest single matrix in an NLP model, this can lead to an unnecessary increase in memory usage.

Add a fail fast option for experiments

Is your feature request related to a problem? Please describe.
Currently the experiment always progresses to later stages even if there was a trial error. This is sometimes okay, but there should be some kind of opt-in feature to stop as soon as a trial fails to avoid wasted computation.

Describe the solution you'd like
Opt-in flag on Experiment that can be set in either the config or via the CLI

Add unit testing for new drop_unkown feature

This PR added a new flag #110 that needs unit testing.

TaggedScalar instead of Schema when using make_component

Hi,

if I try to load a Component registered through make_component, I obtain a TaggedScalar instead of a Schema instance. For example:

test.yaml:

!ray.HyperBandScheduler

test.py:

schema = yaml.load(open("test.yaml"))
print(schema)
# Output: <ruamel.yaml.comments.TaggedScalar at 0x7fe8db40b310>
# Expected output: flambe.compile.component.Schema(...)

How can I obtain a Schema instance?

Out-of-the-box experiments with standard datasets

Is your feature request related to a problem? Please describe.
For most research, new models/algorithms/etc are tested on specific "standard" datasets.

Describe the solution you'd like
flambe-integrated datasets that could be used by, e.g., having the following config.yaml entry: dataset: !flambe.MultiNLI

Additionally, it would be great to even have this functionality over entire suits of datasets, such as: dataset: !flambe.GLUE_ExperimentSuite

Reporting script can die on the orchestrator

When running on a cluster, clicking both the "download" buttons on the report site can lead to a crash of parts of the webapp.

steps to reproduce:

start flambe on cluster
wait until reporting becomes available
click on both "download" buttons
click on console. There, you'll see Oops! Results not available

More useful stage dependency graph

Is your feature request related to a problem? Please describe.
Currently, the flambe report webpage displays a dependency graph of all the stages in the pipeline. While it looks cool, once you have more than three stages, it's pretty much impossible to read it.

Describe the solution you'd like
I think a simpler tree like you see on CircleCI and other CI/CD websites are a lot easier to read and more useful.

Describe alternatives you've considered
I don't pay attention to the graph b/c it's too complex to understand anyway.

Additional context

Relative paths no longer work for local resources

Describe the bug
If you use relative paths for local resources, flambe components no longer can find those files anymore due to the fact that the cwd is the artifact directory.

To Reproduce
Use relative paths when specifying local resources. Components that use them would raise a file not found error.

Expected behavior
Components who use these local resources should be able to find them.

Screenshots
N/A

Software Versions (please complete the following information):

OS: macOS 10.14
Python Version 3.6.9
PyTorch Version 1.1
Flambé Version master

Additional context
N/A

Relative paths don't work with `TabularDataset.from_path`

Describe the bug
You can't quite use relative paths with TabularDataset.from_path b/c the current working directory is the output directory.

os.getcwd()
'/path/to/your/project/flambe-output/output__experiment/dataset/0_2019-08-06_20-56-50cyx6metq'

If you use a relative path like ../../../../data, it'd work, but this is not really intuitive.

To Reproduce
Steps to reproduce the behavior:

Create a yaml file similar to the one below:

!Experiment

name: experiment
pipeline:
  dataset: !TabularDataset.from_path
    train_path: data/train.csv  # any local path
    val_path: data/val.csv  # any local path
    test_path: data/test.csv  # any local path

You'll get the following error:

(pid=15596) Warning: failed to load file {file_path}
(pid=15596) [Errno 2] File b'data/train.csv' does not exist: b'data/train.csv'

Expected behavior
The expectation is that the relative path would start from the directory where the yaml file is.

Screenshots
If applicable, add screenshots to help explain your problem.

Software Versions (please complete the following information):

OS: Ubuntu 18.04
Python Version 3.6.8
PyTorch Version 1.1
Flambé Version 0.4.1

Additional context
Add any other context about the problem here.

Extensions not getting added to the serialized config

Describe the bug
When running a Builder, the extensions are not getting registered as in: https://github.com/Open-ASAPP/flambe/blob/ccba9762d2f27e8898d17e26bf3a14eae0b55355/flambe/experiment/experiment.py#L279

This causes the config.yaml file not to contain the extension section and therefore, the output folder fails when being used with flambe.load.

To Reproduce
Execute any Builder where the component is coming from an extension

Software Versions (please complete the following information):

OS: macOS 10.13,6
Python Version: 3.6.5
PyTorch Version: 1.1.0
Flambé Version: 0.4.3

	if isinstance(module, torch.nn.Module):
	module_load_fn = module._load_from_state_dict
	else:
	module_load_fn = module._load_state_dict_hook
	module_load_fn(state_dict, prefix, local_metadata, True, missing_keys,
	unexpected_keys, error_msgs)

asappresearch / flambe Goto Github PK

flambe's Introduction

Installation

Getting started

Features

Next Steps

Contact

flambe's People

Contributors

Stargazers

Watchers

Forkers

flambe's Issues

Recommend Projects

Recommend Topics

Recommend Org