Giter Site home page Giter Site logo

dhruvdcoder / wandb-allennlp Goto Github PK

View Code? Open in Web Editor NEW
21.0 3.0 3.0 597 KB

Utilities and boilerplate code to use wandb with allennlp

License: MIT License

Python 94.27% Jsonnet 2.30% HTML 3.43%
allennlp sweep wandb machine-learning deep-learning

wandb-allennlp's Introduction

wandb-allennlp

Tests

Utilities and boilerplate code which allows using Weights & Biases to tune the hypereparameters for any AllenNLP model without a single line of extra code!

What does it do?

  1. Log a single run or a hyperparameter search sweep without any extra code, just using configuration files.

  2. Use Weights & Biases' bayesian hyperparameter search engine + hyperband in any AllenNLP project.

Quick start

Installation

$ pip install wandb-allennlp
$ echo wandb_allennlp >> .allennlp_plugins

Log a single run

  1. Create your model using AllenNLP along with a training configuration file as you would normally do.

  2. Add a trainer callback in your config file. Use one of the following based on your AllenNLP version:

...,

trainer: {
    type: 'callback',
    callbacks: [
      ...,
      {
        type: 'wandb_allennlp',
        files_to_save: ['config.json'],
        files_to_save_at_end: ['*.tar.gz'],
      },
      ...,
    ],
    ...,
}
...
...
  1. Execute the allennlp train-with-wandb command instead of allennlp train. It supports all the arguments present in allennlp train. However, the --overrides have to be specified in the --kw value or --kw=value form, where kw is the parameter to override and value is its value. Use the dot notation for nested parameters. For instance, {'model': {'embedder': {'type': xyz}}} can be provided as --model.embedder.type xyz.
allennlp  train-with-wandb model_configs/my_config.jsonnet --include-package=package_with_my_registered_classes --include-package=another_package --wandb-run-name=my_first_run --wandb-tags=any,set,of,non-unique,tags,that,identify,the,run,without,spaces

Hyperparameter Search

  1. Create your model using AllenNLP along with a training configuration file as you would normally do. For example:
local data_path = std.extVar('DATA_PATH');
local a = std.parseJson(std.extVar('a'));
local bool_value = std.parseJson(std.extVar('bool_value'));
local int_value = std.parseJson(std.extVar('int_value'));

{
  type: 'train_test_log_to_wandb',
  evaluate_on_test: true,
  dataset_reader: {
    type: 'snli',
    token_indexers: {
      tokens: {
        type: 'single_id',
        lowercase_tokens: true,
      },
    },
  },
  train_data_path: data_path + '/snli_1.0_test/snli_1.0_train.jsonl',
  validation_data_path: data_path + '/snli_1.0_test/snli_1.0_dev.jsonl',
  test_data_path: data_path + '/snli_1.0_test/snli_1.0_test.jsonl',
  model: {
    type: 'parameter-tying',
    a: a,
    b: a,
    d: 0,
    bool_value: bool_value,
    bool_value_not: !bool_value,
    int_value: int_value,
    int_value_10: int_value + 10,

  },
  data_loader: {
    batch_sampler: {
      type: 'bucket',
      batch_size: 64,
    },
  },
  trainer: {
    optimizer: {
      type: 'adam',
      lr: 0.001,
      weight_decay: 0.0,
    },
    cuda_device: -1,
    num_epochs: 2,
    callbacks: [
      {
        type: 'wandb_allennlp',
        files_to_save: ['config.json'],
        files_to_save_at_end: ['*.tar.gz'],
      },
    ],
  },
}
  1. Create a sweep configuration file and generate a sweep on the wandb server. Note that the tied parameters that are accepted through environment variables are specified using the prefix env. in the sweep config. For example:
name: parameter_tying_test_console_script_v0.2.4
program: allennlp
command:
  - ${program} #omit the interpreter as we use allennlp train command directly
  - "train-with-wandb" # subcommand
  - "configs/parameter_tying_v0.2.4.jsonnet"
  - "--include-package=models" # add all packages containing your registered classes here
  - "--include-package=allennlp_models"
  - ${args}
method: bayes
metric:
  name: training_loss
  goal: minimize
parameters:
  # hyperparameters start with overrides
  # Ranges
  # Add env. to tell that it is a top level parameter
  env.a:
    min: 1
    max: 10
    distribution: uniform
  env.bool_value:
    values: [true, false]
  env.int_value:
    values: [-1, 0, 1, 10]
  model.d:
    value: 1
  1. Create the sweep on wandb.
$ wandb sweep path_to_sweep.yaml
  1. Set the other environment variables required by your jsonnet.
export DATA_DIR=./data
  1. Start the search agents.
wandb agent <sweep_id>

wandb-allennlp's People

Contributors

dhruvdcoder avatar mfa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wandb-allennlp's Issues

Parameter tying through external variables in jsonnet.

The config jsonnet has to be enclosed in the top-level function as show in the Top-Level Function section. The common parameters which tie multiple parameters have to be defined as the arguments for this top-level function so that their values are taken from the "ext" (environment variables in our case).
Moreover, their default values can also be specified in the jsonnet.

UPDATE: Turns out the above method does not work because the top-level arguments always have to be passed as parameters to the jsonnet call. So we will use env variables instead. This means that the config for sweep has to be written differently compared to the single run.

Multi-GPU training errors

Every multi-gpu run I tried so far results in:

wandb.errors.error.Error: You must call wandb.init() before wandb.config.update

I think this is due to each process calling the update_config method which tries sending the config to wandb, if the self.config is not needed .

I thought a possible solution would be to pass the is_master argument from __call__ to update_config

def update_config(self, trainer: GradientDescentTrainer) -> None:

and only log the config to wandb if is_master.

Full Trace:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/opt/conda/lib/python3.7/site-packages/allennlp/commands/train.py", line 443, in _train_worker
metrics = train_loop.run()
File "/opt/conda/lib/python3.7/site-packages/allennlp/commands/train.py", line 505, in run
return self.trainer.train()
File "/opt/conda/lib/python3.7/site-packages/allennlp/training/trainer.py", line 863, in train
callback(self, metrics={}, epoch=-1, is_master=self._master)
File "/opt/conda/lib/python3.7/site-packages/wandb_allennlp/training/callbacks/log_to_wandb.py", line 63, in call
self.update_config(trainer)
File "/opt/conda/lib/python3.7/site-packages/wandb_allennlp/training/callbacks/log_to_wandb.py", line 48, in update_config
self.wandb.config.update(self.config)
File "/opt/conda/lib/python3.7/site-packages/wandb/lib/preinit.py", line 29, in getattr
"You must call wandb.init() before {}.{}".format(self._name, key)
wandb.errors.error.Error: You must call wandb.init() before wandb.config.update

Each run finishes with an error (Allen 1.0.0)

Every run using the wandb_allennlp --subcommand=train always end with an error with TeeHandler

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_run.py", line 1172, in _atexit_cleanup
self._on_finish()
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_run.py", line 1292, in _on_finish
self._console_stop() # TODO: there's a race here with jupyter console logging
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_run.py", line 1203, in _console_stop
self._restore()
File "/opt/conda/lib/python3.7/site-packages/wandb/sdk/wandb_run.py", line 1140, in _restore
self._out_redir.uninstall()
File "/opt/conda/lib/python3.7/site-packages/wandb/lib/redirect.py", line 207, in uninstall
self._redirect(to_fd=self._old_fp.fileno(), close=True)
File "/opt/conda/lib/python3.7/site-packages/wandb/lib/redirect.py", line 161, in _redirect
fp.close()
AttributeError: 'TeeHandler' object has no attribute 'close'

Python yaml package does not parse all float values correctly.

For example, a value 1e-5 will be parsed as a string "1e-5" instead of a float.

Possible solutions:

  1. Have a special regex to parse this before we parse using yaml.load.
  2. You another yaml package like ruamel

How to by-pass the issue without fixing?
Use environment variable to get the value to the jsonnet instead of command line override.

Ref: https://stackoverflow.com/questions/30458977/yaml-loads-5e-6-as-string-and-not-a-number

Integrating into AllenNLP package

Have you thought about integrating this functionalities with main AllenNLP package? At least the Wandb training logging should be easy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.