Giter Site home page Giter Site logo

aim's Introduction

A super-easy way to record, search and compare AI experiments.

PyPI - Python Version PyPI Package Downloads License


PLAY with live demo and check out a short INTRO VIDEO

Integrate seamlessly with your favorite tools

Getting started in three steps

1. Install Aim in your training environment

$ pip install aim

2. Integrate Aim with your code

Flexible integration for any Python script
import aim

# Save inputs, hparams or any other `key: value` pairs
aim.set_params(hyperparam_dict, name='hparams') # Passing name argument is optional

...
for step in range(10):
    # Log metrics to visualize performance
    aim.track(metric_value, name='metric_name', epoch=epoch_number)
...

See documentation here.

PyTorch Lightning integration
from aim.pytorch_lightning import AimLogger

...
trainer = pl.Trainer(logger=AimLogger(experiment='experiment_name'))
...

See documentation here.

Keras & tf.keras integrations
import aim

# Save inputs, hparams or any other `key: value` pairs
aim.set_params(param_dict, name='params_name') # Passing name argument is optional

...
model.fit(x_train, y_train, epochs=epochs, callbacks=[
    aim.keras.AimCallback(aim.Session(experiment='experiment_name'))
    
    # Use aim.tensorflow.AimCallback in case of tf.keras
    aim.tensorflow.AimCallback(aim.Session(experiment='experiment_name'))
])
...

See documentation here.

3. Run the training like you are used to and start Aim UI

$ aim up

Contents

Installation

To install Aim, you need to have python3 and pip3 installed in your environment

  1. Install Aim python package
$ pip install aim

In order to start Aim UI you need to have Docker installed.

$ aim up

Concepts

  • Run - A single training run
  • Experiment - a group of associated training runs

Where is the Data Stored

When the AI training code is instrumented with Aim Python Library and ran, aim automatically creates a .aim directory where the project is located. All the metadata tracked during training via the Python Library is stored in .aim. Also see aim init - an optional and alternative way to initialize aim repository.

Python Library

Use Python Library to instrument your training code to record the experiments. The instrumentation only takes 2 lines:

import aim

Afterwards, simply use the two following functions to track metrics and any params respectively.

...
aim.track(metric_val, name='metric_name', epoch=current_epoch)
aim.set_params(hyperparam_dict, name='dict_name')
...

track

aim.track(value, name='metric_name' [, epoch=epoch] [, **context_args]) source

Parameters

  • value - the metric value of type int/float to track/log
  • name - the name of the metric of type str to track/log (preferred divider: snake_case)
  • epoch - an optional value of the epoch being tracked
  • context_args - any set of other parameters passed would be considered as key-value context for metrics

Examples

aim.track(0.01, name='loss', epoch=43, subset='train', dataset='train_1')
aim.track(0.003, name='loss', epoch=43, subset='val', dataset='val_1')

Once tracked this way, the following search expressions will be enabled:

loss if context.subset in (train, val) # Retrieve all losses in both train and val phase
loss if context.subset == train and context.dataset in (train_1) # Retrieve all losses in train phase with given datasets

Please note that any key-value could be used to track this way and enhance the context of metrics and enable even more detailed search.

Search by context example here:

set_params

aim.set_params(dict_value, name) source

Parameters

  • dict_value - Any dictionary relevant to the training
  • name - A name for dictionaries

Examples

 # really any dictionary can go here
hyperparam_dict = {
  'learning_rate': 0.0001,
  'batch_siz': 32}
aim.set_params(hyperparam_dict, name='params')

The following params can be used later to perform the following search experssions

loss if params.learning_rate < 0.01 # All the runs where learning rate is less than 0.01
loss if params.learning_rate == 0.0001 and params.batch_size == 32 # all the runs where learning rate is 0.0001 and batch_size is 32

Note: if the set_params is called several times with the same name all the dictionaries will add up in one place on the UI.

flush

aim.flush() source

Aim calculates intermediate values of metrics for aggregation during tracking. This method is called at a given frequency(see Session) and at the end of the run automatically. Use this command to flush those values to disk manually.

Session

Use Session to specify custom .aim directory or the experiment from the code.

Class aim.Session()source

Parameters

  • repo - Full path to parent directory of Aim repo - the .aim directory. (optional)
  • experiment - A name of the experiment. See concepts (optional)
  • flush_frequency - The frequency per step to flush intermediate aggregated values of metrics to disk. By default per 128 step. (optional)

Returns

  • Session object to attribute recorded training run to.

Methods

  • track() - Tracks metrics within the session

  • set_params() - Sets session params

  • flush() - Flushes intermediate aggregated metrics to disk. This method is called at a given frequency and at the end of the run automatically.

  • close() - Closes the session. If not invoked, the session will be automatically closed when the training is done.

Examples

  • Here are a few examples of how to use the aim.Session in code

Automatic Tracking

Automatic tracking allows you to track metrics without the need for explicit track statements.

TensorFlow and Keras

Pass an instance of aim.tensorflow.AimCallback to the trainer callbacks list.

Note: Logging for pure keras is handled by aim.keras.AimCallback

Parameters

  • session - Aim Session instance (optional)

Examples

from aim import Session
from aim.tensorflow import AimCallback 
# Use `from aim.keras import AimCallback` in case of keras

...
aim_session = Session(experiment='experiment_name')
model.fit(x_train, y_train, epochs=epochs, callbacks=[
    AimCallback(aim_session)
])
...

TensorFlow v1 full example here
TensorFlow v2 full example here
Keras full example here

PyTorch Lightning

Pass aim.pytorch_lightning.AimLogger instance as logger to pl.Trainer to log metrics and parameters automatically.

Parameters

  • repo - Full path to parent directory of Aim repo - the .aim directory (optional)
  • experiment - A name of the experiment (optional)
  • train_metric_prefix - The prefix of metrics names collected in the training loop. By default train_ (optional)
  • test_metric_prefix - The prefix of metrics names collected in the test loop. By default test_ (optional)
  • val_metric_prefix - The prefix of metrics names collected in the validation loop. By default val_ (optional)
  • flush_frequency - The frequency per step to flush intermediate aggregated values of metrics to disk. By default per 128 step. (optional)

Examples

from aim.pytorch_lightning import AimLogger

...
aim_logger = AimLogger(experiment='pt_lightning_exp')
trainer = pl.Trainer(logger=aim_logger)
trainer.fit(model, train_loader, val_loader)
...

Full example here

Searching Experiments

AimQL is a super simple, python-like search that enables rich search capabilities to search experiments. Here are the ways you can search on Aim:

  • Search by experiment name - experiment == {name}
  • Search by run - run.hash == "{run_hash}" or run.hash in ("{run_hash_1}", "{run_hash_2}") or run.archived is True
  • Search by param - params.{key} == {value}
  • Search by context - context.{key} == {value}

Search Examples

  • Display the losses and accuracy metrics of experiments whose learning rate is 0.001:
    • loss, accuracy if params.learning_rate == 0.001
  • Display the train loss of experiments whose learning rate is greater than 0.0001:
    • loss if context.subset == train and params.learning_rate > 0.0001

Check out this demo project deployment to play around with search.

Command Line Interface

Aim CLI offers a simple interface to easily organize and record your experiments. Paired with the Python Library, Aim is a powerful utility to record, search and compare AI experiments. Here are the set of commands supported:

Command Description
init Initialize the aim repository.
version Displays the version of aim cli currently installed.
experiment Creates a new experiment to group similar training runs into.
up Runs Aim web UI for the given repo
down Turn off the UI
upgrade Upgrade the UI to its latest version
pull Pull the UI of the given version

init

This step is optional. Initialize the aim repo to record the experiments.

$ aim init

Creates .aim directory to save the recorded experiments to. Running aim init in an existing repository will prompt the user for re-initialization.

Beware: Re-initialization of the repo clears .aim folder from previously saved data and initializes new repo. Note: This command is not necessary to be able to get started with Aim as aim is automatically initializes with the first aim function call.

version

Display the Aim version installed.

$ aim version

experiment

Create new experiments to organize the training runs. Here is how it works:

$ aim experiment COMMAND [ARGS]
Command Args Description
add -n | --name <exp_name> Add new experiment with a given name.
checkout -n | --name <exp_name> Switch/checkout to an experiment with given name.
ls List all the experiments of the repo.
rm -n | --name <exp_name> Remove an experiment with the given name.

Disclaimer: Removing the experiment also removes the recorded experiment runs data.

up

Start the Aim web UI locally. Aim UI is a Docker container that mounts the .aim folder and lets researchers manage, search and start new training runs.

$ aim up [ARGS]
Args Description
-h | --host <host> Specify host address.
-p | --port <port> Specify port to listen to.
-v | --version <version> Version of Aim UI to run. Default latest.
--repo <repo_path> Path to parent directory of .aim repo. Current working directory by default
-d | --detach Run Aim UI in detached mode.
--tf_logs <logs_dir_path> Use Aim to search cand compare TensorBoard experiments. More details in TensorBoard Experiments

Disclaimer: UI uses docker container to run and having docker installed in the training environment is mandatory for UI to run. Most of the environments nowadays have docker preinstalled or installed for other purposes so this should not be a huge obstacle to get started with Aim UI.

Please make sure to run aim up in the directory where .aim is located.

down

Turn off Aim UI manually:

$ aim down [ARGS]
Args Description
--repo <repo_path> Path to parent directory of .aim repo. Current working directory by default

upgrade

Upgrade Aim UI to its latest version:

$ aim upgrade

pull

Pulls Aim UI of the given version:

$ aim pull -v <version>

TensorBoard Experiments

Easily run Aim on experiments visualized by TensorBoard. Here is how:

$ aim up --tf_logs path/to/logs

This command will spin up Aim on the TensorFlow summary logs and load the logs recursively from the given path. Use tf: prefix to select and display metrics logged with tf.summary in the dashboard, for example tf:accuracy.

Tensorboard search example here

aim's People

Contributors

bkal01 avatar dependabot[bot] avatar gorarakelyan avatar jamesj-jiao avatar jialin-wu-02 avatar mike1808 avatar sgevorg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.