Giter Site home page Giter Site logo

chanfig's Introduction

Codacy Badge Codacy Badge CodeCov

PyPI - Version PyPI - Python Version Downloads

License: Unlicense License: AGPL v3

Introduction

CHANfiG aims to make your configuration easier.

There are tons of configurable parameters in training a Machine Learning model. To configure all these parameters, researchers usually need to write gigantic config files, sometimes even thousands of lines. Most of the configs are just replicates of the default arguments of certain functions, resulting in many unnecessary declarations. It is also very hard to alter the configurations. One needs to navigate and open the right configuration file, make changes, save and exit. These had wasted an uncountable1 amount of precious time and are no doubt a crime. Using [argparse][argparse] could relieve the burdens to some extent. However, it takes a lot of work to make it compatible with existing config files, and its lack of nesting limits its potential.

CHANfiG would like to make a change.

You just type the alternations in the command line, and leave everything else to CHANfiG.

CHANfiG is highly inspired by YACS. Different from the paradigm of YACS( your code + a YACS config for experiment E (+ external dependencies + hardware + other nuisance terms ...) = reproducible experiment E), The paradigm of CHANfiG is:

your code + command line arguments (+ optional CHANfiG config + external dependencies + hardware + other nuisance terms ...) = reproducible experiment E (+ optional CHANfiG config for experiment E)

Components

A Config is basically a nested dict structure.

However, the default Python dict is hard to manipulate.

The only way to access a dict member is through dict['name'], which is obviously extremely complex. Even worse, if the dict is nested like a config, member access could be something like dict['parent']['children']['name'].

Enough is enough, it is time to make a change.

We need attribute-style access, and we need it now. dict.name and dict.parent.children.name is all you need.

Although there have been some other works that achieve a similar functionality of attribute-style access to dict members. Their Config object either uses a separate dict to store information from attribute-style access (EasyDict), which may lead to inconsistency between attribute-style access and dict-style access; or reuse the existing __dict__ and redirect dict-style access (ml_collections), which may result in confliction between attributes and members of Config.

To overcome the aforementioned limitations, we inherit the Python built-in [dict][dict] to create [FlatDict][chanfig.FlatDict], [DefaultDict][chanfig.DefaultDict], [NestedDict][chanfig.NestedDict], [Config][chanfig.Config], and [Registry][chanfig.Registry]. We also introduce [Variable][chanfig.Variable] to allow sharing a value across multiple places, and [ConfigParser][chanfig.ConfigParser] to parse command line arguments.

FlatDict

[FlatDict][chanfig.FlatDict] improves the default [dict][dict] in 3 aspects.

Dict Operations

[FlatDict][chanfig.FlatDict] supports variable interpolation. Set a member's value to another member's name wrapped in ${}, then call [interpolate][chanfig.FlatDict.interpolate] method. The value of this member will be automatically replaced with the value of another member.

[dict][dict] in Python is ordered since Python 3.7, but there isn't a built-in method to help you sort a [dict][dict]. [FlatDict][chanfig.FlatDict]supports [sort][chanfig.FlatDict.sort] to help you manage your dict.

[FlatDict][chanfig.FlatDict] incorporates a [merge][chanfig.FlatDict.merge] method that allows you to merge a Mapping, an Iterable, or a path to the [FlatDict][chanfig.FlatDict]. Different from built-in [update][dict.update], [merge][chanfig.FlatDict.merge] assign values instead of replace, which makes it work better with [DefaultDict][chanfig.DefaultDict].

Moreover, [FlatDict][chanfig.FlatDict] comes with [difference][chanfig.FlatDict.difference] and [intersect][chanfig.FlatDict.intersect], which makes it very easy to compare a [FlatDict][chanfig.FlatDict] with other Mapping, Iterable, or a path.

ML Operations

[FlatDict][chanfig.FlatDict] supports [to][chanfig.FlatDict.to] method similar to PyTorch Tensor. You can simply convert all member values of [FlatDict][chanfig.FlatDict] to a certain type or pass to a device in the same way.

[FlatDict][chanfig.FlatDict] also integrates [cpu][chanfig.FlatDict.cpu], [gpu][chanfig.FlatDict.gpu] ([cuda][chanfig.FlatDict.cuda]), and [tpu][chanfig.FlatDict.tpu] ([xla][chanfig.FlatDict.xla]) methods for easier access.

IO Operations

[FlatDict][chanfig.FlatDict] provides [json][chanfig.FlatDict.json], [jsons][chanfig.FlatDict.jsons], [yaml][chanfig.FlatDict.yaml] and [yamls][chanfig.FlatDict.yamls] methods to dump [FlatDict][chanfig.FlatDict] to a file or string. It also provides [from_json][chanfig.FlatDict.from_json], [from_jsons][chanfig.FlatDict.from_jsons], [from_yaml][chanfig.FlatDict.from_yaml] and [from_yamls][chanfig.FlatDict.from_yamls] methods to build a [FlatDict][chanfig.FlatDict] from a string or file.

[FlatDict][chanfig.FlatDict] also includes dump and load methods which determine the type by their extension and dump/load [FlatDict][chanfig.FlatDict] to/from a file.

DefaultDict

To facilitate the needs of default values, we incorporate [DefaultDict][chanfig.DefaultDict] which accepts default_factory and works just like a [collections.defaultdict][collections.defaultdict].

NestedDict

Since most Configs are in a nested structure, we further propose a [NestedDict][chanfig.NestedDict].

Based on [DefaultDict][chanfig.DefaultDict], [NestedDict][chanfig.NestedDict] provides [all_keys][chanfig.NestedDict.all_keys], [all_values][chanfig.NestedDict.all_values], and [all_items][chanfig.NestedDict.all_items] methods to allow iterating over the whole nested structure at once.

[NestedDict][chanfig.NestedDict] also comes with [apply][chanfig.NestedDict.apply] and [apply_][chanfig.NestedDict.apply_] methods, which make it easier to manipulate the nested structure.

Config

[Config][chanfig.Config] extends the functionality by supporting [freeze][chanfig.Config.freeze] and [defrost][chanfig.Config.defrost], and by adding a built-in [ConfigParser][chanfig.ConfigParser] to pare command line arguments.

Note that [Config][chanfig.Config] also has default_factory=Config() by default for convenience.

Registry

[Registry][chanfig.Registry] extends the [NestedDict][chanfig.NestedDict] and supports [register][chanfig.Registry.register], [lookup][chanfig.Registry.lookup], and [build][chanfig.Registry.build] to help you register constructors and build objects from a [Config][chanfig.Config].

[ConfigRegistry][chanfig.ConfigRegistry] is a subclass of [Registry][chanfig.Registry] that is specifically designed for building objects from a [Config][chanfig.Config] or a [dataclass][dataclasses.dataclass]. Just specify the key when creating the registry and pass config to the build method, and you will get the object you want.

Variable

Have one value for multiple names at multiple places? We got you covered.

Just wrap the value with [Variable][chanfig.Variable], and one alteration will be reflected everywhere.

[Variable][chanfig.Variable] supports type, choices, validator, and required to ensure the correctness of the value.

To make it even easier, [Variable][chanfig.Variable] also support help to provide a description when using [ConfigParser][chanfig.ConfigParser].

ConfigParser

[ConfigParser][chanfig.ConfigParser] extends [ArgumentParser][argparse.ArgumentParser] and provides [parse][chanfig.ConfigParser.parse] and [parse_config][chanfig.ConfigParser.parse_config] to parse command line arguments.

Usage

CHANfiG has great backward compatibility with previous configs.

No matter if your old config is json or yaml, you could directly read from them.

And if you are using yacs, just replace CfgNode with [Config][chanfig.Config] and enjoy all the additional benefits that CHANfiG provides.

Moreover, if you find a name in the config is too long for command-line, you could simply call [self.add_argument][chanfig.Config.add_argument] with proper dest to use a shorter name in command-line, as you do with argparse.

--8<-- "demo/config.py"

All you need to do is just run a line:

python main.py --model.encoder.num_layers 8 --model.dropout=0.2 --lr 5e-3

You could also load a default configure file and make changes based on it:

Note, you must specify config.parse(default_config='config') to correctly load the default config.

python main.py --config meow.yaml --model.encoder.num_layers 8 --model.dropout=0.2 --lr 5e-3

If you have made it dump current configurations, this should be in the written file:

=== "yaml"

``` yaml
--8<-- "demo/config.yaml"
```

=== "json"

``` json
--8<-- "demo/config.json"
```

Define the default arguments in function, put alterations in CLI, and leave the rest to CHANfiG.

Installation

=== "Install the most recent stable version on pypi"

```shell
pip install chanfig
```

=== "Install the latest version from source"

```shell
pip install git+https://github.com/ZhiyuanChen/CHANfiG
```

It works the way it should have worked.

License

CHANfiG is multi-licensed under the following licenses:

=== "The Unlicense"

```
--8<-- "LICENSES/LICENSE.Unlicense"
```

=== "GNU Affero General Public License v3.0 or later"

```
--8<-- "LICENSES/LICENSE.AGPL"
```

=== "GNU General Public License v2.0 or later"

```
--8<-- "LICENSES/LICENSE.GPLv2"
```

=== "BSD 4-Clause "Original" or "Old" License"

```
--8<-- "LICENSES/LICENSE.BSD"
```

=== "MIT License"

```
--8<-- "LICENSES/LICENSE.MIT"
```

=== "Apache License 2.0"

```
--8<-- "LICENSES/LICENSE.Apache"
```

You can choose any (one or more) of these licenses if you use this work.

SPDX-License-Identifier: Unlicense OR AGPL-3.0-or-later OR GPL-2.0-or-later OR BSD-4-Clause OR MIT OR Apache-2.0

Footnotes

  1. fun fact: time is always uncountable. โ†ฉ

chanfig's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

chanfig's Issues

load config from command arguments

Hi, I want to use --from_yaml to specify use which config file, then load it to Config, use Config.parse() to modify the specified config.

The current code is:

from chanfig import Config
from rich import print


if __name__ == '__main__':
    yaml_config = Config()
    yaml_config.add_argument('--from_yaml', type=str, required=True)
    yaml_config.parse()
    config = Config.from_yaml(yaml_config.from_yaml)
    config = config.parse()
    print(config)

While it is a little ugly, and when run with --help, it only shows help messages of yaml_config since the first parser is used.

Is there a more elegant way to do so?

Can a Config be wrapped with Variable?

For now, I am using this code to get train_args.

from chanfig import Config, Variable
from . import data as data_cfg


class BaseConfig(Config):
    def __init__(self) -> None:
        super().__init__()
        self.train_data: Config = data_cfg.ImageNet

    def get_train_args(self) -> Config:
        train_args: Config = Config({
            'data': self.train_data,
        })
        return train_args

Can the Config be wrapped with Variable such that we could do so?

from chanfig import Config, Variable
from . import data as data_cfg


class BaseConfig(Config):
    def __init__(self) -> None:
        super().__init__()
        self.train_data: Config = data_cfg.ImageNet
        self.train_args: Config = Config({
            'data': self.train_data,
        })

and when run with --train_data.name, the train_args.data.name also changed?

Strange error when dict key is a list

Traceback (most recent call last):
  File "/home/v-zhichen/chanfig/chanfig/flat_dict.py", line 195, in __getitem__
    return self.get(name, default=Null)
  File "/home/v-zhichen/chanfig/chanfig/flat_dict.py", line 188, in get
    if name in self:
TypeError: unhashable type: 'list'

Support to read multiple config file and combine them

But we cannot write "load model.yaml" in yaml config and load it correctly right? When I am writing configurations for my experiments, I faced this issue, since a model's parameters are large (refer to transformers or diffusers) and relatively fixed, and dataset's parameters can be reused across many experiments, I decided to separate these configuration files. But without grammar like ${import xx.yaml}$, I turned out to write a python config, as shown in #16 .

Originally posted by @dc3ea9f in #14 (comment)

from_yamls() is broken with chanfig[include]

For chanfig with yaml include support, the following code snippet

with open('foo.yaml', 'r') as f:
    foo_str = f.read()

c = Config().from_yamls(foo_str)

is broken with the following error:

File "/home/dev/.local/lib/python3.10/site-packages/chanfig/flat_dict.py", line 1083, in from_yamls
    return cls.from_dict(yaml_load(string, *args, **kwargs))
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/constructor.py", line 60, in construct_document
    for dummy in generator:
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/constructor.py", line 143, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "/opt/miniconda/lib/python3.10/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/opt/miniconda/lib/python3.10/site-packages/yamlinclude/constructor.py", line 105, in __call__
    return self.load(loader, *args, **kwargs)
  File "/opt/miniconda/lib/python3.10/site-packages/yamlinclude/constructor.py", line 192, in load
    raise YamlIncludeFileTypeException(
yamlinclude.constructor.YamlIncludeFileTypeException: Relative include only supported for regular files, got <byte string> instead.

wrong output when use with rich.print

Hi, thank you for your awesome project, while I notice there is a bug when using rich.print.

Test Code:

from chanfig import Config
from rich import print as pprint


if __name__ == '__main__':
    config = Config(**{'hello': 'world'})
    print('print', config)
    pprint('rich.print', config)
    print(config.__rich__)
    print(config.keys())

Output:

rint Config(<class 'chanfig.config.Config'>,
  ('hello'): 'world'
)
rich.print Config(<class 'chanfig.config.Config'>,
  ('hello'): 'world'
  ('__rich__'): Config(<class 'chanfig.config.Config'>, )
  ('aihwerij235234ljsdnp34ksodfipwoe234234jlskjdf'): Config(<class 'chanfig.config.Config'>, )
)
Config(<class 'chanfig.config.Config'>, )
dict_keys(['hello', '__rich__', 'aihwerij235234ljsdnp34ksodfipwoe234234jlskjdf'])

Expected behavior: rich.print should output the same with print.

I don't know if the weird behavior is caused by rich or chanfig, but it seems like the rich.print would inject some keys to my config. Hope you can help me fix this issue.

Thanks.

Feature Request: variable interpolation in yaml file

Hi, I just found there exists an exciting and very useful feature "variable interpolation" provided by OmegaConf .

Detailed documents can be found here. And here is a comparison:
image

In short, it can interpolate values that are annotated with ${} in lazy mode. I think this reference is usable when creating configurations from yaml, just like Variable in chanfig.

How do you like it?

Best!

Repetitive setting for model, dataset,...etc

Hi, I love the ideas you introduce in this repo; and in the best practice you demo in the README.md, I notice something like:
model = Model(**config.model); optimizer=Optimizer(**config.optimizer); ...
which is just a perfect idea.
However, there are many setting will get used by many componenet, like model, optimizer,...
How should I handle this if using CHANfiG?

Best!

Feature Request: help message, type checker and input restrictions

When using this library, I typically write a yaml or json file as a configuration and load it using chanfig.Config. However, I have encountered two issues that make the process a bit uncomfortable:

  1. When running the program with the --help option, only the parameters added with config.add_argument are displayed, which is inconvenient. It would be better if the full help message, similar to what argparse provides, could be shown.
  2. After writing the configuration file, I always need to add very long codes to manually verify the input type and check for any other restrictions to ensure that the user hasn't entered any invalid input (e.g., using --type and --choices in argparse). It would be more convenient if we can use a comment (use JSONC for JSON) to restrict input type and other conditions. For example, a configuration file in YAML format could look like this:
model:
  activation: 'ReLU' // **choices: ['ReLU', 'ELU', 'GeLU']; type: str**
exp:
  seed: 42 // **type: int**

Comments quoted with ** would be parsed into input checker by CHANfiG. When there is no comment, CHANfiG parser would roll back to the current version. Although it is a bit ugly, I haven't found any other better solutions so far.

I believe addressing these points would greatly enhance the development experience with CHANfiG. I would appreciate any comments or suggestions you may have on these matters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.