Giter Site home page Giter Site logo

shelf's Introduction

shelf - a lightweight Python artefact store client

What is it?

shelf combines the pytree registry from JAX with the fsspec project.

Similarly to what you do in JAX, registering a pair of serialization and deserialization callbacks allows you to easily save your custom Python types as files anywhere fsspec can reach!

A ⚡️- quick demo

Here's how you register a custom neural network type that uses pickle to store trained models on disk.

# my_model.py
import numpy as np
import pickle
import shelf
import os


class MyModel:
    def __call__(self):
        return 42
    
    def train(self, data: np.ndarray):
        pass
    
    def score(self, data: np.ndarray):
        return 1.


def save_to_disk(model: MyModel, ctx: shelf.Context) -> None:
    """Dumps the model to the directory ``tmpdir`` using `pickle`."""
    fp = ctx.file("my-model.pkl", mode="wb")
    pickle.dump(model, fp)


def load_from_disk(ctx: shelf.Context) -> MyModel:
    """Reloads the previously pickled model."""
    fname, = ctx.filenames
    fp = ctx.file(fname, mode="rb")
    model: MyModel = pickle.load(fp)
    return model


shelf.register_type(MyModel, save_to_disk, load_from_disk)

Now, for example in your training loop, save the model to anywhere using a Shelf:

import numpy as np
from shelf import Shelf

from my_model import MyModel


def train():
    # Initialize a `Shelf` to handle remote I/O.
    shelf = Shelf()
    
    model = MyModel()
    data = np.random.randn(100)

    # Train your model...
    for epoch in range(10):
        model.train(data)
    
    # and save it to S3...
    shelf.put(model, "s3://my-bucket/my-model.pkl")
    # ... or GCS if you prefer...
    shelf.put(model, "gs://my-bucket/my-model.pkl")
    # ... or Azure!
    shelf.put(model, "az://my-blob/my-model.pkl")

Conversely, if you want to reinstantiate a remotely stored model:

def score():
    model = shelf.get("s3://my-bucket/my-model.pkl", MyModel)
    accuracy = model.score(np.random.randn(100))
    
    print(f"And here's how accurately it predicts: {accuracy:.2%}")

And just like that, push and pull your custom models and data artifacts anywhere you like - your service of choice just has to have a supporting fsspec filesystem implementation available.

Installation

⚠️ shelf is an experimental project - expect bugs and sharp edges.

Install it directly from source, for example either using pip or poetry:

pip install git+https://github.com/nicholasjng/shelf.git
# or
poetry add git+https://github.com/nicholasjng/shelf.git

A PyPI package release is planned for the future.

shelf's People

Contributors

nicholasjng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

cav71

shelf's Issues

Check path qualification before prepending prefix

Currently, we unconditionally prepend self.prefix before the rpath in Shelf.(get|put).

This will of course break if self.prefix is not the empty string and the rpath is already fully qualified (i.e. has leading protocol and separator).

Hence, we need to omit the prefix if the input path is already fully qualified.

Add support for multi-file type instantiation

Currently, types that require multiple distinct files to load (think neural networks with a parameter file and a hyperparameter file) are not supported, and if they are, it's probably by accident.

def load_from_disk(param_file: str, hparam_file: str) -> MyModel:  # needs both files to load successfully
    params = load_params(param_file)
    hparams = load_hparams(hparam_file)
    return MyModel(params, hparams)

Loading artifacts from multiple files should look something like this:

m = shelf.get(["s3://my-bucket/my-file1.txt", "s3://my-bucket/my-file2.txt"], MyModel)

BUT, in case the files are saved in a directory prefix, shelf could allow the following shortcut:

# in S3:
# my-bucket/my-file1.txt
#          /my-file2.txt

m = shelf.get("s3://my-bucket/", MyModel)  # assumes that there are no other files in "my-bucket".

The latter case is not trivial at all, though. What happens if the files are not passed to the type deserializer in the correct order? I'm not sure that they are currently sorted in any order, actually. Might require a custom type order field to set on the IO struct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.