Giter Site home page Giter Site logo

slim-trees's Introduction

Slim Trees

CI conda-forge pypi-version python-version

slim-trees is a Python package for saving and loading compressed sklearn Tree-based and lightgbm models. The compression is performed by modifying how the model is pickled by Python's pickle module.

We presented this library at PyData Berlin 2023, check out the slides!

Installation

pip install slim-trees
# or
micromamba install slim-trees -c conda-forge
# or
pixi add slim-trees

Usage

Using slim-trees does not affect your training pipeline. Simply call dump_sklearn_compressed or dump_lgbm_compressed to save your model.

Warning

slim-trees does not save all the data that would be saved by sklearn: only the parameters that are relevant for inference are saved. If you want to save the full model including impurity etc. for analytic purposes, we suggest saving both the original using pickle.dump for analytics and the slimmed down version using slim-trees for production.

Example for a RandomForestClassifier:

# example, you can also use other Tree-based models
from sklearn.ensemble import RandomForestClassifier
from slim_trees import dump_sklearn_compressed

# load training data
X, y = ...
model = RandomForestClassifier()
model.fit(X, y)

dump_sklearn_compressed(model, "model.pkl")
# or alternatively with compression
dump_sklearn_compressed(model, "model.pkl.lzma")

Example for a LGBMRegressor:

from lightgbm import LGBMRegressor
from slim_trees import dump_lgbm_compressed

# load training data
X, y = ...
model = LGBMRegressor()
model.fit(X, y)

dump_lgbm_compressed(model, "model.pkl")
# or alternatively with compression
dump_lgbm_compressed(model, "model.pkl.lzma")

Later, you can load the model using load_compressed or pickle.load.

import pickle
from slim_trees import load_compressed

model = load_compressed("model.pkl")

# or alternatively with pickle.load
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

Save your model as bytes

You can also save the model as bytes instead of in a file similar to the pickle.dumps method.

from slim_trees import dumps_sklearn_compressed, loads_compressed

X, y = ...
model = RandomForestClassifier()
model.fit(X, y)

data = dumps_sklearn_compressed(model, compression="lzma")
...
model_loaded = loads_compressed(data, compression="lzma")

Drop-in replacement for pickle

You can also use the slim_trees.sklearn_tree.dump or slim_trees.lgbm_booster.dump functions as drop-in replacements for pickle.dump.

from slim_trees import sklearn_tree, lgbm_booster

# for sklearn models
with open("model.pkl", "wb") as f:
    sklearn_tree.dump(model, f)  # instead of pickle.dump(...)

# for lightgbm models
with open("model.pkl", "wb") as f:
    lgbm_booster.dump(model, f)  # instead of pickle.dump(...)

Development Installation

You can install the package in development mode using the new conda package manager pixi:

❯ git clone https://github.com/quantco/slim-trees.git
❯ cd slim-trees

❯ pixi install
❯ pixi run postinstall
❯ pixi run test
[...]
❯ pixi run py312 python
>>> import slim_trees
[...]

Benchmark

As a general overview on what you can expect in terms of savings: This is a 1.2G large sklearn RandomForestRegressor.

benchmark

The new file is 9x smaller than the original pickle file.

slim-trees's People

Contributors

pavelzw avatar dependabot[bot] avatar quant-ranger[bot] avatar jonashaag avatar yyyasin19 avatar stefansorgqc avatar

Stargazers

Parsa Bahraminejad avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.