Giter Site home page Giter Site logo

preocts / softboiled Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 121 KB

A dataclass decorator that cleans the parameters on instance creation to account for missing or extra keyword arguments. Allows for faster, if messier, modeling of API responses that are lacking in firm schema.

License: MIT License

Python 93.79% Makefile 6.21%
python dataclasses dataclass

softboiled's Introduction

Code style: black pre-commit pre-commit.ci status Python package codecov

SoftBoiled - Making overly flexible dataclasses

A dataclass decorator that cleans the parameters on instance creation to account for missing or extra keyword arguments. Allows for faster, if messier, modeling of API responses that are lacking in firm schema.

Requirements

  • Python >= 3.8

Installation

Installation Note: Replace 1.x.x with the desired version number or main for latest (unstable) version

Install via pip with GitHub:

# Linux/MacOS
python3 -m pip install git+https://github.com/preocts/[email protected]

# Windows
py -m pip install git+https://github.com/preocts/[email protected]

Known Limitations

  • All dataclass objects within a SoftBoiled dataclass must also be SoftBoiled

The documentation says to expect the following API response:

EXAMPLE01 = {
  "id": 1,
  "name": "Example Response v1",
  "details": {
    "color": "blue",
    "number": 42,
    "true": False
  },
  "more": False
}

However, the API response is actually:

EXAMPLE02 = {
  "id": 1,
  "name": "Example Response v1",
  "status": "depreciated",
  "details": {
    "color": "blue",
    "number": 42,
    "size": "medium"
  },
  "more": False
}

The additional field status and missing field details.true are not consistant in all of the API responses and cannot be safely mapped. Time for a Softboiled dataclass:

from __future__ import annotations

import dataclasses

from softboiled import SoftBoiled


@SoftBoiled
@dataclasses.dataclass
class ExampleAPIModel:
    id: int
    name: str
    details: ExampleAPISubModel
    more: bool


@SoftBoiled
@dataclasses.dataclass
class ExampleAPISubModel:
    color: str
    number: int
    true: bool
    size: str


EXAMPLE01 = {
    "id": 1,
    "name": "Example Response v1",
    "details": {"color": "blue", "number": 42, "true": False},
    "more": False,
}

EXAMPLE02 = {
    "id": 1,
    "name": "Example Response v1",
    "status": "depreciated",
    "details": {"color": "blue", "number": 42, "size": "medium"},
    "more": False,
}

valid_model01 = ExampleAPIModel(**EXAMPLE01)
valid_model02 = ExampleAPIModel(**EXAMPLE02)

print(valid_model01)
print(valid_model02)

Output:

Type Warning: required key missing, now None 'size'
Type Warning: required key missing, now None 'true'
ExampleAPIModel(id=1, name='Example Response v1', details=ExampleAPISubModel(color='blue', number=42, true=False, size=None), more=False)
ExampleAPIModel(id=1, name='Example Response v1', details=ExampleAPISubModel(color='blue', number=42, true=None, size='medium'), more=False)

Both models will be created without errors. The extra field status will be dropped and the missing field details.true will be created with a NoneType value for valid_model02.

The Type Warning is indicating that a value was missing and replaced with None. Type-hinting the key as optional (size: Optional[str]) eliminates the warning. Giving the attribute a default assignment in the dataclass will also remove the warning as the default will be used.



Why:

Because data isn't perfect.

I started using dataclasses to model API responses. The self-constructing nature of the dataclass made the task very simple. In addition, having a model and not just a dict of the API response made the working code much cleaner.

This:

{
  "id": 1,
  "name": "Some API Response",
  "details": {
    "id": 1,
    "name": "Some details"
  }
}

Was easy to create as a data model with:

import dataclasses

@dataclasses.dataclass
class APIResponse:
    id: int
    name: str
    details: APIDetails

@dataclasses.dataclass
class APIDetails:
    id: int
    name: str

mapped_response = APIResponse(**json_response)

Of course this is vastly simplified. The API schema I started working with here were dozens of key/value pairs long with nested arrays and objects. But the pattern was there and it worked! The ** unpacking took care of the parameters and dataclasses did all the work. I could even bring the model back to a dict form with dataclasses.asdict(mapped_response).

The issues started when the API I was working with would add key/value pairs that were not in the official documentation. Some objects gained key/value data depending on how they'd been used. It wasn't important information but it broke the model with a single error:

TypeError: __init__() got an unexpected keyword argument '[keyname]'

The concept of the solution to this is straight-forward: Scrub your data before you create the dataclass instance. dataclasses even has helper methods to facilitate this with fields(). Just remove what isn't expected before creating the model. Easy to apply at the top level APIResponse in my simple example. But how to apply that cleaning logic at the nested APIDetails?

The immediate solution seemed to be not to use the built-in __init__ of the dataclass. Instead, define my own __init__ which accounted for extra values by ignoring them. This quickly lead to bulky __init__ defs in the dataclass definition with duplicated code in each new dataclass model. There had to be a more programmatic solution.

That lead me to SoftBoiled. A decorater for dataclasses. Once wrapped, the dataclass has the incoming key/value data scrubbed. Extra pairs are removed to avoid the TypeError. Missing pairs are added with a value of None, if they don't have a default assignment. Nested Dataclasses are treated with the same care.

This leaves creating the model as simple as defining the structure of the dataclass and then unpacking an API's JSON response into it.



Local developer installation

It is highly recommended to use a virtual environment (venv) for installation. Leveraging a venv will ensure the installed dependency files will not impact other python projects.

Clone this repo and enter root directory of repo:

$ git clone https://github.com/preocts/softboiled
$ cd softboiled

Create and activate venv:

# Linux/MacOS
python3 -m venv venv
. venv/bin/activate

# Windows
python -m venv venv
venv\Scripts\activate.bat
# or
py -m venv venv
venv\Scripts\activate.bat

Your command prompt should now have a (venv) prefix on it.

Install editable library and development requirements:

# Linux/MacOS
pip install -r requirements-dev.txt
pip install --editable .

# Windows
python -m pip install -r requirements-dev.txt
python -m pip install --editable .
# or
py -m pip install -r requirements-dev.txt
py -m pip install --editable .

Install pre-commit hooks to local repo:

pre-commit install
pre-commit autoupdate

Run tests

tox

To exit the venv:

deactivate

Makefile

This repo has a Makefile with some quality of life scripts if your system supports make.

  • install : Clean all artifacts, update pip, install requirements with no updates
  • update : Clean all artifacts, update pip, update requirements, install everything
  • clean-pyc : Deletes python/mypy artifacts
  • clean-tests : Deletes tox, coverage, and pytest artifacts
  • build-dist : Build source distribution and wheel distribution

softboiled's People

Contributors

pre-commit-ci[bot] avatar preocts avatar

Stargazers

 avatar

Watchers

 avatar  avatar

softboiled's Issues

`self` being a part of the loaded data will break things

Likely need to break the self convention in the __call__ method of SoftBoiled to account for the possible existence of a self key in the input key/values.

Alternative is to force the user to sanitize the input and that just spits in the face of way SoftBoiled was created.

dataclasses appear to handle an attribute named self just fine though the linters do get cranky.

What is up with `annotations`

Without from __future__ import annotations the registering system of decorated classes appears to be breaking.

Duplicate: remove annotations import from tests and watch it fail. Inner dataclasses are not being built.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.