Giter Site home page Giter Site logo

When type hint is not recognized (e.g. `pd.Timestamp`), we recursively try to convert into a dataclass which mutates the constructor throughout the codebase about marshmallow_dataclass HOT 11 OPEN

lovasoa avatar lovasoa commented on May 16, 2024
When type hint is not recognized (e.g. `pd.Timestamp`), we recursively try to convert into a dataclass which mutates the constructor throughout the codebase

from marshmallow_dataclass.

Comments (11)

lovasoa avatar lovasoa commented on May 16, 2024 1

I think this is a request for marshmallow itself.

from marshmallow_dataclass.

dairiki avatar dairiki commented on May 16, 2024 1

I think this auto-dataclassification is too much magic, and likely to create hard-to-suss bugs.

Note that the dataclass decorator does not construct and return a new class, rather, it munges the decorated class in-place. So current behavior results in mangling other peoples code in a hidden, fairly magic, manner.

Here's a possible fix.

====

I think we can handle "simple annotated classes"(*) as in the above example without converting them to dataclasses.

For the sake of concreteness, let's say "simple annotated classes" are defined as objects, cls, for which:

  1. isinstance(cls, type)
  2. cls.__init__ is object.__init__
  3. cls.__new__ is object.__new__
  4. typing.get_type_hints(cls) returns non-empty

(This definition could perhaps be tweaked/widened a bit.)

Our class_schema currently uses the fact that its argument is a dataclass for the following purposes.

  1. We currently use dataclasses.fields to get at:

    a. field names — for "simple classes" we can get these from typing.get_type_hints instead
    b. field default values — for "simple classes", the value of the class variable (if any) will do
    c. field metadata — we can safely assume there is no metadata for simple classes

  2. We use the dataclasses constructor to construct the return value in the load method of our generated schemas. Instead, for "simple classes", we can construct them manually: create an instance by calling their __new__, then manually initialize its attributes.

So the proposed fix is to change class_schema so that:

  1. If the argument is a dataclass, do like we do now.
  2. Otherwise, if the argument is a "simple annotated class", construct a schema using the alternate methods described above (notably, without mangling the argument into a dataclass).
  3. Otherwise, raise a TypeError("do not know how to construct schema for X")

from marshmallow_dataclass.

lovasoa avatar lovasoa commented on May 16, 2024

marshmallow_dataclass allows you to have simple classes, like :

class A:
    a: int

class B:
    b: A

marshmallow_dataclass.class_schema(B)

In order to create a schema for B, it also has to create a schema for A. And in order to apply dataclasses.fields to A, it has to make it a dataclass: https://github.com/lovasoa/marshmallow_dataclass/blob/master/marshmallow_dataclass/__init__.py#L281

I don't think it's possible to fix this issue without breaking backward compatibility...
What behavior would you expect for your example above ?

from marshmallow_dataclass.

lovasoa avatar lovasoa commented on May 16, 2024

If you want a custom type in your class, the best way is to use a NewType declaration

from marshmallow_dataclass.

evanfwelch avatar evanfwelch commented on May 16, 2024

Ok I see your point. In my (and my colleagues) use case, we would already have made both A and B a dataclass (because we want all the benefits of dataclasses in our application code), and then when it comes to serializing an instance of B, we'd generate a marshmallow schema by doing:

marshmallow_dataclass.class_schema(B).dump(...)

However I guess you've primarily designed for the case where folks are using the marshmallow_dataclass.dataclass decorator to do both at once....

What if we modified the logic for checking whether B should be converted into dataclass to check if it has already has an __init__ defined. I can't think of many cases where someone has a class with a constructor and wants to make it a dataclass

from marshmallow_dataclass.

lovasoa avatar lovasoa commented on May 16, 2024

I think that would be acceptable...

from marshmallow_dataclass.

lovasoa avatar lovasoa commented on May 16, 2024

In order to avoid breaking compatibility, I am simply adding a new warning that will be displayed before marshmallow_dataclass tries to alter a class :

UserWarning: marshmallow_dataclass was called on the class Path, which is not a dataclass. It is going to try and convert the class into a dataclass, which may have undesirable side effects. To avoid this message, make sure all your classes and all the classes of their fields are either explicitly supported by marshmallow_datcalass, or are already dataclasses. For more information, see #51

To anyone who would end up here after seeing this warning: to fix this , you should make sure that all the fields of all your classes are either simple base types, or dataclasses themselves.

For instance, the following is good :

import marshmallow_dataclass
import dataclasses

@dataclasses.dataclass
class A:
    a: int

class B:
    b: A

marshmallow_dataclass.class_schema(B)

and the following is bad :

import marshmallow_dataclass

class A:
    a: int

class B:
    b: A

marshmallow_dataclass.class_schema(B)

and this is the worse, because marshmallow_dataclass will change the Path class, and break it globally in your whole application :

from pathlib import Path

class B:
    b: Path

marshmallow_dataclass.class_schema(B)

from marshmallow_dataclass.

prihoda avatar prihoda commented on May 16, 2024

Hi all, I think this is quite a serious side effect, for example, it breaks JSON encoding of classes that are handled by a custom JSON encoder, because they are now considered to be dataclasses so they are converted natively.

Can you add an even bigger warning? I didn't actually notice it because it's printed with other messages at flask startup. Also, the warning could contain a suggestion to fix it by adding a marshmallow_field to the field metadata, because that's ultimately the only way to support fields with custom non-dataclass classes, right?

from marshmallow_dataclass.

lovasoa avatar lovasoa commented on May 16, 2024

@prihoda : Could you make a PR with the changes you want to see on the message ? We'll discuss the changes there.

from marshmallow_dataclass.

mivade avatar mivade commented on May 16, 2024

I got burned by this one trying to use pathlib.Path as a type. Seeing as pathlib is seeing increased usage is there any way to have this type treated correctly? As a workaround I can make a base class like this:

class _HasPaths:
    """Workaround to ensure we get ``pathlib.Path`` types where needed."""

    def __post_init__(self) -> None:
        for fld in fields(self):

            if fld.type == Path:
                attr = getattr(self, fld.name)
                setattr(self, fld.name, Path(attr))

and have any dataclass that uses Path as a type inherit from it but this is less than ideal when pathlib.Path should probably be a type that's handled by default.

The warnings introduced in #130 were helpful in figuring this out but for several hours I was confounded by a bunch of pytest INTERNALERROR messages that were pretty impenetrable.

from marshmallow_dataclass.

phyrwork avatar phyrwork commented on May 16, 2024
  1. Otherwise, raise a TypeError("do not know how to construct schema for X")

I agree that the library should not be optimistically trying to create dataclasses out of types it does not understand. I didn't even realise the library could do that until today. Raise an error and let the user wrap with @dataclass themselves - it's so trivial.

Today I lost several hours to Path getting clobbered and not getting the warning on the console because of reasons.

I couldn't even debug it properly because the debugger was crashing trying to use the clobbered type to show values.

Maybe to keep backwards compatibility there could be some kind of strict mode flag / settable module option?

from marshmallow_dataclass.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.