marshmallow-code / marshmallow Goto Github PK

View Code? Open in Web Editor NEW

7.0K 80.0 625.0 5.83 MB

A lightweight library for converting complex objects to and from simple Python datatypes.

Home Page: https://marshmallow.readthedocs.io/

License: MIT License

Python 100.00%

serialization deserialization validation python marshalling python-3 serde schema

marshmallow's People

Contributors

Stargazers

Watchers

Forkers

wcmckee jhardies gisce postmates kalasjocke pombredanne deacondesperado hcarvalhoalves asteinlein npozhar ecarreras guillejb ketouem yograterol greedo xlevus jmcarp amikholap andrewjshults malexer ch00k vesauimonen lextoumbourou 0xdca hakjoon 3rdcycle ryanlowe0 mirko taion lustdante aganezov juanrossi atrexel arowla psalami kelvinhammond onjin kevinastone martinstein alexmorken ivanhernandezplangrid sheeo bartaelterman vladimirpal quxiaolong1504 mwstobo bjmc praveen-p sloat floqqi bachmann1234 etataurov dgilland evgeny-sureev mackjoner pedro-sa imhoffd antonrh agronholm rmoorman freylis nelfin keyflow tedmiston eprikazc shayanarmanpercolate erlingbo zbristow justanr ewang carlos-alberto damianheard yetone maximkulkin d-sutherland daodaoliang shankarganesh svisser xrstf firdaus noscripter kidaa tdevelioglu priestd09 mooc0w daniloakamine tcmitche cmanallen bereal resalisbury vuonghv tuanquanghpvn archelyst sm11963 ericb immerrr yumike iiogmgo russelldavies brandonhoffman

marshmallow's Issues

Serialized SQLAlchemy Query resulting in empty dictionaries or an error

Using sqlalchemy, flask & marshmallow.
I have an issue when Serializing an sqlalchemy query.
I seem to get empty dictionaries when I serialize the query result.
If I try to specify column names to serialize then it errors.
Using a query such as:

modules = db.session.query(Hosts.hostname, Modules.name, HostMatrix.enabled).filter(Hosts.hostname == host).all()

To recreate:

import sqlalchemy
from marshmallow import Serializer

modules = [sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 1)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 0)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 0)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 1))]

Serializer(modules, many=True).data

[OrderedDict(), OrderedDict(), OrderedDict(), OrderedDict()]

Serializer(modules, only=('name', 'enabled'), many=True).data

Traceback (most recent call last):
File "", line 1, in
Serializer(modules, only=('name', 'enabled'), many=True).data
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 193, in init
self._update_fields(obj)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 294, in _update_fields
ret = self.__filter_fields(self.only)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 359, in __filter_fields
attribute_type = type(obj_dict[key])
TypeError: tuple indices must be integers, not str

I also created a class to specify the fields to include but that results in the same error.

Deserialization errors not accessible from exception

I'd like to parse incoming request data using schema.load() and have my framework handle any parsing errors.

It seems that returning a clear error message to the user is hard to do, because I cannot figure out how to know from an UnmarshallingError which field actually failed.

pprint should be recursive

If any dictionaries within a nested dictionary is an ordereddict, it too should be pretty-printed

Allow field's default to be a callable

Example:

fields.Datetime(default=datetime.datetime.utcnow)

Allow nesting serializers within themselves

Always pass single instance to a data handler

Currently, data handler functions are passed the serialized data, as is. This means that if you pass many=True when serializing data, you have to handle a list instead of a single dictionary.j

class AuthorSerializer(Serializer):
    first = fields.String()
    last = fields.String()

@AuthorSerializer.data_handler
def add_fullname(ser, data, obj):
    if ser.many: # data is a list
        for each in data:
            data['fullname'] = ' '.join(data['first'], data['last'])
    else:
        data['fullname'] = ' '.join(data['first'], data['last'])
    return data

It may be more user-friendly to always pass a single dictionary to the data handler function and have marshmallow handle the many parameter automatically. So the following code would work whether you serialize a list or to a single dict:

class AuthorSerializer(Serializer):
    first = fields.String()
    last = fields.String()

@AuthorSerializer.data_handler
def add_fullname(ser, data, obj):
    data['fullname'] = ' '.join(data['first'], data['last'])
    return data

skip_missing is not working for fields of type String

It seems that skip_missing option is working only if field will have None value.
In case if input dict will not have some key which is declared as String type we will have and empty string in result.

Sample code:

class UserSchema(Schema):
    first = String()
    last = String()

    class Meta:
        skip_missing = True


test_data = dict(
    first='Name',
)

sch = UserSchema()
print sch.dump(test_data)

Field value incorrect with many=True option.

Hi there,

First let me apologize by stating that my attempts to create a small, reproducible example have failed. I'm hoping to instead provide examples of what I'm seeing and perhaps you'll be able to tell me what I'm doing incorrectly!

Serialize two items of a list, individually:

(Pdb) EventSerializer(events[0]).data
OrderedDict([('event_id', 11), ('index', None), ('contact_id', 1), ('profile_id', None), ('action', 'updated'), ('type', 'contact')])
(Pdb) EventSerializer(events[1]).data
OrderedDict([('event_id', 13), ('index', None), ('contact_id', None), ('profile_id', 2),('action', 'added'), ('type', 'profile')])

Notice how the 'contact_id' key is 1 in the first, and None in the second. This is as I'd expect. Now, when I serialize the entire list as a whole:

(Pdb) EventSerializer(events, many=True).data
[OrderedDict([('event_id', 11), ('index', None), ('contact_id', 1), ('profile_id', None),  ('action', 'updated'), ('type', 'contact')]), OrderedDict([('event_id', 13), ('index', None), ('contact_id', 0), ('profile_id', 2), ('action', 'added'), ('type', 'profile')])]

Notice that now the 'contact_id' of the second list element is now 0 and not None. Odd!

My serializer definition looks like this:

class EventSerializer(Serializer):
    action = fields.Method('action_to_text')
    type = fields.Method('type_to_text')
    event_id = fields.Integer(attribute='id')

    class Meta:
        fields = ['event_id', 'action', 'profile_id', 
                  'index', 'contact_id', 'type']

    def action_to_text(self, obj):
        return ActionType.to_text(obj.action)

    def type_to_text(self, obj):
        return EventType.to_text(obj.type)

I'm probably missing something obvious...

Thank you for your time!

mock.Mock objects don't serialize correctly

This has been a known bug for a while; finally posting it here.

Mock objects from the mock package (or Py3's unittest.mock) are not serialized correctly.

from unittest.mock import Mock
from marshmallow import Schema, fields, pprint

class UserSchema(Schema):
    name = fields.Str()
    email = fields.Email()

schema = UserSchema()
mock_user = Mock()
mock_user.email = 'hi guys'
pprint(schema.dump(mock_user).data)
# {"name": "<Mock name='mock.name' id='4379527880'>", "email": null}

Error using "only" parameter with tuple in Serializer.init

You have an error which seems to be the result of accidentally iterating over a string as if it were an array. Here is the code to produce the error:

from marshmallow import Serializer
from marshmallow import fields


class UserInputSerializer(Serializer):
    email = fields.String()
    username = fields.String()


json = {"email": "blah"}

user = UserInputSerializer(json, only=('email'))

Error generated:

> Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<string>", line 12, in <module>
  File "/Users/miles/.../marshmallow/serializer.py", line 193, in __init__
    self._update_fields(obj)
  File "/Users/miles/.../marshmallow/serializer.py", line 294, in _update_fields
    ret = self.__filter_fields(self.only)
  File "/Users/miles/.../marshmallow/serializer.py", line 362, in __filter_fields
    '"{0}" is not a valid field for {1}.'.format(key, self.obj))
AttributeError: "e" is not a valid field for {'email': 'blah'}.

For diagnostic purposes, consider the following code (it shouldn't run -- but it should give a different error). If I change the name of the "email" field to "e", like so:

from marshmallow import Serializer
from marshmallow import fields


class UserInputSerializer(Serializer):
    e = fields.String()
    username = fields.String()


json = {"email": "blah"}

user = UserInputSerializer(json, only=('email'))

... and run this script, I get a similar error (notice the difference is "m" not "e"):

>>> Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "<string>", line 12, in <module>
  File "/Users/miles/.../marshmallow/serializer.py", line 193, in __init__
    self._update_fields(obj)
  File "/Users/miles/.../marshmallow/serializer.py", line 294, in _update_fields
    ret = self.__filter_fields(self.only)
  File "/Users/miles/.../marshmallow/serializer.py", line 362, in __filter_fields
    '"{0}" is not a valid field for {1}.'.format(key, self.obj))
AttributeError: "m" is not a valid field for {'email': 'blah'}.

If I change the field name to simply "e", and pass only=('email') , the code does not generate an error:

from marshmallow import Serializer
from marshmallow import fields

class UserInputSerializer(Serializer):
    e = fields.String()
    username = fields.String()


json = {"e": "blah"}

user = UserInputSerializer(json, only=('e'))

(No error)

The good news is, it seems like the problem is only with tuples. The following code, using a list for the parameter, executes with no errors:

from marshmallow import Serializer
from marshmallow import fields


class UserInputSerializer(Serializer):
    email = fields.String()
    username = fields.String()


json = {"email": "blah"}

user = UserInputSerializer(json, only=['email'])

Ths problem appears to be in the __filter_fields function of Serializer.py.

Please let me know if this is expected behavior and I'm doing something wrong...

Creating additional fields on the fly

I'm trying to create a very flexible serializer, such that users can generate additional fields in the future. Let's say that today they only need the defaults I've provided

class PostSerializer(Serializer):
    id = fields.String()
    title = fields.String(default="Untitled")
    body = fields.String(default=None)
    author = fields.List(fields.String)

The user creates several posts, and they decide they want a field for "category." I provide an interface where they set a new category field. Now perhaps I store this field in a dictionary.

additional_fields = {
    "category" : "list"
}

When I modify the serializer on the fly (the only way that seems to work is via Meta.additional, setattr never seems to work)

s = PostSerializer
PostSerializer.Meta.additional = additional_fields.keys()

Posts which were created without the 'category' field will cause the following AttributeError:

AttributeError: "category" is not a valid field for {'id': '123456', 'title': 'Cool Post', 'body': 'Lorem Ipsum...', 'author': ['John', 'Steve']}

How can I maintain flexibility to add user generated fields, but also protect myself in the future? Is there a way to set a global default for additional fields?

Can't serialize a nested serializer instance with Meta options

Example:

class UserSerializer(Serializer):
    class Meta:
        fields = ('id', 'name')

class BlogSerializer(Serializer):
    title = fields.String()
    user = fields.Nested(UserSerializer())

However,

    user = fields.Nested(UserSerializer)

works fine.

[discuss] Validation behavior during deserialization vs. serialization

Is it ok that required fields doesn't work in load() method?

From quickstart example:

class UserSchema(Schema):
    name = fields.String(required=True)
    email = fields.Email()

user = {'name': None, 'email': '[email protected]'}
data, errors = UserSchema().dump(user)
errors  # {'name': 'Missing data for required field.'}

user = {'name': None, 'email': '[email protected]'}
data, errors = UserSchema().load(user)
errors  # {}

I thought that load() method is used for loading model objects from input data and SHOULD support required fields. On the contrary, method dump() is used to serialize inner data and not requires validation at all. Whether I understand everything correctly?

Add option for generating envelops

This should be an option for the Schema class. There should be an option like MySchema(envelope="things") to wrap the generated output/assume an envelope on the input like this:

schema = AlbumSchema(envelope="album")
result = schema.dump(album)
pprint(result.data, indent=2)
# {'album': 
#   { 'artist': {'name': 'David Bowie'},
#     'release_date': '1971-12-17',
#     'title': 'Hunky Dory'}

The reasoning is partly security (http://flask.pocoo.org/docs/0.10/security/#json-security) though this is becoming outdated and partly because some APIs actually work like this. I think this is a proper thing marshmallow should have.

Required fields

I can't seem to find in the docs if there is a way to make certain fields required? Is there no present implementation that marks the is_valid call as invalid if a certain field is missing? Would be willing to contrib 😄

Allow `dump` to receive `many` parameter

Improved error message when many omitted

I ran into this when I accidentally omitted many=True in my serializer when it was actually a many relation.

Traceback (most recent call last):
  File "./shell.py", line 26, in <module>
    serializers.MyBSerializer(b).data
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 183, in __init__
    raw_data = self.marshal(self.obj, self.fields, many=self.many)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/fields.py", line 106, in marshal
    item = (key, field_obj.output(attr_name, data))
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/fields.py", line 306, in output
    self.serializer._update_fields(nested_obj)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 234, in _update_fields
    ret = self.__filter_fields(field_names)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 284, in __filter_fields
    print('type(obj_dict[key]): ', type(obj_dict[key])) # Error as obj_dict is a query, not an ORM
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/sqlalchemy/orm/dynamic.py", line 255, in __getitem__
    return self._clone(sess).__getitem__(index)
  File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/sqlalchemy/orm/query.py", line 2206, in __getitem__
    return list(self[item:item + 1])[0]
TypeError: Can't convert 'int' object to str implicitly

This happens at https://github.com/sloria/marshmallow/blob/dev/marshmallow/serializer.py#L280

Perhaps this exception handling should also handle TypeError and then (using the value of self.many) raise an informative message. Basically a suggestion that many=True may have been omitted.

Minimal test case:

class MyB(db.Model):
    __tablename__ = 'myb'
    id = db.Column(db.Integer, primary_key=True)

    myas = db.relationship('MyA', backref='myb', lazy='dynamic')


class MyA(db.Model):
    __tablename__ = 'mya'
    id = db.Column(db.Integer, primary_key=True)
    myb_id = db.Column(db.Integer, db.ForeignKey('myb.id'),
                       nullable=False)


class MyASerializer(Serializer):
    class Meta:
        fields = ('id', 'myb_id')


class MyBSerializer(Serializer):
    myas = fields.Nested(MyASerializer)              # Accidentally Broken
    # myas = fields.Nested(MyASerializer, many=True) # Correct

    class Meta:
        fields = ('id', 'myas')


b = models.MyB()
db.session.add(b)

a1 = models.MyA(myb=b)
a2 = models.MyA(myb=b)
db.session.add(a1)
db.session.add(a2)

db.session.commit()

serializers.MyBSerializer(b).data

MongoEngine model instances always marshal as a list

The way that marshmallow checks to see if it should marshal a list is perhaps a bit error prone when dealing with object instances that implement the __iter__ magic method. For example, MongonEngine document instances implement this method. So any time I try to serialize a MongoEngine document instance it always returns a list. The first thing that came to mind was a flag for the serializer constructor that would force a single instance. However, I'm wary of suggesting you should pollute that space with more args.

Another approach I just thought of is to make the Serializer._marshal property configurable by passing one to the constructor. Otherwise, use the default implementation.

Non ValidationError Exceptions still get silently put into errors list

For instance, if I forget importing ValidationError itself, I get a list like this:

{ "username": [ "'Marshmallow' object has no attribute 'ValidationError'" ] }

Now, this should obviously this exception should not be caught by Marshmallow. I'm not sure why that even works. Any idea whether this is a bug or a problem on my side?

My code:

def duplicate_email_validator(email):
    <logic>
    raise ma.ValidationError("Email already exists")

class UserInputSchema(ma.Schema):
    username = ma.Email(validate=duplicate_email_validator, required=True)
    password = ma.String(required=True)

result, errors = UserInputSchema(strict=False).load(request.json)

Remove legacy API

In version 2.0, the pre-1.0 legacy API will be completely removed from the codebase.

This includes:

EDIT: Updated checklist based on comments.

With SQLAlchemy, related objects are always loaded, regardless of Meta.fields, only param, or exclude param

SQLAlchemy triggers a query when the attribute is accessed with getattr in utils.to_marshallable_type

I know that it is done this way to keep it non-specific to one ORM -- but would it be possible to pass one (or all) of the Meta.fields, only, or exclude lists to this function?

I don't have time to investigate further at the moment, but I'll try to work up a patch tonight.

Date format class Meta option

Add a class Meta option that specifies the format for every DateTime field in a serializer.

Keep fields order

Fields are returned in a random order after marshalling. I would not pay attention to it, if the return type was not OrderedDict. Why use OrderedDict, if the fields are still returned in random order?

I think the problem for this is the use of unordered set here.

It would be great if the fields returned in the order in which they are declared in serializer. It is much prettier for RESTful APIs.

pprint displays booleans incorrectly

Because marshmallow's pprint function json-encodes OrderedDicts, booleans display as Javascript booleans, with lowercase letters.

from collections import OrderedDict
from marshmallow import pprint
>>> d = OrderedDict([('foo', True), ('bar', False)])
>>> pprint(d)
{"foo": true, "bar": false}

Add examples that use deserialization

Nested class can't process dict

Hello~ at first, I'm sorry for my English

Why Nested field can't process dict? only accept instance it's inherited object.

class Book(object):
    title = ''
    author = ''

class BookSerializer(Serializer):
    title = fields.String()
    author = fields.String()

class BookList(object):
    items = list()

class BookListSerializer(Serializer):
    items = fields.Nested(BookSerializer, many=True)

The solo BookSerializer class accept two types well. object and dic.
it's not problems.
for example

# 1st case which using object.
book = Book()
book.title = 'hello android'
book.author = 'leejaycoke'
return jsonify(BookSerializer(book).data)

# 2st case which using dict.
book = {'title': 'hello android', 'author': 'leejaycoke'}
return jsonify(BookSerializer(book).data)

But Nested fields can't accept dic for listing books but object is ok.
for example

# 1st case which using object
book1 = {'title': 'hello android', 'author': 'leejaycoke'}
book2 = {'title': 'hello iOS', 'author': 'tommy'}
book_list = BookList()
book.items = [book1, book2]
return jsonify(BookListSerializer(book).data)
"""
{
    "items": [
        {
            "title": "hello android",
            "author": "leejaycoke"
        },
        {
            "title": "hello iOS",
            "author": "tommy"
        }
    ]
}
"""

# 2st case which using dict it's failed
book1 = {'title': 'hello android', 'author': 'leejaycoke'}
book2 = {'title': 'hello iOS', 'author': 'tommy'}
book = {'items': [book1, book2]}
return jsonify(BookListSerializer(book).data)
"""
TypeError: Could not marshal nested object due to error:
"'builtin_function_or_method' object is not iterable"
If the nested object is a collection, you need to set "many=True".\
"""

can you help me?

using an attribute named items when serializing a dict

When serializing a dict with an attribute called "items"
the fails to get the correct "items" value and instead gets the items function of the dict object.
The problem is in utils line 298:
if isinstance(key, basestring) and hasattr(obj, key):
hasattr(obj,key) == True.

Better handling of "two-way" nesting

Having two serializers that nest each other is quite awkward. For example, for many-to-one relationship between Books and Authors, you'd have to do something like the following:

class BaseBookMarshal(Serializer):
    date_created = fields.DateTime()
    isbn = fields.String()

class AuthorMarshal(Serializer):
    created = fields.DateTime(attribute='date_created')
    books = fields.Nested(BaseBookMarshal, many=True)

class BookMarshal(BaseBookMarshal):
    author = fields.Nested(AuthorMarshal, allow_null=True)

While this certainly works, having to create the extra BaseBookMarshal class is a bit clunky. It would be nice if you could declare nested serializers without having to worry about declaration order, and just pass class names into the Nested field

class AuthorMarshal(Serializer):
    created = fields.DateTime(attribute='date_created')
    books = fields.Nested('BookMarshal', many=True)

class BookMarshal(Serializer):
    author = fields.Nested('AuthorMarshal', allow_null=True)
    date_created = fields.DateTime()
    isbn = fields.String()

I'm still undecided on whether this is a good idea. Not only with this require more metaclass magicks, but it would necessarily involve implicit removal of fields in order to prevent infinite recursion.

AttributeError when using extra alongside a collection

Say I pass in a list of objects as well as many=True as the obj, if I also pass in extra it treats it as it would a dict, calling update.

https://github.com/sloria/marshmallow/blob/b0f8da7ce8987cbf389b306ac35e992abade812a/marshmallow/serializer.py#L187-L188

[question] Skip missing fields instead of default values

I have tried to find answer myself but failed - it seems it is not supported right now.

In case if I've missed it: is it possible to skip missing fields instead of assigning default values during serialization?

I will try to describe it in example:

some_data = dict(
    first_name='Joe',
    age=20,
)


class TestSchema(Schema):
    first_name = String()
    family_name = String()
    age = Integer()


schema = TestSchema()
print(schema.dump(some_data).data)

Current result: OrderedDict([('first_name', u'Joe'), ('family_name', ''), ('age', 20)])
Desired result: OrderedDict([('first_name', u'Joe'), ('age', 20)])

Of course it is possible to filter the result afterwards. Although it is quite tricky due to the different default values (i.e. for strings, integers) but possible and I've already done it.

I am just curious if I've missed some core functionality.

Setting default on Nested item

For nested items, with many=True, not only do I want to allow null, but if the field is in fact null, I'd like to return an empty array using default.

This will allow API users to ignore checks for null.

Is there a way to accomplish this with the current feature set?

Better namespacing in class registry.

"Namespaces are one honking great idea -- let's do more of those!"

Currently, class registry uses a global dictionary. This works. However, when developing a versioned API (my current situation), this will potentially lead to schema names like V1_SomeSchema or SomeSchema_V2.

I understand why the class registry was implemented in this fashion. However, I'm proposing one of two changes.

Easy: Add a schema_group or similar named attribute on schemas that groups schemas into...well, groups. With schema_group defaulting to something sane (such as default or base). This could be implemented on either the actual schema or (even better) on the Meta options for the Schema.

Example

# v1/schemas/__init__.py:
SomeSchema(BaseSchema):
    class Meta:
        schema_group = 'v1'

# v2/schemas/__init__.py:
SomeSchema(BaseSchema):
    class Meta:
        schema_group = 'v2'

And then _registry would resemble:

{
    'v1' : 
        { 'SomeSchema' : [v1.schemas.SomeSchema] },
    'v2' : 
        { 'SomeSchema' : [v2.schemas.SomeSchema] }
}

Of course, class_registry.get_class and how it's implemented with things such as fields.Nested will also have to change to accommodate this change as well.

Harder: Some how create instances of the registry and explicitly pass them around or some how tie them to Schemas (think SQLAlchemy's metadata object). This would be more difficult to implement as things like class_registry would need to change completely. Again, the most likely home for this would be on the Meta class:

v1_reg = Registry()
SomeSchema(BaseSchema):
    class Meta:
        registry = v1_reg

handle complex object from joins

Based on your example
https://github.com/sloria/marshmallow/blob/dev/examples/flask_example.py

What if we get Authors from the db along with all quotes related!
something like this query:
authorQuotes = session.query(Author, Quote).join(Quote.author ).filter(Author.id == 1).all()

How would we serialize this object?

Object deserialization?

Make attribute getter function configurable

Currently, marshmallow.utils.get_value is used pull values from many different types of objects (both simple and complex types).

It may be useful to override this behavior, e.g. via a class Meta option, when you know exactly what type of objects you will be serializing and how to pull values from them.

I see two use cases for this:

Handling objects that get_value will not work with
Optimizing serialization

Support for read-only fields

It would be nice to have the ability to mark fields as read-only. When deserializing, validation should fail if field(s) marked as read-only are present in the target dictionary.

[feature] Schema-level validation

I couldn't find any way to do Schema level validation in the docs. I mean something to do the sort of validations that involve multiple fields (e.g. if field A is less than 10 then field B must be greater than 50). Any suggestion?
I found Schema.data_handler for data post-processing. Is there something for data pre-processing? For instance: remove all the spaces from a string before validation, or more generally, apply a function to the data before validation (but after deserialization).

Thanks

Factory function for creating a serialization function?

I've been toying around with the idea of a factory that allows you to generate serialization functions

serialize_user = UserSerializer.factory()
serialize_user(user)  # {'name': 'Steve Loria' ...}

# Pass in default params
serialize_user = UserSerializer.factory(strict=True)
serialize_user(invalid_user)  # MarshallingError

Inconsistent behavior between dump(obj, many=True) and Schema(many=True)

Doing

    schema = MySchema(many=True)
    print(schema.dump(mythings))

I get the correct behavior and everything works fine. However, doing

    schema = MySchema()
    print(schema.dump(mythings, many=True))

results in

myfile.py:29: in get print(schema.dump(mythings, many=True)) env/lib/python3.4/site-packages/marshmallow/schema.py:435: in dump self._update_fields(obj) env/lib/python3.4/site-packages/marshmallow/schema.py:583: in _update_fields ret = self.__filter_fields(field_names, obj) env/lib/python3.4/site-packages/marshmallow/schema.py:630: in __filter_fields attribute_type = type(obj_dict[key]) E TypeError: list indices must be integers, not str

Something's up here.

How to handle (polymorphic) subclasses

I'm using SQLAlchemy's polymorphic identities and have been trying to figure out how to get the UserMarshal to use the BusinessProfileMarshal if the Profile attached to the User is actually a BusinessProfile

class User(db.Model):
    profile = db.relationship('Profile', backref='users')

class Profile(db.Model):

    __mapper_args__ = {
        'polymorphic_identity': 'profile',
        'polymorphic_on': type
    }
    ...

class BusinessProfile(Profile):

    __tablename__ = 'profile_business'
    __mapper_args__ = {
        'polymorphic_identity': 'business',
    }
    ...


class UserMarshal(ma.Serializer):
    class Meta:
        fields = (
            'email',
            'profile',
        )

    profile = fields.Nested(ProfileMarshal)


class ProfileMarshal(ma.Serializer):
    class Meta:
        fields = (
            'first_name',
        )

class BusinessProfileMarshal(ma.Serializer):
    class Meta:
        fields = (
            'first_name',
            'company_name',
        )

Performance of serializing nested collections is poor

I worked up a quick test using the nose timed decorator.

class TestSerializerTime(unittest.TestCase):

    def setUp(self):
        self.users = []
        self.blogs = []
        letters = list(string.ascii_letters)

        for i in range(500):
            self.users.append(User(''.join(random.sample(letters, 15)),
                email='[email protected]', age=random.randint(10, 50)))

        for i in range(500):
            self.blogs.append(Blog(''.join(random.sample(letters, 50)),
                user=random.choice(self.users)))

    @timed(.2)
    def test_small_blog_set(self):
        res = BlogSerializer(self.blogs[:20], many=True)

    @timed(.4)
    def test_medium_blog_set(self):
        res = BlogSerializer(self.blogs[:250], many=True)

    @timed(1)
    def test_large_blog_set(self):
        res = BlogSerializer(self.blogs, many=True)

    @timed(.1)
    def test_small_user_set(self):
        res = UserSerializer(self.users[:20], many=True)

    @timed(.2)
    def test_medium_user_set(self):
        res = UserSerializer(self.users[:250], many=True)

    @timed(.5)
    def test_large_user_set(self):
        res = UserSerializer(self.users, many=True)

The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it's still rather slow.

I did a little bit more testing with profile. Serializing the whole blog collection was running between 5 and 6s.

It looks like the bottleneck is the deepcopy operation in serializer.py and it doesn't seem like the call can be removed, or changed to a pickle/unpickle operation.

I'm going to keep digging to see what I can do. If you have any insight, I'd appreciate the help. Thanks!

Change datetimes to ISO8601 format in docs

As of 1.0.0, DateTime fields serialize to ISO8601 format by default. This makes a number of the examples in the docs show incorrect output (the former default was RFC822). These examples should be updated.

Allow Field's `validate` argument to be a list

You may want to do:

some_strict_field = fields.String(validate=[func1, func2]

Deserialization support is missing

I'm not sure if it can be considered as an issue, but I think that supporting deserialization would really benefit marshmallow.
It's obvious that not every serializer can provide reverse operation, but it's true in many cases. If it's not against your view on what this library should be then I can work on extending marshmallow to support it and prepare pull request.

Allow depth > 1 when nesting a serializer within itself

Deprecate legacy serialization API

As of 1.0.0 the correct way to serialize objects is to use the Serializer.dump method.

Usage of Serializer(some_obj).data will be deprecated, as will the related Serializer.errors and Serializer.is_valid members (dump returns both the serialized data and a dictionary of errors, so these validation methods are redundant).

For the 1.0.0 release, deprecation warnings should be raised.

Serializing None with fields.Integer yields 0.0 float

When serializing a None with a field specified as Integer you get a 0.0. float back. I think this is kind of strange, I think that you either shall get 0 back, or perhaps None. I will be happy to submit a patch after discussing a bit first.

I suspect that the behavior comes from this line, as Integer inherits from Number, which has the defaults of 0.0 in here:

https://github.com/sloria/marshmallow/blob/dev/marshmallow/fields.py#L348

What do you think @sloria? I would think that serializing None would yield None back.

Email Field validation not working for Schema.dump()

Email field type validation does not appear to work when using Schema.dump() but works fine for Schema.load(). Working example included below:

from datetime import datetime
from marshmallow import Schema, fields, pprint

# model
class Person(object):
    def __init__(self, name, email):
        self.name = name
        self.email = email
        self.date_born = datetime.now()

# serializer schema
class PersonSchema(Schema):
    name = fields.String()
    email = fields.Email()
    date_born = fields.DateTime()

person = Person(name='Guido van Rossum', email='invalid-email')
schema = PersonSchema()
dumps = schema.dump(person)

print '--DUMPS--'
pprint(dumps.data)
pprint(dumps.errors)

loads = schema.load({'name': 'Guido van Rossum', 'email': 'invalid-email'})

print '--LOADS--'
pprint(loads.data)
pprint(loads.errors)

Using many=True results in problem with Date type

So I have an everyday query like:

things = Thing.query.all()
ThingSerializer(things, many=True).data

This results in contained DateTime objects getting correctly serialized while Date objects don't get serialized!

Example output:

[OrderedDict([('end_date', datetime.date(2011, 1, 4)), ('updated_at', 'Fri, 06 Jun 2014 20:59:56 -0000')])]

Note that end_date is from my SQLAlchemy declarative model and it's a Column.Date type while updated_at is a Column.DateTime type.

However, doing

thing = Thing.query.first()
ThingSerializer(thing).data

results in

OrderedDict([('end_date', '2011-01-01'), ('updated_at', 'Fri, 06 Jun 2014 20:59:56 -0000')])

Note how end_date gets serialized in this case but not in the other and how updated_at always get serialized correctly.

I suspect a typo somewhere that has to do with Date not getting tested a lot or something. Hopefully you can find the issue quickly. :D

many=True renders wrong when given nothing to render

from marshmallow import Schema, fields, pprint

class User(object):
    def __init__(self, name, email, age=None):
        self.name = name

class ChildSchema(Schema):
    name = fields.String()

class ParentSchema(Schema):
    name = fields.String()
    children = fields.Nested(ChildSchema, many=True)

user = User(name="Monty")
schema = ParentSchema()
result = schema.dump(user)
pprint(result.data)
# -> {'children': {'name': ''}, 'name': u'Monty'}

I would expect the result to be {'children': [], 'name': u'Monty'}. If I set a field to many=True, it should always be a list, no exceptions. What I'm getting instead looks like a mistake.