marshmallow-code / marshmallow Goto Github PK
View Code? Open in Web Editor NEWA lightweight library for converting complex objects to and from simple Python datatypes.
Home Page: https://marshmallow.readthedocs.io/
License: MIT License
A lightweight library for converting complex objects to and from simple Python datatypes.
Home Page: https://marshmallow.readthedocs.io/
License: MIT License
Using sqlalchemy, flask & marshmallow.
I have an issue when Serializing an sqlalchemy query.
I seem to get empty dictionaries when I serialize the query result.
If I try to specify column names to serialize then it errors.
Using a query such as:
modules = db.session.query(Hosts.hostname, Modules.name, HostMatrix.enabled).filter(Hosts.hostname == host).all()
To recreate:
import sqlalchemy
from marshmallow import Serializer
modules = [sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 1)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 0)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 0)), sqlalchemy.util._collections.KeyedTuple((u'HostA', u'Backup', 1))]
Serializer(modules, many=True).data
[OrderedDict(), OrderedDict(), OrderedDict(), OrderedDict()]
Serializer(modules, only=('name', 'enabled'), many=True).data
Traceback (most recent call last):
File "", line 1, in
Serializer(modules, only=('name', 'enabled'), many=True).data
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 193, in init
self._update_fields(obj)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 294, in _update_fields
ret = self.__filter_fields(self.only)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/marshmallow/serializer.py", line 359, in __filter_fields
attribute_type = type(obj_dict[key])
TypeError: tuple indices must be integers, not str
I also created a class to specify the fields to include but that results in the same error.
I'd like to parse incoming request data using schema.load() and have my framework handle any parsing errors.
It seems that returning a clear error message to the user is hard to do, because I cannot figure out how to know from an UnmarshallingError which field actually failed.
If any dictionaries within a nested dictionary is an ordereddict, it too should be pretty-printed
Example:
fields.Datetime(default=datetime.datetime.utcnow)
Currently, data handler functions are passed the serialized data, as is. This means that if you pass many=True
when serializing data, you have to handle a list instead of a single dictionary.j
class AuthorSerializer(Serializer):
first = fields.String()
last = fields.String()
@AuthorSerializer.data_handler
def add_fullname(ser, data, obj):
if ser.many: # data is a list
for each in data:
data['fullname'] = ' '.join(data['first'], data['last'])
else:
data['fullname'] = ' '.join(data['first'], data['last'])
return data
It may be more user-friendly to always pass a single dictionary to the data handler function and have marshmallow handle the many
parameter automatically. So the following code would work whether you serialize a list or to a single dict:
class AuthorSerializer(Serializer):
first = fields.String()
last = fields.String()
@AuthorSerializer.data_handler
def add_fullname(ser, data, obj):
data['fullname'] = ' '.join(data['first'], data['last'])
return data
It seems that skip_missing option is working only if field will have None value.
In case if input dict will not have some key which is declared as String type we will have and empty string in result.
Sample code:
class UserSchema(Schema):
first = String()
last = String()
class Meta:
skip_missing = True
test_data = dict(
first='Name',
)
sch = UserSchema()
print sch.dump(test_data)
Hi there,
First let me apologize by stating that my attempts to create a small, reproducible example have failed. I'm hoping to instead provide examples of what I'm seeing and perhaps you'll be able to tell me what I'm doing incorrectly!
Serialize two items of a list, individually:
(Pdb) EventSerializer(events[0]).data
OrderedDict([('event_id', 11), ('index', None), ('contact_id', 1), ('profile_id', None), ('action', 'updated'), ('type', 'contact')])
(Pdb) EventSerializer(events[1]).data
OrderedDict([('event_id', 13), ('index', None), ('contact_id', None), ('profile_id', 2),('action', 'added'), ('type', 'profile')])
Notice how the 'contact_id' key is 1 in the first, and None in the second. This is as I'd expect. Now, when I serialize the entire list as a whole:
(Pdb) EventSerializer(events, many=True).data
[OrderedDict([('event_id', 11), ('index', None), ('contact_id', 1), ('profile_id', None), ('action', 'updated'), ('type', 'contact')]), OrderedDict([('event_id', 13), ('index', None), ('contact_id', 0), ('profile_id', 2), ('action', 'added'), ('type', 'profile')])]
Notice that now the 'contact_id' of the second list element is now 0 and not None. Odd!
My serializer definition looks like this:
class EventSerializer(Serializer):
action = fields.Method('action_to_text')
type = fields.Method('type_to_text')
event_id = fields.Integer(attribute='id')
class Meta:
fields = ['event_id', 'action', 'profile_id',
'index', 'contact_id', 'type']
def action_to_text(self, obj):
return ActionType.to_text(obj.action)
def type_to_text(self, obj):
return EventType.to_text(obj.type)
I'm probably missing something obvious...
Thank you for your time!
This has been a known bug for a while; finally posting it here.
Mock
objects from the mock
package (or Py3's unittest.mock
) are not serialized correctly.
from unittest.mock import Mock
from marshmallow import Schema, fields, pprint
class UserSchema(Schema):
name = fields.Str()
email = fields.Email()
schema = UserSchema()
mock_user = Mock()
mock_user.email = 'hi guys'
pprint(schema.dump(mock_user).data)
# {"name": "<Mock name='mock.name' id='4379527880'>", "email": null}
You have an error which seems to be the result of accidentally iterating over a string as if it were an array. Here is the code to produce the error:
from marshmallow import Serializer
from marshmallow import fields
class UserInputSerializer(Serializer):
email = fields.String()
username = fields.String()
json = {"email": "blah"}
user = UserInputSerializer(json, only=('email'))
Error generated:
> Traceback (most recent call last):
File "<console>", line 1, in <module>
File "<string>", line 12, in <module>
File "/Users/miles/.../marshmallow/serializer.py", line 193, in __init__
self._update_fields(obj)
File "/Users/miles/.../marshmallow/serializer.py", line 294, in _update_fields
ret = self.__filter_fields(self.only)
File "/Users/miles/.../marshmallow/serializer.py", line 362, in __filter_fields
'"{0}" is not a valid field for {1}.'.format(key, self.obj))
AttributeError: "e" is not a valid field for {'email': 'blah'}.
For diagnostic purposes, consider the following code (it shouldn't run -- but it should give a different error). If I change the name of the "email" field to "e", like so:
from marshmallow import Serializer
from marshmallow import fields
class UserInputSerializer(Serializer):
e = fields.String()
username = fields.String()
json = {"email": "blah"}
user = UserInputSerializer(json, only=('email'))
... and run this script, I get a similar error (notice the difference is "m" not "e"):
>>> Traceback (most recent call last):
File "<console>", line 1, in <module>
File "<string>", line 12, in <module>
File "/Users/miles/.../marshmallow/serializer.py", line 193, in __init__
self._update_fields(obj)
File "/Users/miles/.../marshmallow/serializer.py", line 294, in _update_fields
ret = self.__filter_fields(self.only)
File "/Users/miles/.../marshmallow/serializer.py", line 362, in __filter_fields
'"{0}" is not a valid field for {1}.'.format(key, self.obj))
AttributeError: "m" is not a valid field for {'email': 'blah'}.
If I change the field name to simply "e", and pass only=('email')
, the code does not generate an error:
from marshmallow import Serializer
from marshmallow import fields
class UserInputSerializer(Serializer):
e = fields.String()
username = fields.String()
json = {"e": "blah"}
user = UserInputSerializer(json, only=('e'))
(No error)
The good news is, it seems like the problem is only with tuples. The following code, using a list for the parameter, executes with no errors:
from marshmallow import Serializer
from marshmallow import fields
class UserInputSerializer(Serializer):
email = fields.String()
username = fields.String()
json = {"email": "blah"}
user = UserInputSerializer(json, only=['email'])
Ths problem appears to be in the __filter_fields function of Serializer.py.
Please let me know if this is expected behavior and I'm doing something wrong...
I'm trying to create a very flexible serializer, such that users can generate additional fields in the future. Let's say that today they only need the defaults I've provided
class PostSerializer(Serializer):
id = fields.String()
title = fields.String(default="Untitled")
body = fields.String(default=None)
author = fields.List(fields.String)
The user creates several posts, and they decide they want a field for "category." I provide an interface where they set a new category field. Now perhaps I store this field in a dictionary.
additional_fields = {
"category" : "list"
}
When I modify the serializer on the fly (the only way that seems to work is via Meta.additional, setattr
never seems to work)
s = PostSerializer
PostSerializer.Meta.additional = additional_fields.keys()
Posts which were created without the 'category' field will cause the following AttributeError:
AttributeError: "category" is not a valid field for {'id': '123456', 'title': 'Cool Post', 'body': 'Lorem Ipsum...', 'author': ['John', 'Steve']}
How can I maintain flexibility to add user generated fields, but also protect myself in the future? Is there a way to set a global default for additional fields?
Example:
class UserSerializer(Serializer):
class Meta:
fields = ('id', 'name')
class BlogSerializer(Serializer):
title = fields.String()
user = fields.Nested(UserSerializer())
However,
user = fields.Nested(UserSerializer)
works fine.
Is it ok that required fields doesn't work in load()
method?
From quickstart example:
class UserSchema(Schema):
name = fields.String(required=True)
email = fields.Email()
user = {'name': None, 'email': '[email protected]'}
data, errors = UserSchema().dump(user)
errors # {'name': 'Missing data for required field.'}
user = {'name': None, 'email': '[email protected]'}
data, errors = UserSchema().load(user)
errors # {}
I thought that load()
method is used for loading model objects from input data and SHOULD support required
fields. On the contrary, method dump()
is used to serialize inner data and not requires validation at all. Whether I understand everything correctly?
This should be an option for the Schema class. There should be an option like MySchema(envelope="things") to wrap the generated output/assume an envelope on the input like this:
schema = AlbumSchema(envelope="album")
result = schema.dump(album)
pprint(result.data, indent=2)
# {'album':
# { 'artist': {'name': 'David Bowie'},
# 'release_date': '1971-12-17',
# 'title': 'Hunky Dory'}
The reasoning is partly security (http://flask.pocoo.org/docs/0.10/security/#json-security) though this is becoming outdated and partly because some APIs actually work like this. I think this is a proper thing marshmallow should have.
I can't seem to find in the docs if there is a way to make certain fields required? Is there no present implementation that marks the is_valid
call as invalid if a certain field is missing? Would be willing to contrib ๐
I ran into this when I accidentally omitted many=True
in my serializer when it was actually a many relation.
Traceback (most recent call last):
File "./shell.py", line 26, in <module>
serializers.MyBSerializer(b).data
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 183, in __init__
raw_data = self.marshal(self.obj, self.fields, many=self.many)
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/fields.py", line 106, in marshal
item = (key, field_obj.output(attr_name, data))
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/fields.py", line 306, in output
self.serializer._update_fields(nested_obj)
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 234, in _update_fields
ret = self.__filter_fields(field_names)
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/marshmallow/serializer.py", line 284, in __filter_fields
print('type(obj_dict[key]): ', type(obj_dict[key])) # Error as obj_dict is a query, not an ORM
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/sqlalchemy/orm/dynamic.py", line 255, in __getitem__
return self._clone(sess).__getitem__(index)
File "/Users/dpwrussell/.virtualenvs/accounts/lib/python3.3/site-packages/sqlalchemy/orm/query.py", line 2206, in __getitem__
return list(self[item:item + 1])[0]
TypeError: Can't convert 'int' object to str implicitly
This happens at https://github.com/sloria/marshmallow/blob/dev/marshmallow/serializer.py#L280
Perhaps this exception handling should also handle TypeError
and then (using the value of self.many
) raise an informative message. Basically a suggestion that many=True may have been omitted.
Minimal test case:
class MyB(db.Model):
__tablename__ = 'myb'
id = db.Column(db.Integer, primary_key=True)
myas = db.relationship('MyA', backref='myb', lazy='dynamic')
class MyA(db.Model):
__tablename__ = 'mya'
id = db.Column(db.Integer, primary_key=True)
myb_id = db.Column(db.Integer, db.ForeignKey('myb.id'),
nullable=False)
class MyASerializer(Serializer):
class Meta:
fields = ('id', 'myb_id')
class MyBSerializer(Serializer):
myas = fields.Nested(MyASerializer) # Accidentally Broken
# myas = fields.Nested(MyASerializer, many=True) # Correct
class Meta:
fields = ('id', 'myas')
b = models.MyB()
db.session.add(b)
a1 = models.MyA(myb=b)
a2 = models.MyA(myb=b)
db.session.add(a1)
db.session.add(a2)
db.session.commit()
serializers.MyBSerializer(b).data
The way that marshmallow checks to see if it should marshal a list is perhaps a bit error prone when dealing with object instances that implement the __iter__
magic method. For example, MongonEngine document instances implement this method. So any time I try to serialize a MongoEngine document instance it always returns a list. The first thing that came to mind was a flag for the serializer constructor that would force a single instance. However, I'm wary of suggesting you should pollute that space with more args.
Another approach I just thought of is to make the Serializer._marshal
property configurable by passing one to the constructor. Otherwise, use the default implementation.
For instance, if I forget importing ValidationError itself, I get a list like this:
{ "username": [ "'Marshmallow' object has no attribute 'ValidationError'" ] }
Now, this should obviously this exception should not be caught by Marshmallow. I'm not sure why that even works. Any idea whether this is a bug or a problem on my side?
My code:
def duplicate_email_validator(email):
<logic>
raise ma.ValidationError("Email already exists")
class UserInputSchema(ma.Schema):
username = ma.Email(validate=duplicate_email_validator, required=True)
password = ma.String(required=True)
result, errors = UserInputSchema(strict=False).load(request.json)
In version 2.0, the pre-1.0 legacy API will be completely removed from the codebase.
This includes:
Schema
constructordata
and errors
properties of Schema
error
param of Fields
(still in question)Arbitrary
, Fixed
and Price
fields (remove in 2.0)Select
field (remove in 2.0)context
argument of Method fields? (in question)@Schema.preprocessor
, @Schema.data_handler
, etc.MarshallingError
and UnmarshallingError
(remove in 2.1)QuerySelect
and QuerySelectList
(remove in 2.2)allow_none
and required
string arguments (remove in 2.2)EDIT: Updated checklist based on comments.
SQLAlchemy triggers a query when the attribute is accessed with getattr
in utils.to_marshallable_type
I know that it is done this way to keep it non-specific to one ORM -- but would it be possible to pass one (or all) of the Meta.fields, only, or exclude lists to this function?
I don't have time to investigate further at the moment, but I'll try to work up a patch tonight.
Add a class Meta option that specifies the format for every DateTime field in a serializer.
Fields are returned in a random order after marshalling. I would not pay attention to it, if the return type was not OrderedDict
. Why use OrderedDict
, if the fields are still returned in random order?
I think the problem for this is the use of unordered set
here.
It would be great if the fields returned in the order in which they are declared in serializer. It is much prettier for RESTful APIs.
Because marshmallow's pprint function json-encodes OrderedDicts, booleans display as Javascript booleans, with lowercase letters.
from collections import OrderedDict
from marshmallow import pprint
>>> d = OrderedDict([('foo', True), ('bar', False)])
>>> pprint(d)
{"foo": true, "bar": false}
Hello~ at first, I'm sorry for my English
Why Nested field can't process dict? only accept instance it's inherited object.
class Book(object):
title = ''
author = ''
class BookSerializer(Serializer):
title = fields.String()
author = fields.String()
class BookList(object):
items = list()
class BookListSerializer(Serializer):
items = fields.Nested(BookSerializer, many=True)
The solo BookSerializer class accept two types well. object and dic.
it's not problems.
for example
# 1st case which using object.
book = Book()
book.title = 'hello android'
book.author = 'leejaycoke'
return jsonify(BookSerializer(book).data)
# 2st case which using dict.
book = {'title': 'hello android', 'author': 'leejaycoke'}
return jsonify(BookSerializer(book).data)
But Nested fields can't accept dic for listing books but object is ok.
for example
# 1st case which using object
book1 = {'title': 'hello android', 'author': 'leejaycoke'}
book2 = {'title': 'hello iOS', 'author': 'tommy'}
book_list = BookList()
book.items = [book1, book2]
return jsonify(BookListSerializer(book).data)
"""
{
"items": [
{
"title": "hello android",
"author": "leejaycoke"
},
{
"title": "hello iOS",
"author": "tommy"
}
]
}
"""
# 2st case which using dict it's failed
book1 = {'title': 'hello android', 'author': 'leejaycoke'}
book2 = {'title': 'hello iOS', 'author': 'tommy'}
book = {'items': [book1, book2]}
return jsonify(BookListSerializer(book).data)
"""
TypeError: Could not marshal nested object due to error:
"'builtin_function_or_method' object is not iterable"
If the nested object is a collection, you need to set "many=True".\
"""
can you help me?
When serializing a dict with an attribute called "items"
the fails to get the correct "items" value and instead gets the items function of the dict object.
The problem is in utils line 298:
if isinstance(key, basestring) and hasattr(obj, key):
hasattr(obj,key) == True.
Having two serializers that nest each other is quite awkward. For example, for many-to-one relationship between Book
s and Author
s, you'd have to do something like the following:
class BaseBookMarshal(Serializer):
date_created = fields.DateTime()
isbn = fields.String()
class AuthorMarshal(Serializer):
created = fields.DateTime(attribute='date_created')
books = fields.Nested(BaseBookMarshal, many=True)
class BookMarshal(BaseBookMarshal):
author = fields.Nested(AuthorMarshal, allow_null=True)
While this certainly works, having to create the extra BaseBookMarshal
class is a bit clunky. It would be nice if you could declare nested serializers without having to worry about declaration order, and just pass class names into the Nested
field
class AuthorMarshal(Serializer):
created = fields.DateTime(attribute='date_created')
books = fields.Nested('BookMarshal', many=True)
class BookMarshal(Serializer):
author = fields.Nested('AuthorMarshal', allow_null=True)
date_created = fields.DateTime()
isbn = fields.String()
I'm still undecided on whether this is a good idea. Not only with this require more metaclass magicks, but it would necessarily involve implicit removal of fields in order to prevent infinite recursion.
Say I pass in a list of objects as well as many=True
as the obj
, if I also pass in extra
it treats it as it would a dict, calling update
.
I have tried to find answer myself but failed - it seems it is not supported right now.
In case if I've missed it: is it possible to skip missing fields instead of assigning default values during serialization?
I will try to describe it in example:
some_data = dict(
first_name='Joe',
age=20,
)
class TestSchema(Schema):
first_name = String()
family_name = String()
age = Integer()
schema = TestSchema()
print(schema.dump(some_data).data)
Current result: OrderedDict([('first_name', u'Joe'), ('family_name', ''), ('age', 20)])
Desired result: OrderedDict([('first_name', u'Joe'), ('age', 20)])
Of course it is possible to filter the result afterwards. Although it is quite tricky due to the different default values (i.e. for strings, integers) but possible and I've already done it.
I am just curious if I've missed some core functionality.
For nested items, with many=True
, not only do I want to allow null, but if the field is in fact null, I'd like to return an empty array using default.
This will allow API users to ignore checks for null.
Is there a way to accomplish this with the current feature set?
"Namespaces are one honking great idea -- let's do more of those!"
Currently, class registry uses a global dictionary. This works. However, when developing a versioned API (my current situation), this will potentially lead to schema names like V1_SomeSchema
or SomeSchema_V2
.
I understand why the class registry was implemented in this fashion. However, I'm proposing one of two changes.
Easy: Add a schema_group
or similar named attribute on schemas that groups schemas into...well, groups. With schema_group
defaulting to something sane (such as default
or base
). This could be implemented on either the actual schema or (even better) on the Meta options for the Schema.
Example
# v1/schemas/__init__.py:
SomeSchema(BaseSchema):
class Meta:
schema_group = 'v1'
# v2/schemas/__init__.py:
SomeSchema(BaseSchema):
class Meta:
schema_group = 'v2'
And then _registry
would resemble:
{
'v1' :
{ 'SomeSchema' : [v1.schemas.SomeSchema] },
'v2' :
{ 'SomeSchema' : [v2.schemas.SomeSchema] }
}
Of course, class_registry.get_class
and how it's implemented with things such as fields.Nested
will also have to change to accommodate this change as well.
Harder: Some how create instances of the registry and explicitly pass them around or some how tie them to Schemas (think SQLAlchemy's metadata object). This would be more difficult to implement as things like class_registry
would need to change completely. Again, the most likely home for this would be on the Meta class:
v1_reg = Registry()
SomeSchema(BaseSchema):
class Meta:
registry = v1_reg
Based on your example
https://github.com/sloria/marshmallow/blob/dev/examples/flask_example.py
What if we get Authors from the db along with all quotes related!
something like this query:
authorQuotes = session.query(Author, Quote).join(Quote.author ).filter(Author.id == 1).all()
How would we serialize this object?
Currently, marshmallow.utils.get_value
is used pull values from many different types of objects (both simple and complex types).
It may be useful to override this behavior, e.g. via a class Meta
option, when you know exactly what type of objects you will be serializing and how to pull values from them.
I see two use cases for this:
get_value
will not work withIt would be nice to have the ability to mark fields as read-only. When deserializing, validation should fail if field(s) marked as read-only are present in the target dictionary.
Thanks
I've been toying around with the idea of a factory that allows you to generate serialization functions
serialize_user = UserSerializer.factory()
serialize_user(user) # {'name': 'Steve Loria' ...}
# Pass in default params
serialize_user = UserSerializer.factory(strict=True)
serialize_user(invalid_user) # MarshallingError
Doing
schema = MySchema(many=True)
print(schema.dump(mythings))
I get the correct behavior and everything works fine. However, doing
schema = MySchema()
print(schema.dump(mythings, many=True))
results in
myfile.py:29: in get print(schema.dump(mythings, many=True)) env/lib/python3.4/site-packages/marshmallow/schema.py:435: in dump self._update_fields(obj) env/lib/python3.4/site-packages/marshmallow/schema.py:583: in _update_fields ret = self.__filter_fields(field_names, obj) env/lib/python3.4/site-packages/marshmallow/schema.py:630: in __filter_fields attribute_type = type(obj_dict[key]) E TypeError: list indices must be integers, not str
Something's up here.
I'm using SQLAlchemy's polymorphic identities and have been trying to figure out how to get the UserMarshal
to use the BusinessProfileMarshal
if the Profile
attached to the User
is actually a BusinessProfile
class User(db.Model):
profile = db.relationship('Profile', backref='users')
class Profile(db.Model):
__mapper_args__ = {
'polymorphic_identity': 'profile',
'polymorphic_on': type
}
...
class BusinessProfile(Profile):
__tablename__ = 'profile_business'
__mapper_args__ = {
'polymorphic_identity': 'business',
}
...
class UserMarshal(ma.Serializer):
class Meta:
fields = (
'email',
'profile',
)
profile = fields.Nested(ProfileMarshal)
class ProfileMarshal(ma.Serializer):
class Meta:
fields = (
'first_name',
)
class BusinessProfileMarshal(ma.Serializer):
class Meta:
fields = (
'first_name',
'company_name',
)
I worked up a quick test using the nose timed
decorator.
class TestSerializerTime(unittest.TestCase):
def setUp(self):
self.users = []
self.blogs = []
letters = list(string.ascii_letters)
for i in range(500):
self.users.append(User(''.join(random.sample(letters, 15)),
email='[email protected]', age=random.randint(10, 50)))
for i in range(500):
self.blogs.append(Blog(''.join(random.sample(letters, 50)),
user=random.choice(self.users)))
@timed(.2)
def test_small_blog_set(self):
res = BlogSerializer(self.blogs[:20], many=True)
@timed(.4)
def test_medium_blog_set(self):
res = BlogSerializer(self.blogs[:250], many=True)
@timed(1)
def test_large_blog_set(self):
res = BlogSerializer(self.blogs, many=True)
@timed(.1)
def test_small_user_set(self):
res = UserSerializer(self.users[:20], many=True)
@timed(.2)
def test_medium_user_set(self):
res = UserSerializer(self.users[:250], many=True)
@timed(.5)
def test_large_user_set(self):
res = UserSerializer(self.users, many=True)
The user tests all pass, but the medium and large blog tests do not. Obviously, these could pass on some machines, but it's still rather slow.
I did a little bit more testing with profile
. Serializing the whole blog collection was running between 5 and 6s.
It looks like the bottleneck is the deepcopy
operation in serializer.py and it doesn't seem like the call can be removed, or changed to a pickle/unpickle operation.
I'm going to keep digging to see what I can do. If you have any insight, I'd appreciate the help. Thanks!
As of 1.0.0, DateTime
fields serialize to ISO8601 format by default. This makes a number of the examples in the docs show incorrect output (the former default was RFC822). These examples should be updated.
You may want to do:
some_strict_field = fields.String(validate=[func1, func2]
I'm not sure if it can be considered as an issue, but I think that supporting deserialization would really benefit marshmallow.
It's obvious that not every serializer can provide reverse operation, but it's true in many cases. If it's not against your view on what this library should be then I can work on extending marshmallow to support it and prepare pull request.
As of 1.0.0 the correct way to serialize objects is to use the Serializer.dump
method.
Usage of Serializer(some_obj).data
will be deprecated, as will the related Serializer.errors
and Serializer.is_valid
members (dump
returns both the serialized data and a dictionary of errors, so these validation methods are redundant).
For the 1.0.0 release, deprecation warnings should be raised.
When serializing a None
with a field specified as Integer
you get a 0.0. float
back. I think this is kind of strange, I think that you either shall get 0 back, or perhaps None
. I will be happy to submit a patch after discussing a bit first.
I suspect that the behavior comes from this line, as Integer
inherits from Number
, which has the defaults of 0.0 in here:
https://github.com/sloria/marshmallow/blob/dev/marshmallow/fields.py#L348
What do you think @sloria? I would think that serializing None would yield None back.
Email field type validation does not appear to work when using Schema.dump() but works fine for Schema.load(). Working example included below:
from datetime import datetime
from marshmallow import Schema, fields, pprint
# model
class Person(object):
def __init__(self, name, email):
self.name = name
self.email = email
self.date_born = datetime.now()
# serializer schema
class PersonSchema(Schema):
name = fields.String()
email = fields.Email()
date_born = fields.DateTime()
person = Person(name='Guido van Rossum', email='invalid-email')
schema = PersonSchema()
dumps = schema.dump(person)
print '--DUMPS--'
pprint(dumps.data)
pprint(dumps.errors)
loads = schema.load({'name': 'Guido van Rossum', 'email': 'invalid-email'})
print '--LOADS--'
pprint(loads.data)
pprint(loads.errors)
So I have an everyday query like:
things = Thing.query.all()
ThingSerializer(things, many=True).data
This results in contained DateTime objects getting correctly serialized while Date objects don't get serialized!
Example output:
[OrderedDict([('end_date', datetime.date(2011, 1, 4)), ('updated_at', 'Fri, 06 Jun 2014 20:59:56 -0000')])]
Note that end_date is from my SQLAlchemy declarative model and it's a Column.Date type while updated_at is a Column.DateTime type.
However, doing
thing = Thing.query.first()
ThingSerializer(thing).data
results in
OrderedDict([('end_date', '2011-01-01'), ('updated_at', 'Fri, 06 Jun 2014 20:59:56 -0000')])
Note how end_date gets serialized in this case but not in the other and how updated_at always get serialized correctly.
I suspect a typo somewhere that has to do with Date not getting tested a lot or something. Hopefully you can find the issue quickly. :D
from marshmallow import Schema, fields, pprint
class User(object):
def __init__(self, name, email, age=None):
self.name = name
class ChildSchema(Schema):
name = fields.String()
class ParentSchema(Schema):
name = fields.String()
children = fields.Nested(ChildSchema, many=True)
user = User(name="Monty")
schema = ParentSchema()
result = schema.dump(user)
pprint(result.data)
# -> {'children': {'name': ''}, 'name': u'Monty'}
I would expect the result to be {'children': [], 'name': u'Monty'}
. If I set a field to many=True, it should always be a list, no exceptions. What I'm getting instead looks like a mistake.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.