guyskk / validr Goto Github PK

View Code? Open in Web Editor NEW

210.0 210.0 12.0 779 KB

A simple, fast, extensible python library for data validation.

License: Other

Python 72.44% Shell 0.21% Cython 27.36%

json-schema validation validator

validr's People

Contributors

Stargazers

Watchers

Forkers

axltom vibiu ph0enixxx ppproxy macndesign alchemyst strategist922 jokipii v1c77 survivorm talent-tool-sets infogrid-io

validr's Issues

Failed to create custom choice validator

I'm trying to make a choice validator like test_custom_validator in test_custom_validator.py,
but it will be better with "choices" parameter, so I made this:

def choice_validator(choices: list or tuple):
    @validator_wrap(string=False)
    def _validator(value):
        if value not in choices:
            raise Invalid('invalid choice')
        return value

    return _validator

and used it like:

SP = SchemaParser(validators={'choice': choice_validator})
SP.parse({'type?choice(["A", "B"])': 'blahblah'})

unfortunately this would raise exception:

validr._exception.SchemaError: invalid JSON value in '["A", "B"]'

and I find the parameters between "()" will be split by ",", so there is no way to pass array variable to a custom validator?

It will be great helpful if there is an example of custom validator with parameters.

Goals

Easy to write schema in python/high-level programming language, fewer mistakes
Easy to share and refer schema, and define openapi
Compatible with YAML, replace & with . and remove @ grammar

Overview:

# old
name?str&strip&default="world"&desc="Your name"
# new-yaml
name: str.strip.default="world".desc="Your name"
# new-python
name: T.str.strip.default("world").desc("your name")
# new refer
pet: T.ref("http://example.com/schema.json#Pet").optional.desc('description')

Syntax in YAML

scalar:
    validator.bool.key=value
list:
    - validator.bool.key=value
    - arg0
    - arg1
dict:
    $self: validator.bool.key=value
    key0: value
    key1: value
refer:
    pet: ref("http://example.com/schema.json#Pet").optional.desc('description')
    pet:
        - ref.optional.desc('description')
        - http://example.com/schema.json#Pet

Syntax in Python

from validr import T

Welcome = T.dict(
    message='str.desc="Welcome message"'
    message=T.str.desc("Welcome message")
).optional.desc('Welcome Object')

@route('/')
def welcome(
    name: 'str.strip.default="world".desc="Your name"',
    name: T.str.strip.default("world").desc("Your name"),
) -> T.list(Welcome).minlen(3):
    return [{'message': 'hello ' + name}] * 3

Feature: use ciso8601 to implement datetime validator

Currently datetime.strptime method is not flexible, it only support vert strict format.
Use https://github.com/closeio/ciso8601 is better to parse iso8601 datetime format.

And add tzaware option, if tzaware=True then return datetime object with timezone info (UTC).

The change is compatible, no break changes.

Cannot compile for Python 3.10 and 3.11

Steps to reproduce:

$ python3.10 -m venv py310
$ py310/bin/pip install validr

Output:

Installing collected packages: validr
  Running setup.py install for validr ... error
  error: subprocess-exited-with-error
  
  × Running setup.py install for validr did not run successfully.
  │ exit code: 1
  ╰─> [1381 lines of output]
      VALIDR_SETUP_MODE=c
      running install
      /home/krat/Projects/py310/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.10
      creating build/lib.linux-x86_64-3.10/validr
      copying src/validr/exception.py -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/schema.py -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/validator.py -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/__init__.py -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/model.py -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/_validator_py.py -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/_exception_py.py -> build/lib.linux-x86_64-3.10/validr
      creating build/lib.linux-x86_64-3.10/validr/_vendor
      copying src/validr/_vendor/email_validator.py -> build/lib.linux-x86_64-3.10/validr/_vendor
      copying src/validr/_vendor/__init__.py -> build/lib.linux-x86_64-3.10/validr/_vendor
      copying src/validr/_vendor/durationpy.py -> build/lib.linux-x86_64-3.10/validr/_vendor
      copying src/validr/_vendor/fqdn.py -> build/lib.linux-x86_64-3.10/validr/_vendor
      running egg_info
      writing src/validr.egg-info/PKG-INFO
      writing dependency_links to src/validr.egg-info/dependency_links.txt
      writing requirements to src/validr.egg-info/requires.txt
      writing top-level names to src/validr.egg-info/top_level.txt
      reading manifest file 'src/validr.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      adding license file 'LICENSE'
      writing manifest file 'src/validr.egg-info/SOURCES.txt'
      copying src/validr/_exception_c.c -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/_exception_c.pyx -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/_validator_c.c -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/_validator_c.pyx -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/model.pyi -> build/lib.linux-x86_64-3.10/validr
      copying src/validr/schema.pyi -> build/lib.linux-x86_64-3.10/validr
      running build_ext
      building 'validr._exception_c' extension
      creating build/temp.linux-x86_64-3.10
      creating build/temp.linux-x86_64-3.10/src
      creating build/temp.linux-x86_64-3.10/src/validr
      x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/krat/Projects/py310/include -I/usr/include/python3.10 -c src/validr/_exception_c.c -o build/temp.linux-x86_64-3.10/src/validr/_exception_c.o
      src/validr/_exception_c.c: In function ‘__Pyx_call_return_trace_func’:
      src/validr/_exception_c.c:1075:15: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘use_tracing’; did you mean ‘tracing’?
       1075 |       tstate->use_tracing = 0;
            |               ^~~~~~~~~~~
            |               tracing


      *** a lot of similar lines go here ***


      /usr/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here
        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {
            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

For Python 3.11 the output differs, but it's still a compilation error.

Error using T.enum (v1.1.3)

python3 --version
Python 3.6.9

pip3 --version
pip 20.0.2 from /home/yumi/.local/lib/python3.6/site-packages/pip (python 3.6)

pip3 install validr
Processing => validr-1.1.3-cp36-cp36m-linux_x86_64.whl

python3
>>> from validr import T
>>> T.enum([1,2,3])
....
  File "/home/yumi/.local/lib/python3.6/site-packages/validr/schema.py", line 424, in _check_items
    raise SchemaError('items must be bool, int, float or str')
validr._exception_c.SchemaError: items must be bool, int, float or str

>>> T.enum(1,2,3)
...
File "/home/matu/.local/lib/python3.6/site-packages/validr/schema.py", line 362, in __call__
    raise SchemaError("can't call with more than one positional argument")
validr._exception_c.SchemaError: can't call with more than one positional argument

I found your validr_uncython.py scipt quite useful. After modifying it for my needs, I placed it as a standalone project here: https://github.com/JohannesBuchner/uncythonize and uploaded it also to pypi. I am sure I will use it in the future.

Thank you!

rename validater to validr

Some changes:

change package name to validr, use import validr instead of import validater
change variables name from validater to validator, include ValidatorString, validr.validators, build_re_validator and builtin_validators

support coerce invalid value, support report invalid value in exception

To handle invalid values more flexibly, will add two params for all validators:

invalid_to(value): replace invalid value with the specified value
invalid_to_default: replace invalid value with default value, the default must be set

And add value attribute to Invalid exception, the error message will include the invalid value (long text will be truncated).

rename validater to ?

validatr
valdating

Allow option to throw Invalid exception as either AsciiTable or dictionary.

Validr is currently unsupported for server applications that require verifying POST requests. Since Validr has implemented AsciiTable as opposed to either a dictionary or a list of errors, when creating RESTful API's the thrown exception is absolutely useless.

A better implementation would perhaps be the ability to define some variable within the base Model classes to give better control of the error:

@modelclass
class Model:
    self.error_type = dict 

class Person(Model):
    name=T.str.maxlen(16).desc('at most 16 chars')
    website=T.url.optional.desc('website is optional')

This way when an error is found, the server could easily return back what is going on.

try:
    test = Person(name=True, website='')
except Exception as e:
    return json({'error': 'Invalid key(s) input.', 'keys': e})

Where the error e would be:

{'name': 'invalid string'}

And therefore the server is able to respond:

{
    'error': 'Invalid key(s) input.',
    'keys': {
        'name': 'invalid string'
    }
}

For now, unfortunately, I will have to go back to another json verifier. Great project nonetheless!

Propose: union schema

To support validate data with multiple schemas, similar to the anyOf, oneOf feature in jsonschema, I propose union schema.

Union schema solves two usage scenario.

Distinguish schemas by type or dict keys

schema_by_type_or_keys = T.list(T.union([
    T.str,
    T.list(T.str),
    T.dict(key1=T.str),
    T.dict(key2=T.str, key3=T.str),
]))

valid_values = [
    "string",
    ["list", "of", "string"],
    {"key1": "key1 value"},
    {"key2": "key2 value", "key3": "key3 value"}
]

schema in json format:

[
    "union",
    "str",
    ["list", "str"],
    {"key1": "str"},
    {"key2": "str", "key3": "str"},
]

validate process:

def union_validator(compiler, items):
    scalar_inner = None
    list_inner = None
    dict_inners = {}
    for schema in items:
        assert schema.validator != 'union', 'ambiguous schema'
        assert not schema.optional and not schema.default, 'ambiguous schema'
        if schema.validator == 'list':
            assert list_inner is None, 'ambiguous schema'
            list_inner = compiler.compile(schema)
        elif schema.validator == 'dict':
            key = required_fields_of(schema)
            assert key not in dict_inners, 'ambiguous schema'
            # TODO: make sure only one inner schema will be selected
            dict_inners[key] =compiler.compile(schema)
        else:
            assert scalar_inner is None, 'ambiguous schema'
            scalar_inner = compiler.compile(schema)
    def validate(value):
        if isinstance(value, list):
            return list_inner(value)
        elif isinstance(value, dict):
            # TODO: optimize select inner schema
            for keys, inner in dict_inners.items():
                if keys.issubset(value.keys()):
                    return inner(value)
            return 
        else:
            return scalar_inner(value)
    return validate

Example on select inner schema:

dict schema and keys:

schema1: k1,k2
schema2: k1,k2,k3
schema3: k1,k2,k4
schema4: k1,k2,k5,k6

value keys and matched schema:

k1,k2,k3,k4,k5,k6 -> schema4
k1,k2,k3,k4,k5    -> schema3
k1,k2,k3          -> schema2
k1,k2             -> schema1

Logic: match the longest subset schema.

Distinguish schemas by specified field

schema_by_specified_field = T.list(T.union(
    smtp=T.dict(
        host=T.str,
        port=T.int,
        username=T.str,
        password=T.str,
    ),
    slack=T.dict(
        endpoint=T.url,
        token=T.str,
    )
).by('type'))

valid_values = [
    {
        "type": "smtp",
        "host": "localhost",
        "port": 25,
        "username": "guyskk",
        "password": "123456",
    },
    {
        "type": "slack",
        "endpoint": "https://api.slack.com",
        "token": "xxxxxx",
    },
]

schema in json format:

{
    "$self": "union.by('type')",
    "smtp": {
        "host": "str",
        "port": "int",
        "username": "str",
        "password": "str",
    },
    "slack": {
        "endpoint": "str",
        "token": "str",
    }
}

validate process:

def union_validator(compiler, items, by):
    inners = {}
    for k, schema in items.items():
        assert not schema.optional and not schema.default, 'ambiguous schema'
        inners[key] = compiler.compile(v)
    def validate(value):
        by_type = value.get(by)
        return inners[by_type](value)
    return validate

Release v1.2.0

Check List:

Can't call with keyword argument

I am trying out a simple schema and I realize it is not possible to use optional while having additional attributes declared in a dict schema. Am I doing it correctly? I cannot find any relevant example regarding this usage.

As such,

from validr import T, modelclass, asdict, ValidrError

@modelclass
class Model:
    """Base Model"""

class Person(Model):

    url=T.url.desc("url")
    myObj = T.dict.optional(
        idx=T.int.optional.default(0)
    )

try:
    result = Person(url="http://test.com", myObj={
        "idx": 1
    })
    print(asdict(result))
except ValidrError as err:
    print(err.message)

Installation error

Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-qqy3vem8/validr/setup.py", line 8, in
long_description = f.read()
File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 576: ordinal not in range(128)

Looks like with some system encodings istallation is broken.
i suggest replacing line 7 at setup.py

with open(os.path.join(dirname(__file__), 'README.md')) as f:

with

with open(os.path.join(dirname(__file__), 'README.md'), 'r', 'utf-8') as f:

Propose: dynamic dict keys

Similar to JSON SchemapatternProperties and "additionalProperties": false/true features.

It's important features to support docker-compose config schemas: https://github.com/docker/compose/tree/master/compose/config

T.dict.key(T.str.match("^[a-zA-Z0-9._-]+$")).value(T.int)

{
    "abc.A-B-C_123": 123,
    "ABC.abc-4_5_6": 456,
}

T.dict(key=T.int).extra('discard')
T.dict(key=T.int)   # default: discard
{"key": 123, "xxx": 123}  ->  {"key": 123}

T.dict(key=T.int).extra('keep')
{"key": 123, "xxx": 123}  ->  {"key": 123, "xxx": 123}

T.dict(key=T.int).extra('error')
{"key": 123, "xxx": 123}  ->  raise Invalid("xxx fields")

stable benchmark

横坐标是原始测试结果(时间)，纵坐标代表出现的频率。
水平的横线表示 stable_timeit 选取的数据，那一段的数据是比较稳定可靠的。
最大偏差在3%以内，通常结果会上下浮动1%。

validator string DSL parser by pyparsing

https://gist.github.com/guyskk/1630149017f3eaa59c7e30c009741634

English document

I am working on english document, suggesting and pull request are welcome

control validator accept and output type

Currently validator can output either string or object(non-string), no convenience way to control it's accept and output type. For example datetime validator always output string, but sometimes it's better to output datetime object.

To solve the problem I will introduce accept and output parameter to @validator().

accept parameter:

str: the validator accept only string, treat both None and empty string as None, eg: email, phone, idcard
object: the validator accept only object, eg: dict, list
(str, object): (default) the validator accept both string and object, treat both None and empty string as None, eg: datetime, date, time, url, uuid, ipv4, ipv6

output parameter:

str: (default) the validator always output string, convert None to empty string, eg: str, email, phone, idcard
object: the validator always output object, eg: dict, list, int, float
(str, object): the validator can output both string and object, and has an object parameter to control which to output, eg: datetime, date, time, url, uuid, ipv4, ipv6. To reduce conflict, the object parameter will rename to output_object in validator's signature.

Usage example:

@validator(accept=(str,object), output=(str,object)
def datetime_validator(output_object=False):
    def validate(value):
        # do validation
        if output_object:
            # return datetime object
        else:
            # return datetime string
    return validate

datetime_str_schema = T.datetime
datetime_obj_schema = T.datetime.object

Special case for str validator:

By default str validator only accept str type because all python object implement __str__ method and simply convert object to str will cause unwanted behavior.
So str validator will has an accept_object parameter to control whether it should convert object to str.

Backward compatibility:

The origin string parameter will be deprecated but will not be removed, no break changes.

string=True equal to accept=(str, object), output=str
string=False equal to accept=(str, object), output=object

The feature will added in v1.1, maybe in a few months.

list.unique validator is SLOW

I have a schema like

plain_rule_schema_in = T.dict(
    enabled=T.bool.optional.default,
    ttl=T.int.optional,
    settings=T.dict(
        addr=T.netaddr
    )
)

plain_rule_list_in = T.dict(
    name=T.str,
    enabled=T.bool,
    description=T.str.optional.default(''),
    category=T.enum(
        ','.join([category.name for category in RuleListCategoriesEnum])
    ),
    rules=T.list(
        plain_rule_schema_in,
    ).maxlen(100000)
)

of a relatively big list of rule entities inside of the list container. It processes fast enough (0.23 sec for 20000 entities), but that's before i add a unique checker to the list. Time instantly jumps to 15-16 s!
I can imagine that unique check is a little slow, yes, but not SUCH slow. There may be a catch here, of course, for example, we can try to drop list/dict objects to json, and many others to str to do substantially faster checks. You may think of something even faster, but for now, this great check is literally unusable
I use validr==1.0.4, netaddr check is performed via

@validator(string=True)
def netaddr_validator(compiler):
    """Custom validator

    Params:
        compiler: can be used for compile inner schema
        items: optional, and can only be scalar type, passed by schema in `T.validator(items)` form
        some_param: other params
    Returns:
        validate function
    """
    def validate(value):
        """Validate function

        Params:
            value: data to be validate
        Returns:
            valid value or converted value
        Raises:
            Invalid: value invalid
        """
        try:
            value = str(netaddr.IPNetwork(value))
        except netaddr.core.AddrFormatError as ex:
            raise Invalid('{} is invalid addr'.format(value))

        return value
    return validate

Error while installing the package

I get this error while installing in an empty environment (virtual environment) with python 3.8.10:

pip install validr
Collecting validr
  Using cached validr-1.2.1.tar.gz (291 kB)
Collecting idna>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting pyparsing>=2.1.0
  Downloading pyparsing-3.0.8-py3-none-any.whl (98 kB)
     |████████████████████████████████| 98 kB 154 kB/s 
Collecting terminaltables>=3.1.0
  Downloading terminaltables-3.1.10-py2.py3-none-any.whl (15 kB)
Building wheels for collected packages: validr
  Building wheel for validr (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /home/shayan/dev/test-validr/env/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-mtopyp8x/validr/setup.py'"'"'; __file__='"'"'/tmp/pip-install-mtopyp8x/validr/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-kx_fpz1i
       cwd: /tmp/pip-install-mtopyp8x/validr/
  Complete output (7 lines):
  VALIDR_SETUP_MODE=c
  usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: setup.py --help [cmd1 cmd2 ...]
     or: setup.py --help-commands
     or: setup.py cmd --help
  
  error: invalid command 'bdist_wheel'
  ----------------------------------------
  ERROR: Failed building wheel for validr
  Running setup.py clean for validr
Failed to build validr
Installing collected packages: idna, pyparsing, terminaltables, validr
    Running setup.py install for validr ... done
Successfully installed idna-3.3 pyparsing-3.0.8 terminaltables-3.1.10 validr-1.2.1

Also I can't install it in my env created using pipenv with python 3.10.3 and I get:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 41: ordinal not in range(128)

Example request: Nesting validators

Is there a way to create custom validator with the ability to use nested other validators (as list and dict do?)
I know i'm using the validr in an unexpected way, but the case is: I have the db results which i want to pack into arbitrary data structure. One of the fields of the list of nested dicts (representing db rows) is bytea in postgre (python bytes), packed json dump -> str -> bytes. So, i need to unpack it, and ideally - check the result as a dict.
Is it possible?

improve refer syntax

优化引用语法

目前不支持引用有参数，无法处理被引用的数据是可选这个功能。
也不支持引用多个，不能像多重继承(mixins)一样，组合多个数据。
另外 "$self?&optional" 这里的 ? 不太好，应当去掉。

以下是改进后支持的语法:

"?validater(arg1,arg2...)&key=value&..."
"(arg1,arg2...)&key=value&..."
"&key=value&..."
"@refer&key=value&..."
"@refer@refer&key=value&..."

idcard and phone validater is chinese special, considering improve

rename idcard to cn-idcard or other better solution
make phone validater internationalization