Giter Site home page Giter Site logo

stranger6667 / jsonschema-rs Goto Github PK

View Code? Open in Web Editor NEW
497.0 497.0 90.0 3.83 MB

JSON Schema validation library

Home Page: https://docs.rs/jsonschema

License: MIT License

Rust 97.27% Shell 0.12% Python 2.61%
hacktoberfest jsonschema python rust

jsonschema-rs's Introduction

Hi, I am Dmitry ๐Ÿ‘‹

Software Engineer with more than 12 years of experience specializing in Rust and Python with a focus on writing parsers and fuzzing.

  • ๐ŸŒ Based in Prague, Czech Republic ๐Ÿ‡จ๐Ÿ‡ฟ
  • ๐Ÿ’ก Interested in software testing & building reliable systems
  • ๐ŸŽ“ Studied information security
  • ๐Ÿšฒ Love traveling
  • ๐Ÿ‘‹ Reach me on LinkedIn, Twitter or Telegram

jsonschema-rs's People

Contributors

aaron-makowski avatar alexjg avatar blacha avatar dependabot[bot] avatar derridda avatar djmitche avatar duckontheweb avatar ermakov-oleg avatar gavadinov avatar jacobmischka avatar jayvdb avatar jqnatividad avatar jrdngr avatar kgutwin avatar leoschwarz avatar macisamuele avatar matteopolak avatar orangetux avatar pdogr avatar qrayven avatar rafaelcaricio avatar samgqroberts avatar samwilsn avatar stranger6667 avatar syheliel avatar tamasfe avatar thebearingedge avatar tobz avatar wolfgangwalther avatar zhiburt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsonschema-rs's Issues

Refactor benchmarks

At the moment I see these disadvantages of the current implementation:

  • They test only the performance of is_valid. We should bench validate as well
  • benchmark names are hardcoded and often duplicated. we should autogenerate them, so they are not re-written accidentally during the run
  • a lot of duplicated schemas. They could be reorganized with some macro
  • a lot of duplicate code in benches implementation
  • commented code. actually it will be better to uncomment and then select by name

Generate validators without dispatching

Even though compiling validators gives pretty good results, it is not the fastest way to perform validation in all circumstances. If we know the schema during the build time, then we can generate code that will be more efficient, than the current approach.

For example if we have this schema:

{"type": "string", "maxLength": 5}

then our current approach will basically iterate over a vector of trait objects and call their validate / is_valid methods.

The idea is to generate code like this:

fn is_valid(instance: &Value) -> bool {
    match instance {
        Value::String(value) => value.len() <= 5,
        _ => false
    }
}

https://github.com/horejsek/python-fastjsonschema does this.

Avoid copying to ValidationError

In most of the cases, we copy data into ValidationError instance like this (taken from the implementation of required keyword):

    fn validate<'a>(&self, _: &'a JSONSchema, instance: &'a Value) -> ErrorIterator<'a> {
        if let Value::Object(item) = instance {
            for property_name in &self.required {
                if !item.contains_key(property_name) {
                    return error(ValidationError::required(instance, property_name.clone()));
                }
            }
        }
        no_error()
    }

instance is later wrapped in Cow::Borrowed but property_name is cloned. Sometimes instance is cloned too via ValidationError::into_owned (e.g. in additional_properties keyword implementation) so it can be used in our error iterator.

I assume that it is possible to avoid cloning, but it will require some lifetime tweaks which I failed to implement (a couple of times)

Optimize check_time

It might be faster with a single regex rather than with 4 calls to parse_from_str

Restructure project

  • Rename Validator -> Validate
  • Rename Schema -> Validator
  • Move Scope and related things to a separate module
  • Put types.rs to relevant places
  • Rename validate_sub -> descend
  • Rename validators -> keywords
  • Move compile to the root
  • Move validators/mod.rs to the root

Bug in AdditionalPropertiesFalseValidator

There is no test case for it, but instead of :

fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Object(item) = instance {
            return item.iter().next().is_some();
        }
        true
    }

it should be:

fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Object(item) = instance {
            return item.iter().next().is_none();
        }
        true
    }

I.e. it is only valid on objects without properties

Restructure project

  • resolver & validators should be at the same level
  • errors should be grouped in the same file
  • move format checkers into a separate file
  • types separately

Setup CI

  • GitHub actions
  • Each commit - cargo fmt & cargo clippy
  • Test build

Update "Performance" section

It will be more fair to have two groups - compiled and not compiled. Currently, results from jsonschema_valid and valico are done with compiled validators. So, basically we need to move jsonschema (not compiled) column into a new table and compare it with not compiled versions of jsonschema_valid and valico.

Rust compiler version & options will also be useful there. As for benchmarks probably it will be better to compile with LTO and RUSTFLAGS="--emit=asm"

Handle errors instead `unwrap`

In some cases, it might be better to return an error to the client instead. they are mostly in the resolver.

But for known to be valid regexes & URLs we can use expect

Improve compilation

Currently, all possible sub-schemas are built. Maybe only subschemas for existing refs should be built instead?

Avoid mutable context - in this case, it is harder to parallelize compilation, simple clones should work

Macros to return validation error

Instead of

let message = format!("'{}' is too long", item);
return Err(ValidationError::ValidationError(message));

it can be

return validation_error!("`{}` is too long", item)

Improve validators debug representation

I think it will be better if this representation will be closer to the original schema. E.g.

<unique_items> vs {"uniqueItems": true}

It might be less confusing since it will use the same keywords as in the original schema

Store & use meta-schemas

If we'll validate the input schemas for conformance to the respective specs, then:

  • We probably can skip a lot of our own checks during the compilation process
  • There will be an understandable error message in case if the input schema is not valid

Regarding the implementation details - it can be done via lazy_static! so the schema is not re-compiled. In the perfect scenario I'd like to have it done via code-generation (like described here - #46)

Do not include empty nodes in the validation tree

When a schema is compiled it is possible to have empty nodes. Example

{
    "items": {"additionalProperties": true}
}

It compiles to items: {} because true is a default value for this keyword which makes this schema empty.
Such cases should be checked and removed from the tree

Cache for loaded documents

Once a remote reference is resolved I think it will make sense to cache it somewhere. I assume that it might be done with RefCell. Some kind of LRU cache with a small capacity (usually there are not many remote schemas under the same document)

Canonicalise schemas during compilation

We can eliminate some not-efficient options. e.g:

{"anyOf": [{"type": "string"}, {"type": "number"}]}

can be simplified to:

{"type": ["string", "number"}

And if there are integer and number, then it can be replaced with {"type": "number"} since number includes integer.

Another approach that we can take additionally:

When a schema is compiled it is possible to have empty nodes. Example

{
    "items": {"additionalProperties": true}
}

It compiles to items: {} because true is a default value for this keyword which makes this schema empty. Such cases should be checked and removed from the tree

Possible truncation & panic

e.g. in min_properties:

let limit = limit.as_u64().unwrap() as usize;

If the schema will contain a negative/float number for this keyword then this line will panic

On a 32-bit platform, an integer that exceeds usize it will be truncated which may lead to wrong results during validation

Affected validators:

  • max_items
  • max_length
  • max_properties
  • min_items
  • min_length
  • min_properties

As a result we should have clippy::cast_possible_truncation lint + test cases

Group validators by the input type

The idea is to store validators in groups by the input type, e.g. all validators that can be applied to a number, object, array, string, etc.

What we can get from it

Less pattern matching on the matching type

Consider this schema: {"minimum": 1, "maximum": 10}

Essentially we have 2 validators that together roughly do the following:

if let Value::Number(item) = instance {
    let item = item.as_f64().unwrap();
    if item < self.limit {
        return false;
    }
}
if let Value::Number(item) = instance {
    let item = item.as_f64().unwrap();
    if item > self.limit {
        return false;
    }
}

Pattern matching twice, and item.as_f64().unwrap() twice. Instead, we can do on the root validation method (and in nodes where it is appropriate):

... // some common validators for any type here
match instance {
    Value::Number(item) => {
        let item = item.as_f64().unwrap();
        // first validator inlined for illustration
        if item < self.limit {
            return false;
        };
        if item > self.limit {
            return false;
        }
        true
    }
    ...
}

In this arm, we can apply exclusiveMaximum, exclusiveMinimum, minimum, maximum, and multipleOf.

Much simpler validators

Instead of this:

    fn is_valid(&self, _: &JSONSchema, instance: &Value) -> bool {
        if let Value::Number(item) = instance {
            let item = item.as_f64().unwrap();
            if item < self.limit {
                return false;
            }
        }
        true
    }

we can do this:

    fn is_valid(&self, item: f64) -> bool {
        item < self.limit
    }

And there is no need to pass a not used reference to JSONSchema instance. The same simplification can be applied to the validate method.

Faster execution for not-matching types

Currently, if we pass null to the validator above, we'll still call both of them in a loop. and they both will return true. With that idea, there will be only 1 pattern matching in the root + maybe some small checks which I'll describe below

More insights where to apply parallel execution

We can know for sure that there is no point to apply any parallel execution for numeric validators, since they are fast and there are only 5 of them. In other words, the surface of possibilities will be more visible (only applicable to arrays and objects) and smaller.

As a downside, I see that there could be some extra logic to iterate over two vectors (common & specific validators) which may have higher overhead for some small schemas with a single keyword

Also, the implementation will require splitting to multiple traits.

But anyway, this option is worth exploring, maybe some other optimizations will be more visible on the way

I think that this idea can be also applied to the compilation phase

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.