Giter Site home page Giter Site logo

mqlite's Introduction

About

MQLite is a small pattern matching language loosely based on MQL, the Metaweb Query Language. The main difference is that MQLite can be used to pattern match arbitrary JSON data instead of a particular schema (such as Freebase).

MQLite is a simple program, it reads JSON from stdin and outputs JSON to stdout. The pattern is specified as a parameter. Here is an example, using the Github API to find repositories that have forks:

$ curl https://api.github.com/users/Beluki/repos |
| MQlite.py '[{"name": null, "forks": null, "forks >": 0}]'
[
    {
        "name": "GaGa",
        "forks": 2
    },
    {
        "name": "MQLite",
        "forks": 1
    }
]

Also included is MQLiteSH, an interactive shell that can be used to easily query local JSON files.

Installation

MQLite and MQLiteSH are single, small Python 3.3.+ files with no dependencies other than the Python 3 standard library. You can just put them in your PATH.

Installation is only needed to import MQLite as a library in your own programs (e.g. custom pattern matching or extensions). This can be done using setuptools:

$ cd Source
$ python setup.py install

This will install MQLite as a module and both MQLite and MQLiteSH as scripts.

MQLite specification

The MQLite language is very similar to the MQL read API with a few changes here and there (needed to match arbitrary JSON). The best way to learn it is by example.

All the examples in this README use the following JSON data as input:

[
    {
        "name": "Anna",
        "age": 25,
        "student": true,
        "grades": { "chemistry": "A", "math": "C" },
        "hobbies": ["reading", "chess", "swimming"]
    },
    {
        "name": "James",
        "age": 23,
        "student": false,
        "hobbies": ["chess", "football", "basketball"]
    },
    {
        "name": "John",
        "age": 35,
        "student": true,
        "grades": { "chemistry": "C", "english": "A" },
        "hobbies": ["reading", "swimming", "painting"]
    }
]

Basic pattern matching rules:

  • null matches anything.

  • booleans, numbers and strings match themselves.

  • lists match another list if each element in the query list matches at least one element of the data list.

  • dicts match another dict if all the keys in the query dict are present in the data and the values for those keys match.

Here are some examples (using MQLiteSH with the above dataset as input).

# give me the name and the age for everyone:
>>> [{ "name": null, "age": null }]
[
    {
        "age": 25,
        "name": "Anna"
    },
    {
        "age": 23,
        "name": "James"
    },
    {
        "age": 35,
        "name": "John"
   }
]

# find the names of those that are students:
>>> [{ "name": null, "student": true }]
[
    {
        "student": true,
        "name": "Anna"
    },
    {
        "student": true,
        "name": "John"
    }
]

# who plays both chess and basketball?
>>> [{ "name": null, "hobbies": ["chess", "basketball"] }]
[
    {
        "name": "James",
        "hobbies": [
            "chess",
            "basketball"
        ]
    }
]

MQLite specification: constraints

Basic pattern matching only allows to test datatypes for equality. Constraints make it possible to match on arbitrary operations.

Implemented constraints are:

  • >, >=, <, <=, ==, != compare data with the usual arithmetic operations.

  • contain, in test that the data contains a value or is contained in a set of values.

  • regex matches data against regular expressions.

  • is tests that the data is of a particular type.

  • match recursively matches its argument.

Constrains are written after the key name they apply to in the query dict. Unlike basic patterns, constraints don't add anything to the final query result.

Examples:

# who is more than 25 years old?
>>> [{ "name": null, "age >": 25 }]
[
    {
        "name": "John"
    }
]

# give me the names of the people whose name starts with a J
>>> [{ "name": null, "name regex": "^J" }]
[
    {
        "name": "James"
    },
    {
        "name": "John"
    }
]

# what are the hobbies of John and Anna?
>>> [{ "name in": ["John", "Anna"], "name": null, "hobbies": null }]
[
    {
        "name": "Anna",
        "hobbies": [
            "reading",
            "chess",
            "swimming"
        ]
    },
    {
        "name": "John",
        "hobbies": [
            "reading",
            "swimming",
            "painting"
        ]
    }
]

# who got an A in chemistry? what's his/her math grade?
>>> [{ "name": null, "grades match": {"chemistry": "A"}, "grades": {"math": null} }]
[
    {
        "name": "Anna",
        "grades": {
            "math": "C"
        }
    }
]

MQLite specification: constraint modifiers

To be able to express NOT, AND, OR and XOR, MQLite allows constraints to be prefixed and suffixed with additional operators:

  • not as a prefix negates the constraint.

  • all as a suffix requires the constraint to match every item in a list (e.g. AND).

  • any as a suffix requires the constraint to match at least one of the elements in a list (e.g. OR).

  • one as a suffix requires the constraint to match only one of the elements in a list (e.g. XOR).

Examples:

# whose age is not 23?
>>> [{ "name": null, "age": null, "age not ==": 23 }]
[
    {
        "name": "Anna",
        "age": 25
    },
    {
        "name": "John",
        "age": 35
    }
]

# who likes reading or painting?
>>> [{ "name": null, "hobbies contain any": ["reading", "painting"] }]
[
    {
        "name": "Anna"
    },
    {
        "name": "John"
    }
]

# who likes swimming or painting but not both?
>>> [{ "name": null, "hobbies contain one": ["swimming", "painting"] }]
[
    {
        "name": "Anna"
    }
]

In both MQL and MQLite it's impossible to write an OR clause using multiple keys, e.g. age > 20 or name in .... You need to write multiple queries to do it.

Other than that it's possible to specify as many constraints as you want for each key, with or without prefixes/suffixes.

MQLite specification: directives

Directives are special keys that change results in a particular way. The currently implemented directives are:

  • __limit__ returns a subset of the results.

  • __sort__ sorts the results by a given key.

  • __order__ sorts the results randomly or in reverse order.

  • * returns all the keys in a query or a list of particular keys.

Examples:

# give me a random name
>>> [{ "name": null, "__order__": "random", "__limit__": 1 }]
[
    {
        "name": "Anna"
    }
]

# give me all the names and ages, sort the results by age in reverse order
>>> [{ "name": null, "age": null, "__sort__": "age", "__order__": "reverse" }]
[
    {
        "name": "John",
        "age": 35
    },
    {
        "name": "Anna",
        "age": 25
    },
    {
        "name": "James",
        "age": 23
    }
]

# who is at least 25 years old? give me all the properties
>>> [{ "age >": 25, "*": "*" }]
[
    {
        "name": "John",
        "student": true,
        "age": 35,
        "grades": {
            "english": "A",
            "chemistry": "C"
        },
        "hobbies": [
            "reading",
            "swimming",
            "painting"
        ]
    }
]

# who got at least a B in chemistry? give me all the grades
>>> [{ "name": null, "grades": { "chemistry <=": "B", "*": "*" }}]
[
    {
        "name": "Anna",
        "grades": {
            "math": "C",
            "chemistry": "A"
        }
    }
]

# what are John's hobbies and age?
>>> [{ "name ==": "John", "*": ["hobbies", "age"] }]
[
    {
        "age": 35,
        "hobbies": [
            "reading",
            "swimming",
            "painting"
        ]
    }
]

Note that the order of the directives is important. If __limit__ is specified before __sort__, only the subset of the results returned by limit will be considered for sorting. MQLite uses an OrderedDict under the hood to maintain the query order in JSON patterns.

Command-line options

MQLite has some options that can be used to change the behavior:

  • --strict exits with an error message and status 1 when there are no matches instead of producing an empty output. Useful for scripts.

  • --ascii escapes non-ascii characters in output.

  • --indent N uses N spaces of indentation for output. Use -1 to disable indentation.

  • --sort-keys sorts dictionary keys by name before printing the results.

  • --newline [dos, mac, unix, system] changes the newline format. I tend to use Unix newlines everywhere, even on Windows. The default is system, which uses the current platform newline format.

MQLiteSH has the same options except --strict (no matches don't produce output) and --newline (it always uses system newlines).

Portability

Information and error messages are written to stdout and stderr respectively, using the current platform newline format and encoding.

The input JSON is expected to be UTF-8. Both MQLite and MQLiteSH accept input with or without a BOM signature.

The output JSON is always written in UTF-8 without BOM. When using the same --newline format, it should be byte by byte identical between platforms.

MQLite is tested on Windows 7 and 8 and on Debian (both x86 and x86-64) using Python 3.3+. Older versions are not supported.

Status

This program is finished!

MQLite is feature-complete and has no known bugs. Unless issues are reported I plan no further development on it other than maintenance.

License

Like all my hobby projects, this is Free Software. See the Documentation folder for more information. No warranty though.

mqlite's People

Contributors

beluki avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jy4618272

mqlite's Issues

Suggestion for refactoring

Apologies if I sound a bit blocky and bot-like in my comment, it's like 12:33AM where I am. See MQLite.py:425, a conditional is not needed as you have already stated all of the possible conditions in the __init__ function. Also in line 438, not wanting to sound like a pendant it is generally not recommended to put spaces after equals sign in function arguments. For WrapConstraintAnd you may implement it as the following which would be more intuitive:

class WrapConstraintAnd(object):
    def __init__(self, *conds):
        self.conds = conds

    def match(self, data):
        return all(k(data) for k in self.conds)

The long block of conditionals at line 529 may also be refactored into a dict:

class Compiler(object):
    # code here
    compilers = {bool: MatchEqual, dict: compile_dict, ...}

Also for caching classes like Pattern I usually implement the caching like the following:

class Pattern(object):
    def __init__(self, data):
        self._compiler = Compiler()
        self._data = data
        self._compiled = None

    def compile(self):
        pattern = self._compiled = self._compiler.compile(self._data)
        return pattern

    def match(self, data):
        return self._compiled.match(data) if self._compiled else self.compile().match(data)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.