Giter Site home page Giter Site logo

dotty's Introduction

EFILTER Query Language

EFILTER is a general purpose query language designed to be embedded in Python applications and libraries. It supports SQL-like syntax to filter your application's data and provides a convenient way to directly search through the objects your applications manages.

A second use case for EFILTER is to translate queries from one query language to another, such as from SQL to OpenIOC and so on. A basic SQL-like syntax and a POC lisp implementation are included with the language, and others are relatively simple to add.

Projects using EFILTER:

Quick examples of integration.

from efilter import api
api.apply("5 + 5") # => 10

# Returns [{"name": "Alice"}, {"name": "Eve"}]
api.apply("SELECT name FROM users WHERE age > 10",
          vars={"users": ({"age": 10, "name": "Bob"},
                          {"age": 20, "name": "Alice"},
                          {"age": 30, "name": "Eve"}))

You can also filter custom objects:

# Step 1: have a custom class.

class MyUser(object):
    ...

# Step 2: Implement a protocol (like an interface).

from efilter.protocols import structured
structured.IStructured.implement(
    for_type=MyUser,
    implementations: {
        structured.resolve: lambda user, key: getattr(user, key)
    }
)

# Step 3: EFILTER can now use my class!
from efilter import api
api.apply("SELECT name FROM users WHERE age > 10 ORDER BY age",
          vars={"users": [MyUser(...), MyUser(...)]})

Don't have SQL injections.

EFILTER supports query templates, which can interpolate unescaped strings safely.

# Replacements are applied before the query is compiled.
search_term = dangerous_user_input["name"]
api.apply("SELECT * FROM users WHERE name = ?",
          vars={"users": [...]},
          replacements=[search_term])

# We also support keyword replacements.
api.apply("SELECT * FROM users WHERE name = {name}",
          vars={"users": [...]},
          replacements={"name": search_term})

Basic IO is supported, including CSV data sets.

# Builtin IO functions need to be explicitly enabled.
api.apply("SELECT * FROM csv(users.csv) WHERE name = 'Bob'", allow_io=True)

Language Reference

Work in progress.

Protocol documentation

Work in progress.

Example projects

Several sample projects are provided.

  • examples/star_catalog: filters a large CSV file with nearby star systems
  • examples/tagging: use a custom query format

License and Copyright

Copyright 2015 Google Inc. All Rights Reserved

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contributors

Adam Sindelar

dotty's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dotty's Issues

Membership operator with None causes typerror.

This query:

IfElse(
Union(Membership(
 Literal('last message repeated')
 Var('body')))
Literal('repeated')
Literal(None))

Causes a TypeError when the "body" var is None. Stacktrace:

  File "/usr/lib/python2.7/dist-packages/efilter/api.py", line 125, in apply
    results = solve.solve(query, vars).value
  File "/usr/lib/python2.7/dist-packages/efilter/dispatch.py", line 193, in __call__
    return implementation(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/efilter/transforms/solve.py", line 208, in solve_query
    return solve(query.root, vars)
  File "/usr/lib/python2.7/dist-packages/efilter/dispatch.py", line 193, in __call__
    return implementation(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/efilter/transforms/solve.py", line 403, in solve_ifelse
    if boolean.asbool(solve(condition, vars).value):
  File "/usr/lib/python2.7/dist-packages/efilter/dispatch.py", line 193, in __call__
    return implementation(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/efilter/transforms/solve.py", line 644, in solve_union
    result = solve(child, vars)
  File "/usr/lib/python2.7/dist-packages/efilter/dispatch.py", line 193, in __call__
    return implementation(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/efilter/transforms/solve.py", line 771, in solve_membership
    return Result(needle in values, ())
  File "/usr/lib/python2.7/dist-packages/efilter/transforms/solve.py", line 162, in __solve_and_destructure_repeated
    values = iter(__solve_for_repeated(expr, vars))
TypeError: 'NoneType' object is not iterable

Applying the output of one query to the input of another is broken

This might be a design problem with the efilter implementation. What is happening is that when Apply() is called it assigns a list if there is more that one result and a scalar if there only one result. This creates a situation where a query will fail depending on the number of results returned.

For example:

[1] Live (API) 00:50:29> select Hashes, Path from hash(paths: (select path from glob("/home/*/.ssh/*")).path.filename)
2017-08-06 00:50:29,880:DEBUG:rekall.1:Running plugin (search) with args (()) kwargs ({'query': u'select Hashes, Path from hash(paths: (select path from glob("/home/*/.ssh/*")).path.filename)'})
                    Hashes                                     Path                                                                                                                     
---------------------------------------------- ------------------------------------                                                                                                     
sha1 41b5e39b28e3c39ec9e376624b40a0c3544b6958  /home/scudette/.ssh/authorized_keys                                                                                                      
sha1 188e0f24996ee87b6082e786aacd48bb565fc997  /home/scudette/.ssh/id_rsa.pub                                                                                                           
sha1 ced1cfb8e52b7fc0a75df8bd13ac0466e8bad2a6  /home/scudette/.ssh/id_rsa                          

This works fine because the glob expands to three files and this produces a list which is passed to the paths arg of the hash plugin (which expects a list).

However the following query breaks:

[1] Live (API) 00:51:44> select Hashes, Path from hash(paths: (select path from glob("/home/*/.ssh/authorized_keys")).path.filename)
2017-08-06 00:51:44,181:DEBUG:rekall.1:Running plugin (search) with args (()) kwargs ({'query': u'select Hashes, Path from hash(paths: (select path from glob("/home/*/.ssh/authorized_k
eys")).path.filename)'})
2017-08-06 00:51:44,382:CRITICAL:rekall.1:Traceback (most recent call last):
  File "/home/scudette/rekall/rekall-core/rekall/session.py", line 862, in RunPlugin
    result = plugin_obj.render(ui_renderer) or plugin_obj
  File "/home/scudette/rekall/rekall-core/rekall/plugins/common/efilter_plugins/search.py", line 777, in render
    rows = self.collect() or []
  File "/home/scudette/rekall/rekall-core/rekall/plugins/common/efilter_plugins/search.py", line 677, in collect
    result = self.solve()
....
  File "/home/scudette/rekall/rekall-core/rekall/plugin.py", line 608, in __init__
    value, session=kwargs.get("session"))
  File "/home/scudette/rekall/rekall-core/rekall/plugin.py", line 156, in parse
    raise TypeError("Arg %s must be a list of strings" % self.name)
TypeError: Arg paths must be a list of strings

This happens because this time, glob returns one result and so hashes gets a single entry not a list (with one entry):

> /home/scudette/projects/dotty/efilter/transforms/solve.py(348)solve_apply()
    346             args.append(solve(arg, vars).value)
    347 
--> 348     result = applicative.apply(func, args, kwargs)
    349 
    350     return Result(result, ())

ipdb> p func
<CommandWrapper: 'IRHash'>
ipdb> p kwargs
{u'paths': <rekall.plugins.response.common.FileSpec object at 0x7f0994a70710>}
ipdb> p args
[]
ipdb> expr.args
(Pair(
 Var(u'paths')
 Resolve(
  Resolve(
   Map(
    Apply(
     Var(u'glob')
     Literal(u'/home/*/.ssh/authorized_keys'))
    Bind(Pair(
     Literal(u'path')
     Var(u'path'))))
   Literal(u'path'))
  Literal(u'filename'))),)

Python 3 support is currently broken

Collecting efilter>=1-1.5 (from -r requirements.txt (line 14))
  Using cached efilter-1453815385.tar.gz
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/tmp/pip-build-6uvgthat/efilter/setup.py", line 18, in <module>
        from efilter import version
      File "./efilter/__init__.py", line 3, in <module>
        from efilter import ext
      File "./efilter/ext/__init__.py", line 1, in <module>
        from efilter.ext import indexset
      File "./efilter/ext/indexset.py", line 26, in <module>
        from efilter.protocols import indexable
      File "./efilter/protocols/indexable.py", line 22, in <module>
        from efilter import protocol
      File "./efilter/protocol.py", line 66, in <module>
        BUILTIN_TYPES = (int, float, long, complex, basestring, tuple, list, dict, set,
    NameError: name 'long' is not defined
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

      File "<string>", line 20, in <module>

      File "/tmp/pip-build-6uvgthat/efilter/setup.py", line 18, in <module>

        from efilter import version

      File "./efilter/__init__.py", line 3, in <module>

        from efilter import ext

      File "./efilter/ext/__init__.py", line 1, in <module>

        from efilter.ext import indexset

      File "./efilter/ext/indexset.py", line 26, in <module>

        from efilter.protocols import indexable

      File "./efilter/protocols/indexable.py", line 22, in <module>

        from efilter import protocol

      File "./efilter/protocol.py", line 66, in <module>

        BUILTIN_TYPES = (int, float, long, complex, basestring, tuple, list, dict, set,

    NameError: name 'long' is not defined

hello world example not working for me.

vars={"users": ({"age": 10, "name": "Bob"},{"age": 20, "name": "Alice"},{"age": 30, "name": "Eve"})}
print api.apply("SELECT name FROM users WHERE age > 10",vars)

ERROR: efilter.errors.EfilterKeyError: EfilterKeyError (users) in query 'SELECT name FROM >>> users <<< WHERE age > 10'

Efilter cant install

Efilter setup.py is broken because it tries to import efilter code before this is actually installed.

The symptom is that the installation breaks if six is not installed yet, because efilter tries to transitively import it.

Membership operator doesn't work as expected for Var expressions

This looks to me like two issues:

  • expressions that are values that contain strings don't trigger the string membership logic
  • expression values that are unicode strings don't trigger the string membership logic as six.strings_types contains only str, and not unicode.

Debugging values from breakpoint at

return Result(needle in values, ())

expr = {Membership} Membership(\n Literal('a message')\n Var('body'))
 _BinaryExpression__abstract = {bool} True
 _Expression__abstract = {bool} True
 arity = {int} 2
 children = {tuple} <type 'tuple'>: (Literal('a message'), Var('body'))
 element = {Literal} Literal('a message')
 end = {int} 25
 lhs = {Literal} Literal('a message')
 return_signature = {ABCMeta} <class 'efilter.protocols.boolean.IBoolean'>
 rhs = {Var} Var('body')
 set = {Var} Var('body')
  _Expression__abstract = {bool} True
  _ValueExpression__abstract = {bool} True
  arity = {int} 1
  children = {tuple} <type 'tuple'>: ('body',)
  end = {int} 4
  return_signature = {type} <class 'efilter.protocol.AnyType'>
  source = {str} 'body contains \\'a message\\''
  start = {int} 0
  type_signature = {tuple} <type 'tuple'>: (<type 'basestring'>,)
  value = {str} 'body'
 source = {str} 'body contains \\'a message\\''
 start = {int} 0
 type_signature = {tuple} <type 'tuple'>: (<class 'efilter.protocols.eq.IEq'>, <class 'efilter.protocols.iset.ISet'>)
needle = {str} 'a message'
values = {generator} <generator object __solve_and_destructure_repeated at 0x7fe778690550>
 gi_code = {code} <code object __solve_and_destructure_repeated at 0x7fe7944a81b0, file "/usr/lib/python2.7/dist-packages/efilter/transforms/solve.py", line 146>
 gi_frame = {frame} __solve_and_destructure_repeated [solve.py:146]  id:49887728
 gi_running = {int} 0
vars = {ScopeStack} ScopeStack(LibraryModule(name='stdcore', vars={'materialize': <efilter.stdlib.core.Materialize object at 0x7fe7944d2550>, 'singleton': <efilter.stdlib.core.SingletonReducer object at 0x7fe7944d24d0>, 'int': <type 'int'>, 'float': <type 'float'>, 'find': <e
 globals = {LibraryModule} LibraryModule(name='stdcore', vars={'materialize': <efilter.stdlib.core.Materialize object at 0x7fe7944d2550>, 'singleton': <efilter.stdlib.core.SingletonReducer object at 0x7fe7944d24d0>, 'int': <type 'int'>, 'float': <type 'float'>, 'find': <efilter.stdl
 locals = {TestEvtRecordEvent} <tests.analysis.tagging.TestEvtRecordEvent object at 0x7fe778684290>
  COMPARE_EXCLUDE = {frozenset} frozenset([u'store_number', u'display_name', u'uuid', u'data_type', u'timestamp', u'filename', u'store_index', u'tag', u'pathspec', u'inode'])
  CONTAINER_TYPE = {unicode} u'event'
  DATA_TYPE = {unicode} u'windows:evt:record'
  body = {unicode} u'this is a message'
  data_type = {unicode} u'windows:evt:record'
  display_name = {NoneType} None
  event_identifier = {int} 16
  filename = {NoneType} None
  hostname = {NoneType} None
  inode = {NoneType} None
  offset = {NoneType} None
  pathspec = {NoneType} None
  source_name = {unicode} u'Messaging'
  store_index = {NoneType} None
  store_number = {NoneType} None
  tag = {NoneType} None
  timestamp = {int} 1464181206000000
  uuid = {unicode} u'c3f4974cacc44430b8a1b63d17308e90'
 scopes = {list} <type 'list'>: [LibraryModule(name='stdcore', vars={'materialize': <efilter.stdlib.core.Materialize object at 0x7fe7944d2550>, 'singleton': <efilter.stdlib.core.SingletonReducer object at 0x7fe7944d24d0>, 'int': <type 'int'>, 'float': <type 'float'>, 'find

Flaky test

Minor issue noticed while running tox -epy27 that one of the tests is flaky:

======================================================================
FAIL: testQuery (unit.transforms.aslisp.AsLispTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "dotty/efilter_tests/unit/transforms/aslisp.py", line 50, in testQuery
    self.assertEqual(aslisp.aslisp(query), expected)
AssertionError: Tuples differ: ('map', ('filter', ('apply', (... != ('map', ('filter', ('apply', (...

First differing element 2:
('bind', (':', 'pid', ('.', ('var', 'proc'), 'pid')), (':', 1, ('.', ('.', ('var', 'proc'), 'parent'), 'pid')))
('bind', ('pair', 'pid', ('.', ('var', 'proc'), 'pid')), ('pair', 1, ('.', ('.', ('var', 'proc'), 'parent'), 'pid')))

  ('map',
   ('filter',
    ('apply', ('var', 'pslist')),
    ('==', ('.', ('var', 'proc'), 'command'), 'init')),
   ('bind',
-   (':', 'pid', ('.', ('var', 'proc'), 'pid')),
?     ^

+   ('pair', 'pid', ('.', ('var', 'proc'), 'pid')),
?     ^^^^

-   (':', 1, ('.', ('.', ('var', 'proc'), 'parent'), 'pid'))))
?     ^

+   ('pair', 1, ('.', ('.', ('var', 'proc'), 'parent'), 'pid'))))
?     ^^^^

pypi versions inconsistent - old version installed by default

It looks like the versioning has messed things up a little:

root@plaso-dev:~# pip install efilter -v
Downloading/unpacking efilter
  Ignoring link https://pypi.python.org/packages/b0/e2/70de27d289869013e2ae50381f6ba8fb2c07c3a4c955c48853604bde6770/efilter-1%211.3-py2-none-any.whl#md5=aaa22e3f5b2488b4de4586e9e3ba2e1e (from https://pypi.python.org/simple/efilter/), version 1%211.3 is a pre-release (use --pre to allow).
  Ignoring link https://pypi.python.org/packages/d7/6b/55b5fdc72dd620bac5b54120466fd624c98b14d8991ec9dce5ad7ffe6a58/efilter-1%211.2-py2-none-any.whl#md5=9bf50576b2b8e82811388badb214e83f (from https://pypi.python.org/simple/efilter/), version 1%211.2 is a pre-release (use --pre to allow).
  Using version **1453815385 (newest of versions: 1453815385, 1450268920, 1449139184, 1449128552, 1446301913, 1445943458, 1445495565, 1445494810, 1440489265, 1438631774, 1438631350, 1438624661, 1438555622, 1438555278, 1438554658, 1.2.post1, 1-1.0, 1, 1, 1, 1, 1)**
  Downloading efilter-1453815385.tar.gz (47kB): 

1453815385 is a version from January, which isn't current.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.