pyelasticsearch / pyelasticsearch Goto Github PK

View Code? Open in Web Editor NEW

359.0 359.0 149.0 569 KB

python elasticsearch client

License: BSD 3-Clause "New" or "Revised" License

Python 98.06% Shell 1.94%

pyelasticsearch's Introduction

Please see http://pyelasticsearch.readthedocs.org/ or the docs folder.

pyelasticsearch's People

Stargazers

Watchers

Forkers

mootpointer toastdriven gnomix pcdinh scottynomad msabramo lslatr-zz straup hardtke erikrose schuyler acdha smaddineni dstufft knutwalker sbellem chrisfauerbach zbyte64 jasisz yipit yayalice jromeem binarymatt robertour j0hnsmith sunscrapers ucsd-ties s55-labs btimby lambdafu nak123 gnurag josephdrose bkjones wwoods patricksmith honzakral peopledoc pombredanne dmpeters mpetyx insurancezebra entequak georgi0u kul lxyu lexmachinainc brainbot-com rosscdh dulacp sherifzain holg danielo konarkmodi pabluk erkarl bluestalker sixtynine botify-labs kalugny msalvadores mythmon darklow maykinmedia tbrooks007 ixxi-mobility rboulton piykumar natim web5design wk8 awentzonline infoxchange whitmo buildingenergy colony005 worksology exitio dokai waytai detonavomek-zz willingc altmetric rholder pac78275 hydrologic gcarothers hankers caidongyun ramhiser cboylan vidyar simudream i3visio rlugojr longlongtech dmdv fastingsamurai pelmers leeeandroo

pyelasticsearch's Issues

Is simplejson really necessary as library requirement?

Hi,

As all you know Python has builtin json support from 2.6 version (I think it is lower supported version of pyelasticsearch), so my question is why you place simplejson as library requirement?

For support decimals? As you use custom JSON Encoder class is very easy to add decimal conversion there, like:

def default(self, value):
    ...
    if isinstance(value, decimal.Decimal):
        return str(value)
    ...

For catching json decode errors? I think it easy to swtich from JSONDecodeError to ValueError.

For speed issues? I'm not sure that simplejson much more faster even as much more slower than builtin json. They work about the same. If you need speed you may want look on https://github.com/esnme/ultrajson for example.

So, what is the point of installing and using simplejson, when I'm installing pyelasticsearch?

bulk_index doesn't work with ES 0.17.9

http://www.elasticsearch.org/guide/reference/api/bulk.html

That says the endpoints for _bulk are:

/_bulk
/{index}/_bulk
/{index}/{type}/_bulk

pyelasticsearch has a bulk_index method on ElasticSearch which takes an index and a doc_type, puts both of them into the action of the request body, and then uses the middle endpoint /{index}/_bulk.

That works with 0.19.11 (and probably later--I haven't sat down and tested much), but fails with 0.17.9 with this:

requests.packages.urllib3.connectionpool: DEBUG: "POST /inputindex/_bulk?op_type=create HTTP/1.1" 400 120

I did some skulking around and found this:

elastic/elasticsearch#1375

That suggests that the _bulk endpoint pyelasticsearch is using is new as of ES 0.18.0.

I kind of need to support 0.17.9 because I'm pretty sure Mozilla hasn't moved all our sites over to a later version, yet.

Anyone mind if I do a patch that changes the endpoint used to just /_bulk? Since we're putting the index and doctype in the action, that would work fine.

Stream the request body in bulk_index

…and perhaps elsewhere. This would let callers build their documents-to-index on the fly, with generators. See http://docs.python-requests.org/en/latest/user/advanced/#streaming-uploads.

simplejson is removed in Django 1.5

pyelasticsearch shoudl use json instead of simplejson, which is removed in Django 1.5.

Debug output produces good curl queries, but search() always returns 0 hits/failed query

my test document is a simple serial number/mac address mapping. These are indexed as strings and show up via the usual query-all (search?q=:_) via curl. Every combination of query string and query DSL I have tried always produces 0 hits, but the debug output shows a curl query which produces the correct (expected) results (see gist at https://gist.github.com/neurobashing/6024159).

My assumption is the query DSL is correct since curl produces correct results, and PyElasticsearch is somehow passing a bad value to the search.

Stop using eval in to_python()

It freaks me out. :-) (This is really a note to self, unless somebody else wants to hack on it.)

Support basic authentication

I've been playing a little with this project and deployment on Heroku, and it sounds like pyelasticsearch support only partially basic authentication.

I mean, by inspecting requests made, there is no use of the header :

Authorization: Basic dXNlcjpwYXNzd29yZA0K

where dXNlcjpwYXNzd29yZA0K is the concatened-base64-encoded string user:password.

Do you think, like I do, that it would improve the stability of the library if we provide the authentication like (suggested in the requests doc)[http://docs.python-requests.org/en/latest/user/authentication.html#basic-authentication] with the auth param.

All of this without adding any new parameter to the lib, but just by parsing the given url. (e.g. http://user:[email protected])

I will provide some code soon, as I need this to work for a project of mine.

Support for authentication

--- client.py.orig  2013-06-13 17:25:59.000000000 +0930
+++ client.py   2013-06-13 17:31:04.000000000 +0930
@@ -106,7 +106,7 @@
     This object is thread-safe. You can create one instance and share it
     among all threads.
     """
-    def __init__(self, urls, timeout=60, max_retries=0, revival_delay=300):
+    def __init__(self, urls, timeout=60, max_retries=0, revival_delay=300, auth=None):
         """
         :arg urls: A URL or iterable of URLs of ES nodes. These are full URLs
             with port numbers, like ``http://elasticsearch.example.com:9200``.
@@ -127,6 +127,7 @@
         self.logger = getLogger('pyelasticsearch')
         self.session = requests.session()
         self.json_encoder = JsonEncoder
+        self.auth = auth

     def _concat(self, items):
         """
@@ -235,7 +236,7 @@
             try:
                 resp = req_method(
                     url,
-                    timeout=self.timeout,
+                    timeout=self.timeout, auth=self.auth,
                     **({'data': request_body} if body else {}))
             except (ConnectionError, Timeout):
                 self.servers.mark_dead(server_url)

Raise a special exception if the index already exists when you try to create one

We should have an unambiguous way of saying the equivalent of pyes's create_index_unless_exists. And delete_index_if_exists, for that matter.

in the docs count() implies 3 arguments when it only accepts 2

In the docs count should be like this:

count(query[, other kwargs listed below])
http://goo.gl/NGDRi

As oppose to this:

count(query, index, doc_type[, other kwargs listed below])
http://goo.gl/8OYta

Consider reordering args of delete_by_query.

index and doc_type are (newly?) optional in ES, so query should come first. We might be able to get away with this, since delete_by_query might not work as is: #29.

search() example in docs doesn't work

The conn.search(query) example in the readme doesn't seem to work:

conn.search({ "query_string": { "query": "name:tester" }, "filtered": { "filter": { "range": { "age": { "from": 27, "to": 37\

DEBUG 2012-08-30 22:41:39,938 connectionpool "GET /_search?q=%7B%27query_string%27%3A+%7B%27query%27%3A+%27name%3Atester%27%7D%2C+%27filtered%27%3A+%7B%27filter%27%3A+%7B%27range%27%3A+%7B%27age%27%3A+%7B%27to%27%3A+37%2C+%27from%27%3A+27%7D%7D%7D%7D%7D HTTP/1.1" 500 64354
*** ElasticSearchError: Non-OK status code returned (500) containing u'SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[dH2uhViYRnyupZTQrGM0IQ][api_small][16]: ...

In short, it tried to stick the query in a querystring arg rather than in the request body. Maybe we should change the example to use the body kwarg. At any rate, I got my code working, but I thought I'd mention it.

Use logging modul properly

Please do not use root logger in your package (e.g. logging.debug(...))

Instead of this create your logging instance and use one (all logging.debug(..) will be converted into log.debug(...))

log = logging.getLogger(name)

class ElasticSearch(object):
....
def _send_request(self, method, path, body="", querystring_args={}):
...
log.debug("making %s request to path: %s %s %s with body: %s" % (method, self.host, self.port, path, body))

delete_by_query() won't delete something that search() finds.

I have a query like this:

query = {'query': {'term': {'url.exact': 'https://my.url.com'}}}

When I call it with es.search(query), it does return the correct result.

However, using the same query with: es.delete_by_query('my_index_name', 'my_doc_type', query), nothing gets deleted.

Here's the log:
b.es.delete_by_query('my_index_name', 'my_doc_type', query)
making DELETE request to path: http://localhost:9200/my_index_name/my_doc_type/_query /my_index_name/my_doc_type/_query with body: {"query": {"term": {"url.exact": "https://my.url.com"}}}
response status: 200
got response {u'ok': True, u'_indices': {u'my_index_name': {u'_shards': {u'successful': 0, u'failed': 5, u'total': 5}}}}

Make ElasticSearch.json_encoder private or something

I'm a little flummoxed at what the extensibility story is supposed to be for JSON encoding. It looks like there are 2 equivalent hook points:

Subclassing ElasticSearch and overriding from_python
Sticking an alternative encoder class into ElasticSearch.json_encoder

Have 1.

How to create id automatically?

es.index("test","rest",{'name':'dd'})
{u'_type': u'rest', u'_id': u'None', u'ok': True, u'_version': 2, u'_index': u'test'}

id is None!!!

es.index("test","rest",{'name':'dd'},id='')
pyelasticsearch.exceptions.InvalidJsonResponseError: <Response [400]>

What to do if i dont want to have my own ids? Or is this some deliberate design to force to chose own ids?

Adding timeout support

Timeout support on the HTTP connections (which isn't present in the Python stdlib in Python 2.5 & below - http://docs.python.org/library/httplib.html#httplib.HTTPConnection) would be a very welcome addition. I'm liking Requests (http://docs.python-requests.org/) quite a bit these days, though httplib2 is also decent. If I offered to add this & submit a pull request, would you be interested?

Upload to PyPI

It'd be nice, especially for my end users, if this module were installable off PyPI using pip or easy_install.

Assuming you have (or setup) a PyPI account, it should be just python setup.py register then python setup.py sdist upload.

Won't work with Python 2.5

Unfortunately the module can't be built with Python 2.5, since the dependency "requests-1.1.0" uses the as keyword in "except Empty as a" (in /requests-1.1.0/requests/packages/urllib3/connectionpool.py", line 441).

While this is out of reach to pyelasticsearch, this could be mentioned in the README.

IndexMissingException

After 2 solid days without a single problem on the connection pooling commit, I suddenly get this:

'IndexMissingException[[modelresult] missing]'

in console. With code 404, obviously.

I can't (yet) tell you how to reproduce this. It just suddenly happened. Maybe a connection got dropped or something?

Update pypi

The current version on PyPi has not updated and line "prefetch=True" leads to an exception in django.

It would nice if the index was updated with the current 1.0 version.

You should really abandon requests, use pycurl instead

Each time I try to index, or search use py elasticsearch, it takes nearly 1 second to establish the connection. I can't image what will happen if I use requests to index 40,000,000 mongodb records..

When I switch requests to pycurl, it takes only 1/1000 second.

And there're minor error in your setup.py, some doesn't exists file blocks the install.

Why are index and doc_types kwargs in search()?

What do they default to? Do they have reasonable defaults? What was my thinking there? Should they be args like in the other methods?

more_like_this sends fields instead of mlt_fields

https://github.com/rhec/pyelasticsearch/blob/master/pyelasticsearch/client.py#L562

There's a TODO there that suggests that the code is wrong. I'm fairly sure that my testing indicates this is correct with at least ElasticSearch 0.19.11 (I haven't checked earlier versions of ES, yet).

Implement delete_by_query method

def delete_by_query(self, index, doc_type, query):
    """
    Delete a typed JSON documents from a specific index based on query
    """

    path = self._make_path([index, doc_type, '_query'])
    response = self._send_request('DELETE', path, query)
    return response

Installation guide

Hi there,

This might seem obvious, but your documentation does not seem to have any installation instructions whatsoever, i.e. things to do before actually starting to play with the API. Most people will probably figure out to use pip or python setup.py install, but still some best use guidelines would be kind of nice to have..

Cheers,
Martin

Update bulk_index to enable all options

We currently cannot:

use _parent, _routing, _ttl and other fields from http://www.elasticsearch.org/guide/reference/api/bulk.html in the action
specify different index/doc_type per document

The same interface design should be then used for the multi_get proposal (#37) and would fix #68.

-- (Erik starts talking here.) --

Using http://www.elasticsearch.org/guide/reference/api/bulk.html as an example, here are where various params need to land. As you can see, we have quite a lawn-sprinkler effect.

In doc:

parent
ttl
id
timestamp
version

In meta:

percolator
index
doc_type

Dunno:

version_type

Once per bulk call:

refresh
consistency

Problem: right now, index and doc_type are required args for bulk_index. These need to be optional so they can be specified per document. But docs needs to be required, so the argument order needs to change. Fun! We might end up making a whole new routine to handle the power-user cases.

Rather than muck up (and break backward compatibility of) bulk_index for the unusual power-user use case, we'll introduce a new routine, bulk, which does arbitrary bulk operations. Then the naming is more accurate anyway. It might look something like this:

def bulk(self, docs, index=None, doc_type=None, id_field='id' (maybe this should die)):
    """
    :arg docs: An iterable of things, each of which is either a dict representing a doc or a tuple (or something, maybe) representing (action dict, doc dict). Or something or something.
    """

Custom JSON encoder's default() doesn't call super()

We should. Otherwise, we'll happily return None for unhandle-able values rather than raising TypeError.

support python -OO optimize option

in client.py line 46,
doc = method.doc

will raise error when optimize option is used , since the doc is none , and cannot be iterated line 49

Travis tests fail

We seem to get different results from ES than we expected. willkg suggests that perhaps we're outrunning ES and should stick some sleep() calls in or something. refresh() is async, unfortunately.

Can't get any valid responses with pyelasticsearch

Using django 1.4.3
Using haystack - dev version
Using pyelasticsearch - latest zip

(venv) C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo>manage.py shell

>>> from pyelasticsearch import ElasticSearch
>>> conn = ElasticSearch('http://localhost:9200')
>>> conn.status()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 89, in decorate
    return func(*args, query_params=query_params, **kwargs)
  File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 598, in status
    query_params=query_params)
  File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 232, in send_request
    prepped_response = self._decode_response(resp)
  File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 249, in _decode_response
    json_response = response.json()


TypeError: 'dict' object is not callable
>>>

However, I can go to my browser and...

I hope I'm just missing some configuration variable or something! Going nuts here!

requests==0.14.1

Thank you - thank you - thank you for your awesome work with this and haystack.

Upgrading to this version fixes an issue I was having with phrase based searching.

Could you make one small change? Indicate the pyelasticsearch 0.2 requires:
requests==0.14.1

Otherwise with requests==0.13.2 you get the following error:
from requests.compat import json
ImportError: cannot import name json

Thanks!

blocking connectionpool injection - api help ?

I am using pyelasticsearch and am doing a good amount of bulk indexing ( bulk_index).

I see the following in the logs.

2013-05-05 05:21:34,100 INFO requests.packages.urllib3.connectionpool:191 Starting new HTTP connection (403): [ip address of es host]
...
2013-05-05 05:31:32,348 INFO requests.packages.urllib3.connectionpool:191 Starting new HTTP connection (522): [ip address of es host]

Some quick questions:

The number for new 'http connection' seems to be increasing.

How would I set the max number of connections per (ES) host before blocking so as to not overflood it by connections. Any code files /classes to start looking into ?

Thanks !

requests-0.13.1 vs ElasticSearch._decode_response

Hi,

I'm using a Gentoo system and there is a requests library in version 0.13.1 where requests.Response.json is a property, not a function.

so, calling:

es.index(..)

causes an exception in line 280, in _decode_response:

    json_response = response.json()

Could you please first check whether response.json is callable?

   json_response = response.json
   if callable(json_response):
        json_response = json_response()

According to https://github.com/kennethreitz/requests/blob/master/HISTORY.rst the change has been made in 1.0.0 (2012-12-17).

Thanks!

Close HTTPConnection after using

You need to close HTTPConnection in _send_request method b/c for indexation of large collection of documents you may get an exception socket.error "Address already in use".

So simply the following lines after response = self._prep_response(response.read())
os.close(conn.sock.fileno())
try:
conn.close()
except Exception, e:
log.debug("_send_request(): got an exception during closing connection: %r" % e)

There is b/c conn.close() does not close socket

Upgrade to requests 1.0

An API-stable version of requests is out. We should upgrade to it and require it, as there are some backward-incompatible changes that affect us.

tests seem to be ES-version specific

There are a bunch of tests that fail for me with Elasticsearch 0.20.4:

(pyelasticsearch) (106-info=88950 run_tests.sh) saturn ~/mozilla/pyelasticsearch> ./run_tests.sh 

ERROR: pyelasticsearch.tests.client_tests:IndexingTestCase.test_percolate
  vim +324 pyelasticsearch/tests/client_tests.py  # test_percolate
    result = self.conn.percolate('test-index','test-type', document)
  vim +96  pyelasticsearch/client.py  # decorate
    return func(*args, query_params=query_params, **kwargs)
  vim +984 pyelasticsearch/client.py  # percolate
    doc, query_params=query_params)
  vim +254 pyelasticsearch/client.py  # send_request
    self._raise_exception(resp, prepped_response)
  vim +268 pyelasticsearch/client.py  # _raise_exception
    raise error_class(response.status_code, error_message)
ElasticHttpError: (500, u'NoShardAvailableActionException[[_na][_na] No shard available for [org.elasticsearch.action.percolate.PercolateRequest@25aaaa]]')
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index' -d 'null'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'acknowledged': True, u'ok': True}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/_percolator/test-index/id_1' -d '{"query": {"match": {"name": "Joe"}}}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /_percolator/test-index/id_1 HTTP/1.1" 201 81
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-index', u'_id': u'id_1', u'ok': True, u'_version': 1, u'_index': u'_percolator'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/_percolator/test-index/id_2' -d '{"query": {"match": {"name": "not_that_guy"}}}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /_percolator/test-index/id_2 HTTP/1.1" 201 81
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-index', u'_id': u'id_2', u'ok': True, u'_version': 1, u'_index': u'_percolator'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/_percolate' -d '{"doc": {"name": "Joe"}}'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/_percolate HTTP/1.1" 500 152
pyelasticsearch: DEBUG: response status: 500
--------------------- >> end captured logging << ---------------------

FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_mlt
  vim +402 pyelasticsearch/tests/client_tests.py  # test_mlt
    self.assert_result_contains(result, {'hits': {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_source': {'name': 'Joe Test'}, '_index': 'test-index'}], 'total': 1, 'max_score': 0.19178301}})
  vim +27  pyelasticsearch/tests/__init__.py  # assert_result_contains
    eq_(value, result[key])
AssertionError: {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_index': 'test-index', '_source': {'name': 'Joe Test'}}], 'total': 1, 'max_score': 0.19178301} != {u'hits': [{u'_score': 0.18116833, u'_type': u'test-type', u'_id': u'3', u'_source': {u'name': u'Joe Test'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.18116833}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/3' -d '{"name": "Joe Test"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'3', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1 HTTP/1.1" 200 235
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 0.18116833, u'_type': u'test-type', u'_id': u'3', u'_source': {u'name': u'Joe Test'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.18116833}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}
--------------------- >> end captured logging << ---------------------

FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_mlt_fields
  vim +435 pyelasticsearch/tests/client_tests.py  # test_mlt_fields
    {u'hits': {u'hits': [{u'_score': 0.30685282, u'_type': u'test-type', u'_id': u'4', u'_source': {u'sport': u'football', u'name': u'Cam'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.30685282}})
  vim +27  pyelasticsearch/tests/__init__.py  # assert_result_contains
    eq_(value, result[key])
AssertionError: {u'hits': [{u'_score': 0.30685282, u'_type': u'test-type', u'_id': u'4', u'_index': u'test-index', u'_source': {u'sport': u'football', u'name': u'Cam'}}], u'total': 1, u'max_score': 0.30685282} != {u'hits': [{u'_score': 1.5108256, u'_type': u'test-type', u'_id': u'4', u'_source': {u'sport': u'football', u'name': u'Cam'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 1.5108256}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/3' -d '{"sport": "football", "name": "Angus"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'3', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/4' -d '{"sport": "football", "name": "Cam"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/4 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'4', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/5' -d '{"sport": "baseball", "name": "Sophia"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/5 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'5', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/3/_mlt?min_doc_freq=1&mlt_fields=sport&min_term_freq=1' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/3/_mlt?min_doc_freq=1&mlt_fields=sport&min_term_freq=1 HTTP/1.1" 200 249
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 1.5108256, u'_type': u'test-type', u'_id': u'4', u'_source': {u'sport': u'football', u'name': u'Cam'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 1.5108256}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}
--------------------- >> end captured logging << ---------------------

FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_mlt_with_body
  vim +424 pyelasticsearch/tests/client_tests.py  # test_mlt_with_body
    {'hits': {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_source': {'age': 16, 'name': 'Joe Justin'}, '_index': 'test-index'}], 'total': 1, 'max_score': 0.19178301}})
  vim +27  pyelasticsearch/tests/__init__.py  # assert_result_contains
    eq_(value, result[key])
AssertionError: {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_index': 'test-index', '_source': {'age': 16, 'name': 'Joe Justin'}}], 'total': 1, 'max_score': 0.19178301} != {u'hits': [{u'_score': 0.15891947, u'_type': u'test-type', u'_id': u'3', u'_source': {u'age': 16, u'name': u'Joe Justin'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.15891947}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"age": 22, "name": "Joe Test"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 200 76
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 2, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/3' -d '{"age": 16, "name": "Joe Justin"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'3', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1' -d '{"filter": {"fquery": {"query": {"range": {"age": {"to": 20, "from": 10}}}}}}'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1 HTTP/1.1" 200 248
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 0.15891947, u'_type': u'test-type', u'_id': u'3', u'_source': {u'age': 16, u'name': u'Joe Justin'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.15891947}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 2, u'timed_out': False}
--------------------- >> end captured logging << ---------------------

FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_search_by_field
  vim +362 pyelasticsearch/tests/client_tests.py  # test_search_by_field
    self.assert_result_contains(result, {'hits': {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '1', '_source': {'name': 'Joe Tester'}, '_index': 'test-index'}], 'total': 1, 'max_score': 0.19178301}})
  vim +27  pyelasticsearch/tests/__init__.py  # assert_result_contains
    eq_(value, result[key])
AssertionError: {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '1', '_index': 'test-index', '_source': {'name': 'Joe Tester'}}], 'total': 1, 'max_score': 0.19178301} != {u'hits': [{u'_score': 0.625, u'_type': u'test-type', u'_id': u'1', u'_source': {u'name': u'Joe Tester'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.625}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/_search?q=name%3Ajoe' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/_search?q=name%3Ajoe HTTP/1.1" 200 227
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 0.625, u'_type': u'test-type', u'_id': u'1', u'_source': {u'name': u'Joe Tester'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.625}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}
--------------------- >> end captured logging << ---------------------

60 tests, 4 failures, 1 error in 11.6s

real    0m11.895s
user    0m0.796s
sys 0m0.092s

I'm pretty sure the failures are all related to different scores in the actual results than in the expected results.

I mentioned this to @erikrose a while back and then either he or I mentioned that it could be due to different Elasticsearch versions.

Anyhow, two things:

is this from different Elasticsearch versions?
if so, how do we change the tests so they're not finicky about the actual score numbers?

0.0.5 release on PyPi missing README.rst, which is referenced in setup.py

Example:

$> wget http://pypi.python.org/packages/source/p/pyelasticsearch/pyelasticsearch-0.0.5.tar.gz#md5=c24cd85848a057ec80010f22a1a063f6
$> pip install pyelasticsearch-0.0.5.tar.gz 
Unpacking ./pyelasticsearch-0.0.5.tar.gz
  Running setup.py egg_info for package from file:///Users/matt/python-bundle/libs/pyelasticsearch-0.0.5.tar.gz
    Traceback (most recent call last):
      File "<string>", line 14, in <module>
      File "/var/folders/+H/+HaoHbGjELiGx5lKc2j60E+++TI/-Tmp-/pip-gYZ0ww-build/setup.py", line 8, in <module>
        long_description=open('README.rst', 'r').read(),
    IOError: [Errno 2] No such file or directory: 'README.rst'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 14, in <module>

  File "/var/folders/+H/+HaoHbGjELiGx5lKc2j60E+++TI/-Tmp-/pip-gYZ0ww-build/setup.py", line 8, in <module>

    long_description=open('README.rst', 'r').read(),

IOError: [Errno 2] No such file or directory: 'README.rst'

Convert dates coming back from ES

It would be nice to expose dates that ES sends us as dates or datetimes or whatever. This would provide better symmetry, since we do this in the opposite direction. However, we're too low-level a lib to know what field types things are, and it'd be possible to guess wrong and turn a text field which happens to contain a date-like string into a date. Worse, this error could be triggered by user input.

We should either expose a callable for explicitly converting date-like strings or…

We could make a Frankentype that conforms to both the string and datetime interfaces, and you can treat it as you expect. Muhahaha.

more generalist bulk action

Some application like Kibana use aggressive index usage : one index per day.
Actualy bulk_index accept only one index.

Maybe bulk index can handle index inside document, like id or parent_id.

Maybe a transaction like API, you get a transaction object (mocking client api), using it, and finally, commiting it. Python redis client expose a similar API.

add support for getting ES information

If you do:

curl localhost:9200

It spits out some helpful information about the Elasticsearch you're connecting to that isn't available elsewhere:

saturn ~/> curl localhost:9200
{
  "ok" : true,
  "status" : 200,
  "name" : "Darkhawk",
  "version" : {
    "number" : "0.20.4",
    "snapshot_build" : false
  },
  "tagline" : "You Know, for Search"
}

It'd really help to have that available via pyelasticsearch.

I don't see this in the Elasticsearch docs anywhere. I bumped into it in the the forums:

https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/ZlcSVi4VrA8

I'm game for implementing it, but don't know offhand what the method should be called. Maybe info?

rtfd latest docs not rebuilding on commit?

http://pyelasticsearch.readthedocs.org/en/latest/

That showed that the documentation was for pyelasticsearch 1.0.

I kicked off a rebuild on rtfd (not sure why I have access to do that) and that updated the docs. I suspect that the docs aren't being rebuild on github commit to master or something isn't set up right or something like that.

Not to be snooty, but this is a big problem if this is the canonical documentation source.

Also, it's worth adding a separate set of docs following the 0.3 tag or a 0.3 branch.

DSL Filtered Range Query does not respect filter, giving wrong results

Results of a filtered range query, as in tests.py does not seem to respect the range filter, giving inaccurate results. In the following examples, only the "AgeBill Baloney" record should be returned.

CURL Examples:

curl -XDELETE 'http://127.0.0.1:9200/test-index/?pretty=1'

curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/_bulk?pretty=1' -d '
{"index" : {"_id" : 1}}
{"name" : "AgeJoe Tester", "age" : 25}
{"index" : {"_id" : 2}}
{"name" : "AgeBill Baloney", "age" : 35}
'
curl -XPOST 'http://127.0.0.1:9200/test-index/_refresh?pretty=1'

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?pretty=0' -d '
{ query:
{ "query_string": { "query": "name:Age*" },
"filtered" : {
"filter" : {
"range" : {
"age" : { "from" : 27, "to" : 37 }
}
}
}
}
}
'

results : 1

def testSearchByDSL(self):
    import simplejson as json
    self.conn.delete_index("test-index")

    self.conn.index({"name":"AgeJoe Tester", "age":25}, "test-index", "test-type", 1)
    self.conn.index({"name":"AgeBill Baloney", "age":35}, "test-index", "test-type", 2)

    self.conn.refresh(["test-index"])

    query = { 
        "query_string": { "query": "name:Age*" },
        "filtered": {
            "filter": {
                "range": {
                    "age": {"from": 27, "to": 37 },
                 },
            },
        },
    }

    result = self.conn.search("", body=json.dumps(query), indexes=['test-index'])
    print result.get('hits')  #should be 1
    self.assertEqual(result.get('hits').get('total'), 1)  #fails with 2

Upsert supported?

Sorry if this is just a usage question, but perhaps it will be a feature request.

Is upsert supported ? Or is the best approach to get an exception on update, otherwise insert?

Not sure this is the exact feature.
elastic/elasticsearch#2008

Cannot serialize Django geo data

I am using Django 1.4, Haystack 2.0.0-dev, pyelasticsearch 0.4.1 and elasticsearch 0.20.6, attempting to add geographical data to my search index, following the guidelines laid out here: http://django-haystack.readthedocs.org/en/latest/spatial.html#indexing

Specifically:

# models.py
from django.contrib.gis.geos import Point
from django.db import models


class Business(models.Model):
    latitude = models.FloatField()
    longitude = models.FloatField()

    def get_location(self):
        # Remember, longitude FIRST!
        return Point(self.longitude, self.latitude)

# search_indexes.py
from haystack import indexes
from models import Business


class BusinessIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)
    location = indexes.LocationField()

    def get_model(self):
        return Business

    def prepare_location(self, obj):
        return obj.get_location()

However, when I go to build my index I get:
ValueError: Circular reference detected

Is this something that would need a custom encoder somehow attached, or is this simply the wrong place for this bug?

Re-add to_python

The library Haystack uses the now removed to_python function (https://github.com/toastdriven/django-haystack/blob/master/haystack/backends/elasticsearch_backend.py#L586)

Any chance of adding it back in? Currently anyone who installs Celery with a fresh copy of pyelasticsearch will be hit in the face with errors. I know its technically not your fault (it was undocumented) but Celery has over 290 issues and 90 pull requests pending, so I don't think there is a chance of getting this fix in any time soon.

Replace info() command with the admin node info API

See a14a0fd#commitcomment-3429492. I didn't want to forget about it.

Address TODOs

Get the rest of the TODOs out of the code—or at least the ones that hint at things that would effect backward-incompatible API changes and thus block 1.0.

indexing document without id bug

When calling index() method without id parameter, exception is throwing. This is because due to ES documentation ( http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/ ) when we want to use automatic id generation POST method should be used instead of PUT.

DSL-type queries don't work

Creating a defined query results in a 500 error:

self.conn.index({"name":"Joe Tester", "age":30}, "test-index", "test-type", 1)
self.conn.index({"name":"Bill Baloney", "age":35}, "test-index", "test-type", 2)
query = { "query": { "query_string": { "query": "name:joe" } } }
result = self.conn.search(query)

fails. But setting just a plain query such as query = "name:Tester" works.

The same query run via curl works:

curl -XGET "http://localhost:9200/_all/_search?pretty=true" -d '
{"query": {"query_string": {"query": "name:joe"}}}'

The example DSL-type query in the README doesn't work either. Perhaps a more clear example can be provided for how to use DSL-type queries. It's not yet clear to me whether there's a bug that prevents DSL-type requests from being processed correctly, or if I'm passing arguments incorrectly.