Please see http://pyelasticsearch.readthedocs.org/ or the docs folder.
pyelasticsearch / pyelasticsearch Goto Github PK
View Code? Open in Web Editor NEWpython elasticsearch client
License: BSD 3-Clause "New" or "Revised" License
python elasticsearch client
License: BSD 3-Clause "New" or "Revised" License
Please see http://pyelasticsearch.readthedocs.org/ or the docs folder.
Hi,
As all you know Python has builtin json support from 2.6 version (I think it is lower supported version of pyelasticsearch), so my question is why you place simplejson
as library requirement?
For support decimals? As you use custom JSON Encoder class is very easy to add decimal conversion there, like:
def default(self, value):
...
if isinstance(value, decimal.Decimal):
return str(value)
...
For catching json decode errors? I think it easy to swtich from JSONDecodeError to ValueError.
For speed issues? I'm not sure that simplejson much more faster even as much more slower than builtin json. They work about the same. If you need speed you may want look on https://github.com/esnme/ultrajson for example.
So, what is the point of installing and using simplejson, when I'm installing pyelasticsearch?
http://www.elasticsearch.org/guide/reference/api/bulk.html
That says the endpoints for _bulk
are:
/_bulk
/{index}/_bulk
/{index}/{type}/_bulk
pyelasticsearch has a bulk_index
method on ElasticSearch
which takes an index and a doc_type, puts both of them into the action of the request body, and then uses the middle endpoint /{index}/_bulk
.
That works with 0.19.11 (and probably later--I haven't sat down and tested much), but fails with 0.17.9 with this:
requests.packages.urllib3.connectionpool: DEBUG: "POST /inputindex/_bulk?op_type=create HTTP/1.1" 400 120
I did some skulking around and found this:
That suggests that the _bulk
endpoint pyelasticsearch is using is new as of ES 0.18.0.
I kind of need to support 0.17.9 because I'm pretty sure Mozilla hasn't moved all our sites over to a later version, yet.
Anyone mind if I do a patch that changes the endpoint used to just /_bulk
? Since we're putting the index and doctype in the action, that would work fine.
…and perhaps elsewhere. This would let callers build their documents-to-index on the fly, with generators. See http://docs.python-requests.org/en/latest/user/advanced/#streaming-uploads.
pyelasticsearch shoudl use json instead of simplejson, which is removed in Django 1.5.
my test document is a simple serial number/mac address mapping. These are indexed as strings and show up via the usual query-all (search?q=:_) via curl. Every combination of query string and query DSL I have tried always produces 0 hits, but the debug output shows a curl query which produces the correct (expected) results (see gist at https://gist.github.com/neurobashing/6024159).
My assumption is the query DSL is correct since curl produces correct results, and PyElasticsearch is somehow passing a bad value to the search.
It freaks me out. :-) (This is really a note to self, unless somebody else wants to hack on it.)
I've been playing a little with this project and deployment on Heroku, and it sounds like pyelasticsearch
support only partially basic authentication.
I mean, by inspecting requests made, there is no use of the header :
Authorization: Basic dXNlcjpwYXNzd29yZA0K
where
dXNlcjpwYXNzd29yZA0K
is the concatened-base64-encoded stringuser:password
.
Do you think, like I do, that it would improve the stability of the library if we provide the authentication like (suggested in the requests
doc)[http://docs.python-requests.org/en/latest/user/authentication.html#basic-authentication] with the auth
param.
All of this without adding any new parameter to the lib, but just by parsing the given url. (e.g. http://user:[email protected]
)
I will provide some code soon, as I need this to work for a project of mine.
--- client.py.orig 2013-06-13 17:25:59.000000000 +0930
+++ client.py 2013-06-13 17:31:04.000000000 +0930
@@ -106,7 +106,7 @@
This object is thread-safe. You can create one instance and share it
among all threads.
"""
- def __init__(self, urls, timeout=60, max_retries=0, revival_delay=300):
+ def __init__(self, urls, timeout=60, max_retries=0, revival_delay=300, auth=None):
"""
:arg urls: A URL or iterable of URLs of ES nodes. These are full URLs
with port numbers, like ``http://elasticsearch.example.com:9200``.
@@ -127,6 +127,7 @@
self.logger = getLogger('pyelasticsearch')
self.session = requests.session()
self.json_encoder = JsonEncoder
+ self.auth = auth
def _concat(self, items):
"""
@@ -235,7 +236,7 @@
try:
resp = req_method(
url,
- timeout=self.timeout,
+ timeout=self.timeout, auth=self.auth,
**({'data': request_body} if body else {}))
except (ConnectionError, Timeout):
self.servers.mark_dead(server_url)
We should have an unambiguous way of saying the equivalent of pyes's create_index_unless_exists. And delete_index_if_exists, for that matter.
In the docs count should be like this:
As oppose to this:
index and doc_type are (newly?) optional in ES, so query should come first. We might be able to get away with this, since delete_by_query might not work as is: #29.
The conn.search(query)
example in the readme doesn't seem to work:
conn.search({ "query_string": { "query": "name:tester" }, "filtered": { "filter": { "range": { "age": { "from": 27, "to": 37\
DEBUG 2012-08-30 22:41:39,938 connectionpool "GET /_search?q=%7B%27query_string%27%3A+%7B%27query%27%3A+%27name%3Atester%27%7D%2C+%27filtered%27%3A+%7B%27filter%27%3A+%7B%27range%27%3A+%7B%27age%27%3A+%7B%27to%27%3A+37%2C+%27from%27%3A+27%7D%7D%7D%7D%7D HTTP/1.1" 500 64354
*** ElasticSearchError: Non-OK status code returned (500) containing u'SearchPhaseExecutionException[Failed to execute phase [query], total failure; shardFailures {[dH2uhViYRnyupZTQrGM0IQ][api_small][16]: ...
In short, it tried to stick the query in a querystring arg rather than in the request body. Maybe we should change the example to use the body
kwarg. At any rate, I got my code working, but I thought I'd mention it.
Please do not use root logger in your package (e.g. logging.debug(...))
Instead of this create your logging instance and use one (all logging.debug(..) will be converted into log.debug(...))
log = logging.getLogger(name)
class ElasticSearch(object):
....
def _send_request(self, method, path, body="", querystring_args={}):
...
log.debug("making %s request to path: %s %s %s with body: %s" % (method, self.host, self.port, path, body))
I have a query like this:
query = {'query': {'term': {'url.exact': 'https://my.url.com'}}}
When I call it with es.search(query), it does return the correct result.
However, using the same query with: es.delete_by_query('my_index_name', 'my_doc_type', query), nothing gets deleted.
Here's the log:
b.es.delete_by_query('my_index_name', 'my_doc_type', query)
making DELETE request to path: http://localhost:9200/my_index_name/my_doc_type/_query /my_index_name/my_doc_type/_query with body: {"query": {"term": {"url.exact": "https://my.url.com"}}}
response status: 200
got response {u'ok': True, u'_indices': {u'my_index_name': {u'_shards': {u'successful': 0, u'failed': 5, u'total': 5}}}}
I'm a little flummoxed at what the extensibility story is supposed to be for JSON encoding. It looks like there are 2 equivalent hook points:
ElasticSearch.json_encoder
Have 1.
es.index("test","rest",{'name':'dd'})
{u'_type': u'rest', u'_id': u'None', u'ok': True, u'_version': 2, u'_index': u'test'}
id is None!!!
es.index("test","rest",{'name':'dd'},id='')
pyelasticsearch.exceptions.InvalidJsonResponseError: <Response [400]>
What to do if i dont want to have my own ids? Or is this some deliberate design to force to chose own ids?
Timeout support on the HTTP connections (which isn't present in the Python stdlib in Python 2.5 & below - http://docs.python.org/library/httplib.html#httplib.HTTPConnection) would be a very welcome addition. I'm liking Requests (http://docs.python-requests.org/) quite a bit these days, though httplib2 is also decent. If I offered to add this & submit a pull request, would you be interested?
It'd be nice, especially for my end users, if this module were installable off PyPI using pip
or easy_install
.
Assuming you have (or setup) a PyPI account, it should be just python setup.py register
then python setup.py sdist upload
.
Unfortunately the module can't be built with Python 2.5, since the dependency "requests-1.1.0" uses the as keyword in "except Empty as a" (in /requests-1.1.0/requests/packages/urllib3/connectionpool.py", line 441).
While this is out of reach to pyelasticsearch, this could be mentioned in the README.
After 2 solid days without a single problem on the connection pooling commit, I suddenly get this:
'IndexMissingException[[modelresult] missing]'
in console. With code 404
, obviously.
I can't (yet) tell you how to reproduce this. It just suddenly happened. Maybe a connection got dropped or something?
The current version on PyPi has not updated and line "prefetch=True" leads to an exception in django.
It would nice if the index was updated with the current 1.0 version.
Each time I try to index, or search use py elasticsearch, it takes nearly 1 second to establish the connection. I can't image what will happen if I use requests to index 40,000,000 mongodb records..
When I switch requests to pycurl, it takes only 1/1000 second.
And there're minor error in your setup.py, some doesn't exists file blocks the install.
What do they default to? Do they have reasonable defaults? What was my thinking there? Should they be args like in the other methods?
https://github.com/rhec/pyelasticsearch/blob/master/pyelasticsearch/client.py#L562
There's a TODO there that suggests that the code is wrong. I'm fairly sure that my testing indicates this is correct with at least ElasticSearch 0.19.11 (I haven't checked earlier versions of ES, yet).
def delete_by_query(self, index, doc_type, query):
"""
Delete a typed JSON documents from a specific index based on query
"""
path = self._make_path([index, doc_type, '_query'])
response = self._send_request('DELETE', path, query)
return response
Hi there,
This might seem obvious, but your documentation does not seem to have any installation instructions whatsoever, i.e. things to do before actually starting to play with the API. Most people will probably figure out to use pip or python setup.py install, but still some best use guidelines would be kind of nice to have..
Cheers,
Martin
We currently cannot:
The same interface design should be then used for the multi_get proposal (#37) and would fix #68.
-- (Erik starts talking here.) --
Using http://www.elasticsearch.org/guide/reference/api/bulk.html as an example, here are where various params need to land. As you can see, we have quite a lawn-sprinkler effect.
In doc:
In meta:
Dunno:
Once per bulk call:
Problem: right now, index
and doc_type
are required args for bulk_index
. These need to be optional so they can be specified per document. But docs
needs to be required, so the argument order needs to change. Fun! We might end up making a whole new routine to handle the power-user cases.
Rather than muck up (and break backward compatibility of) bulk_index
for the unusual power-user use case, we'll introduce a new routine, bulk
, which does arbitrary bulk operations. Then the naming is more accurate anyway. It might look something like this:
def bulk(self, docs, index=None, doc_type=None, id_field='id' (maybe this should die)):
"""
:arg docs: An iterable of things, each of which is either a dict representing a doc or a tuple (or something, maybe) representing (action dict, doc dict). Or something or something.
"""
We should. Otherwise, we'll happily return None for unhandle-able values rather than raising TypeError.
in client.py line 46,
doc = method.doc
will raise error when optimize option is used , since the doc is none , and cannot be iterated line 49
We seem to get different results from ES than we expected. willkg suggests that perhaps we're outrunning ES and should stick some sleep()
calls in or something. refresh()
is async, unfortunately.
Using django 1.4.3
Using haystack - dev version
Using pyelasticsearch - latest zip
(venv) C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo>manage.py shell
>>> from pyelasticsearch import ElasticSearch
>>> conn = ElasticSearch('http://localhost:9200')
>>> conn.status()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 89, in decorate
return func(*args, query_params=query_params, **kwargs)
File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 598, in status
query_params=query_params)
File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 232, in send_request
prepped_response = self._decode_response(resp)
File "C:\Users\Eric\Desktop\Projects\Python\engineereddemo\venv\engineereddemo/../Lib/site-packages\pyelasticsearch\client.py", line 249, in _decode_response
json_response = response.json()
TypeError: 'dict' object is not callable
>>>
However, I can go to my browser and...
I hope I'm just missing some configuration variable or something! Going nuts here!
Thank you - thank you - thank you for your awesome work with this and haystack.
Upgrading to this version fixes an issue I was having with phrase based searching.
Could you make one small change? Indicate the pyelasticsearch 0.2 requires:
requests==0.14.1
Otherwise with requests==0.13.2 you get the following error:
from requests.compat import json
ImportError: cannot import name json
Thanks!
I am using pyelasticsearch and am doing a good amount of bulk indexing ( bulk_index).
I see the following in the logs.
2013-05-05 05:21:34,100 INFO requests.packages.urllib3.connectionpool:191 Starting new HTTP connection (403): [ip address of es host]
...
2013-05-05 05:31:32,348 INFO requests.packages.urllib3.connectionpool:191 Starting new HTTP connection (522): [ip address of es host]
Some quick questions:
How would I set the max number of connections per (ES) host before blocking so as to not overflood it by connections. Any code files /classes to start looking into ?
Thanks !
Hi,
I'm using a Gentoo system and there is a requests library in version 0.13.1 where requests.Response.json is a property, not a function.
so, calling:
es.index(..)
causes an exception in line 280, in _decode_response:
json_response = response.json()
Could you please first check whether response.json is callable?
json_response = response.json
if callable(json_response):
json_response = json_response()
According to https://github.com/kennethreitz/requests/blob/master/HISTORY.rst the change has been made in 1.0.0 (2012-12-17).
Thanks!
You need to close HTTPConnection in _send_request method b/c for indexation of large collection of documents you may get an exception socket.error "Address already in use".
So simply the following lines after response = self._prep_response(response.read())
os.close(conn.sock.fileno())
try:
conn.close()
except Exception, e:
log.debug("_send_request(): got an exception during closing connection: %r" % e)
There is b/c conn.close() does not close socket
An API-stable version of requests is out. We should upgrade to it and require it, as there are some backward-incompatible changes that affect us.
There are a bunch of tests that fail for me with Elasticsearch 0.20.4:
(pyelasticsearch) (106-info=88950 run_tests.sh) saturn ~/mozilla/pyelasticsearch> ./run_tests.sh
ERROR: pyelasticsearch.tests.client_tests:IndexingTestCase.test_percolate
vim +324 pyelasticsearch/tests/client_tests.py # test_percolate
result = self.conn.percolate('test-index','test-type', document)
vim +96 pyelasticsearch/client.py # decorate
return func(*args, query_params=query_params, **kwargs)
vim +984 pyelasticsearch/client.py # percolate
doc, query_params=query_params)
vim +254 pyelasticsearch/client.py # send_request
self._raise_exception(resp, prepped_response)
vim +268 pyelasticsearch/client.py # _raise_exception
raise error_class(response.status_code, error_message)
ElasticHttpError: (500, u'NoShardAvailableActionException[[_na][_na] No shard available for [org.elasticsearch.action.percolate.PercolateRequest@25aaaa]]')
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index' -d 'null'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index HTTP/1.1" 200 31
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'acknowledged': True, u'ok': True}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/_percolator/test-index/id_1' -d '{"query": {"match": {"name": "Joe"}}}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /_percolator/test-index/id_1 HTTP/1.1" 201 81
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-index', u'_id': u'id_1', u'ok': True, u'_version': 1, u'_index': u'_percolator'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/_percolator/test-index/id_2' -d '{"query": {"match": {"name": "not_that_guy"}}}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /_percolator/test-index/id_2 HTTP/1.1" 201 81
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-index', u'_id': u'id_2', u'ok': True, u'_version': 1, u'_index': u'_percolator'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/_percolate' -d '{"doc": {"name": "Joe"}}'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/_percolate HTTP/1.1" 500 152
pyelasticsearch: DEBUG: response status: 500
--------------------- >> end captured logging << ---------------------
FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_mlt
vim +402 pyelasticsearch/tests/client_tests.py # test_mlt
self.assert_result_contains(result, {'hits': {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_source': {'name': 'Joe Test'}, '_index': 'test-index'}], 'total': 1, 'max_score': 0.19178301}})
vim +27 pyelasticsearch/tests/__init__.py # assert_result_contains
eq_(value, result[key])
AssertionError: {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_index': 'test-index', '_source': {'name': 'Joe Test'}}], 'total': 1, 'max_score': 0.19178301} != {u'hits': [{u'_score': 0.18116833, u'_type': u'test-type', u'_id': u'3', u'_source': {u'name': u'Joe Test'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.18116833}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/3' -d '{"name": "Joe Test"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'3', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1 HTTP/1.1" 200 235
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 0.18116833, u'_type': u'test-type', u'_id': u'3', u'_source': {u'name': u'Joe Test'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.18116833}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}
--------------------- >> end captured logging << ---------------------
FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_mlt_fields
vim +435 pyelasticsearch/tests/client_tests.py # test_mlt_fields
{u'hits': {u'hits': [{u'_score': 0.30685282, u'_type': u'test-type', u'_id': u'4', u'_source': {u'sport': u'football', u'name': u'Cam'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.30685282}})
vim +27 pyelasticsearch/tests/__init__.py # assert_result_contains
eq_(value, result[key])
AssertionError: {u'hits': [{u'_score': 0.30685282, u'_type': u'test-type', u'_id': u'4', u'_index': u'test-index', u'_source': {u'sport': u'football', u'name': u'Cam'}}], u'total': 1, u'max_score': 0.30685282} != {u'hits': [{u'_score': 1.5108256, u'_type': u'test-type', u'_id': u'4', u'_source': {u'sport': u'football', u'name': u'Cam'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 1.5108256}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/3' -d '{"sport": "football", "name": "Angus"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'3', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/4' -d '{"sport": "football", "name": "Cam"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/4 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'4', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/5' -d '{"sport": "baseball", "name": "Sophia"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/5 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'5', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/3/_mlt?min_doc_freq=1&mlt_fields=sport&min_term_freq=1' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/3/_mlt?min_doc_freq=1&mlt_fields=sport&min_term_freq=1 HTTP/1.1" 200 249
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 1.5108256, u'_type': u'test-type', u'_id': u'4', u'_source': {u'sport': u'football', u'name': u'Cam'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 1.5108256}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}
--------------------- >> end captured logging << ---------------------
FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_mlt_with_body
vim +424 pyelasticsearch/tests/client_tests.py # test_mlt_with_body
{'hits': {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_source': {'age': 16, 'name': 'Joe Justin'}, '_index': 'test-index'}], 'total': 1, 'max_score': 0.19178301}})
vim +27 pyelasticsearch/tests/__init__.py # assert_result_contains
eq_(value, result[key])
AssertionError: {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '3', '_index': 'test-index', '_source': {'age': 16, 'name': 'Joe Justin'}}], 'total': 1, 'max_score': 0.19178301} != {u'hits': [{u'_score': 0.15891947, u'_type': u'test-type', u'_id': u'3', u'_source': {u'age': 16, u'name': u'Joe Justin'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.15891947}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"age": 22, "name": "Joe Test"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 200 76
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 2, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/3' -d '{"age": 16, "name": "Joe Justin"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/3 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'3', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1' -d '{"filter": {"fquery": {"query": {"range": {"age": {"to": 20, "from": 10}}}}}}'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/test-type/1/_mlt?min_doc_freq=1&mlt_fields=name&min_term_freq=1 HTTP/1.1" 200 248
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 0.15891947, u'_type': u'test-type', u'_id': u'3', u'_source': {u'age': 16, u'name': u'Joe Justin'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.15891947}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 2, u'timed_out': False}
--------------------- >> end captured logging << ---------------------
FAIL: pyelasticsearch.tests.client_tests:SearchTestCase.test_search_by_field
vim +362 pyelasticsearch/tests/client_tests.py # test_search_by_field
self.assert_result_contains(result, {'hits': {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '1', '_source': {'name': 'Joe Tester'}, '_index': 'test-index'}], 'total': 1, 'max_score': 0.19178301}})
vim +27 pyelasticsearch/tests/__init__.py # assert_result_contains
eq_(value, result[key])
AssertionError: {'hits': [{'_score': 0.19178301, '_type': 'test-type', '_id': '1', '_index': 'test-index', '_source': {'name': 'Joe Tester'}}], 'total': 1, 'max_score': 0.19178301} != {u'hits': [{u'_score': 0.625, u'_type': u'test-type', u'_id': u'1', u'_source': {u'name': u'Joe Tester'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.625}
-------------------- >> begin captured logging << --------------------
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/1' -d '{"name": "Joe Tester"}'
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/1 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'1', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPUT 'http://localhost:9200/test-index/test-type/2' -d '{"name": "Bill Baloney"}'
requests.packages.urllib3.connectionpool: DEBUG: "PUT /test-index/test-type/2 HTTP/1.1" 201 76
pyelasticsearch: DEBUG: response status: 201
pyelasticsearch: DEBUG: got response {u'_type': u'test-type', u'_id': u'2', u'ok': True, u'_version': 1, u'_index': u'test-index'}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XPOST 'http://localhost:9200/test-index/_refresh' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "POST /test-index/_refresh HTTP/1.1" 200 59
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'ok': True, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}}
pyelasticsearch: DEBUG: Making a request equivalent to this: curl -XGET 'http://localhost:9200/test-index/_search?q=name%3Ajoe' -d '""'
requests.packages.urllib3.connectionpool: DEBUG: "GET /test-index/_search?q=name%3Ajoe HTTP/1.1" 200 227
pyelasticsearch: DEBUG: response status: 200
pyelasticsearch: DEBUG: got response {u'hits': {u'hits': [{u'_score': 0.625, u'_type': u'test-type', u'_id': u'1', u'_source': {u'name': u'Joe Tester'}, u'_index': u'test-index'}], u'total': 1, u'max_score': 0.625}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}
--------------------- >> end captured logging << ---------------------
60 tests, 4 failures, 1 error in 11.6s
real 0m11.895s
user 0m0.796s
sys 0m0.092s
I'm pretty sure the failures are all related to different scores in the actual results than in the expected results.
I mentioned this to @erikrose a while back and then either he or I mentioned that it could be due to different Elasticsearch versions.
Anyhow, two things:
Example:
$> wget http://pypi.python.org/packages/source/p/pyelasticsearch/pyelasticsearch-0.0.5.tar.gz#md5=c24cd85848a057ec80010f22a1a063f6
$> pip install pyelasticsearch-0.0.5.tar.gz
Unpacking ./pyelasticsearch-0.0.5.tar.gz
Running setup.py egg_info for package from file:///Users/matt/python-bundle/libs/pyelasticsearch-0.0.5.tar.gz
Traceback (most recent call last):
File "<string>", line 14, in <module>
File "/var/folders/+H/+HaoHbGjELiGx5lKc2j60E+++TI/-Tmp-/pip-gYZ0ww-build/setup.py", line 8, in <module>
long_description=open('README.rst', 'r').read(),
IOError: [Errno 2] No such file or directory: 'README.rst'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 14, in <module>
File "/var/folders/+H/+HaoHbGjELiGx5lKc2j60E+++TI/-Tmp-/pip-gYZ0ww-build/setup.py", line 8, in <module>
long_description=open('README.rst', 'r').read(),
IOError: [Errno 2] No such file or directory: 'README.rst'
It would be nice to expose dates that ES sends us as dates or datetimes or whatever. This would provide better symmetry, since we do this in the opposite direction. However, we're too low-level a lib to know what field types things are, and it'd be possible to guess wrong and turn a text field which happens to contain a date-like string into a date. Worse, this error could be triggered by user input.
We should either expose a callable for explicitly converting date-like strings or…
We could make a Frankentype that conforms to both the string and datetime interfaces, and you can treat it as you expect. Muhahaha.
Some application like Kibana use aggressive index usage : one index per day.
Actualy bulk_index accept only one index.
Maybe bulk index can handle index inside document, like id or parent_id.
Maybe a transaction like API, you get a transaction object (mocking client api), using it, and finally, commiting it. Python redis client expose a similar API.
If you do:
curl localhost:9200
It spits out some helpful information about the Elasticsearch you're connecting to that isn't available elsewhere:
saturn ~/> curl localhost:9200
{
"ok" : true,
"status" : 200,
"name" : "Darkhawk",
"version" : {
"number" : "0.20.4",
"snapshot_build" : false
},
"tagline" : "You Know, for Search"
}
It'd really help to have that available via pyelasticsearch.
I don't see this in the Elasticsearch docs anywhere. I bumped into it in the the forums:
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/ZlcSVi4VrA8
I'm game for implementing it, but don't know offhand what the method should be called. Maybe info
?
http://pyelasticsearch.readthedocs.org/en/latest/
That showed that the documentation was for pyelasticsearch 1.0.
I kicked off a rebuild on rtfd (not sure why I have access to do that) and that updated the docs. I suspect that the docs aren't being rebuild on github commit to master or something isn't set up right or something like that.
Not to be snooty, but this is a big problem if this is the canonical documentation source.
Also, it's worth adding a separate set of docs following the 0.3 tag or a 0.3 branch.
Results of a filtered range query, as in tests.py does not seem to respect the range filter, giving inaccurate results. In the following examples, only the "AgeBill Baloney" record should be returned.
CURL Examples:
curl -XDELETE 'http://127.0.0.1:9200/test-index/?pretty=1'
curl -XPOST 'http://127.0.0.1:9200/test-index/test-type/_bulk?pretty=1' -d '
{"index" : {"_id" : 1}}
{"name" : "AgeJoe Tester", "age" : 25}
{"index" : {"_id" : 2}}
{"name" : "AgeBill Baloney", "age" : 35}
'
curl -XPOST 'http://127.0.0.1:9200/test-index/_refresh?pretty=1'
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?pretty=0' -d '
{ query:
{ "query_string": { "query": "name:Age*" },
"filtered" : {
"filter" : {
"range" : {
"age" : { "from" : 27, "to" : 37 }
}
}
}
}
}
'
results : 1
def testSearchByDSL(self):
import simplejson as json
self.conn.delete_index("test-index")
self.conn.index({"name":"AgeJoe Tester", "age":25}, "test-index", "test-type", 1)
self.conn.index({"name":"AgeBill Baloney", "age":35}, "test-index", "test-type", 2)
self.conn.refresh(["test-index"])
query = {
"query_string": { "query": "name:Age*" },
"filtered": {
"filter": {
"range": {
"age": {"from": 27, "to": 37 },
},
},
},
}
result = self.conn.search("", body=json.dumps(query), indexes=['test-index'])
print result.get('hits') #should be 1
self.assertEqual(result.get('hits').get('total'), 1) #fails with 2
Sorry if this is just a usage question, but perhaps it will be a feature request.
Is upsert supported ? Or is the best approach to get an exception on update, otherwise insert?
Not sure this is the exact feature.
elastic/elasticsearch#2008
I am using Django 1.4, Haystack 2.0.0-dev, pyelasticsearch 0.4.1 and elasticsearch 0.20.6, attempting to add geographical data to my search index, following the guidelines laid out here: http://django-haystack.readthedocs.org/en/latest/spatial.html#indexing
Specifically:
# models.py
from django.contrib.gis.geos import Point
from django.db import models
class Business(models.Model):
latitude = models.FloatField()
longitude = models.FloatField()
def get_location(self):
# Remember, longitude FIRST!
return Point(self.longitude, self.latitude)
# search_indexes.py
from haystack import indexes
from models import Business
class BusinessIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
location = indexes.LocationField()
def get_model(self):
return Business
def prepare_location(self, obj):
return obj.get_location()
However, when I go to build my index I get:
ValueError: Circular reference detected
Is this something that would need a custom encoder somehow attached, or is this simply the wrong place for this bug?
The library Haystack uses the now removed to_python function (https://github.com/toastdriven/django-haystack/blob/master/haystack/backends/elasticsearch_backend.py#L586)
Any chance of adding it back in? Currently anyone who installs Celery with a fresh copy of pyelasticsearch will be hit in the face with errors. I know its technically not your fault (it was undocumented) but Celery has over 290 issues and 90 pull requests pending, so I don't think there is a chance of getting this fix in any time soon.
See a14a0fd#commitcomment-3429492. I didn't want to forget about it.
Get the rest of the TODOs out of the code—or at least the ones that hint at things that would effect backward-incompatible API changes and thus block 1.0.
When calling index() method without id parameter, exception is throwing. This is because due to ES documentation ( http://www.elasticsearch.com/docs/elasticsearch/rest_api/index/ ) when we want to use automatic id generation POST method should be used instead of PUT.
Creating a defined query results in a 500 error:
self.conn.index({"name":"Joe Tester", "age":30}, "test-index", "test-type", 1)
self.conn.index({"name":"Bill Baloney", "age":35}, "test-index", "test-type", 2)
query = { "query": { "query_string": { "query": "name:joe" } } }
result = self.conn.search(query)
fails. But setting just a plain query such as query = "name:Tester"
works.
The same query run via curl works:
curl -XGET "http://localhost:9200/_all/_search?pretty=true" -d '
{"query": {"query_string": {"query": "name:joe"}}}'
The example DSL-type query in the README doesn't work either. Perhaps a more clear example can be provided for how to use DSL-type queries. It's not yet clear to me whether there's a bug that prevents DSL-type requests from being processed correctly, or if I'm passing arguments incorrectly.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.