elastic / elasticsearch-py Goto Github PK

View Code? Open in Web Editor NEW

4.1K 410.0 1.2K 8.51 MB

Official Python client for Elasticsearch

Home Page: https://ela.st/es-python

License: Apache License 2.0

Python 99.58% Shell 0.40% Dockerfile 0.02%

python elasticsearch search client

elasticsearch-py's Introduction

Elasticsearch Python Client

The official Python client for Elasticsearch.

Features

Translating basic Python data types to and from JSON
Configurable automatic discovery of cluster nodes
Persistent connections
Load balancing (with pluggable selection strategy) across available nodes
Failed connection penalization (time based - failed connections won't be retried until a timeout is reached)
Support for TLS and HTTP authentication
Thread safety across requests
Pluggable architecture
Helper functions for idiomatically using APIs together

Installation

Download the latest version of Elasticsearch or sign-up for a free trial of Elastic Cloud.

Refer to the Installation section of the getting started documentation.

Connecting

Refer to the Connecting section of the getting started documentation.

Usage

Compatibility

Language clients are forward compatible; meaning that the clients support communicating with greater or equal minor versions of Elasticsearch without breaking. It does not mean that the clients automatically support new features of newer Elasticsearch versions; it is only possible after a release of a new client version. For example, a 8.12 client version won't automatically support the new features of the 8.13 version of Elasticsearch, the 8.13 client version is required for that. Elasticsearch language clients are only backwards compatible with default distributions and without guarantees made.

Elasticsearch version	elasticsearch-py branch	Supported
main	main
8.x	8.x	8.x
7.x	7.x	7.17

If you have a need to have multiple versions installed at the same time older versions are also released as elasticsearch7 and elasticsearch8.

Documentation

Documentation for the client is available on elastic.co and Read the Docs.

Feedback 🗣️

The engineering team here at Elastic is looking for developers to participate in research and feedback sessions to learn more about how you use our Python client and what improvements we can make to their design and your workflow. If you're interested in sharing your insights into developer experience and language client design, please fill out this short form. Depending on the number of responses we get, we may either contact you for a 1:1 conversation or a focus group with other developers who use the same client. Thank you in advance - your feedback is crucial to improving the user experience for all Elasticsearch developers!

License

elasticsearch-py's People

Contributors

Stargazers

Watchers

Forkers

nkvoll richardmcallister-wf kimchy socialpercon zcwfeng ixxi-mobility llonchj rboulton krenisko ronrothman obmarg pombredanne murhaf matttproud crvidya wuzesheng infoxchange nside brianhicks waytai cmladerman marcari grasskode gwillink hatchetation rbparrish jarda-manana sivaatpeel akx eliribble freimer raufridzuan kzwang almer-t marco-hoyer cread web5design genba tevinlord klbostee gifflen sampaccoud kosz85 llvtt starenka ypupuy gelim alexksikes ronnix cygnusnetworks speedplane darioblanco theoryno3 wkiser frewsxcv armagnac kmike jackzhou portante eyepulp martin-ly bigdbcloud shareablee farchanjo obihuang pkeeper kung-foo ezc tankbusta phill-tornroth untergeek benoss maxcl730 akarali lepture oscarhi agarwal-karan thor27 virtix prashiyn hudie afroisalreadyinu barroco59 natim gwecho mrphilroth konradkonrad krasimir-vanev mikeatm ambertch rinseio 0bsearch j0hnsmith messias72 ashangit lauxley sevenloo pigletto jmehnle ryanwang520

elasticsearch-py's Issues

Support for an aggregation query builder

New features in 1.00 could have better integration specifically Aggregation. A query builder and parser might be a good solution here given the nested nature of the aggregation queries. Yo @honzakral what's your thoughts on this?

Ranges in range aggregation returns float value for integer fields

I have a integer field int_field defined in mapping.

For the following aggregation:

{
  "_source": "false",
  "query": {
    "match_all": {}
  },
  "aggs": {
    "agg1": {
      "range": {
        "field": "int_field",
        "ranges": [
          {
            "to": 1
          },
          {
            "from": 1,
            "to": 201
          }
        ]
      }
    }
  }
}

The result from elasticsearch-py is:

{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5},
 u'aggregations': {u'agg1': {u'buckets': [{u'doc_count': 2, u'to': 1.0},
    {u'doc_count': 4, u'from': 1.0, u'to': 201.0}]}},
 u'hits': {u'hits': [...],
  u'max_score': 1.0,
  u'total': 8},
 u'timed_out': False,
 u'took': 2}

So the value of from and to in each bucket becomes a float value instead of integer, as defined in mapping.
I've also tried to put from and to in string, the result is always in float.

Tested on elasticsearch v 1.0.1 and elasticsearch-py v 1.0.0 and v0.4.4.

API support `put_aliases`

Can support be added for adding multiple aliases at once? Is there a reason it hasn't been implemented yet?

I can work on a pull request but wasn't sure if there was a reason it has been left out intentionally or development was in progress.

Less Specificity on Dependencies

Requiring requests==1.2.0 conflicts with other packages. As a library, it should really be more tolerable of versions and not require a precise release.

Support for 1.0.0.RC1?

This is more of a question than an issue. Is the client compatible with the recent 1.0.0.RC1 release?

Thanks,
Mike

Unicode keys of dictionary partially fail

Hi,

I found (in a very painful way:) ) that if there are unicode keys on a inner dictionary ES or the python library fails to add the object. e.g. for the following dictionary:

{
    u'city': u'Toronto',
    u'name': u'PostBeyond',
    u'events': {
        u'title': u'ExtremeCachingwithPHP',
        u'start_date': u'2014-01-08T00: 00: 00+00: 00'
    }
}

ES will process/add correctly city and name fields. However the inner events.title and events.start_date will not be added.

This will be correctly processed:

{
    'city': u'Toronto',
    'name': u'PostBeyond',
    'events': {
        'title': u'ExtremeCachingwithPHP',
        'start_date': u'2014-01-08T00: 00: 00+00: 00'
    }
}

As a workaround I tried the following which for some reason does not work. I guess the 'events' is still somehow in the memory.

temp_events = {str(key): val for key, val in doc['events'].items()} #make unicode keys strings
doc.pop('events',None)
doc['events'] = temp_events  #this will not be added either
doc['events2'] = temp_events #this will be added

Anyway it's 6am

Best,
D

Inconsistent behavior when inserting indexes

I noticed that when I specify an id when inserting an index (newbie user, please excuse me) in a small Flask application, I get a RequestError/TransportError with the error code 400 and message 'No handler found for uri […] and method [PUT].

Incidentally, when I do the same thing in a Python shell, I don't get an error.

ActionRequestValidationException on bulk insert of single item

Hi, I'm sometimes getting the following error when bulk updating a single item:

Traceback (most recent call last): 
WARNING:elasticsearch:POST /themuse-prod/_bulk?refresh=false [status:500 request:0.006s] 
  File "/app/.heroku/python/lib/python2.7/site-packages/tornado/web.py", line 1270, in _when_complete 
    callback() 
  File "/app/.heroku/python/lib/python2.7/site-packages/tornado/web.py", line 1291, in _execute_method 
    self._when_complete(method(*self.path_args, **self.path_kwargs), 
  File "./themuse/admin/controllers/admin_base_controller.py", line 207, in post 
    callback(model) 
  File "./themuse/admin/controllers/admin_base_controller.py", line 225, in on_submit 
    form, error_message = self._submit(model) 
  File "./themuse/admin/controllers/admin_base_controller.py", line 259, in _submit 
    self.ranking_cls(self.redis(), self.db()).reindex((model,)) 
  File "./themuse/analytics/rankings.py", line 98, in reindex 
    self.search.update(self.category, serialized_models, priority=priority) 
  File "./themuse/search_client.py", line 51, in update 
    bulk(self.conn, actions, index=self.index, raise_on_error=True, refresh=priority) 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/helpers.py", line 148, in bulk 
    for ok, item in streaming_bulk(client, actions, **kwargs): 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/helpers.py", line 107, in streaming_bulk 
    resp = client.bulk(bulk_actions, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 70, in _wrapped 
    return func(*args, params=params, **kwargs) 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 568, in bulk 
    params=params, body=self._bulk_body(body)) 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/transport.py", line 274, in perform_request 
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore) 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request 
    self._raise_error(response.status, raw_data) 
  File "/app/.heroku/python/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 83, in _raise_error 
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info) 
TransportError: TransportError(500, u'ActionRequestValidationException[Validation Failed: 1: no requests added;]')

I'm bulk updating a single item because I have a method that can take 1 or more models to update. That way I can reuse the same code for everything. I haven't isolated the reason why this happens only sometimes, but this error only started happening shortly after switching to v1.0.0 of the library.

Python 3.2 Support

I see that a54d703 adds support for Python 3.2, but this isn't in the 1.0.0 release on PyPI. Will there be another PyPI release soon so projects that depend on elasticsearch-py will also be able to support Python 3.2?

Function Score Query functionality

Looks like there is no support for the function score query introduced in the latest ElasticSearch release in the Python library as it is in Java. Is it a feature that someone is already working on?

AttributeError: 'str' object has no attribute 'transport'

Hi,
I'm testing python client and snapshot funcionality introduced in 1.0 and I've got this:

Traceback (most recent call last):
  File "./es_test.py", line 11, in <module>
    res = snap.get(repository=["backup"],snapshot=["snapshot-2014-04-11"])
  File "/var/tmp/elasticsearch/client/utils.py", line 65, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/var/tmp/elasticsearch/client/snapshot.py", line 48, in get
    _, data = self.transport.perform_request('GET', _make_path('_snapshot',
  File "/var/tmp/elasticsearch/client/utils.py", line 76, in transport
    return self.client.transport
AttributeError: 'str' object has no attribute 'transport'

while trying to execute such code:



from elasticsearch import client

snap = client.SnapshotClient('localhost')

res = snap.get(repository=["backup"],snapshot=["snapshot-2014-04-11"])

python 3.2 compatibility problems

I'm unable to use this library with python 3.2.5 (also tested with python 3.2.3). I'm unsure but assuming this is due to how 3.3 supports unicode better than 3.2. Can anyone confirm/deny this is the problem? I had no trouble using it with 3.3. I would like to fix for 3.2 if anyone has any suggestions.

pip --version
pip 1.5.2 from /Users/bryan.nelson/.python/lib/python3.2/site-packages (python 3.2)

pip install elasticsearch
Downloading/unpacking elasticsearch
Downloading elasticsearch-1.0.0-py2.py3-none-any.whl (47kB): 47kB downloaded
Downloading/unpacking urllib3>=1.5,<2.0 (from elasticsearch)
Downloading urllib3-1.7.1.tar.gz (67kB): 67kB downloaded
Running setup.py (path:/Users/bryan.nelson/.python/build/urllib3/setup.py) egg_info for package urllib3

Installing collected packages: elasticsearch, urllib3
*** Error compiling '/Users/bryan.nelson/.python/build/elasticsearch/elasticsearch/client/init.py'...
File "/Users/bryan.nelson/.python/build/elasticsearch/elasticsearch/client/init.py", line 25
if isinstance(hosts, (type(''), type(u''))):
^
SyntaxError: invalid syntax

*** Error compiling '/Users/bryan.nelson/.python/build/elasticsearch/elasticsearch/client/utils.py'...
File "/Users/bryan.nelson/.python/build/elasticsearch/elasticsearch/client/utils.py", line 21
value = u','.join(value)
^
SyntaxError: invalid syntax

*** Error compiling '/Users/bryan.nelson/.python/build/elasticsearch/elasticsearch/serializer.py'...
File "/Users/bryan.nelson/.python/build/elasticsearch/elasticsearch/serializer.py", line 14
if isinstance(data, (type(''), type(u''))):
^
SyntaxError: invalid syntax

Running setup.py install for urllib3

Successfully installed elasticsearch urllib3
Cleaning up...

easy_install fails on missing README.rst

Installing via easy_install fails:

$ env/bin/easy_install elasticsearch
Searching for elasticsearch
Reading http://pypi.python.org/simple/elasticsearch/
Best match: elasticsearch 1.0.0
Downloading https://pypi.python.org/packages/source/e/elasticsearch/elasticsearch-1.0.0.tar.gz#md5=ac087d3f7a704b2c45079e7e25b56b9f
Processing elasticsearch-1.0.0.tar.gz
Running elasticsearch-1.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-tIpohP/elasticsearch-1.0.0/egg-dist-tmp-N5Yo_u
error: /tmp/easy_install-tIpohP/elasticsearch-1.0.0/README.rst: No such file or directory

You could argue to simply use pip instead of course, but since tools like buildout use easy_install under the hood it's not always that easy to switch to pip in practice.

Can you use elasticsearch.helpers.scan for an aggregation query result?

I'm not sure this is the right place to post this type of question, but...

Can you use elasticsearch.helpers.scan when the query returns no results, but instead an aggregation?

For example,

query = {"size" : 0,
            "query": {"match_all": {}},
              "aggs": {
                  "idCnt": {
                      "terms": { "field": "userId", "size": 0}
                    }
                }
            }

So I'm counting documents in the index, per userId and I only want the results of the aggregation, not the results of the query.

If I use something like:

result = es.search(index=myIndex,body=query,doc_type=myDocType,size=1000000)

I get the results, but note I have to put in a bogus large size to get everything back.

If I use something like:

result = elasticsearch.helpers.scan(myEs, index=myIndex,query=query,doc_type=myDocType)

I don't seem to get anything.

Is it me (like, am i doing it wrong?), or is scan not actually setup for this?, or is it meant to work and I seem to be doing it right, but something is still wrong?

Thanks!

no support to delete a type

There's no way of doing this:
curl -XDELETE 'http://{server}/{index_name}/{type_name}/'

Although it's not documented anywhere, it works and is a convenient way to delete all documents of a specific type and their mapping.

I'm glad to add it, if you tell me where to add it.

How to check max_file_descriptors ?

I can use :

curl -s 'http://:9200/_nodes?pretty=true&process=true'

to get max_file_descriptors, but still can't figure out how to get this information with ClusterClient.node_stats()

Thanks
-Ryan

make bulk or streaming_bulk adaptive (to max http request size)

It would be extremely useful if streaming_bulk could predict whether the bulk payload would violate a max request size parameter and chunk the request for you.

It seems as though we actually have the ability to handle this already in bulk via client._bulk_body and then taking a look at how resulting bytes (this gives a close approximation of the resulting request size).

I'm putting this up here for consideration, as I think it benefits users in general (would be kinda nice if chunk_size was a maximum and streaming_bulk figured out a reasonable size as much as possible) but is also a rather large change streaming_bulk and bulk would need a decent amount of reworking to make this passably efficient while working with action lists that might potentially be one-shot (i.e. a generator).

Would this type of feature be better off in a separate library?

for clarification, this is how I envision it to work:

es = elasticsearch.Elasticsearch()
for x in elasticsearch.helpers.streaming_bulk(es, huge_actions, chunk_size=1000, max_http_request_size=1024*1024*50):
    # attempt a bulk_index
    # if bulk_index fails with a socket error, determine size of huge_actions
    # if huge_actions violates max_http_request_size, attempt multiple smaller bulk_indexes until a small enough chunk size is found (some error handling for chunk_size <= 1)
    print x

Dictionary Order

Has anybody found an intuitive way to enforce the dictionary order used when creating a document? (I know about OrderedDict in python 2.7 - but that seems overkill for the elasticsearch use case - and I don't know if it even works with this ES Python client).

for example, indexing a document using python as:

doc = { "a":"a", "b":"b", "c":"c" }

doc may actually get sorted by python in any random order, and looking inside ElasticSearch, you may get

 { "a":"a", "c":"c", "b":"b" }

connection failure to heroku node?

Hello elasticsearch-py folks - I just tried to use it as follows

esurl = 'http://USER:PASS@HEROKU_BONSAI_HOST' # modified
es = Elasticsearch(esurl)
dir(es)
['class', 'delattr', 'dict', 'doc', 'format', 'getattribute', 'hash', 'init', 'module', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', '_bulk_body', 'bulk', 'cluster', 'count', 'create', 'delete', 'delete_by_query', 'exists', 'explain', 'get', 'get_source', 'index', 'indices', 'info', 'mget', 'mlt', 'msearch', 'percolate', 'ping', 'scroll', 'search', 'suggest', 'transport', 'update']

es.count()
Traceback (most recent call last):
File "", line 1, in
File "/home/envs/datacat/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 65, in _wrapped
return func(_args, params=params, *_kwargs)
File "/home/envs/datacat/local/lib/python2.7/site-packages/elasticsearch/client/init.py", line 415, in count
params=params, body=body)
File "/home/envs/datacat/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 217, in perform_request
status, raw_data = connection.perform_request(method, url, params, body)
File "/home/envs/datacat/local/lib/python2.7/site-packages/elasticsearch/connection/http.py", line 65, in perform_request
raise ConnectionError('N/A', str(e), e)
elasticsearch.exceptions.ConnectionError: ConnectionError([Errno -2] Name or service not known) caused by: gaierror([Errno -2] Name or service not known)

But, it works fine with pyelasticsearch

es2 = ElasticSearch(esurl)

es2.status()
{u'indices': {u'contacts': {u'index': {u'primary_size': u'2.2kb', u'primary_size_in_bytes': 2298, u'size_in_bytes': 4576, u'size': u'4.4kb'}, u'docs': {u'max_doc': 1, u'num_docs': 1, u'deleted_docs': 0}, u'translog': {u'operations': 2}, u'merges': {u'total_time': u'0s', u'total_docs': 0, u'total_size_in_bytes': 0, u'total_size': u'0b', u'current_size': u'0b', u'current_size_in_bytes': 0, u'current_docs': 0, u'current': 0, u'total_time_in_millis': 0, u'total': 0}, u'shards': {u'0': [{u'index': {u'size_in_bytes': 2298, u'size': u'2.2kb'}, u'translog': {u'operations': 1, u'id': 1380138337607}, u'merges': {u'total_time': u'0s', u'total_docs': 0, u'total_size_in_bytes': 0, u'total_size': u'0b', u'current_size': u'0b', u'current_size_in_bytes': 0, u'current_docs': 0, u'current': 0, u'total_time_in_millis': 0, u'total': 0}, u'flush': {u'total_time': u'0s', u'total_time_in_millis': 0, u'total': 0}, u'refresh': {u'total_time': u'5ms', u'total_time_in_millis': 5, u'total': 2}, u'state': u'STARTED', u'routing': {u'node': u'U6mKGBltRCOuVwy9ABjz_Q', u'index': u'contacts', u'primary': True, u'shard': 0, u'state': u'STARTED', u'relocating_node': None}, u'docs': {u'max_doc': 1, u'num_docs': 1, u'deleted_docs': 0}}, {u'index': {u'size_in_bytes': 2278, u'size': u'2.2kb'}, u'translog': {u'operations': 1, u'id': 1380138337607}, u'merges': {u'total_time': u'0s', u'total_docs': 0, u'total_size_in_bytes': 0, u'total_size': u'0b', u'current_size': u'0b', u'current_size_in_bytes': 0, u'current_docs': 0, u'current': 0, u'total_time_in_millis': 0, u'total': 0}, u'flush': {u'total_time': u'0s', u'total_time_in_millis': 0, u'total': 0}, u'refresh': {u'total_time': u'4ms', u'total_time_in_millis': 4, u'total': 1}, u'state': u'STARTED', u'routing': {u'node': u'tdZ0joDcS729FXPQFplz_g', u'index': u'contacts', u'primary': False, u'shard': 0, u'state': u'STARTED', u'relocating_node': None}, u'docs': {u'max_doc': 1, u'num_docs': 1, u'deleted_docs': 0}}]}, u'refresh': {u'total_time': u'9ms', u'total_time_in_millis': 9, u'total': 3}, u'flush': {u'total_time': u'0s', u'total_time_in_millis': 0, u'total': 0}}}, u'ok': True, u'_shards': {u'successful': 2, u'failed': 0, u'total': 2}}

Maybe I just did something wrong? I didn't see any example in the docs yet of making a remote connection, I tried several ways / permutations of url without success.

Thanks

Nginx 411 Content-Length missing on streaming bulk

When using streaming_bulk on an ES behing nginx, I get kicked by nginx with 411 Error missing Content-Length.
It looks like httplib does not add the corresponding header by itself.
When I added the header by hand in streaming_bulk, nginx accepted the stream.

there's no documentation on exceptions

There are several exception classes defined in elasticsearch.exceptions module. However, these aren't mentioned anywhere in the elasticsearch-py docs and it's not clear when these exceptions might get raised.

I think at least two things should happen:

There should be a chapter in the documentation on exceptions that lists the possible exceptions, what they mean and generally when they're used. I'm thinking something like the exceptions chapter in the Python docs: http://docs.python.org/2/library/exceptions.html
Methods on classes that raise exceptions that are important should have those exceptions documented. For example, there's no mention of what happens if the index doesn't already exist if you try to delete it: https://elasticsearch-py.readthedocs.org/en/latest/api.html#elasticsearch.client.IndicesClient.delete In other elasticsearch clients, this throws a NotFound kind of error. Does it do that in elasticsearch-py?

No handlers could be found for logger "elasticsearch"

I might miss something apparent, I am trying to call
if es.indices.exists_type(resource_index, resource_type) :
// delete my resource_type,

but always receive error No handlers could be found for logger "elasticsearch"
what does it mean? how can I call exists_type correctly?
Thanks,
Chen

TransportError exceptions cannot be logged or cast to strings

TransportError expects three arguments, which are defined as status_code, error and info. However, in transport.py this exception is being raised with only one parameter, the actual error string:

https://github.com/elasticsearch/elasticsearch-py/blob/c4f7cb6182835ea9baf633afdecf4e946cdd5504/elasticsearch/transport.py#L178

...and in at least one more location.

This means that trying to cast the exception to a string doesn't work...which occurs during logging:

import logging
import logging.config

from elasticsearch import TransportError

LOGGING = { 
        'version': 1,
        'handlers': {
            'console': {
                'class': 'logging.StreamHandler',
            }   
        },  
        'root': {
            'handlers': ['console'],
        }   
}

logging.config.dictConfig(LOGGING)

try:
    raise TransportError("Enable to sniff hosts.")
except TransportError, e:
    logging.error('es error: %s', e)

Will result in:

Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 846, in emit
    msg = self.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 723, in format
    return fmt.format(record)
  File "/usr/lib/python2.7/logging/__init__.py", line 464, in format
    record.message = record.getMessage()
  File "/usr/lib/python2.7/logging/__init__.py", line 328, in getMessage
    msg = msg % self.args
  File "/home/vagrant/venv/kabuto_server/local/lib/python2.7/site-packages/elasticsearch/exceptions.py", line 35, in __str__
    return 'TransportError(%d, %r)' % (self.status_code, self.error)
  File "/home/vagrant/venv/kabuto_server/local/lib/python2.7/site-packages/elasticsearch/exceptions.py", line 27, in error
    return self.args[1]
IndexError: tuple index out of range

Thrift SSL support is lacking in a few places.

Looking at https://github.com/apache/thrift/blob/master/lib/py/src/transport/TSSLSocket.py, for reference.

It requires either passing validate = False or the path to a ca_certs file, usually downloaded from http://curl.haxx.se/ca/cacert.pem .
The default TSSLSocket implementation does not support wildcard certificates.

msearch doesn't works

trying with
es = elasticsearch.Elasticsearch()
es.msearch(body=[{"query" : {"match_all" : {}}}],index=['tvshows'],doc_type=['tvshows'])

and getting error:
TransportError(500, u'ActionRequestValidationException[Validation Failed: 1: no requests added;]')

Could you please point me, what I doing wrong, or this is an issue?

kwargs issue on es.index()

es = elasticsearch.Elasticsearch()
Basically, this works:
print es.index('all', 'id', {}, 'ssd0', params=({'routing': 'yahoo'}))

While this doesn't:
print es.index('all', 'id', {}, 'ssd0', ({'routing': 'yahoo'}))

TypeError: index() got multiple values for keyword argument 'params'

While es.get() works with both.

Null values in helpers.bulk()

I copy my text originally from this SO question[1] that I created. Could someone try to replicate this?

I use py-elasticsearch with bulk feature e.g.:

//loop
   action = {
    "_index": "venue",
    "_type": "venue",
    "_id": tempid,
    "_source": item
   }
   actions.append(action)
helpers.bulk(es, actions)

However in my results I see:

"hits": [
     {
        "_index": "venue",
        "_type": "venue",
        "_id": "52e6d42fc36b4408dbe907d1",
        "_score": 4.0977435,
        "_source": {
           "city": "Athens",
           "name": "Half Note Jazz Club",
           "address": "17 Trivonianou Street",
        }
     },
     {
        "_index": "venue",
        "_type": "venue",
        "_id": "530391abc36b442b25e8a514",
        "_score": 4.023251,
        "_source": {
           "city": null,   <--- this
           "name": "life Jazz Cafe-bar",               
           "address": null  <---- and this
        }
     }
  ]

Assuming that when I "bulk-feed" ES I have not defined the city and the address and I use schema-less dictionaries directly from a MongoDB.

My full mapping is the following:

<type>: {
    mappings: {
        <index>: {
            properties: {                
                address: {
                    type: "string"
                },
                city: {
                    type: "string"
                },                    
                name: {
                    type: "string"
                }
            }
        }
    }
}

So I have not poked the null_value option. Should not the results be schema-less and not have those null values?

[1]http://stackoverflow.com/questions/22139046/handle-undefined-fields-elasticsearch

search, count, delete_by_query and suggest still use 'ignore_indices'-parameter

the parameter ignore_indices was replaced with ignore_unavailable, expand_wildcards and allow_no_indices.
see elastic/elasticsearch@f4bf0d5

elasticsearch-py still uses the not supported ignore_indices-parameter.

where is the "Helpers" function

hi, i found the doc has "Helpers", http://elasticsearch-py.readthedocs.org/en/master/helpers.html
but i can not find in elasticsearch.py,

In [2]: elasticsearch.VERSION
Out[2]: (1, 1, 0)

put_mapping does not work

The put_mapping function is broken, the parameters for the document type and _mapping need to be swapped. I'll add a pull request shortly.

Don't get any results if value starts form number and '-'

es.search(
index=INDEX,
doc_type=DOC_TYPE,
body={'query':
{'filtered':
{'query':
{'term':
{'slug_id': slug_id}},
'filter':
{'term': {'slug': slug}}
}}}
)

I don't get any results if 'slug' value from the example starts from number and '-'.
So i can retrieve data if slug is 'some-text', but not if '5-some-text'

Adding support for control over urllib3 client timeout behavior

The client does not seem to have the ability to control the timeout behavior of urllib3 for a given API.

Having the ability to specify a urllib3.Timeout object for a given API invocation would be very useful.

Pooling question

Hello,

I am not sure that this is the best place to ask questions, so redirect me if needed.
I wanted to have more information about the persistence and the pooling.
I have a set of rest services. Should I create a new instance of es = Elasticsearch() in each call or should I create one shared by each call, is it thread safe?
If I do es = Elasticsearch(), will it use pooling? What is the default number of connections in the pool, is there a way to change it?

Thanks

There's no support for script_fields option in search request

Links to es docs are sometimes broken

remove the trailing slash when needed.

put_mapping fails on elasticsearch 1.0.1

I'm getting an error when attempting to use put_mapping with elasticsearch 1.0.1:

es.indices.put_mapping(
   index="myindex",
   doc_type="mytype",
   body={
      "mytype": {
         "properties": {
            "_timestamp": { "enabled": True },
         }
   })

Error:

InvalidTypeNameException[mapping type name [_mapping] can't start with '_']

I think it's caused by change 6c29fca due to swapping the position of _mapping in the URI. Should it be switched back?

count() helper doesn't include simple query parameter

Hi,
We can only use the query DSL with the count() helper. It should be helpful to as well include the simple 'q=' query parameter (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-count.html)

The following patch addresses that and enables to make calls like :
es.count(index=INDEX, q='user:gelim')

Cheers,
-- Mathieu

diff --git a/elasticsearch/client/utils.py b/elasticsearch/client/utils.py
index fc33c43..b9b654c 100644
--- a/elasticsearch/client/utils.py
+++ b/elasticsearch/client/utils.py
@@ -49,7 +49,7 @@ def _make_path(_parts):
quote_plus(escape(p), ',') for p in parts if p not in SKIP_IN_PATH)

parameters that apply to all methods

-GLOBAL_PARAMS = ('pretty', )
+GLOBAL_PARAMS = ('pretty', 'q', )

def query_params(*es_query_params):
"""

Helpers - Scan

Hi,

why aren't the Helpers available at the lib distributed on the "pip" repository? My current instalation of elasticsearch-py does not seem to have them available.

If they aren't, how can i use it on my current instalation?

Thanks

List the official client in the clients list on the website

When I didn't see this listed at http://www.elasticsearch.org/guide/clients/ it made me unsure if it was really ready for use yet.

Search offset argument is not honored

Apparently, the offset argument in a Elasticsearch.search call is ignored.

As in the Elasticsearch server documentation for _search, the argument used from, which is also a python reserved keyword.

My assumption is there's no internal argument translation because elasticsearch-py requests offset instead of from.

A quick and dirty diff fixes the issue:

diff --git a/elasticsearch/client/__init__.py b/elasticsearch/client/__init__.py
index 2fdafa3..397c333 100644
--- a/elasticsearch/client/__init__.py
+++ b/elasticsearch/client/__init__.py
@@ -294,7 +294,7 @@ class Elasticsearch(object):
     @query_params('_source', '_source_exclude', '_source_include',
         'analyze_wildcard', 'analyzer', 'default_operator', 'df',
         'explain', 'fields', 'ignore_indices', 'indices_boost', 'lenient',
-        'lowercase_expanded_terms', 'offset', 'preference', 'q', 'routing',
+        'lowercase_expanded_terms', 'from', 'preference', 'q', 'routing',
         'scroll', 'search_type', 'size', 'sort', 'source', 'stats',
         'suggest_field', 'suggest_mode', 'suggest_size', 'suggest_text', 'timeout',
         'version')```

I will be more than happy to fix that, @HonzaKral What will be the right strategy to fix the issue?

Thanks,

small timeout in transport.py sniff_hosts request causing failures

I've started to hit frequent urllib3 ReadTimeoutErrors in some scripts that are periodically run by a monitoring system. My attempts to pass timeout values to the connection_class via *kwargs had no effect. Eventually, I traced down the problem to a hard-coded timeout value in the sniff_hosts function in transport.py.

The call in question is on line 172 of v0.4.4 of transport.py:

# use small timeout for the sniffing request, should be a fast api call
 _, headers, node_info = c.perform_request('GET', '/_cluster/nodes', timeout=.1)

I am able to work around the problem by replacing the hard-coded number with a larger value. In fairness, my ES cluster is running on smallish VMs so the response time could be worse than is typical. However, I would be surprised if others don't eventually get bitten by this.

use itertools.imap in streaming_bulk rather than map

As far as I can tell, the point of using streaming_bulk is to make use of lazy evaluation so that not all of the material to be indexed needs to be kept in memory or preprocessed at once.

So to that end, on this line: https://github.com/elasticsearch/elasticsearch-py/blob/1.0.0/elasticsearch/helpers.py#L90

wouldn't it be much more helpful to use itertools.imap()? If I'm understanding correctly, the map() would pre-evaluate the entire iterator passed in.

elasticsearch version 1.0 /_cluster/nodes removed?

the ClusterClient's node_info() func call failed due to this. any way to make node_info() work as before?

JSON in Unicode Form

When (in Python3) passing Unicode strings as document body the serializer will (rightly so) leave this alone. However, when such unicode strings contain special characters an error will be raised later as the http connection will try to transform the body from a string to bytes using iso-latin-1 encoding. The solution to this is to transform the document body to UTF-8 encoded bytes before passing it down the connection. I've written a patch for this.

Py3: tracing logs binary strings

Just noticed a small issue on Python 3.4 when using the tracing logger. It seems to break a little bit due to str vs. bytes issues.

elasticsearch.trace: INFO: curl -XGET 'http://localhost:9200/myindex/mydoc/_search?pretty&size=100&from=0' -d 'b'{"query": {"match_all": {}}}''

The following diff shows how I hacked it in on my box to have nice tracing, but I am not sure if that's the proper way to address it.

diff --git a/elasticsearch/connection/base.py b/elasticsearch/connection/base.py
index 16c0b97..3240e71 100644
--- a/elasticsearch/connection/base.py
+++ b/elasticsearch/connection/base.py
@@ -56,7 +56,7 @@ class Connection(object):
             path = path.replace('?', '?pretty&', 1) if '?' in path else path + '?pretty'
             if self.url_prefix:
                 path = path.replace(self.url_prefix, '', 1)
-            tracer.info("curl -X%s 'http://localhost:9200%s' -d '%s'", method, path, _pretty_json(body) if body else '')
+            tracer.info("curl -X%s 'http://localhost:9200%s' -d '%s'", method, path, _pretty_json(body.decode()) if body else '')

         if tracer.isEnabledFor(logging.DEBUG):
             tracer.debug('#[%s] (%.3fs)\n#%s', status_code, duration, _pretty_json(response).replace('\n', '\n#') if response else '')
@@ -81,5 +81,3 @@ class Connection(object):
             pass

         raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)

Helpers.bulk_index exception

When I am trying to do a bulk loading with a bad index config I get this error

  File "......./src/common/ElasticUtils.py", line 37, in csv_to_index
    helpers.bulk_index(self.es, document_iterator())
  File "......./local/lib/python2.7/site-packages/elasticsearch/helpers.py", line 66, in bulk_index
    if item[act]['ok']:
KeyError: 'ok'

If I print "item"

{
    'create':{
        '_type':'listing',
        '_id':'zsuuZ0QlTvGQiikD0t8KYw',
        'error':'IllegalArgumentException[enablePositionIncrements=false is not supported anymore as of Lucene 4.4 as it can create broken token streams]',
        '_index':'test'
    }
}

Looks like there is no Key 'ok' at all when an error is returned by Elasticsearch in this case

ids query seems to be failing.

It works via curl but not the client using urllib3. It just returns all documents.

I hacked the trace on -I haven't manage to turn on trace logging correctly, so here's the curl request generated by the client which returns the correct document :

curl -XGET 'http://localhost:9200/ttt/cv/_search?pretty&fields=first_name%2Clast_name%2Cprofile%2Cskills%2Cemplyment%2Ceducation' -d '{

"query": {
"ids": {
"type": "cv",
"values": [
"Ur7xuDINTI2OfUMFeX2ucw"
]
}
}
}'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,

The call to the client that generates this request returns all the docs for this type. The query dictionary I pass to it looks like :

{
"query" : {
"ids" : {"type" : cls.doc_type, "values" : [doc_id] }
},
}

Hardcoded request type GET

Search API does not support POST requests.
_, data = self.transport.perform_request('GET', _make_path(index, doc_type, '_search')
This leads to wrong results when elasticsearch is behind loadbalancer or proxy. Usually GET does not contains request body.
http://stackoverflow.com/questions/978061/http-get-with-request-body
Please add POST to search API.

Positional vs keyward arg doc improvements

When working on projects that use the Python elasticsearch client I find myself jumping to the code to see which arguments are required (positional) and which are optional with reasonable default values.

I think it would benefit the docs to make this clearer.

helpers.reindex changes target_client to target_index

Line number 208 in helpers.py changes target_client to target_index in case it is not None.

target_client = client if target_client is None else target_index

Minor bug. Can I submit a fix?

elastic / elasticsearch-py Goto Github PK

elasticsearch-py's Introduction

Elasticsearch Python Client

Features

Installation

Connecting

Usage

Compatibility

Documentation

Feedback 🗣️

License

elasticsearch-py's People

Contributors

Stargazers

Watchers

Forkers

elasticsearch-py's Issues

parameters that apply to all methods

Recommend Projects

Recommend Topics

Recommend Org