Comments (8)
Hello @cjdoumas, and thanks for the report.
Sorting within nested objects requires special syntax, please see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/sort-search-results.html#nested-sorting. Do you get results when using nested
in sort
? If not, I'd appreciate a more complete example, including mappings and actual docs that reproduce the issue.
Also note that using version 8.x of the client against a 7.x cluster can cause issues. (But probably not in this specific case.) Clients are forward-compatible, though, so the opposite is OK.
from elasticsearch-py.
So, I tried that syntax, and have had either empty or not sorted results. Honestly, though I am pretty sure I am not using it correctly.
So, here are 4 records:
{'_index': 'my_index', '_type': '_doc', '_id': '55tgr45s-ffh6-f8ik-aae1-llkf345shdu5', '_common': {'eventTime': '2024-03-22 10:31:45.125', 'entryId': 'tfg4435', 'entryGroup': 505, 'entryNumber': 58, 'name': 'ecd_1234'}},
{'_index': 'my_index', '_type': '_doc', '_id': 'sdfkjyu5-554h-fft4-eg35-llkfhut758hf', '_common': {'eventTime': '2024-03-22 09:26:33.122', 'entryId': 'tfg4435', 'entryGroup': 215, 'entryNumber': 7, 'name': 'tmx1234'}},
{'_index': 'my_index', '_type': '_doc', '_id': 'xvnmjdue-44gd-dddf-57fh-lprtuericd3s', '_common': {'eventTime': '2024-03-22 09:55:16.654', 'entryId': 'tfg4435', 'entryGroup': 215, 'entryNumber': 11, 'name': 'ecd_1234'}},
{'_index': 'my_index', '_type': '_doc', '_id': 'asdfe553-0gj4-998f-fd34-vnbmc643hfd2', '_common': {'eventTime': '2024-03-22 08:59:03.258', 'entryId': 'l57fhvg', 'entryGroup': 77, 'entryNumber': 3, 'name': 'ecd_1234'}}
So, for the end result, I want to sort by 3 vars in the following order: _common.entryID, _common.entryGroup, and _common.entryNumber. However, first, I am just trying to sort on _common.entryNumber to figure out the nested syntax. The code below does not throw an error, but results in 0 results. I am certain I am misunderstanding something.
# Set the query
query_model = {
"range": {"common.eventTime": {"gte": "now-2h", "lte": "now"}},
}
# Set the sort instructions
sort_instructions = '{" _common.entryNumber": {"order": "asc", "nested": {"path": "_common"}}}'
index_data = client.search(
index="my_index",
query=query_model,
sort=sort_instructions,
size=10
)
So, first step is getting the nested to work for just one var, then to get it to work with all three in that order. (Afterwards, I'd need to then incorporate the search_after argument to be able datasets beyond 10,000 entries)
Am I providing what you need?
from elasticsearch-py.
Note, I have also formatted the sort_instructions without the all encompassing quotes, and that also returned 0 results, as such:
sort_instructions = {"_common.entryNumber": {"order": "asc", "nested": {"path": "_common"}}}
from elasticsearch-py.
Any updates on this issue?
I'm also not familiar with submitting issues in Github, so if just let me know if I need to be patient or there is something else I should be doing.
from elasticsearch-py.
Sorry for the delay, you've been more than patient enough. Your code wasn't really complete enough as I did not get the mapping and had to guess about them. I believe the issue was that sort instruction should not be a string and that your query also needs to take into account that _common
is nested. Here's a working example.
We begin by setting up the connection to the cluster. I tested this both with Elasticsearch 8.x and Elasticsearch 7.17, but only with the latest version of the client.
from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
print(client.info()["version"])
Next we define the mappings and index your 4 sample docs:
client.options(ignore_status=[404]).indices.delete(index="my_index")
client.indices.create(
index="my_index",
mappings={
"properties": {
"_common": {
"type": "nested",
"properties": {
"eventTime": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS"},
"entryId": {"type": "keyword"},
"entryGroup": {"type": "integer"},
"entryNumber": {"type": "integer"},
"name": {"type": "keyword"},
},
}
}
},
)
for doc in [
{
"_index": "my_index",
"_type": "_doc",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "asdfe553-0gj4-998f-fd34-vnbmc643hfd2",
"_source": {
"_common": {
"eventTime": "2024-03-22 08:59:03.258",
"entryId": "l57fhvg",
"entryGroup": 77,
"entryNumber": 3,
"name": "ecd_1234",
}
},
},
]:
client.create(index=doc["_index"], id=doc["_id"], body=doc["_source"])
And finally we execute the query, modifying query_model
and sort_instructions
:
import json
query_model = {
"nested": {
"path": "_common",
"query": {
"range": {
"_common.eventTime": {
"gte": "2024-03-22 09:00:00.000",
"lte": "now",
}
}
},
}
}
# Set the sort instructions
sort_instructions = {
"_common.entryNumber": {
"order": "asc",
"nested": {
"path": "_common",
},
}
}
index_data = client.search(
index="my_index", query=query_model, sort=sort_instructions, size=10
)
print(json.dumps(index_data.body, indent=2))
Which prints 3 out of 4 docs, sorted by entryNumber
:
{
"took": 79,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my_index",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234"
}
},
"sort": [
7
]
},
{
"_index": "my_index",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234"
}
},
"sort": [
11
]
},
{
"_index": "my_index",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234"
}
},
"sort": [
58
]
}
]
}
}
Now you can take that code and use search_after
with the point-in-time API if you need a large amount of results: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html. Note that as mentioned in that page, sorting can slow this operation down, and only use it if you actually need it.
from elasticsearch-py.
So, maybe I just don't understand the mapping or know how to find the mapping. I tried your syntax, however that was returning 0 values still. However, by just simply removing the nested argument, it seemed to work.
sort_instructions = { "_common.entryNumber": { "order": "asc", } }
I thought that the period and structure of the record meant that entryNumber is nested under _common. However, does the fact that the above code worked imply that is is not nested, but something else?
from elasticsearch-py.
Oh, when you mentioned nested I assumed you meant the nested data type: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html. Sorry for the misleading response. I will try to take a look later taking this new information into account.
from elasticsearch-py.
I thought that the period and structure of the record meant that entryNumber is nested under _common. However, does the fact that the above code worked imply that is is not nested, but something else?
So, as mentioned in the previous comment, "nested" is a very specific Elasticsearch term. In the context, of Elasticsearch, what you have is not "nested", it's simply an object field, which is the default for arrays if you don't specify mappings: https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Here's the complete working example from above with object fields.
We begin by setting up the connection to the cluster. I tested this both with Elasticsearch 8.x and Elasticsearch 7.17, but only with the latest version of elasticsearch-py, the client.
from elasticsearch import Elasticsearch
client = Elasticsearch("http://localhost:9200")
print(client.info()["version"])
Let's setup the index from scratch, define mappings to be explicit, and then index the docs you sent me:
client.options(ignore_status=[404]).indices.delete(index="my_index")
client.indices.create(
index="my_index",
mappings={
"properties": {
"_common": {
"properties": {
"eventTime": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS"},
"entryId": {"type": "keyword"},
"entryGroup": {"type": "integer"},
"entryNumber": {"type": "integer"},
"name": {"type": "keyword"},
},
}
}
},
)
for doc in [
{
"_index": "my_index",
"_type": "_doc",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234",
}
},
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "asdfe553-0gj4-998f-fd34-vnbmc643hfd2",
"_source": {
"_common": {
"eventTime": "2024-03-22 08:59:03.258",
"entryId": "l57fhvg",
"entryGroup": 77,
"entryNumber": 3,
"name": "ecd_1234",
}
},
},
]:
client.create(index=doc["_index"], id=doc["_id"], body=doc["_source"])
And, finally, let's make the query:
import json
query_model = {
"range": {
"_common.eventTime": {
"gte": "2024-03-22 09:00:00.000",
"lte": "now",
}
}
}
# Set the sort instructions
sort_instructions = {
"_common.entryNumber": {
"order": "asc",
}
}
index_data = client.search(
index="my_index", query=query_model, sort=sort_instructions, size=10
)
print(json.dumps(index_data.body, indent=2))
Which still prints 3 out of 4 docs, sorted by entryNumber:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:26:33.122",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 7,
"name": "tmx1234"
}
},
"sort": [
7
]
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 09:55:16.654",
"entryId": "tfg4435",
"entryGroup": 215,
"entryNumber": 11,
"name": "ecd_1234"
}
},
"sort": [
11
]
},
{
"_index": "my_index",
"_type": "_doc",
"_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
"_score": null,
"_source": {
"_common": {
"eventTime": "2024-03-22 10:31:45.125",
"entryId": "tfg4435",
"entryGroup": 505,
"entryNumber": 58,
"name": "ecd_1234"
}
},
"sort": [
58
]
}
]
}
}
Again, you can take that code and use search_after
with the point-in-time API if you need a large amount of results: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html. As mentioned in that page, sorting can slow this operation down: only use it if you actually need it.
from elasticsearch-py.
Related Issues (20)
- Memory leak when using AsyncElasticsearch HOT 3
- Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown HOT 2
- `retry_on_status` setting does not work as expected with requests that should not be retried immediately
- es部署后索引无法创建 HOT 1
- Ability to pass headers to index function / other functions or Load headers from client HOT 1
- [BUG] Missing type and settings parameters in _sync/snapshots & _async/snapshots create_respository methods HOT 3
- Incremented connection delay are not of the stated duration HOT 3
- client fails to connect to self-managed Elasticsearch instance at https://localhost:9200 using all methods described in documentation HOT 2
- Bulk action typing does not allow `TypedDict`
- Unexpected ilm.put_lifecycle behavior
- Unable to connect via AsyncElasticsearch using ssl fingerprint HOT 1
- [DOC] Add more Python Client code examples to main Elasticsearch Docs | Set up and Upgrade Elasticsearch HOT 2
- [Documentation] Access to specialized clients is not documented HOT 1
- Test failures against NumPy 2.0.0rc1
- 8.13.1: pytest fails with `ImportError: cannot import name 'OrjsonSerializer' from 'elasticsearch.serializer` in elasticsearch/serializer.py HOT 8
- Add update_trained_model_deployment to ML client HOT 1
- Improve typing of string enums
- Setting custom node_pool_class does not work and seems to be a typo HOT 2
- py>=3.10 client>7.10.0 can not run HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-py.