Giter Site home page Giter Site logo

Comments (8)

pquentin avatar pquentin commented on July 18, 2024

Hello @cjdoumas, and thanks for the report.

Sorting within nested objects requires special syntax, please see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/sort-search-results.html#nested-sorting. Do you get results when using nested in sort? If not, I'd appreciate a more complete example, including mappings and actual docs that reproduce the issue.

Also note that using version 8.x of the client against a 7.x cluster can cause issues. (But probably not in this specific case.) Clients are forward-compatible, though, so the opposite is OK.

from elasticsearch-py.

cjdoumas avatar cjdoumas commented on July 18, 2024

So, I tried that syntax, and have had either empty or not sorted results. Honestly, though I am pretty sure I am not using it correctly.

So, here are 4 records:

{'_index': 'my_index', '_type': '_doc', '_id': '55tgr45s-ffh6-f8ik-aae1-llkf345shdu5', '_common': {'eventTime': '2024-03-22 10:31:45.125', 'entryId': 'tfg4435', 'entryGroup': 505, 'entryNumber': 58, 'name': 'ecd_1234'}},

{'_index': 'my_index', '_type': '_doc', '_id': 'sdfkjyu5-554h-fft4-eg35-llkfhut758hf', '_common': {'eventTime': '2024-03-22 09:26:33.122', 'entryId': 'tfg4435', 'entryGroup': 215, 'entryNumber': 7, 'name': 'tmx1234'}},

{'_index': 'my_index', '_type': '_doc', '_id': 'xvnmjdue-44gd-dddf-57fh-lprtuericd3s', '_common': {'eventTime': '2024-03-22 09:55:16.654', 'entryId': 'tfg4435', 'entryGroup': 215, 'entryNumber': 11, 'name': 'ecd_1234'}}, 

{'_index': 'my_index', '_type': '_doc', '_id': 'asdfe553-0gj4-998f-fd34-vnbmc643hfd2', '_common': {'eventTime': '2024-03-22 08:59:03.258', 'entryId': 'l57fhvg', 'entryGroup': 77, 'entryNumber': 3, 'name': 'ecd_1234'}}

So, for the end result, I want to sort by 3 vars in the following order: _common.entryID, _common.entryGroup, and _common.entryNumber. However, first, I am just trying to sort on _common.entryNumber to figure out the nested syntax. The code below does not throw an error, but results in 0 results. I am certain I am misunderstanding something.

# Set the query
query_model = {
    "range": {"common.eventTime": {"gte": "now-2h", "lte": "now"}},
}

# Set the sort instructions
sort_instructions = '{" _common.entryNumber": {"order": "asc", "nested": {"path": "_common"}}}'

index_data = client.search(
    index="my_index",
    query=query_model,
    sort=sort_instructions,
    size=10
)

So, first step is getting the nested to work for just one var, then to get it to work with all three in that order. (Afterwards, I'd need to then incorporate the search_after argument to be able datasets beyond 10,000 entries)

Am I providing what you need?

from elasticsearch-py.

cjdoumas avatar cjdoumas commented on July 18, 2024

Note, I have also formatted the sort_instructions without the all encompassing quotes, and that also returned 0 results, as such:

sort_instructions = {"_common.entryNumber": {"order": "asc", "nested": {"path": "_common"}}}

from elasticsearch-py.

cjdoumas avatar cjdoumas commented on July 18, 2024

Any updates on this issue?

I'm also not familiar with submitting issues in Github, so if just let me know if I need to be patient or there is something else I should be doing.

from elasticsearch-py.

pquentin avatar pquentin commented on July 18, 2024

Sorry for the delay, you've been more than patient enough. Your code wasn't really complete enough as I did not get the mapping and had to guess about them. I believe the issue was that sort instruction should not be a string and that your query also needs to take into account that _common is nested. Here's a working example.

We begin by setting up the connection to the cluster. I tested this both with Elasticsearch 8.x and Elasticsearch 7.17, but only with the latest version of the client.

from elasticsearch import Elasticsearch

client = Elasticsearch("http://localhost:9200")
print(client.info()["version"])

Next we define the mappings and index your 4 sample docs:

client.options(ignore_status=[404]).indices.delete(index="my_index")
client.indices.create(
    index="my_index",
    mappings={
        "properties": {
            "_common": {
                "type": "nested",
                "properties": {
                    "eventTime": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS"},
                    "entryId": {"type": "keyword"},
                    "entryGroup": {"type": "integer"},
                    "entryNumber": {"type": "integer"},
                    "name": {"type": "keyword"},
                },
            }
        }
    },
)


for doc in [
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 10:31:45.125",
                "entryId": "tfg4435",
                "entryGroup": 505,
                "entryNumber": 58,
                "name": "ecd_1234",
            }
        },
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 09:26:33.122",
                "entryId": "tfg4435",
                "entryGroup": 215,
                "entryNumber": 7,
                "name": "tmx1234",
            }
        },
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 09:55:16.654",
                "entryId": "tfg4435",
                "entryGroup": 215,
                "entryNumber": 11,
                "name": "ecd_1234",
            }
        },
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "asdfe553-0gj4-998f-fd34-vnbmc643hfd2",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 08:59:03.258",
                "entryId": "l57fhvg",
                "entryGroup": 77,
                "entryNumber": 3,
                "name": "ecd_1234",
            }
        },
    },
]:
    client.create(index=doc["_index"], id=doc["_id"], body=doc["_source"])

And finally we execute the query, modifying query_model and sort_instructions:

import json

query_model = {
    "nested": {
        "path": "_common",
        "query": {
            "range": {
                "_common.eventTime": {
                    "gte": "2024-03-22 09:00:00.000",
                    "lte": "now",
                }
            }
        },
    }
}


# Set the sort instructions
sort_instructions = {
    "_common.entryNumber": {
        "order": "asc",
        "nested": {
            "path": "_common",
        },
    }
}

index_data = client.search(
    index="my_index", query=query_model, sort=sort_instructions, size=10
)
print(json.dumps(index_data.body, indent=2))

Which prints 3 out of 4 docs, sorted by entryNumber:

{
  "took": 79,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
        "_score": null,
        "_source": {
          "_common": {
            "eventTime": "2024-03-22 09:26:33.122",
            "entryId": "tfg4435",
            "entryGroup": 215,
            "entryNumber": 7,
            "name": "tmx1234"
          }
        },
        "sort": [
          7
        ]
      },
      {
        "_index": "my_index",
        "_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
        "_score": null,
        "_source": {
          "_common": {
            "eventTime": "2024-03-22 09:55:16.654",
            "entryId": "tfg4435",
            "entryGroup": 215,
            "entryNumber": 11,
            "name": "ecd_1234"
          }
        },
        "sort": [
          11
        ]
      },
      {
        "_index": "my_index",
        "_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
        "_score": null,
        "_source": {
          "_common": {
            "eventTime": "2024-03-22 10:31:45.125",
            "entryId": "tfg4435",
            "entryGroup": 505,
            "entryNumber": 58,
            "name": "ecd_1234"
          }
        },
        "sort": [
          58
        ]
      }
    ]
  }
}

Now you can take that code and use search_after with the point-in-time API if you need a large amount of results: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html. Note that as mentioned in that page, sorting can slow this operation down, and only use it if you actually need it.

from elasticsearch-py.

cjdoumas avatar cjdoumas commented on July 18, 2024

So, maybe I just don't understand the mapping or know how to find the mapping. I tried your syntax, however that was returning 0 values still. However, by just simply removing the nested argument, it seemed to work.

sort_instructions = { "_common.entryNumber": { "order": "asc", } }

I thought that the period and structure of the record meant that entryNumber is nested under _common. However, does the fact that the above code worked imply that is is not nested, but something else?

from elasticsearch-py.

pquentin avatar pquentin commented on July 18, 2024

Oh, when you mentioned nested I assumed you meant the nested data type: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html. Sorry for the misleading response. I will try to take a look later taking this new information into account.

from elasticsearch-py.

pquentin avatar pquentin commented on July 18, 2024

I thought that the period and structure of the record meant that entryNumber is nested under _common. However, does the fact that the above code worked imply that is is not nested, but something else?

So, as mentioned in the previous comment, "nested" is a very specific Elasticsearch term. In the context, of Elasticsearch, what you have is not "nested", it's simply an object field, which is the default for arrays if you don't specify mappings: https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/

Here's the complete working example from above with object fields.

We begin by setting up the connection to the cluster. I tested this both with Elasticsearch 8.x and Elasticsearch 7.17, but only with the latest version of elasticsearch-py, the client.

from elasticsearch import Elasticsearch

client = Elasticsearch("http://localhost:9200")
print(client.info()["version"])

Let's setup the index from scratch, define mappings to be explicit, and then index the docs you sent me:

client.options(ignore_status=[404]).indices.delete(index="my_index")
client.indices.create(
    index="my_index",
    mappings={
        "properties": {
            "_common": {
                "properties": {
                    "eventTime": {"type": "date", "format": "yyyy-MM-dd HH:mm:ss.SSS"},
                    "entryId": {"type": "keyword"},
                    "entryGroup": {"type": "integer"},
                    "entryNumber": {"type": "integer"},
                    "name": {"type": "keyword"},
                },
            }
        }
    },
)


for doc in [
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 10:31:45.125",
                "entryId": "tfg4435",
                "entryGroup": 505,
                "entryNumber": 58,
                "name": "ecd_1234",
            }
        },
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 09:26:33.122",
                "entryId": "tfg4435",
                "entryGroup": 215,
                "entryNumber": 7,
                "name": "tmx1234",
            }
        },
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 09:55:16.654",
                "entryId": "tfg4435",
                "entryGroup": 215,
                "entryNumber": 11,
                "name": "ecd_1234",
            }
        },
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "asdfe553-0gj4-998f-fd34-vnbmc643hfd2",
        "_source": {
            "_common": {
                "eventTime": "2024-03-22 08:59:03.258",
                "entryId": "l57fhvg",
                "entryGroup": 77,
                "entryNumber": 3,
                "name": "ecd_1234",
            }
        },
    },
]:
    client.create(index=doc["_index"], id=doc["_id"], body=doc["_source"])

And, finally, let's make the query:

import json

query_model = {
    "range": {
        "_common.eventTime": {
            "gte": "2024-03-22 09:00:00.000",
            "lte": "now",
        }
    }
}


# Set the sort instructions
sort_instructions = {
    "_common.entryNumber": {
        "order": "asc",
    }
}

index_data = client.search(
    index="my_index", query=query_model, sort=sort_instructions, size=10
)
print(json.dumps(index_data.body, indent=2))

Which still prints 3 out of 4 docs, sorted by entryNumber:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "sdfkjyu5-554h-fft4-eg35-llkfhut758hf",
        "_score": null,
        "_source": {
          "_common": {
            "eventTime": "2024-03-22 09:26:33.122",
            "entryId": "tfg4435",
            "entryGroup": 215,
            "entryNumber": 7,
            "name": "tmx1234"
          }
        },
        "sort": [
          7
        ]
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "xvnmjdue-44gd-dddf-57fh-lprtuericd3s",
        "_score": null,
        "_source": {
          "_common": {
            "eventTime": "2024-03-22 09:55:16.654",
            "entryId": "tfg4435",
            "entryGroup": 215,
            "entryNumber": 11,
            "name": "ecd_1234"
          }
        },
        "sort": [
          11
        ]
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "55tgr45s-ffh6-f8ik-aae1-llkf345shdu5",
        "_score": null,
        "_source": {
          "_common": {
            "eventTime": "2024-03-22 10:31:45.125",
            "entryId": "tfg4435",
            "entryGroup": 505,
            "entryNumber": 58,
            "name": "ecd_1234"
          }
        },
        "sort": [
          58
        ]
      }
    ]
  }
}

Again, you can take that code and use search_after with the point-in-time API if you need a large amount of results: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html. As mentioned in that page, sorting can slow this operation down: only use it if you actually need it.

from elasticsearch-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.