Comments (3)
Thanks @teuneboon, I can reproduce this! π My observations:
- Please don't do that. :) HTTP clients reuse connections to amortize the cost of TCP and TLS handshakes. Creating one new client per request defeats that. I still would like to fix the memory usage if possible. The slower leak when you create a single instance is probably a totally different issue.
- The required ingredients are 1/ AsyncElasticsearch 2/ SSL. The actual request does not matter (I reproduce with
client.info()
) and using a single node is also enough to reproduce. AsyncElasticsearch uses aiohttp, which has known memory leaks. - After about ~30 seconds, the memory usage stabilizes (see figure), which suggests it's maybe not an actual leak, but a reference cycle. However, running
gc.collect()
in the loop did not help.
The next steps are using memray to understand the peak usage in more detail and trying to reproduce with aiohttp.
from elasticsearch-py.
Here's my current attempt with aiohttp:
import asyncio
import aiohttp
async def leaky():
i = 0
while i <= 1500:
async with aiohttp.ClientSession() as session:
async with session.get(
"https://localhost:9200/",
auth=aiohttp.BasicAuth("elastic", "changeme"),
ssl=False,
) as response:
assert response.status == 200
await response.text()
i += 1
if i % 100 == 0:
print(i)
if __name__ == "__main__":
asyncio.run(leaky())
It inexplicably fails after 1000 connections with:
Traceback (most recent call last):
File "/.../.virtualenvs/elasticsearch-py/lib64/python3.12/site-packages/aiohttp/connector.py", line 1173, in _create_direct_connection
hosts = await asyncio.shield(host_resolved)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../.virtualenvs/elasticsearch-py/lib64/python3.12/site-packages/aiohttp/connector.py", line 884, in _resolve_host
addrs = await self._resolver.resolve(host, port, family=self._family)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/.../.virtualenvs/elasticsearch-py/lib64/python3.12/site-packages/aiohttp/resolver.py", line 33, in resolve
infos = await self._loop.getaddrinfo(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/asyncio/base_events.py", line 899, in getaddrinfo
return await self.run_in_executor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/socket.py", line 963, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 16] Device or resource busy
And only partly reproduces the leak:
from elasticsearch-py.
I just remembered that the upcoming release later this month will include HTTPX support, so I tried it too.
import asyncio
from elasticsearch import AsyncElasticsearch
async def leaky():
i = 0
while i <= 1500:
async with AsyncElasticsearch(
"https://localhost:9200",
basic_auth=("elastic", "changeme"),
verify_certs=False,
node_class="httpxasync",
) as es:
await es.info()
i += 1
if i % 100 == 0:
print(i)
if __name__ == "__main__":
asyncio.run(leaky())
There's still a leak, maybe? But it's smaller in terms of magnitude and has the same ceiling at some point.
from elasticsearch-py.
Related Issues (20)
- search with nested sort results in 0 results HOT 8
- Helpers for `bulk` method such as `async_bulk` sleep in blocking manner, preventing graceful shutdown HOT 2
- `retry_on_status` setting does not work as expected with requests that should not be retried immediately
- esι¨η½²εη΄’εΌζ ζ³εε»Ί HOT 1
- Ability to pass headers to index function / other functions or Load headers from client HOT 1
- [BUG] Missing type and settings parameters in _sync/snapshots & _async/snapshots create_respository methods HOT 3
- Incremented connection delay are not of the stated duration HOT 3
- client fails to connect to self-managed Elasticsearch instance at https://localhost:9200 using all methods described in documentation HOT 2
- Bulk action typing does not allow `TypedDict`
- Unexpected ilm.put_lifecycle behavior
- Unable to connect via AsyncElasticsearch using ssl fingerprint HOT 1
- [DOC] Add more Python Client code examples to main Elasticsearch Docs | Set up and Upgrade Elasticsearch HOT 2
- [Documentation] Access to specialized clients is not documented HOT 1
- Test failures against NumPy 2.0.0rc1
- 8.13.1: pytest fails with `ImportError: cannot import name 'OrjsonSerializer' from 'elasticsearch.serializer` in elasticsearch/serializer.py HOT 8
- Add update_trained_model_deployment to ML client HOT 1
- Improve typing of string enums
- Setting custom node_pool_class does not work and seems to be a typo HOT 2
- py>=3.10 client>7.10.0 can not run HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-py.