Comments (9)
Looks like we can conclude that there is no bug to be fixed here. Just a postgres setup that needs to be optimised.
from minion.
Out of curiosity, could you share what kind of hardware you are running this on? This can help with comparing throughput / capacity between what you have and what the rest of us have set up. Also, which version of PostgreSQL are you on?
from minion.
2-3 million should not be a problem, i have quite a bit more in a $work setup. Setting retention time lower is just not recommended because of job results if i remember correctly.
from minion.
Please show us an EXPLAIN ANALYZE
of the queries that are running too slow, that might already give some hints for what's going on.
from minion.
I may need to get more resources on the backend now that it was brought up. The PGDB may be underpowered.
PostgreSQL
PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
4CPU / 16GB ram
The CPU usage is spiking so I'll talk to the DBAs to see if I can get this bumped up to see if it resolved the contention.
from minion.
Looking at resource usage plus doing a review of how postgresql is tuned can have pretty dramatic effects on performance. As @kraih said the number of processed jobs should not be a problem in itself, so there is probably something specific that triggers the bottleneck you're seeing. Check swap usage btw. I've seen some instances of incorrect kernel parameter tuning resulting in PostgreSQL-performance being incredibly bad, with heavy swapping and extreme CPU usage.
Very cool if you could share any findings you or your DBA make, so that this information can be available to other who might experience that same thing at a later time.
from minion.
I will.
I was letting a junior dev use my database and I did discover through some investigation that he has a massively misconfigured query that is constantly running in there from a websocket application that is using the majority of my CPU resources. I'm going to move him off on to another PG instance and may resolve my issues.
from minion.
Just a quick follow-up. I had the other app removed from the PG instance and it immediately made a difference. Up to about 500k records after reset and noticed slow-downs again. On investigation I found the other dev deleting records from the his old database instead of just truncating the table or dropping the database. CPU and memory usage seemed within a decent range, but the deletes must have been creating some IO waits. During his delete the stats query went from sub-second to ~10 seconds.
I'm considering isolating this entire database from everything but Minion in this particular environment because of the amount of work that's getting pushed through it.
from minion.
I would like to revisit this.
This morning I have about 5.5 million rows in the table, approximately 14G of data. I've upgraded the database three times to 8CPU/16GB, 8CPU/32GB, and now this morning 16CPU / 32GB memory. What I'm seeing is as the queue increases, so does the latency of the stats query. I did some analyzing of the query and below is what I see, along with a suggestion on the query and the analysis.
The current state of the queue:
inactive | active | failed | finished | delayed | active_locks | active_workers | enqueued_jobs | workers | uptime |
---|---|---|---|---|---|---|---|---|---|
0 | 172 | 28763 | 5443465 | 0 | 0 | 23 | 6324456 | 24 | 5290.031395 |
The current call to the stats query takes about 8-10 seconds:
https://explain.dalibo.com/plan/3633h4g79884d2g6#plan
SELECT COUNT(*) FILTER (WHERE state = 'inactive' AND (expires IS NULL OR expires > NOW())) AS inactive_jobs,
COUNT(*) FILTER (WHERE state = 'active') AS active_jobs, COUNT(*) FILTER (WHERE state = 'failed') AS failed_jobs,
COUNT(*) FILTER (WHERE state = 'finished') AS finished_jobs,
COUNT(*) FILTER (WHERE state = 'inactive' AND delayed > NOW()) AS delayed_jobs,
(SELECT COUNT(*) FROM minion_locks WHERE expires > NOW()) AS active_locks,
COUNT(DISTINCT worker) FILTER (WHERE state = 'active') AS active_workers,
(SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) AS enqueued_jobs,
(SELECT COUNT(*) FROM minion_workers) AS workers,
EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
FROM minion_jobs
Changing the query to something without the filter aggregations using selects, ~1 second:
https://explain.dalibo.com/plan/17ac42f69a623353
select
(select count(*) from minion_jobs mj where state = 'inactive' and (expires IS NULL OR expires > NOW())) as inactive_jobs,
(select count(*) from minion_jobs mj where state = 'active') as active_jobs,
(select count(*) from minion_jobs mj where state = 'failed') as failed_jobs,
(select count(*) from minion_jobs mj where state = 'finished') as finished_jobs,
(select count(*) from minion_jobs mj where state = 'inactive' and delayed > NOW()) as delayed,
(select COUNT(*) from minion_locks where expires > NOW()) as active_locks,
(select count(DISTINCT worker) from minion_jobs mj where state = 'active') as active_workers,
(select CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) as enqueued_jobs,
(select COUNT(*) from minion_workers) as workers,
EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
from minion.
Related Issues (20)
- [Feature Request] Allow registering subclass of Minion::Job as task HOT 13
- [Feature Request] Include Test::Minion subclass/role of Test::Mojo HOT 10
- action buttons don't perform their actions, only redirect HOT 2
- Calling `finish`, `note` etc in task code fails on macOS while communicating over TCP HOT 6
- [Feature Request] Finalizier task for locks
- [ Feature request ] An event for progress? HOT 19
- Add a Minion guide HOT 2
- [Feature Request] Add search bar to Minion Admin site HOT 7
- fix typo at Minion::Guide HOT 1
- [Feature Request] ability to filter delayed jobs HOT 1
- Incorrect count for attempts upon ultimate failure? HOT 3
- popper.js not found HOT 2
- [Feature request] allow signals to be sent to workers as well HOT 5
- A Mojolicious app w/ Minion::Backend::Pg always connects to PostgreSQL db on startup HOT 2
- Repair may stuck when many failed jobs HOT 6
- Batch enqueue HOT 2
- linkcheck doesn't work - links never checked HOT 5
- Update to Bootstrap 5 HOT 4
- `remove_after` no longer respects task dependencies HOT 22
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from minion.