Giter Site home page Giter Site logo

Comments (9)

kraih avatar kraih commented on June 28, 2024 1

Looks like we can conclude that there is no bug to be fixed here. Just a postgres setup that needs to be optimised.

from minion.

christopherraa avatar christopherraa commented on June 28, 2024

Out of curiosity, could you share what kind of hardware you are running this on? This can help with comparing throughput / capacity between what you have and what the rest of us have set up. Also, which version of PostgreSQL are you on?

from minion.

kraih avatar kraih commented on June 28, 2024

2-3 million should not be a problem, i have quite a bit more in a $work setup. Setting retention time lower is just not recommended because of job results if i remember correctly.

from minion.

kraih avatar kraih commented on June 28, 2024

Please show us an EXPLAIN ANALYZE of the queries that are running too slow, that might already give some hints for what's going on.

from minion.

rshingleton avatar rshingleton commented on June 28, 2024

I may need to get more resources on the backend now that it was brought up. The PGDB may be underpowered.

PostgreSQL
PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit

4CPU / 16GB ram

The CPU usage is spiking so I'll talk to the DBAs to see if I can get this bumped up to see if it resolved the contention.

from minion.

christopherraa avatar christopherraa commented on June 28, 2024

Looking at resource usage plus doing a review of how postgresql is tuned can have pretty dramatic effects on performance. As @kraih said the number of processed jobs should not be a problem in itself, so there is probably something specific that triggers the bottleneck you're seeing. Check swap usage btw. I've seen some instances of incorrect kernel parameter tuning resulting in PostgreSQL-performance being incredibly bad, with heavy swapping and extreme CPU usage.

Very cool if you could share any findings you or your DBA make, so that this information can be available to other who might experience that same thing at a later time.

from minion.

rshingleton avatar rshingleton commented on June 28, 2024

I will.
I was letting a junior dev use my database and I did discover through some investigation that he has a massively misconfigured query that is constantly running in there from a websocket application that is using the majority of my CPU resources. I'm going to move him off on to another PG instance and may resolve my issues.

from minion.

rshingleton avatar rshingleton commented on June 28, 2024

Just a quick follow-up. I had the other app removed from the PG instance and it immediately made a difference. Up to about 500k records after reset and noticed slow-downs again. On investigation I found the other dev deleting records from the his old database instead of just truncating the table or dropping the database. CPU and memory usage seemed within a decent range, but the deletes must have been creating some IO waits. During his delete the stats query went from sub-second to ~10 seconds.

I'm considering isolating this entire database from everything but Minion in this particular environment because of the amount of work that's getting pushed through it.

from minion.

rshingleton avatar rshingleton commented on June 28, 2024

@kraih

I would like to revisit this.

This morning I have about 5.5 million rows in the table, approximately 14G of data. I've upgraded the database three times to 8CPU/16GB, 8CPU/32GB, and now this morning 16CPU / 32GB memory. What I'm seeing is as the queue increases, so does the latency of the stats query. I did some analyzing of the query and below is what I see, along with a suggestion on the query and the analysis.

The current state of the queue:

inactive active failed finished delayed active_locks active_workers enqueued_jobs workers uptime
0 172 28763 5443465 0 0 23 6324456 24 5290.031395

The current call to the stats query takes about 8-10 seconds:
https://explain.dalibo.com/plan/3633h4g79884d2g6#plan

SELECT COUNT(*) FILTER (WHERE state = 'inactive' AND (expires IS NULL OR expires > NOW())) AS inactive_jobs,
       COUNT(*) FILTER (WHERE state = 'active') AS active_jobs, COUNT(*) FILTER (WHERE state = 'failed') AS failed_jobs,
       COUNT(*) FILTER (WHERE state = 'finished') AS finished_jobs,
       COUNT(*) FILTER (WHERE state = 'inactive' AND delayed > NOW()) AS delayed_jobs,
       (SELECT COUNT(*) FROM minion_locks WHERE expires > NOW()) AS active_locks,
       COUNT(DISTINCT worker) FILTER (WHERE state = 'active') AS active_workers,
       (SELECT CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) AS enqueued_jobs,
       (SELECT COUNT(*) FROM minion_workers) AS workers,
       EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime
     FROM minion_jobs

Changing the query to something without the filter aggregations using selects, ~1 second:
https://explain.dalibo.com/plan/17ac42f69a623353

select 
(select count(*) from minion_jobs mj where state = 'inactive' and (expires IS NULL OR expires > NOW()))  as inactive_jobs,
(select count(*) from minion_jobs mj where state = 'active') as active_jobs,
(select count(*) from minion_jobs mj where state = 'failed') as failed_jobs,
(select count(*) from minion_jobs mj where state = 'finished') as finished_jobs,
(select count(*) from minion_jobs mj where state = 'inactive' and delayed > NOW()) as delayed,
(select COUNT(*) from minion_locks where expires > NOW()) as active_locks,
(select count(DISTINCT worker)  from minion_jobs mj  where state = 'active') as active_workers,
(select CASE WHEN is_called THEN last_value ELSE 0 END FROM minion_jobs_id_seq) as enqueued_jobs,
(select COUNT(*) from minion_workers) as workers,
EXTRACT(EPOCH FROM NOW() - PG_POSTMASTER_START_TIME()) AS uptime

from minion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.