Giter Site home page Giter Site logo

Comments (6)

kraih avatar kraih commented on June 16, 2024

Interesting. I'll have to double check, but it looks like a good optimization.

from minion.

kraih avatar kraih commented on June 16, 2024

Full explain analyze results for one of my larger Minion setups:

# explain analyze select id FROM minion_jobs AS j WHERE (finished <= NOW() - INTERVAL '2 days' AND NOT EXISTS ( SELECT 1 FROM minion_jobs WHERE parents @> ARRAY[j.id] AND state != 'finished' ) AND state = 'finished') OR (expires <= NOW() AND state = 'inactive');
                                                                                      QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on minion_jobs j  (cost=0.00..15210224159.25 rows=517279 width=8) (actual time=0.280..89508.393 rows=1026181 loops=1)
   Filter: (((finished <= (now() - '2 days'::interval)) AND (NOT (SubPlan 1)) AND (state = 'finished'::minion_state)) OR ((expires <= now()) AND (state = 'inactive'::minion_state)))
   Rows Removed by Filter: 10818
   SubPlan 1
     ->  Bitmap Heap Scan on minion_jobs  (cost=86.99..14629.99 rows=1 width=0) (actual time=0.086..0.086 rows=0 loops=1026182)
           Recheck Cond: (parents @> ARRAY[j.id])
           Filter: (state <> 'finished'::minion_state)
           Rows Removed by Filter: 1
           Heap Blocks: exact=873921
           ->  Bitmap Index Scan on minion_jobs_parents_idx  (cost=0.00..86.99 rows=5198 width=0) (actual time=0.084..0.084 rows=1 loops=1026182)
                 Index Cond: (parents @> ARRAY[j.id])
 Planning Time: 1.512 ms
 Execution Time: 89563.174 ms
(13 rows)
# explain analyze select j.id FROM minion_jobs AS j LEFT JOIN minion_jobs children ON children.state != 'finished' and array_length(children.parents,1)>0 and j.id = any(children.parents) WHERE (j.finished <= NOW() - INTERVAL '2 day' AND children.id IS NULL AND j.state = 'finished') OR (j.expires <= NOW() AND j.state = 'inactive');
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=1000.00..144554.87 rows=1 width=8) (actual time=140.286..13444.584 rows=1026181 loops=1)
   Join Filter: (j.id = ANY (children.parents))
   Rows Removed by Join Filter: 8295507
   Filter: (((j.finished <= (now() - '2 days'::interval)) AND (children.id IS NULL) AND (j.state = 'finished'::minion_state)) OR ((j.expires <= now()) AND (j.state = 'inactive'::minion_state)))
   Rows Removed by Filter: 10831
   ->  Seq Scan on minion_jobs j  (cost=0.00..51092.56 rows=1039656 width=28) (actual time=0.028..418.200 rows=1037011 loops=1)
   ->  Materialize  (cost=1000.00..49276.93 rows=1 width=35) (actual time=0.000..0.001 rows=8 loops=1037011)
         ->  Gather  (cost=1000.00..49276.93 rows=1 width=35) (actual time=19.233..140.259 rows=8 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on minion_jobs children  (cost=0.00..48276.83 rows=1 width=35) (actual time=17.599..135.238 rows=3 loops=3)
                     Filter: ((state <> 'finished'::minion_state) AND (array_length(parents, 1) > 0))
                     Rows Removed by Filter: 345668
 Planning Time: 0.569 ms
 Execution Time: 13491.001 ms
(15 rows)

And for completeness, this is what the table looks like currently:

# \d minion_jobs
                                      Table "public.minion_jobs"
  Column  |           Type           | Collation | Nullable |                 Default                 
----------+--------------------------+-----------+----------+-----------------------------------------
 id       | bigint                   |           | not null | nextval('minion_jobs_id_seq'::regclass)
 args     | jsonb                    |           | not null | 
 created  | timestamp with time zone |           | not null | now()
 delayed  | timestamp with time zone |           | not null | 
 finished | timestamp with time zone |           |          | 
 priority | integer                  |           | not null | 
 result   | jsonb                    |           |          | 
 retried  | timestamp with time zone |           |          | 
 retries  | integer                  |           | not null | 0
 started  | timestamp with time zone |           |          | 
 state    | minion_state             |           | not null | 'inactive'::minion_state
 task     | text                     |           | not null | 
 worker   | bigint                   |           |          | 
 queue    | text                     |           | not null | 'default'::text
 attempts | integer                  |           | not null | 1
 parents  | bigint[]                 |           |          | '{}'::bigint[]
 notes    | jsonb                    |           | not null | '{}'::jsonb
 expires  | timestamp with time zone |           |          | 
 lax      | boolean                  |           | not null | false
Indexes:
    "minion_jobs_pkey" PRIMARY KEY, btree (id)
    "minion_jobs_expires_idx" btree (expires)
    "minion_jobs_notes_idx" gin (notes)
    "minion_jobs_parents_idx" gin (parents)
    "minion_jobs_state_priority_id_idx" btree (state, priority DESC, id)
Check constraints:
    "args" CHECK (jsonb_typeof(args) = 'array'::text)
    "minion_jobs_notes_check" CHECK (jsonb_typeof(notes) = 'object'::text)
Triggers:
    minion_jobs_notify_workers_trigger AFTER INSERT OR UPDATE OF retries ON minion_jobs FOR EACH ROW EXECUTE FUNCTION minion_jobs_notify_workers()

from minion.

ilmari avatar ilmari commented on June 16, 2024

Visualisations for the original plan and the updated one.

from minion.

ilmari avatar ilmari commented on June 16, 2024

It would be interesting to see what the result is of using UNION ALL instead of OR:

SELECT j.id
  FROM minion_jobs AS j 
  LEFT JOIN minion_jobs children
    ON children.state != 'finished' AND array_length(children.parents,1)>0 AND j.id = ANY(children.parents)
 WHERE j.state = 'finished' AND j.finished <= NOW() - INTERVAL '2 day' AND children.id IS NULL
UNION ALL
SELECT j.id FROM minion_jobs AS j
 WHERE j.state = 'inactive' AND j.expires <= NOW();

This might also benefit from indexes on (expires) WHERE state = 'inactive' and (finished) WHERE state = 'finished'.

from minion.

kraih avatar kraih commented on June 16, 2024

Results for the UNION ALL version:

# explain analyze SELECT j.id FROM minion_jobs AS j LEFT JOIN minion_jobs children ON children.state != 'finished' AND array_length(children.parents,1)>0 AND j.id = ANY(children.parents) WHERE j.state = 'finished' AND j.finished <= NOW() - INTERVAL '2 day' AND children.id IS NULL UNION ALL SELECT j.id FROM minion_jobs AS j WHERE j.state = 'inactive' AND j.expires <= NOW();
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=1000.00..136994.01 rows=2 width=8) (actual time=129.375..11762.892 rows=1026222 loops=1)
   ->  Nested Loop Left Join  (cost=1000.00..136987.78 rows=1 width=8) (actual time=129.374..11676.382 rows=1026222 loops=1)
         Join Filter: (j.id = ANY (children.parents))
         Rows Removed by Join Filter: 4104888
         Filter: (children.id IS NULL)
         ->  Seq Scan on minion_jobs j  (cost=0.00..61625.54 rows=1041423 width=8) (actual time=0.056..575.302 rows=1026222 loops=1)
               Filter: ((state = 'finished'::minion_state) AND (finished <= (now() - '2 days'::interval)))
               Rows Removed by Filter: 10833
         ->  Materialize  (cost=1000.00..49326.67 rows=1 width=35) (actual time=0.000..0.000 rows=4 loops=1026222)
               ->  Gather  (cost=1000.00..49326.66 rows=1 width=35) (actual time=22.857..129.393 rows=4 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Parallel Seq Scan on minion_jobs children  (cost=0.00..48326.56 rows=1 width=35) (actual time=16.749..124.557 rows=1 loops=3)
                           Filter: ((state <> 'finished'::minion_state) AND (array_length(parents, 1) > 0))
                           Rows Removed by Filter: 345684
   ->  Index Scan using minion_jobs_state_priority_id_idx on minion_jobs j_1  (cost=0.42..6.20 rows=1 width=8) (actual time=0.077..0.078 rows=0 loops=1)
         Index Cond: (state = 'inactive'::minion_state)
         Filter: (expires <= now())
         Rows Removed by Filter: 1
 Planning Time: 0.674 ms
 Execution Time: 11806.823 ms

from minion.

andrii-suse avatar andrii-suse commented on June 16, 2024

It would be interesting to see what the result is of using UNION ALL instead of OR

It didn't do significant difference in my case, but I am glad that you are looking at it as well.
But since query takes too long in extreme cases - it can split it into two and delete expired and completed jobs in separate queries.

from minion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.