Giter Site home page Giter Site logo

pg_vacuum's Introduction

pg_vacuum

This python program does vacuum/analyze/freeze actions on tables and materialized views based on the user inputs provided.

(c) 2018-2022 SQLEXEC LLC
GNU V3 and MIT licenses are conveyed accordingly.
Bugs can be reported @ [email protected]

History

The first version of this program was created in 2018.
Program renamed from optimize_db.py to pg_vacuum.py (December 2020)

Overview

This program is useful to identify and vacuum tables. Most inputs are optional, and either an optional parameter is not used or a default value is used if not provided. That means you can override internal parameters by specifying them on the command line. The latest version of pg_vacuum incorporates parallel vacuuming if maintenance workers are available. Here are the parameters:
-H --host host name
-d --dbname database name
-p --dbport database port
-U --dbuser database user
-s --maxsize max table size that will be considered (default, 400GB)
-y --analyzemaxdays Analyzes older than this will be considered
-x --vacuummaxdays Vacuums older than this will be considered
-t --mindeadtups minimum dead tups before considering a vacuum
-z --minmodanalyzed minimum mod since analyzed tups before considering an analyze
-b --maxtables max number of tables to vacuum (default 9999)
-i --ignoreparts ignore partitioned tables
-a --async run async jobs ignoring thresholds
-m --schema if provided, perform actions only on this schema
-c --check Only Check stats on tables
-r --dryrun do a dry run for analysis before actually running it.
-q --inquiry show stats to validate run. Best used with dryrun. Values: "all" | "found"
-v --verbose Used primarily for debugging
-e --autotune specifies scale_factor to use for both vaccums and analyzes (range: 0.00001 to 0.2)
-f --freeze perform freeze in percentage provided if near the wraparound threshold (range: 10 - 99)
-n --nullsonly Only consider tables with no vacuum or analyze history

Requirements

  1. python 2.7 or above
  2. python packages: psycopg2
  3. Works on Linux and Windows.
  4. PostgreSQL versions 9.6 and up

Assumptions

  1. Only when a table is within 25 million of reaching the wraparound threshold is it considered a FREEZE candidate.
  2. By default, catalog tables are ignored unless specified explicitly with the --schema option.
  3. If passwords are required (authentication <> trust), then you must define credentials in the .pgpass (linux)/pgpass.conf (windows) files.
  4. The less parameters you supply, the more wide-open the vacuum operation, i.e., more tables qualify

Vacuuming Best Practices

Once you have your autovacuum tuned for your specific SQL workload, it is usually good to combine that with some nightly cronjobs to take some of the load off of the autovacuum daemon. The following 2 pg_vacuum.py jobs are an example of one setup to run on a nightly basis in which the 1st one checks for tables with no history of vacuums or analyzes and the 2nd one does them based on how long since the previous ones were done and perhaps on n_dead_tups/n_mod_since_analyze thresholds:
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 --nullsonly
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 -x2 -y 20 -t 10000

Basically, always check for tables without vacuums and analyzes. This can happen on newly created tables or after a PG service crashed, invalidating all the vacuum stats. The second job just makes sure we do vacuuming at least every 2 days if dead tuples has reached out maximum. Do analyzes for tables that haven't been analyzed in the last 20 days.

Examples

NOTE: all examples shown are in --dryrun mode since this is a best practice before actually running the command.
Vacuum all tables that don't have any vacuums/analyzes. Only do tables less that 100MB in size. Bypass partitioned tables. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres --maxsize 1000000000 --nullsonly --ignoreparts --dryrun

Same as before but only do it for the first 50 tables.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --nullsonly --ignoreparts --dryrun -b 50

Same as before but only do it for a specific schema.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --nullsonly --ignoreparts --dryrun -b 50 --schema concept

Vacuum tables that haven't been vacuumed in 10 days, 20 days for analyzes. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 -x 20 -y 20 --dryrun

Vacuum tables that have more than 1000 dead tuples and haven't been vacuumed in 20 days. Dryrun first.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 -x 20 -y 20 -t 1000 --dryrun

Run a check to get the overall status of vacuuming in the database.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --check

Vacuum Freeze tables that are at the 90% threshold for transaction wrap-around to kick in.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --pctfreeze 90 --freeze --dryrun

Vacuum/analyze tables whose dead tups/tups since analyzed > threshold factor.
pg_vacuum.py -H localhost -d testing -p 5432 -U postgres -s 1000000000 --autotune 0.1 --dryrun

pg_vacuum's People

Contributors

michaeldba avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.