Giter Site home page Giter Site logo

gridl / postgres-checkup Goto Github PK

View Code? Open in Web Editor NEW

This project forked from postgres-ai/postgres-checkup

0.0 0.0 0.0 1014 KB

Postgres Health Check and SQL Performance Analysis. :point_right: THIS IS A MIRROR OF https://gitlab.com/postgres-ai/postgres-checkup

License: Other

Shell 38.30% TSQL 1.67% Dockerfile 0.06% Makefile 0.43% Go 32.04% Smarty 27.49%

postgres-checkup's Introduction

Demo

Auto-generated demonstration based on the code in the master branch (only single node analyzed): https://gitlab.com/postgres-ai-team/postgres-checkup-tests/tree/master/master. Go to md_reports/TIMESTAMP and then open 0_Full_report.md.

Disclaimer: This Tool is Designed for DBA Experts

Each report consists of 3 sections: Observations, Conclusions, and Recommendations. As of March 2019, only Observations are filled automatically. To treat the data correctly, you need deep Postgres knowledge.

You can get Conclusions, and Recommendations from the Postgres.ai team for free, send us your .json and .md with filled Observations sections: [email protected]. Limited time only. We're a small team, so "restrictions apply".

About

Postgres Checkup (postgres-checkup) is a new-generation diagnostics tool that allows users to collect deep analysis of the health of a Postgres database. It aims to detect and describe all current and potential issues in the fields of database performance, scalability, and security, providing advices how to resolve or prevent them.

Compared to a monitoring system, postgres-checkup goes deeper into the analysis of the database system and environment. It combines numerous internal characteristics of the database with data about resources and OS, producing multiple comprehensive reports. These reports use formats which are easily readable both by humans and machines and which are extremely oriented to DBA problem-solving. Monitoring systems constantly collect telemetry, help to react to issues more quickly, and are useful for post-mortem analyses. At the same time, checkups are needed for a different purpose: detect issues at a very early stage, advising on how to prevent them. This procedure is to be done on a regular basis — weekly, monthly, or quarterly. Additionally, it is recommended to run it immediately before and after any major change in the database server.

The three key principles behind postgres-checkup:

  • Unobtrusiveness: postgres-checkup’s impact on the observing system is close to zero. It does not use any heavy queries, keeping resource usage very low, and avoiding having the “observer effect.”

  • Zero install (on observed machines): it is able to analyze any Linux machine (including virtual machines), as well as Cloud Postgres instances (such as Amazon RDs or Google Cloud SQL), not requiring any additional setup or any changes. It does, hovewer, require a privileged access (a DBA usually has it anyway).

  • Complex analysis: unlike most monitoring tools, which provide raw data, postgres-checkup combines data from various parts of the system (e.g., internal Postgres stats are combined with knowledge about system resources in autovacuum setting and behavior analysis). Also, it analyzes the master database server together with all its replicas (e.g. to build the list of unused indexes).

Reports Structure

Postgres-checkup produces two kinds of reports for every check:

  • JSON reports (*.json) — can be consumed by any program or service, or stores in some database.

  • Markdown reports (*.md) — the main format for humans, may contain lists, tables, pictures. Being of native format for GitLab and GitHub, such reports are ready to be used, for instance, in their issue trackers, simplifying workflow. Markdown reports are derived from JSON reports.

Markdown reports can be converted to different formats such as HTML or PDF.

Each report consists of three sections:

  1. "Observations": automatically collected data. This is to be consumed by an expert DBA.
  2. "Conclusions": what we conclude from the Observations—what is good, what is bad (right now, it is to be manually filled for most checks).
  3. "Recommendations": action items, what to do to fix the discovered issues. Both "Conclusions" and "Recommendations" are to be consumed by engineers who will make decisions what, how and when to optimize, and how to react to the findings.

Installation and Usage

Requirements

The supported OS of the observer machine (those from which the tool is to be executed):

  • Linux (modern RHEL/CentOS or Debian/Ubuntu; others should work as well, but are not yet tested);
  • MacOS.

The following programs must be installed on the observer machine:

  • bash
  • psql
  • coreutils
  • jq >= 1.5
  • golang >= 1.8 (no binaries are shipped at the moment)
  • awk
  • sed
  • pandoc *
  • wkhtmltopdf *

Pandoc and wkhtmltopdf optional, they need for generating HTML and PDF versions of report (see --pdf and --html).

Nothing special has to be installed on the observed machines. However, these machines must run Linux (again: modern RHEL/CentOS or Debian/Ubuntu; others should work as well, but are not yet tested).

⚠️ Only Postgres version 9.6 and higher are currently supported.

How to Install

1. Install required programs

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install postgresql
sudo apt-get install coreutils
sudo apt-get install jq
sudo apt-get install golang
# Optional
sudo apt install pandoc
sudo apt-get install wkhtmltopdf

MacOS (assuming that Homebrew is installed):

brew install postgresql
brew install coreutils
brew install jq
brew install golang
# Optional
brew install pandoc
brew install Caskroom/cask/wkhtmltopdf

2. Clone this repo

Use git clone. This is the only method of installation currently supported.

Example of Use

Let's make a report for a project named prod1: Cluster slony contains two servers - db1.vpn.local and db1.vpn.local. Postgres-checkup automatically detects which one is a master:

./checkup -h db1.vpn.local -p 5432 --username postgres --dbname postgres --project prod1 -e 1
./checkup -h db2.vpn.local -p 5432 --username postgres --dbname postgres --project prod1 -e 1

Which literally means: connect to the server with given credentials, save data into prod1 project directory, as epoch of check 1. Epoch is a numerical (integer) sign of current iteration. For example: in half a year we can switch to "epoch number 2".

-h db2.vpn.local means: try to connect to host via SSH and then use remote psql command to perform checks.
If SSH is not available the local 'psql' will be used (non-psql reports will be skipped).

For comprehensive analysis, it is recommended to run the tool on the master and all its replicas – postgres-checkup is able to combine all the information from multiple nodes to a single report.

Some reports (such as K003) require two snapshots, to calculate "deltas" of metrics. So, for better results, use the following example, executing it during peak working hours, with $DISTANCE values from 10 min to a few hours:

$DISTANCE="1800" # 30 minutes

# Assuming that db2 is the master, db3 and db4 are its replicas
for host in db2.vpn.local db3.vpn.local db4.vpn.local; do
  ./checkup \
    -h "$host" \
    -p 5432 \
    --username postgres \
    --dbname postgres \
    --project prod1 \
    -e 1 \
    --file resources/checks/K000_query_analysis.sh # the first snapshot is needed only for reports K***
done
  
sleep "$DISTANCE"

for host in db2.vpn.local db3.vpn.local db4.vpn.local; do
  ./checkup \
    -h "$host" \
    -p 5432 \
    --username postgres \
    --dbname postgres \
    --project prod1 \
    -e 1
done

As a result of execution, two directories containing .json and .md files will be created:

./artifacts/prod1/json_reports/1_2018_12_06T14_12_36_+0300/
./artifacts/prod1/md_reports/1_2018_12_06T14_12_36_+0300/

Each of generated files contains information about "what we check" and collected data for all instances of the postgres cluster prod1.

A human-readable report can be found at:

./artifacts/prod1/md_reports/1_2018_12_06T14_12_36_+0300/Full_report.md

Open it with your favorite Markdown files viewer or just upload to a service such as gist.github.com.

Docker 🐳

It's possible to use the postgres-checkup from a docker container. The container will run, execute all checks and stop itself. The check result can be found inside the artifacts folder in current directory (pwd).

Usage with docker run

First of all we need a postgres. You can use any local or remote running instance. For this example we run postgres in a separate docker container:

docker run \
    --name postgres \
    -e POSTGRES_PASSWORD=postgres \
    -d postgres

We need to know a hostname or an ip address of target database to be used with -h parameter:

PG_HOST=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' postgres)

You can use official images or build an image yourself. Run this command to build an image:

docker build -t postgres-checkup .

Then run a container with postgres-checkup. This command run the tool using Postgres connection only (without SSH):

docker run --rm \
  --name postgres-checkup \
  -e PGPASSWORD="postgres" \
  -v `pwd`/artifacts:/artifacts \
  postgres-checkup \
    ./checkup \
      -h hostname \
      -p 5432 \
      --username postgres \
      --dbname postgres \
      --project c \
      -e "$(date +'%Y%m%d')001"

In this case some checks (those requiring SSH connection) will be skipped.

If you want to have all supported checks, you have to use SSH access to the target machine with Postgres database.

If SSH connection to the Postgres server is available, it is possible to pass SSH keys to the docker container, so postgres-checkup will switch to working via remote SSH calls, generating all reports:

docker run --rm \
  --name postgres-checkup \
  -v "$(pwd)/artifacts:/artifacts" \
  -v "$(echo ~)/.ssh/id_rsa:/root/.ssh/id_rsa:ro" \
  postgres-checkup ./checkup \
    -h sshusername@hostname \
    --username my_postgres_user \
    --dbname my_postgres_database \
    --project docker_test_with_ssh \
    -e "$(date +'%Y%m%d')001"

If you try to check the local instance of postgres on your host from a container, you cannot use localhost in -h parameter. You have to use a bridge between host OS and Docker Engine. By default, host IP is 172.17.0.1 in docker0 network, but it vary depending on configuration. More information here.

Usage with docker-compose

It will run an empty postgres database and postgres-checkup application that will stop once it's done. The local folder named artifacts will contain the docker subfolder with checkup reports.

docker-compose build
docker-compose up -d

docker-compose down

Credits

Some reports are based on or inspired by useful queries created and improved by various developers, including but not limited to:

Docker support implemented by Ivan Muratov.

The Full List of Reports

А. General / Infrastructural

  • A001 System, CPU, RAM, disks, virtualization #6 , #56 , #57 , #86
  • A002 PostgreSQL versions (Simple) #68, #21, #86
  • A003 Collect pg_settings #15, #167, #86
  • A004 General cluster info #7, #58, #59, #86, #162
  • A005 Extensions #8, #60, #61, #86, #167
  • A006 Config diff #9, #62, #63, #86
  • A007 ALTER SYSTEM vs postgresql.conf #18, #86
  • A008 Disk usage and file system type #19, #20
  • A010 Data checksums, wal_log_hints #22
  • A011 Connection pooling. pgbouncer #23
  • A012 Anti-crash checks #177

B. Backups and DR

  • B001 SLO/SLA, RPO, RTO #24
  • B002 File system, mount flags #25
  • B003 Full backups / incremental #26
  • B004 WAL archiving (GB/day?) - #27
  • B005 Restore checks, monitoring, alerting #28

C. Replication and HA

  • C001 SLO/SLA #29
  • C002 Sync/async, Streaming / wal transfer; logical decoding #30
  • C003 SPOFs; “-1 datacenter”, standby with traffic #31
  • C004 Failover #32
  • C005 Switchover #33
  • C006 Delayed replica (replay of 1 day of WALs) - #34

D. Monitoring / Troubleshooting

  • D001 Logging (syslog?), log_*** #35
  • D002 Useful Linux tools #36
  • D003 List of monitoring metrics #37
  • D004 pg_stat_statements, tuning opts, pg_stat_kcache #38
  • D005 track_io_timing, …, auto_explain #39
  • D006 Recommended DBA toolsets: postgres_dba, pgCenter, pgHeroother #40
  • D007 Postgres-specific tools for troubleshooting #137

E. WAL, Checkpoints

  • E001 WAL/checkpoint settings, IO #41
  • E002 Checkpoints, bgwriter, IO #42

F. Autovacuum, Bloat

  • F001 < F003 Current autovacuum-related settings #108, #164
  • F002 < F007 Transaction wraparound check #16, #171
  • F003 < F006 Dead tuples #164
  • F004 < F001 Heap bloat estimation #87, #122
  • F005 < F002 Index bloat estimation #88
  • F006 < F004 Precise heap bloat analysis
  • F007 < F005 Precise index bloat analysis
  • F008 < F008 Resource usage (CPU, Memory, disk IO) #44

G. Performance / Connections / Memory-related Settings

  • G001 Memory-related settings #45, #190
  • G002 Connections #46
  • G003 Timeouts, locks, deadlocks (amount) #47
  • G004 Query planner (diff) #48
  • G005 I/O settings #49
  • G006 Default_statistics_target (plus per table?) #50

H. Index Analysis

  • H001 Indexes: invalid #192, #51
  • H002 Unused and redundant indexes #51, #180, #170, #168, #322
  • H003 Missing FK indexes #52, #142, #173

J. Capacity Planning

  • J001 Capacity planning - #54

K. SQL query Analysis

  • K001 Globally aggregated query metrics #158, #178, #182, #184
  • K002 Workload type ("first word" analysis) #159, #178, #179, #182, #184
  • K003 Top queries by total_time #160, #172, #174, #178, #179, #182, #184, #193

L. DB Schema Analysis

  • L001 (was: H003) Current sizes of DB objects (tables, indexes, mat. views) #163
  • L002 (was: H004) Data types being used #53
  • L003 Integer (int2, int4) out-of-range risks in PKs // calculate capacity remained; optional: predict when capacity will be fully used) https://gitlab.com/postgres-ai-team/postgres-checkup/issues/237

TODO:

  • DB schema, DDL, DB schema migrations

Ideas 💡 💡 💡 :thinking_face:

  • analyze all FKs and check if data types of referencing column and referenced one match (same thing for multi-column FKs)
  • tables w/o PKs? tables not having even unique index?

PostgreSQL:

  • ready to archive WAL files (count) (need FS access) on master
  • standby lag in seconds

OS:

  • FS settings (mount command parsing)
  • meltdown/spectre patches
  • swap settings
  • memory pressure settings
  • overcommit settings
  • NUMA enabled?
  • Huge pages?
  • Transparent huge pages?

postgres-checkup's People

Contributors

anatolystansler avatar ane4ka avatar binakot avatar dmius avatar nastradamus avatar nikolays avatar o2eg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.