Giter Site home page Giter Site logo

pg_query_state's Introduction

Build Status codecov

pg_query_state

The pg_query_state module provides facility to know the current state of query execution on working backend. To enable this extension you have to patch the stable version of PostgreSQL, recompile it and deploy new binaries. All patch files are located in patches/ directory and tagged with suffix of PostgreSQL version number.

Overview

Each nonutility query statement (SELECT/INSERT/UPDATE/DELETE) after optimization/planning stage is translated into plan tree which is kind of imperative representation of SQL query execution algorithm. EXPLAIN ANALYZE request allows to demonstrate execution statistics gathered from each node of plan tree (full time of execution, number rows emitted to upper nodes, etc). But this statistics is collected after execution of query. This module allows to show actual statistics of query running gathered from external backend. At that, format of resulting output is almost identical to ordinal EXPLAIN ANALYZE. Thus users are able to track of query execution in progress.

In fact, this module is able to explore external backend and determine its actual state. Particularly it's helpful when backend executes a heavy query and gets stuck.

Use cases

Using this module there can help in the following things:

  • detect a long query (along with other monitoring tools)
  • overwatch the query execution

Installation

To install pg_query_state, please apply corresponding patches custom_signal_(PG_VERSION).patch and runtime_explain_(PG_VERSION).patch (or runtime_explain.patch for PG version <= 10.0) from the patches/ directory to reqired stable version of PostgreSQL and rebuild PostgreSQL.

To do this, run the following commands from the postgresql directory:

patch -p1 < path_to_pg_query_state_folder/patches/runtime_explain_(PG_VERSION).patch
patch -p1 < path_to_pg_query_state_folder/patches/custom_signal_(PG_VERSION).patch

Then execute this in the module's directory:

make install USE_PGXS=1

To execute the command correctly, make sure you have the PATH or PG_CONFIG variable set.

export PATH=path_to_your_bin_folder:$PATH
# or
export PG_CONFIG=path_to_your_bin_folder/pg_config

Add module name to the shared_preload_libraries parameter in postgresql.conf:

shared_preload_libraries = 'pg_query_state'

It is essential to restart the PostgreSQL instance. After that, execute the following query in psql:

CREATE EXTENSION pg_query_state;

Done!

Tests

Test using parallel sessions with Python 3+ compatible script:

python3 tests/pg_qs_test_runner.py [OPTION]...

prerequisite packages:

  • psycopg2 version 2.6 or later
  • PyYAML version 3.11 or later
  • progressbar2 for stress test progress reporting

options:

  • - -host --- postgres server host, default value is localhost
  • - -port --- postgres server port, default value is 5432
  • - -database --- database name, default value is postgres
  • - -user --- user name, default value is postgres
  • - -password --- user's password, default value is empty
  • - -tpc-ds-setup --- setup database to run TPC-DS benchmark
  • - -tpc-ds-run --- runs only stress tests on TPC-DS benchmark

Or run all tests in Docker using:

export LEVEL=hardcore
export USE_TPCDS=1
export PG_VERSION=12

./mk_dockerfile.sh

docker-compose build
docker-compose run tests

There are different test levels: hardcore, nightmare (runs tests under valgrind) and stress (runs tests under TPC-DS load).

Function pg_query_state

pg_query_state(
        integer     pid,
        verbose     boolean DEFAULT FALSE,
        costs       boolean DEFAULT FALSE,
        timing      boolean DEFAULT FALSE,
        buffers     boolean DEFAULT FALSE,
        triggers    boolean DEFAULT FALSE,
        format      text    DEFAULT 'text'
) returns TABLE (
    pid             integer,
    frame_number    integer,
    query_text      text,
    plan            text,
    leader_pid      integer
)

extracts the current query state from backend with specified pid. Since parallel query can spawn multiple workers and function call causes nested subqueries so that state of execution may be viewed as stack of running queries, return value of pg_query_state has type TABLE (pid integer, frame_number integer, query_text text, plan text, leader_pid integer). It represents tree structure consisting of leader process and its spawned workers identified by pid. Each worker refers to leader through leader_pid column. For leader process the value of this column is null. The state of each process is represented as stack of function calls. Each frame of that stack is specified as correspondence between frame_number starting from zero, query_text and plan with online statistics columns.

Thus, user can see the states of main query and queries generated from function calls for leader process and all workers spawned from it.

In process of execution some nodes of plan tree can take loops of full execution. Therefore statistics for each node consists of two parts: average statistics for previous loops just like in EXPLAIN ANALYZE output and statistics for current loop if node have not finished.

Optional arguments:

  • verbose --- use EXPLAIN VERBOSE for plan printing;
  • costs --- add costs for each node;
  • timing --- print timing data for each node, if collecting of timing statistics is turned off on called side resulting output will contain WARNING message timing statistics disabled;
  • buffers --- print buffers usage, if collecting of buffers statistics is turned off on called side resulting output will contain WARNING message buffers statistics disabled;
  • triggers --- include triggers statistics in result plan trees;
  • format --- EXPLAIN format to be used for plans printing, possible values: {text, xml, json, yaml}.

If callable backend is not executing any query the function prints INFO message about backend's state taken from pg_stat_activity view if it exists there.

Warning: Calling role have to be superuser or member of the role whose backend is being called. Otherwise function prints ERROR message permission denied.

Configuration settings

There are several user-accessible GUC variables designed to toggle the whole module and the collecting of specific statistic parameters while query is running:

  • pg_query_state.enable --- disable (or enable) pg_query_state completely, default value is true
  • pg_query_state.enable_timing --- collect timing data for each node, default value is false
  • pg_query_state.enable_buffers --- collect buffers usage, default value is false

This parameters is set on called side before running any queries whose states are attempted to extract. Warning: if pg_query_state.enable_timing is turned off the calling side cannot get time statistics, similarly for pg_query_state.enable_buffers parameter.

Examples

Set maximum number of parallel workers on gather node equals 2:

postgres=# set max_parallel_workers_per_gather = 2;

Assume one backend with pid = 49265 performs a simple query:

postgres=# select pg_backend_pid();
 pg_backend_pid
 ----------------
          49265
(1 row)
postgres=# select count(*) from foo join bar on foo.c1=bar.c1;

Other backend can extract intermediate state of execution that query:

postgres=# \x
postgres=# select * from pg_query_state(49265);
-[ RECORD 1 ]+-------------------------------------------------------------------------------------------------------------------------
pid          | 49265
frame_number | 0
query_text   | select count(*) from foo join bar on foo.c1=bar.c1;
plan         | Finalize Aggregate (Current loop: actual rows=0, loop number=1)                                                         +
             |   ->  Gather (Current loop: actual rows=0, loop number=1)                                                               +
             |         Workers Planned: 2                                                                                              +
             |         Workers Launched: 2                                                                                             +
             |         ->  Partial Aggregate (Current loop: actual rows=0, loop number=1)                                              +
             |               ->  Nested Loop (Current loop: actual rows=12, loop number=1)                                             +
             |                     Join Filter: (foo.c1 = bar.c1)                                                                      +
             |                     Rows Removed by Join Filter: 5673232                                                                +
             |                     ->  Parallel Seq Scan on foo (Current loop: actual rows=12, loop number=1)                          +
             |                     ->  Seq Scan on bar (actual rows=500000 loops=11) (Current loop: actual rows=173244, loop number=12)
leader_pid   | (null)
-[ RECORD 2 ]+-------------------------------------------------------------------------------------------------------------------------
pid          | 49324
frame_number | 0
query_text   | <parallel query>
plan         | Partial Aggregate (Current loop: actual rows=0, loop number=1)                                                          +
             |   ->  Nested Loop (Current loop: actual rows=10, loop number=1)                                                         +
             |         Join Filter: (foo.c1 = bar.c1)                                                                                  +
             |         Rows Removed by Join Filter: 4896779                                                                            +
             |         ->  Parallel Seq Scan on foo (Current loop: actual rows=10, loop number=1)                                      +
             |         ->  Seq Scan on bar (actual rows=500000 loops=9) (Current loop: actual rows=396789, loop number=10)
leader_pid   | 49265
-[ RECORD 3 ]+-------------------------------------------------------------------------------------------------------------------------
pid          | 49323
frame_number | 0
query_text   | <parallel query>
plan         | Partial Aggregate (Current loop: actual rows=0, loop number=1)                                                          +
             |   ->  Nested Loop (Current loop: actual rows=11, loop number=1)                                                         +
             |         Join Filter: (foo.c1 = bar.c1)                                                                                  +
             |         Rows Removed by Join Filter: 5268783                                                                            +
             |         ->  Parallel Seq Scan on foo (Current loop: actual rows=11, loop number=1)                                      +
             |         ->  Seq Scan on bar (actual rows=500000 loops=10) (Current loop: actual rows=268794, loop number=11)
leader_pid   | 49265

In example above working backend spawns two parallel workers with pids 49324 and 49323. Their leader_pid column's values clarify that these workers belong to the main backend. Seq Scan node has statistics on passed loops (average number of rows delivered to Nested Loop and number of passed loops are shown) and statistics on current loop. Other nodes has statistics only for current loop as this loop is first (loop number = 1).

Assume first backend executes some function:

postgres=# select n_join_foo_bar();

Other backend can get the follow output:

postgres=# select * from pg_query_state(49265);
-[ RECORD 1 ]+------------------------------------------------------------------------------------------------------------------
pid          | 49265
frame_number | 0
query_text   | select n_join_foo_bar();
plan         | Result (Current loop: actual rows=0, loop number=1)
leader_pid   | (null)
-[ RECORD 2 ]+------------------------------------------------------------------------------------------------------------------
pid          | 49265
frame_number | 1
query_text   | SELECT (select count(*) from foo join bar on foo.c1=bar.c1)
plan         | Result (Current loop: actual rows=0, loop number=1)                                                              +
             |   InitPlan 1 (returns $0)                                                                                        +
             |     ->  Aggregate (Current loop: actual rows=0, loop number=1)                                                   +
             |           ->  Nested Loop (Current loop: actual rows=51, loop number=1)                                          +
             |                 Join Filter: (foo.c1 = bar.c1)                                                                   +
             |                 Rows Removed by Join Filter: 51636304                                                            +
             |                 ->  Seq Scan on bar (Current loop: actual rows=52, loop number=1)                                +
             |                 ->  Materialize (actual rows=1000000 loops=51) (Current loop: actual rows=636355, loop number=52)+
             |                       ->  Seq Scan on foo (Current loop: actual rows=1000000, loop number=1)
leader_pid   | (null)

First row corresponds to function call, second - to query which is in the body of that function.

We can get result plans in different format (e.g. json):

postgres=# select * from pg_query_state(pid := 49265, format := 'json');
-[ RECORD 1 ]+------------------------------------------------------------
pid          | 49265
frame_number | 0
query_text   | select * from n_join_foo_bar();
plan         | {                                                          +
             |   "Plan": {                                                +
             |     "Node Type": "Function Scan",                          +
             |     "Parallel Aware": false,                               +
             |     "Function Name": "n_join_foo_bar",                     +
             |     "Alias": "n_join_foo_bar",                             +
             |     "Current loop": {                                      +
             |       "Actual Loop Number": 1,                             +
             |       "Actual Rows": 0                                     +
             |     }                                                      +
             |   }                                                        +
             | }
leader_pid   | (null)
-[ RECORD 2 ]+------------------------------------------------------------
pid          | 49265
frame_number | 1
query_text   | SELECT (select count(*) from foo join bar on foo.c1=bar.c1)
plan         | {                                                          +
             |   "Plan": {                                                +
             |     "Node Type": "Result",                                 +
             |     "Parallel Aware": false,                               +
             |     "Current loop": {                                      +
             |       "Actual Loop Number": 1,                             +
             |       "Actual Rows": 0                                     +
             |     },                                                     +
             |     "Plans": [                                             +
             |       {                                                    +
             |         "Node Type": "Aggregate",                          +
             |         "Strategy": "Plain",                               +
             |         "Partial Mode": "Simple",                          +
             |         "Parent Relationship": "InitPlan",                 +
             |         "Subplan Name": "InitPlan 1 (returns $0)",         +
             |         "Parallel Aware": false,                           +
             |         "Current loop": {                                  +
             |           "Actual Loop Number": 1,                         +
             |           "Actual Rows": 0                                 +
             |         },                                                 +
             |         "Plans": [                                         +
             |           {                                                +
             |             "Node Type": "Nested Loop",                    +
             |             "Parent Relationship": "Outer",                +
             |             "Parallel Aware": false,                       +
             |             "Join Type": "Inner",                          +
             |             "Current loop": {                              +
             |               "Actual Loop Number": 1,                     +
             |               "Actual Rows": 610                           +
             |             },                                             +
             |             "Join Filter": "(foo.c1 = bar.c1)",            +
             |             "Rows Removed by Join Filter": 610072944,      +
             |             "Plans": [                                     +
             |               {                                            +
             |                 "Node Type": "Seq Scan",                   +
             |                 "Parent Relationship": "Outer",            +
             |                 "Parallel Aware": false,                   +
             |                 "Relation Name": "bar",                    +
             |                 "Alias": "bar",                            +
             |                 "Current loop": {                          +
             |                   "Actual Loop Number": 1,                 +
             |                   "Actual Rows": 611                       +
             |                 }                                          +
             |               },                                           +
             |               {                                            +
             |                 "Node Type": "Materialize",                +
             |                 "Parent Relationship": "Inner",            +
             |                 "Parallel Aware": false,                   +
             |                 "Actual Rows": 1000000,                    +
             |                 "Actual Loops": 610,                       +
             |                 "Current loop": {                          +
             |                   "Actual Loop Number": 611,               +
             |                   "Actual Rows": 73554                     +
             |                 },                                         +
             |                 "Plans": [                                 +
             |                   {                                        +
             |                     "Node Type": "Seq Scan",               +
             |                     "Parent Relationship": "Outer",        +
             |                     "Parallel Aware": false,               +
             |                     "Relation Name": "foo",                +
             |                     "Alias": "foo",                        +
             |                     "Current loop": {                      +
             |                       "Actual Loop Number": 1,             +
             |                       "Actual Rows": 1000000               +
             |                     }                                      +
             |                   }                                        +
             |                 ]                                          +
             |               }                                            +
             |             ]                                              +
             |           }                                                +
             |         ]                                                  +
             |       }                                                    +
             |     ]                                                      +
             |   }                                                        +
             | }
leader_pid   | (null)

Feedback

Do not hesitate to post your issues, questions and new ideas at the issues page.

Authors

Maksim Milyutin
Alexey Kondratov [email protected] Postgres Professional Ltd., Russia

pg_query_state's People

Contributors

anna-akenteva avatar antamel avatar cherkashinsergey avatar daniel-95 avatar dlepikhova avatar funny-falcon avatar ildus avatar knizhnik avatar kovdb75 avatar maksl avatar maksm90 avatar ololobus avatar sokolcati avatar vbwagner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pg_query_state's Issues

create a CURSOR function cause the server to die

HI:
I occur a problem about pg_query_state extension and hope receive your replay,
environment context:
1. ENVIRONMENT
postgresql10.5 + pg_query_state(PG10) ;
2.add "shared_preload_libraries = 'pg_query_state' on postgresql.conf.
Whether or not you create an extension will cause this problem.
3. exec the following SQL:

  • create table test(id int, value text);

  • insert into test values(1, 'one'),(2,'two');

  • create or replace function test_fun() returns void AS 'declare cursor_name cursor for select value from test;' LANGUAGE SQL;

  • select test_fun();
    and cause the server to die:
    # select test_fun(); TRAP: FailedAssertion("!(list->type == T_List || list->type == T_IntList || list->type == T_OidList)", File: "list.c", Line: 45) 2018-11-06 16:03:10.157 CST [31276] LOG: server process (PID 31285) was terminated by signal 6: Aborted 2018-11-06 16:03:10.157 CST [31276] DETAIL: Failed process was running: select test_fun(); 2018-11-06 16:03:10.157 CST [31276] LOG: terminating any other active server processes 2018-11-06 16:03:10.159 CST [31281] WARNING: terminating connection because of crash of another server process 2018-11-06 16:03:10.159 CST [31281] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

  1. Executing the following two SQL will not cause this problem:
  • create or replace function test_fun()
    returns void
    AS 'declare cursor_name cursor for select value from test;
    close cursor_name;' #add more
    LANGUAGE SQL;

  • create or replace function test_fun2()
    RETURNS void AS
    $$
    DECLARE
    cursor_name2 CURSOR FOR select * from test;
    c_id INT;
    c_value TEXT;
    BEGIN
    OPEN cursor_name2;
    FETCH cursor_name2 INTO c_id, c_value;
    close cursor_name2;
    END;
    $$
    LANGUAGE PLPGSQL;

                                                                                                       From  LI Guangxian
    
                                                                                                                 2018-11-07
    

Используется memcpy() для строк, не добавляется '\0'

Статический анализатор PVS Studio выдал такие ошибки:
contrib/pg_query_state/signal_handler.c 125 err V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null.
contrib/pg_query_state/signal_handler.c 129 err V575 The 'memcpy' function doesn't copy the whole string. Use 'strcpy / strcpy_s' function to preserve terminal null.

Вот, собственно, тот кусок, на который он ругается:

static void
serialize_stack_frame(char **dest, stack_frame *qs_frame)
{
	SET_VARSIZE(*dest, strlen(qs_frame->query) + VARHDRSZ);
	memcpy(VARDATA(*dest), qs_frame->query, strlen(qs_frame->query));
	*dest += INTALIGN(VARSIZE(*dest));

	SET_VARSIZE(*dest, strlen(qs_frame->plan) + VARHDRSZ);
	memcpy(VARDATA(*dest), qs_frame->plan, strlen(qs_frame->plan));
	*dest += INTALIGN(VARSIZE(*dest));
}

Я, если честно, так сразу не понимаю, false positive это или нет. Зависит от контекста.

Например, если данные из dest где-то печатаются, то в конце должен быть '\0', иначе это undefined behaviour. И если под dest изначально выделен фиксированный участок памяти, а то, что мы туда копируем, может быть неограниченно длинным, то неплохо было бы проверять длину и убеждаться, что мы не выйдем за этот фиксированный участок памяти.

cann't userd in postgresql 9.6 ?

HI,
I compile this contrib into postgresql 9.6, but when create extension pg_query_state, postgres crashed.

step :

   git clone ...
   git checkout PGPRO9_6
   patch postgresql9.6 with custom_signals.patch 
   patch postgresql9.6 with runtime_explain.patch
   recompile postgresql9.6
   compile pg_query_state 
   modify postgresql.conf 
      shared_preload_libraries = 'pg_query_state,pg_pathman,pg_stat_statements'
   restart database
   psql -c create extension pg_query_state, then crashed
CentOS 6.x x64
gcc 4.8.2

best regards,
digoal

Test corner_cases fails due to timing issues.

On "slow" runs we can get something like this:

test corner_cases                 ... FAILED     8738 ms

======================
 1 of 1 tests failed. 
======================

The differences that caused some tests to fail can be viewed in the
file "/bf/ENT_master/pgsql.build/contrib/pg_query_state/isolation_output/regression.diffs".  A copy of the test summary that you see
above is saved in the file "/bf/ENT_master/pgsql.build/contrib/pg_query_state/isolation_output/regression.out".

make[1]: *** [Makefile:35: installcheck] Ошибка 1
make[1]: выход из каталога «/bf/ENT_master/pgsql.build/contrib/pg_query_state»
make: *** [Makefile:116: installcheck-pg_query_state-recurse] Ошибка 2


================= pgsql.build/contrib/pg_query_state/isolation_output/regression.diffs ===================
diff -U3 /bf/ENT_master/pgsql.build/contrib/pg_query_state/expected/corner_cases.out /bf/ENT_master/pgsql.build/contrib/pg_query_state/isolation_output/results/corner_cases.out
--- /bf/ENT_master/pgsql.build/contrib/pg_query_state/expected/corner_cases.out	2021-11-05 04:05:56.529952097 +0300
+++ /bf/ENT_master/pgsql.build/contrib/pg_query_state/isolation_output/results/corner_cases.out	2021-11-05 06:53:58.446478561 +0300
@@ -66,7 +66,8 @@
 (1 row)
 
 s2: INFO:  state of backend is idle
-step s2_pg_qs_counterpart: select pg_query_state(counterpart_pid(0));
+step s2_pg_qs_counterpart: select pg_query_state(counterpart_pid(0)); <waiting ...>
+step s2_pg_qs_counterpart: <... completed>
 pg_query_state
 --------------
 (0 rows)```

License

Is this PostgreSQL License or GNU or APACHE ? Could you please update it ?

вопрос по установке патчей custom_signal и runtime_explain

Добрый день.
не могли бы вы привести краткую инструкцию по установке патчей? или какую-нибудь ссылку где можно найти как правильно это делать?

попробовал создать сразу новый кластер с патчами:

1 ) в папке с распакованным дистрибутивом постгреса создал директорию patches в которую положил два файла с патчами

2 )
./configure
make
su
make install

но при установке расширения получаю ошибку:

postgres@student:~/pg_query_state$ make install USE_PGXS=1
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -O2 -fPIC -I. -I./ -I/usr/local/pgsql/include/server -I/usr/local/pgsql/include/internal  -D_GNU_SOURCE   -c -o pg_query_state.o pg_query_state.c
pg_query_state.c:67:44: error: ‘INVALID_PROCSIGNAL’ undeclared here (not in a function)
   67 | static ProcSignalReason UserIdPollReason = INVALID_PROCSIGNAL;
      |                                            ^~~~~~~~~~~~~~~~~~
pg_query_state.c: In function ‘_PG_init’:
pg_query_state.c:203:21: warning: implicit declaration of function ‘RegisterCustomProcSignalHandler’ [-Wimplicit-function-declaration]
  203 |  UserIdPollReason = RegisterCustomProcSignalHandler(SendCurrentUserId);
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [<builtin>: pg_query_state.o] Error 1
postgres@student:~/pg_query_state$

Several issues observed on Postgres 14.1

Thanks for the great extension.

Tried it on PG 14.1, and had the following issues:

  1. TIMING and BUFFERS are not working – tried to set it everywhere, on both sides, globally – doesn't matter, I don't get those numbers in the pg_query_state output
  2. leader_pid is not working as described in README. Maybe there is something that has changed in recent postgres versions (when leader_pid appeared in pg_stat_activity?), but for parallelized queries, pg_query_state always returns a single entry for me -- parallel workers are not displayed. I ended up using this workaround:
     select pid, leader_pid, (select plan from pg_query_state(pid))
     from pg_stat_activity
     where :PID in (pid, leader_pid)  \watch .5

Segfault with TPC-DS query

I apologize in advance for this messy bug report.

Executing the following SQL query, an instance of a TPC-DS template, occasionally causes a segmentation fault in PG10.3 when pg_query_state is called. Here is the query:

select  *
from
 (select count(*) h8_30_to_9
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk   
     and ss_hdemo_sk = household_demographics.hd_demo_sk 
     and ss_store_sk = s_store_sk
     and time_dim.t_hour = 8
     and time_dim.t_minute >= 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2)) 
     and store.s_store_name = 'ese') s1,
 (select count(*) h9_to_9_30 
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk 
     and time_dim.t_hour = 9 
     and time_dim.t_minute < 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s2,
 (select count(*) h9_30_to_10 
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk
     and time_dim.t_hour = 9
     and time_dim.t_minute >= 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s3,
 (select count(*) h10_to_10_30
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk
     and time_dim.t_hour = 10 
     and time_dim.t_minute < 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s4,
 (select count(*) h10_30_to_11
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk
     and time_dim.t_hour = 10 
     and time_dim.t_minute >= 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s5,
 (select count(*) h11_to_11_30
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk 
     and time_dim.t_hour = 11
     and time_dim.t_minute < 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s6,
 (select count(*) h11_30_to_12
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk
     and time_dim.t_hour = 11
     and time_dim.t_minute >= 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s7,
 (select count(*) h12_to_12_30
 from store_sales, household_demographics , time_dim, store
 where ss_sold_time_sk = time_dim.t_time_sk
     and ss_hdemo_sk = household_demographics.hd_demo_sk
     and ss_store_sk = s_store_sk
     and time_dim.t_hour = 12
     and time_dim.t_minute < 30
     and ((household_demographics.hd_dep_count = 0 and household_demographics.hd_vehicle_count<=0+2) or
          (household_demographics.hd_dep_count = 2 and household_demographics.hd_vehicle_count<=2+2) or
          (household_demographics.hd_dep_count = 4 and household_demographics.hd_vehicle_count<=4+2))
     and store.s_store_name = 'ese') s8
;

I apologize for the length... I am not sure what elements of the query causes the failure.

Here are the steps to reproduce:

  1. Download PostgresQL 10.3
  2. Download the PG10 branch
  3. Patch and build:
cd postgresql-10.3
patch -p1 < /home/postgres/pg_query_state/custom_signals.patch
patch -p1 < /home/postgres/pg_query_state/runtime_explain.patch
./configure --prefix=/home/postgres/local/
make -j 2

make install
 
export PATH=$PATH:/home/postgres/local/bin
 
cd /home/postgres/pg_query_state
make install USE_PGXS=1
  1. Enable the extension:
# wherever your postgresql.conf is...
echo "shared_preload_libraries = 'pg_query_state'">> /media/data/pg/postgresql.conf
  1. Start the DB and load TPC-DS data (scale factor of 1GB will work)
  2. Start a PSQL session, get the backend PID. Use this session to execute the above query over and over again
  3. In another PSQL section, run select * from pg_query_state($PID); while the query is running. After 1-5 tries, Postgres has a segfault.

Is there a way I can build PG to provide more useful info about what is going on? Are there log files I can provide?

what's the use of n_peers

Hello, pg_query_state is really a cool tool and I'm recently learning through the code. I noticed the process needs to check if n_peers is zero or wait a timeout before doing the actual work. It seems n_peers is used to protect something. Could anyone possibly explain what's the use of n_peers? Thanks for your time.

postgresql crash when call ```pg_query_state($pid,true,true,true,true,true)``` very freq.

HI

when i set

pg_query_state.enable = on
pg_query_state.enable_buffers = off
pg_query_state.enable_timing = off

and test this case :

session a

do language plpgsql                                          
$$
declare
begin
for i in 1..100 loop
  perform 1 from pg_class;
  perform pg_sleep(10);
  perform t1.relname from pg_class t1 join pg_class t2 on (t1.oid=t2.oid) where t1.oid::int8 <100000;
end loop;
end;
$$;

session b
repeat run this query in psql:
select * from pg_query_state(22617,true,true,true,true,true);
\watch 0.000001

then database crashed.

when i set

pg_query_state.enable = on
pg_query_state.enable_buffers = on
pg_query_state.enable_timing = on

and test the same case, there is also has problem when call pg_query_state() function very freq.

                                                            Thu 08 Dec 2016 11:08:38 PM CST (every 1e-06s)

  pid  | frame_number |     query_text      |                                                    plan                                                     | leader_pid 
-------+--------------+---------------------+-------------------------------------------------------------------------------------------------------------+------------
 98079 |            0 | SELECT pg_sleep(10) | Result  (cost=0.00..0.01 rows=1 width=4) (Current loop: running time=3262.896 actual rows=0, loop number=1)+|           
       |              |                     |   Output: pg_sleep('10'::double precision)                                                                  | 
(1 row)

                                                            Thu 08 Dec 2016 11:08:38 PM CST (every 1e-06s)

  pid  | frame_number |     query_text      |                                                    plan                                                     | leader_pid 
-------+--------------+---------------------+-------------------------------------------------------------------------------------------------------------+------------
 98079 |            0 | SELECT pg_sleep(10) | Result  (cost=0.00..0.01 rows=1 width=4) (Current loop: running time=3263.361 actual rows=0, loop number=1)+|           
       |              |                     |   Output: pg_sleep('10'::double precision)                                                                  | 
(1 row)

......
then will crash

best regards,
digoal

Вопрос по патчам

Добрый день,
в описании инсталяции сказано:

To install pg_query_state, please apply patches custom_signal.patch, executor_hooks.patch and runtime_explain.patch to the latest stable version of PostgreSQL and rebuild PostgreSQL.

скажите пожалуйста, указанные патчи в ванильной версии постгреса 10.2 уже есть или их как-то надо туда применить?
Можно ли безпроблемно использовать текущую версию кода с ванильным постгрессом 10.2 ?

Tests stability

There are two types of errors occurring from time to time when running docker tests.

  1. Assertion failure with basic python tests:
test when two backends compete with each other to extract state from third running backend...
ok!
test statistics under calling function...
[(9429, 0, 'select * from n_join_foo_bar()', 'Function Scan on n_join_foo_bar (Current loop: actual rows=0, loop number=1)', None)]
Traceback (most recent call last):
  File "tests/pg_qs_test_runner.py", line 129, in <module>
    main(args)
  File "tests/pg_qs_test_runner.py", line 112, in main
    test(conn_params)
  File "/pg/testdir/tests/test_cases.py", line 122, in test_nested_call
    assert len(qs) == 2 \
AssertionError
  1. And division by zero with TPC-DS:
Database is setup successfully
Starting stress test
Preparing TPC-DS queries...
Starting TPC-DS queries...
 91% (94 of 103) |##############################################################################################################################################################                | Elapsed Time: 0:01:06 ETA:   0:00:06Traceback (most recent call last):
  File "tests/pg_qs_test_runner.py", line 129, in <module>
    main(args)
  File "tests/pg_qs_test_runner.py", line 98, in main
    tpcds.run_tpcds(conn_params)
  File "/pg/testdir/tests/tpcds.py", line 107, in run_tpcds
    common.wait(acon)
  File "/pg/testdir/tests/common.py", line 17, in wait
    state = conn.poll()
psycopg2.errors.DivisionByZero: division by zero

100% (103 of 103) |#############################################################################################################################################################################| Elapsed Time: 0:01:07 Time:  0:01:07

First one looks like a real issue to fix or at least we have to correct our expectations.

Second one looks like a problem with test infrastructure and does not actually lead to a test failure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.