Giter Site home page Giter Site logo

tritondatacenter / pg_prefaulter Goto Github PK

View Code? Open in Web Editor NEW
55.0 46.0 13.0 7.88 MB

Faults pages into PostgreSQL shared_buffers or filesystem caches in advance of WAL apply

License: Apache License 2.0

Makefile 6.78% Go 92.01% Shell 0.38% DTrace 0.83%

pg_prefaulter's Issues

Stream WAL information to a sidecar pg_prefaulter process via a bgworker...

It would be clever to hook in to PostgreSQL as a background worker. There are actually two candidate paths for this enhancement in the future:

  1. stream the WAL information over an RPC socket to a sidecar pg_prefaulter process
  2. outright convert pg_prefaulter into a background worker

The former has benefits because you could stop and upgrade the sidecar process without restarting PostgreSQL. The latter is more inline with the traditional monolithic C mindset. Given the low volume of RPCs, the former is probably preferred over the latter but additional investigation is required before making a determination.

Inspiration taken from: https://github.com/prest/bgworker

panic: bad, open vs close count not the same after purge

During a recent incident, the prefaulter panicked on more than 200 PostgreSQL instances, with the following set of log messages:

19:01:00.598Z ERROR pg_prefaulter: unable to find WAL files (next step=retrying)
    error: unable to query PostgreSQL checkpoint information: write unix ->/tmp/.s.PGSQL.5432: write: broken pipe: (retriable: true, purge cache: true
19:01:01.599Z LVL70 pg_prefaulter: bad, open vs close count not the same after purge (close-count=769548809, open-count=981315695)
panic: bad, open vs close count not the same after purge

Core files for the failures are tagged with this ticket in thoth.

Pre-fault in the entire WAL file for the pg_last_xlog_receive_location() segment...

pg_prefaulter should pre-fault in the entire WAL segment referenced by pg_last_xlog_receive_location() and the necessary segments as specified by the read ahead.

It is conceivable that the WAL receiver could be blocked on a RMW cycle when using a page size that is different than PostgreSQL's 8K pages. Warming the filesystem cache in advance of the receiver writing a WAL record would prevent the synchronous read(2) call from blocking replication when synchronous_commit=remote_write is being used. This problem will likely only manifest itself after PostgreSQL has begun recycling its WAL files.

close all files when PostgreSQL is not running

When PostgreSQL is not running, the prefaulter appears to retain a large number of open files within the PostgreSQL data directory. So that we can cleanly unmount and remount the data file system in Manatee, the prefaulter needs to close these descriptors promptly, and avoid opening any new ones, until it is determined that PostgreSQL is running again. It would also be good to make sure the current working directory of the prefaulter process never resides within the PostgreSQL data directory.

Any updates?

Hey guys,

I've been wondering why this project went totally left out. I found it pretty interesting but I was eager to know the reason of this inactivity. Is it just because of loss of interest of contributors or is it somehow proven to be harmful to use?

I would be glad to hear from you.
Thanks.

Is this project still being maintained?

I am considering using this tool to alleviate lag on slaves due to heavy primary activity, but I see many issues created 3 years ago with no responses and the last update was about a year and a half ago.

Can I get a response, please, to let me know about the future of this project?

Add a SIGINFO handler...

Change the SIGINFO handler to dump stats (vs show the stack). The current behavior dumps the stack of the process, which is useful, but not necessarily desired.

logs should be in the bunyan format

Though at least some log messages are presently formatted as linefeed-delimited JSON records, the content of the messages is not quite right. The log records must contain at least the Bunyan core fields in order for the bunyan tools to parse them correctly.

Add status command

Add a generic status command that shows:

  • how far behind a given followers are when connected to the primary
  • how far behind a given follower is when connected to the follower
  • dumps various cache stats

prefaulter must start up even if database is not available

It would appear that the prefaulter does not start up correctly if the database is not available initially. When run as an SMF service, it fails fast this way a few times and then the service goes into maintenance:

...
unable to start agent: unable to initialize db connection pool: unable to create a new DB connection pool: dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory
[ Sep 13 03:18:10 Stopping because all processes in service exited. ]
[ Sep 13 03:18:10 Executing stop method (:kill). ]
[ Sep 13 03:18:10 Executing start method ("/opt/pg_prefaulter/pg_prefaulter run --config=/opt/pg_prefaulter/pg_prefaulter.toml &"). ]
[ Sep 13 03:18:10 Method "start" exited with status 0. ]
{"time":"2017-09-13T03:18:10.84142588Z","level":"debug","config-file":"/opt/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-09-13T03:18:10.841509168Z","level":"debug","message":"args: []"}
{"time":"2017-09-13T03:18:10.841778917Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-09-13T03:18:10.841794234Z","level":"debug","postgresql.pgdata":"/manatee/pg/data","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/current/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-09-13T03:18:10.842199269Z","level":"info","message":"Starting pg_prefaulter"}
{"time":"2017-09-13T03:18:10.842205727Z","level":"info","message":"Stopped pg_prefaulter"}
Error: unable to start agent: unable to initialize db connection pool: unable to create a new DB connection pool: dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory
Usage:
  pg_prefaulter run [flags]

Flags:
  -h, --help                   help for run
  -m, --mode string            Mode of operation of the database: "auto", "primary", "follower" (default "auto")
  -N, --num-io-threads uint    Number of IO threads to spawn for IOs
  -i, --poll-interval string   Interval to poll the database for state change (default "1s")
  -n, --wal-readahead uint     Number of WAL entries to perform read-ahead into (default 4)
  -X, --xlog-mode string       pg_xlogdump(1) variant: "xlog" or "pg" (default "pg")
  -x, --xlogdump-bin string    Path to pg_xlogdump(1) (default "/usr/local/bin/pg_xlogdump")

Global Flags:
  -a, --circonus-api-key string                         Circonus API token
      --circonus-api-url string                         Circonus API URL (default "https://api.circonus.com/v2")
      --circonus-broker-id string                       Circonus Broker ID
      --circonus-broker-max-response-time string        Circonus Broker Max Response Time (default "500ms")
      --circonus-broker-select-tag string               Circonus Broker Select Tag
      --circonus-check-display-name string              Circonus Check Display Name (default "pg_prefaulter")
      --circonus-check-force-metric-activation string   Circonus Check Force Metric Activation (default "false")
      --circonus-check-id string                        Circonus Check ID
      --circonus-check-instance-id string               Circonus Check Instance ID (default "4042552b-5fe0-45c4-9df4-f21eca88464d:pg_prefaulter")
      --circonus-check-max-url-age string               Circonus Check Max URL Age (default "5m")
      --circonus-check-search-tag string                Circonus Check Search Tag (default "app:pg_prefaulter,host:4042552b-5fe0-45c4-9df4-f21eca88464d")
      --circonus-check-secret string                    Circonus Check Secret
      --circonus-check-tags string                      Circonus Check Tags (default "app:pg_prefaulter")
      --circonus-check-target-host string               Circonus Check Target Host (default "4042552b-5fe0-45c4-9df4-f21eca88464d")
      --circonus-debug                                  Enable Circonus Debug
      --circonus-enable-metrics                         Enable Circonus metrics (default true)
      --circonus-submission-url string                  Circonus Check Submission URL
      --config string                                   config file (default "pg_prefaulter.toml")
  -d, --database string                                 postgres (default "postgres")
      --enable-agent                                    Enable the gops(1) agent interface (default true)
  -H, --hostname string                                 Hostname to connect to PostgreSQL (default "/tmp")
  -l, --log-level string                                Log level (default "INFO")
  -D, --pgdata string                                   Path to PGDATA (default "pgdata")
  -p, --port uint                                       Hostname to connect to PostgreSQL (default 5432)
  -U, --username string                                 Username to connect to PostgreSQL (default "postgres")

unable to start agent: unable to initialize db connection pool: unable to create a new DB connection pool: dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory
[ Sep 13 03:18:10 Stopping because all processes in service exited. ]
[ Sep 13 03:18:10 Executing stop method (:kill). ]
[ Sep 13 03:18:10 Restarting too quickly, changing state to maintenance. ]

Once the database is running, I can clear the maintenance status and the service appears to start correctly:

[ Sep 13 03:20:11 Leaving maintenance because clear requested. ]
[ Sep 13 03:20:11 Enabled. ]
[ Sep 13 03:20:11 Executing start method ("/opt/pg_prefaulter/pg_prefaulter run --config=/opt/pg_prefaulter/pg_prefaulter.toml &"). ]
[ Sep 13 03:20:11 Method "start" exited with status 0. ]
{"time":"2017-09-13T03:20:11.206137022Z","level":"debug","config-file":"/opt/pg_prefaulter/pg_prefaulter.toml"}
{"time":"2017-09-13T03:20:11.20630263Z","level":"debug","message":"args: []"}
{"time":"2017-09-13T03:20:11.20648666Z","level":"debug","message":"starting gops(1) agent"}
{"time":"2017-09-13T03:20:11.20670297Z","level":"debug","postgresql.pgdata":"/manatee/pg/data","postgresql.host":"/tmp","postgresql.port":5432,"postgresql.user":"postgres","postgresql.xlog.mode":"pg","postgresql.xlog.pg_xlogdump-path":"/opt/postgresql/current/bin/pg_xlogdump","postgresql.poll-interval":1000,"message":"flags"}
{"time":"2017-09-13T03:20:11.207273131Z","level":"info","message":"Starting pg_prefaulter"}
{"time":"2017-09-13T03:20:11.226491329Z","level":"debug","backend-pid":63146,"version":"PostgreSQL 9.6.4 on i386-pc-solaris2.11, compiled by gcc (GCC) 4.9.3, 64-bit","message":"established DB connection"}
{"time":"2017-09-13T03:20:11.226914389Z","level":"debug","rlimit-nofile":65536,"filehandle-cache-size":65486,"filehandle-cache-ttl":3600000,"message":"filehandle cache initialized"}
{"time":"2017-09-13T03:20:11.243983658Z","level":"info","io-worker-threads":1000,"message":"started IO worker threads"}
{"time":"2017-09-13T03:20:11.244136082Z","level":"debug","message":"Starting wait"}
{"time":"2017-09-13T03:20:11.244148015Z","level":"debug","wal-worker-thread-id":2,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244151999Z","level":"debug","wal-worker-thread-id":1,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244147788Z","level":"debug","wal-worker-thread-id":3,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244208579Z","level":"debug","wal-worker-thread-id":0,"message":"starting WAL worker thread"}
{"time":"2017-09-13T03:20:11.244208705Z","level":"debug","message":"Starting pg_prefaulter agent"}
{"time":"2017-09-13T03:21:11.227143486Z","level":"debug","hit":23176,"miss":646,"lookup":23822,"hit-rate":0.9728822097221056,"message":"filehandle-stats"}
{"time":"2017-09-13T03:21:11.24432601Z","level":"debug","hit":1,"miss":2,"lookup":3,"hit-rate":0.3333333333333333,"message":"walcache-stats"}
{"time":"2017-09-13T03:21:11.244413444Z","level":"debug","hit":8178,"miss":41757,"lookup":49935,"hit-rate":0.16377290477620907,"message":"iocache-stats"}
{"time":"2017-09-13T03:22:11.228855455Z","level":"debug","hit":113623,"miss":734,"lookup":114357,"hit-rate":0.9935815035371687,"message":"filehandle-stats"}
{"time":"2017-09-13T03:22:11.244911682Z","level":"debug","hit":3,"miss":4,"lookup":7,"hit-rate":0.42857142857142855,"message":"walcache-stats"}
{"time":"2017-09-13T03:22:11.244947642Z","level":"debug","hit":57646,"miss":191404,"lookup":249050,"hit-rate":0.23146356153382855,"message":"iocache-stats"}
{"time":"2017-09-13T03:23:11.229500586Z","level":"debug","hit":199725,"miss":754,"lookup":200479,"hit-rate":0.9962390075768535,"message":"filehandle-stats"}
{"time":"2017-09-13T03:23:11.245253655Z","level":"debug","hit":6,"miss":8,"lookup":14,"hit-rate":0.42857142857142855,"message":"walcache-stats"}
{"time":"2017-09-13T03:23:11.246306122Z","level":"debug","hit":128811,"miss":331991,"lookup":460802,"hit-rate":0.279536547150403,"message":"iocache-stats"}
{"time":"2017-09-13T03:24:11.230050485Z","level":"debug","hit":240793,"miss":759,"lookup":241552,"hit-rate":0.9968578194343247,"message":"filehandle-stats"}
{"time":"2017-09-13T03:24:11.245531156Z","level":"debug","hit":8,"miss":10,"lookup":18,"hit-rate":0.4444444444444444,"message":"walcache-stats"}
{"time":"2017-09-13T03:24:11.246573786Z","level":"debug","hit":165312,"miss":400260,"lookup":565572,"hit-rate":0.29229169760879253,"message":"iocache-stats"}
{"time":"2017-09-13T03:25:11.230626347Z","level":"debug","hit":282746,"miss":766,"lookup":283512,"hit-rate":0.9972981743277181,"message":"filehandle-stats"}
{"time":"2017-09-13T03:25:11.24583358Z","level":"debug","hit":10,"miss":12,"lookup":22,"hit-rate":0.45454545454545453,"message":"walcache-stats"}
{"time":"2017-09-13T03:25:11.246880287Z","level":"debug","hit":203032,"miss":468442,"lookup":671474,"hit-rate":0.3023676270414044,"message":"iocache-stats"}
{"time":"2017-09-13T03:26:11.231247847Z","level":"debug","hit":325980,"miss":773,"lookup":326753,"hit-rate":0.9976342986904482,"message":"filehandle-stats"}
{"time":"2017-09-13T03:26:11.2461834Z","level":"debug","hit":12,"miss":14,"lookup":26,"hit-rate":0.46153846153846156,"message":"walcache-stats"}
{"time":"2017-09-13T03:26:11.247218878Z","level":"debug","hit":241664,"miss":538432,"lookup":780096,"hit-rate":0.3097875133316925,"message":"iocache-stats"}
{"time":"2017-09-13T03:27:11.231782623Z","level":"debug","hit":367433,"miss":778,"lookup":368211,"hit-rate":0.9978870810486379,"message":"filehandle-stats"}
{"time":"2017-09-13T03:27:11.246419942Z","level":"debug","hit":14,"miss":16,"lookup":30,"hit-rate":0.4666666666666667,"message":"walcache-stats"}
{"time":"2017-09-13T03:27:11.247501626Z","level":"debug","hit":278505,"miss":607739,"lookup":886244,"hit-rate":0.3142531853530179,"message":"iocache-stats"}
{"time":"2017-09-13T03:28:11.232316668Z","level":"debug","hit":430939,"miss":780,"lookup":431719,"hit-rate":0.9981932692329964,"message":"filehandle-stats"}
{"time":"2017-09-13T03:28:11.246682366Z","level":"debug","hit":18,"miss":20,"lookup":38,"hit-rate":0.47368421052631576,"message":"walcache-stats"}
{"time":"2017-09-13T03:28:11.247703691Z","level":"debug","hit":356427,"miss":744053,"lookup":1100480,"hit-rate":0.32388321459726666,"message":"iocache-stats"}
...

Reduce ARC usage of the IO cache...

While writing this I defaulted to using string slices as the arguments, which is efficient, but not necessarily ideal.

The current _IOCacheKey is too large:

type _IOCacheKey struct {
        Tablespace string
        Database   string
        Relation   string
        Block      string
}

_IOCacheKey.Tablespace string: 0-16 (size 16, align 8)
_IOCacheKey.Database string: 16-32 (size 16, align 8)
_IOCacheKey.Relation string: 32-48 (size 16, align 8)
_IOCacheKey.Block string: 48-64 (size 16, align 8)

Plus the size of byte array backing the string. It could/should be:

// type'ed uint64's
type _IOCacheKey struct {
        Tablespace uint64
        Database   uint64
        Relation   uint64
        Block      uint64
}

_IOCacheKey.Tablespace string: 0-8 (size 8, align 8)
_IOCacheKey.Database string: 8-16 (size 8, align 8)
_IOCacheKey.Relation string: 16-24 (size 8, align 8)
_IOCacheKey.Block string: 24-32 (size 8, align 8)

This change will incur a parsing overhead but we have CPU to burn in the common case and are memory limited, not CPU limited. The current CPU overhead of pg_prefaulter running on a busy follower is between 1-2% CPU, so we have CPU budget to spare.

prefaulter's child process hangs onto xlog files after database is shut down

We've found another case where the prefaulter implicitly hangs onto references to a database's filesystem after the database itself has shut down. This is a problem for Manatee because Manatee attempts to unmount and mount the filesystem as part of starting the database (itself a separate problem), but it can't unmount it while the prefaulter is holding these references. This is a similar problem as in #13, but a different way that it can happen.

In this case, there was only one process holding open a file on the database's dataset:

[root@fa0e1075 (postgres) ~]$ fuser -c /manatee/pg
/manatee/pg:     8115o

It's a pg_xlogdump process forked by the prefaulter:

[root@HA99RHND2 (eu-central-1b) ~]# ptree 8115
892   zsched
  975   /sbin/init
    7127  /opt/pg_prefaulter/pg_prefaulter run --config=/opt/pg_prefaulter/pg_prefau
      8115  /opt/postgresql/current/bin/pg_xlogdump -f /manatee/pg/data/pg_xlog/0000

It's not holding a file via its working directory:

[root@fa0e1075 (postgres) ~]$ pwdx 8115
8115:   /root

But rather it seems to be one of its open fds:

[root@HA99RHND2 (eu-central-1b) ~]# pfiles 8115
8115:   /opt/postgresql/current/bin/pg_xlogdump -f /manatee/pg/data/pg_xlog/00
  Current rlimit: 65536 file descriptors
   0: S_IFCHR mode:0666 dev:561,8 ino:3001421048 uid:0 gid:3 rdev:38,2
      O_RDONLY|O_LARGEFILE
      /zones/fa0e1075-8c65-4e2c-a062-e23013e40247/root/dev/null
      offset:0
   1: S_IFIFO mode:0000 dev:558,0 ino:29756267 uid:0 gid:0 rdev:0,0
      O_RDWR
   2: S_IFIFO mode:0000 dev:558,0 ino:29756268 uid:0 gid:0 rdev:0,0
      O_RDWR
   3: S_IFREG mode:0600 dev:90,65658 ino:65565 uid:907 gid:0 size:16777216
      O_RDONLY|O_LARGEFILE
      offset:14827520

Unfortunately we don't have the path for fd 3, but it seems almost certainly the WAL file that was given as an argument to the program:

[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# pargs core.8115 
core 'core.8115' of 8115:	/opt/postgresql/current/bin/pg_xlogdump -f /manatee/pg/data/pg_xlog/00000001000
argv[0]: /opt/postgresql/current/bin/pg_xlogdump
argv[1]: -f
argv[2]: /manatee/pg/data/pg_xlog/000000010000118600000084

The inode number reported by stat on that file matches the one that pfiles reported:

[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# stat /zones/fa0e1075-8c65-4e2c-a062-e23013e40247/root/manatee/pg/data/pg_xlog/000000010000118600000084
  File: `/zones/fa0e1075-8c65-4e2c-a062-e23013e40247/root/manatee/pg/data/pg_xlog/000000010000118600000084'
  Size: 16777216  	Blocks: 14213      IO Block: 8192   regular file
Device: 169007ah/23658618d	Inode: 65565       Links: 1
Access: (0600/-rw-------)  Uid: (  907/ UNKNOWN)   Gid: (    0/    root)
Access: 2017-10-13 16:34:10.812840776 +0000
Modify: 2017-10-13 16:34:44.373763334 +0000
Change: 2017-10-13 16:34:44.373763334 +0000
 Birth: -

The "-f" flag appears to block when it reaches the end of valid WAL data. And it does look like we're at the end:

[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# ls -l /zones/fa0e1075-8c65-4e2c-a062-e23013e40247/root/manatee/pg/data/pg_xlog/ | tail
-rw-------   1 907      root     16777216 Oct 13 16:33 00000001000011860000007D
-rw-------   1 907      root     16777216 Oct 13 16:33 00000001000011860000007E
-rw-------   1 907      root     16777216 Oct 13 16:33 00000001000011860000007F
-rw-------   1 907      root     16777216 Oct 13 16:33 000000010000118600000080
-rw-------   1 907      root     16777216 Oct 13 16:33 000000010000118600000081
-rw-------   1 907      root     16777216 Oct 13 16:33 000000010000118600000082
-rw-------   1 907      root     16777216 Oct 13 16:34 000000010000118600000083
-rw-------   1 907      root     16777216 Oct 13 16:34 000000010000118600000084
-rw-------   1 907      root     16777216 Oct 13 05:46 000000010000118600000085
drwx------   2 907      907        20002 Oct 13 16:35 archive_status

There's one more file, but it has a much older mtime -- it looks like that might be a recycled file. It's not valid at all:

[root@fa0e1075 (postgres) /manatee/pg/data/pg_xlog]$ pg_xlogdump 000000010000118600000085 
pg_xlogdump: FATAL:  could not find a valid record after 1186/85000000

and xxd shows that file "84" ends in all zeros, which suggests it's all invalid:

[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# xxd -a 000000010000118600000084 | tail
...
0e22000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0fffff0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Offset e22000 is 14819328, just 8K behind where the pg_xlogdump process was in the pfiles output. And pg_xlogdump is in a loop sleeping from main():

[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# pstack 8115
8115:	/opt/postgresql/current/bin/pg_xlogdump -f /manatee/pg/data/pg_xlog/00
 ffffbf7fff25532a pollsys  (ffffbf7fffdfd6a0, 0, ffffbf7fffdfd790, 0)
 ffffbf7fff1ea27b pselect (0, 0, 0, 0, ffffbf7fffdfd790, 0) + 26b
 ffffbf7fff1ea62a select (0, 0, 0, 0, ffffbf7fffdfd7f0) + 5a
 00000000004102cb pg_usleep () + 92
 000000000040853c main () + a5e
 000000000040603c _start () + 6c
[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# truss -p 8115
pollsys(0xFFFFBF7FFFDFD6A0, 0, 0xFFFFBF7FFFDFD790, 0x00000000) = 0
lseek(3, 0, SEEK_SET)				= 0
read(3, "93D007\001\0\0\0\0\0\084".., 8192)	= 8192
lseek(3, 14811136, SEEK_SET)			= 14811136
read(3, "93D005\001\0\0\0\0\0E284".., 8192)	= 8192
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0".., 8192)	= 8192

which is clearly what it does when it reaches the end of the WAL file:

1007         for (;;)
1008         {
1009                 /* try to read the next record */
1010                 record = XLogReadRecord(xlogreader_state, first_record, &errormsg);
1011                 if (!record)
1012                 {
1013                         if (!config.follow || private.endptr_reached)
1014                                 break;
1015                         else
1016                         {
1017                                 pg_usleep(1000000L);    /* 1 second */
1018                                 continue;
1019                         }
1020                 }

On timing, this process started at 16:34:11:

[root@HA99RHND2 (eu-central-1b) /var/tmp/dap]# ps -opid,stime,args  -p 8115
  PID    STIME COMMAND
 8115 16:34:11 /opt/postgresql/current/bin/pg_xlogdump -f /manatee/pg/data/pg_xlog/00000001000

and was still running until we killed it well after 16:58Z. The manatee-sitter log shows that it was at 16:40 that we brought down postgres, and shortly after that that we started it again.

I have saved:

  • core file, pfiles output, and the raw binary from pid 8115 (pg_xlogdump)
  • core file, pfiles output, and the raw binary from pid 7127 (pg_prefaulter)
  • the manatee-sitter log showing a timeline of events leading up to this.
  • the WAL file itself, 000000010000118600000084

I will tar all of these up and upload them to "/thoth/stor/tickets/pg_prefaulter#40" in the JPC Manta.

Getting error while starting pg_prefaulter.

Getting error while starting pg_prefaulter.
Error: unable to start agnet: unable to create a stats agent: invalid check manager configuration
I tried to disable Circonus but still no luck.

Any pointers would be greatly appreciated.
Thanks
J

Commits for pg10+ change functional behaviour in a possibly unintended manner.

In this commit, a translation layer is added to enable pg_prefaulter to work with postgres 10+.

However, the old SQL query

SELECT timeline_id, redo_location, pg_last_xlog_replay_location() FROM pg_control_checkpoint()

is translated as

SELECT timeline_id, redo_lsn, pg_last_wal_receive_lsn() FROM pg_control_checkpoint()

This seems to have the effect of causing the code to attempt to prefault files just ahead of the most-recently-received WAL. The old behaviour appears to have been to prefault just ahead of the most-recently-replayed WAL. I am unsure whether this change in functionality was intended, but the old behaviour does seem more logical to me.

To revert to the old behaviour, change pg_last_wal_receive_lsn() to pg_last_wal_replay_lsn().

program handles SIGTERM but does not shut down

The prefaulter was running under SMF. I disabled the service with the usual svcadm disable pg_prefaulter, but the usual stop method timed out. It appears that the software does not correctly shut down on SIGTERM. Log entries from the failure, including SMF log messages:

[ Sep 13 03:16:25 Stopping because service disabled. ]
[ Sep 13 03:16:25 Executing stop method (:kill). ]
{"time":"2017-09-13T03:16:25.340818386Z","level":"info","signal":"terminated","message":"Received signal"}
{"time":"2017-09-13T03:16:25.341335305Z","level":"debug","message":"Shutting down"}
{"time":"2017-09-13T03:16:26.167822291Z","level":"error","error":"unable to execute primary check: dial unix /tmp/.s.PGSQL.5432: connect: connection refused","message":"unable to determine if database is primary or not, retrying"}
[ Sep 13 03:17:25 Method or service exit timed out.  Killing contract 1830449. ]
[ Sep 13 03:17:26 Method or service exit timed out.  Killing contract 1830449. ]
[ Sep 13 03:17:27 Method or service exit timed out.  Killing contract 1830449. ]
[ Sep 13 03:17:28 Method or service exit timed out.  Killing contract 1830449. ]
[ Sep 13 03:17:29 Method or service exit timed out.  Killing contract 1830449. ]

Note that the failure to connect to the database is expected, as this was during zone shutdown. The manatee-sitter service (and thus PostgreSQL) were already offline at the time.

expose statistics through the Prometheus protocol

As part of the broader effort to provide statistics and monitoring for Manta components, we are moving towards exposing a Prometheus client in each software component. The rationale and some details about the plan for this are described in RFD 99.

If this was a Node program, I'd say we would use artedi. I'm not sure what the Prometheus client library landscape is like in the Go ecosystem, but I have to imagine it's pretty reasonable as Prometheus itself is written in Go!

The listen port for Prometheus connections should be configurable.

Scrape the WAL file from the process args when PostgreSQL is starting up...

When PostgreSQL is starting up, it's not possible to connect to PG to query the WAL file that it's replaying. Fortunately this information is available in the process title. Unfortunately this is very OS-specific. Doubly unfortunate, this is often very important in recovery situations when PG is busy applying WAL files which means it's absolutely worth while to do something OS-specific and dirty (read: scraping information from the OS to figure out what WAL file we need to interrogate).

may miss some pages depending on xlog-mode

There are two regular expressions used to parse the output of either pg_xlogdump or https://github.com/snaga/xlogdump.

https://github.com/joyent/pg_prefaulter/blob/master/agent/walcache/cache.go#L50
and
https://github.com/joyent/pg_prefaulter/blob/master/agent/walcache/cache.go#L56

When operating in xlog-mode (ie using https://github.com/snaga/xlogdump), the output for different operations is different; and the regular expression may not match all operations.

HEAP_INSERT https://github.com/snaga/xlogdump/blob/master/xlogdump_rmgr.c#L726
HEAP_DELETE https://github.com/snaga/xlogdump/blob/master/xlogdump_rmgr.c#L769

pg_xlogdump-mode is the default; and that mode does not appear to have this problem.

Dump caches on signal...

On a running process, when it receives a signal (probably a SIGHUP or SIGUSR2), purge all of the caches.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.