pgmemento / pgmemento Goto Github PK

Audit trail with schema versioning for PostgreSQL using transaction-based logging

License: GNU Lesser General Public License v3.0

PLpgSQL 97.98% Shell 1.84% Makefile 0.18%

postgresql versioning audit-trail plpgsql transaction-log jsonb

pgmemento's Introduction

pgMemento

pgMemento provides an audit trail for your data inside a PostgreSQL database using triggers and server-side functions written in PL/pgSQL. It also tracks DDL changes to enable schema versioning and offers powerful algorithms to restore or repair past revisions.

Only deltas of changes are logged and they are stored in one data log table as JSONB. Transaction and table event metadata is recorded in separate log tables making it easier to browse through past write operations.

Quickstart

To use pgMemento as an extension download the pgmemento-.zip archive either from the release on GitHub or PGXN, change to extracted folder and run the following commands from within a shell environment:

make
sudo make install

Then, connect to your database where you want to use auditing and run:

CREATE EXTENSION pgmemento;

Alternatively, you can also add pgMemento to your database the classic way by running the INSTALL_PGMEMENTO.sql script, which simply executes all SQL files in the right order and updates the search path.

psql -h localhost -p 5432 -U my_user -d my_database -f INSTALL_PGMEMENTO.sql

All of pgMemento's log tables and functions are created in a separate database schema called pgmemento. Auditing can be started per schema with the init command, e.g. for the public schema it could be:

SELECT pgmemento.init('public');

A new column pgmemento_audit_id is added to every table in the given schema to trace different row versions in the central data log - the pgmemento.row_log table. Each write transaction is also logged in the pgmemento.transaction_log which can consist of multiple table events stored in the pgmemento.table_event_log table. The event_key column in the latter links to the entries in the data log.

Schema versioning takes place in the pgmemento.audit_table_log and pgmemento.audit_column_log. Transaction ranges show the life time of an audited table and its columns. Changing the table schema, e.g. altering data types or dropping entire columns will produce data logs as well. Auditing can also be started automatically for newly created tables in schemas where pgMemento has been initialized. This is tracked (and configurable) in the pgmemento.audit_schema_log.

System requirements

PostgreSQL 9.5+
PL/pgSQL language

Documentation

Documentation can be found in the wiki section of this repository.

License

The scripts for pgMemento are open source under GNU Lesser General Public License Version 3.0. See the file LICENSE for more details.

Disclaimer

pgMemento IS PROVIDED "AS IS" AND "WITH ALL FAULTS." I MAKE NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE QUALITY, SAFETY OR SUITABILITY OF THE SKRIPTS, EITHER EXPRESSED OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

IN NO EVENT WILL I BE LIABLE FOR ANY INDIRECT, PUNITIVE, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES HOWEVER THEY MAY ARISE AND EVEN IF I HAVE BEEN PREVIOUSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. STILL, I WOULD FEEL SORRY.

pgmemento's People

Contributors

Stargazers

Watchers

Forkers

neogis-de mortbauer cuulee jonasheinrich myabc denishpatel inevity tommasop jamsheerali-pm wenzowski glw ugurlu2001 andreiberezin c0d3x42 sicaps francoricci sindile wahello kh0r 7u4 coutopl oussemadaoud thbaumann dnsl48 raiviskrumins rekgrpth nextstopsun efremropelato paulrizzo apollusehs-devops gilbertobotaro ol-teuto fduartef masch712 carlosjordandev fizure datadevopscloud tnahpele axinom-test bakkerthehacker alexgotardi breezedb

pgmemento's Issues

Possible issue with restore_recordset

I am a little confused - when I run this query I get the correct state of the data before transaction 32 - i.e. grade:1 and login_count:3

SELECT * FROM pgmemento.restore_record(1,32,'user','public', 1048, TRUE) AS (log JSONB);

However, when I run this query:

SELECT * FROM pgmemento.restore_recordset(1,32,'user','public',TRUE) AS (log JSONB);

I don't get any rows. I would have thought that restore_recordset was a superset of restore_record, or have I misunderstood?

row_log:
1247 32 1048 "{"grade": 1, "login_count": 3}"

Restore previous table state also as MATERIALIZED VIEWs

Had it in the beginning. Dropped it then. But should not be hard to bring it back. Question is what literal shall be used for target_table_type parameter?

MATERIALIZED VIEW - propably good to integrated in scripts
MVIEW or MATVIEW - shorter and faster to write, when executed manually

Maybe both forms should be supported.

pgmemento.init generates invalid session_info

The session object stored in pgmemento.transaction_log is not a valid object after calling pgmemento.init(). The session is there, but it is stored as a string rather than an object:

  id  | txid |           txid_time           | process_id | user_name |  client_name  | client_port | application_name |                                                                                                                                            session_info                                                                                                                                             
------+------+-------------------------------+------------+-----------+---------------+-------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1 | 1112 | 2020-08-20 15:05:16.323992+00 |         74 | user      | 172.26.0.1/32 |       40024 |                  | "{\"pgmemento_init\": {\"schema_name\": '\"public\"', \"default_audit_id_column\": '\"pgmemento_audit_id\"', \"default_log_old_data\": 'true', \"default_log_new_data\": 'true', \"log_state\": 'true', \"trigger_create_table\": 'true', \"except_tables\": '[\"SequelizeMeta\", \"Sessions\"]'}}"
...
  296 | 1553 | 2020-08-20 15:14:59.944585+00 |         80 | user      | 172.27.0.1/32 |       34734 |                  | {"message": "Added test users", "client_user": "seed"}

Setting the session_info for further changes does work correctly, it is just the init that creates a string instead of an object.
The test for this is also broken, as it does attempt to check for a pgmemento_init session_info but doesnt actually work correctly.

ASSERT (
    SELECT EXISTS (
      SELECT session_info ? 'pgmemento_init'
        FROM pgmemento.transaction_log
       WHERE id = current_setting('pgmemento.' || txid_current())::numeric
    )
  ), 'Error: Could not find entry in transaction_log for stopping audit trail in schema %!', tab_schema;

The SELECT EXISTS just hides the false value returned from SELECT session_info ? 'pgmemento_init'.

support logging DDL events that contain comments

The context of ALTER TABLE events is retrieved with GET DIAGNOSTICS stack = PG_CONTEXT; and then parsed character by character for the table name and then word by word for column names.

So far, whenever comment symbols such as -- and /* ... */ are found in the context and error is thrown, because these comments could contain SQL commands which would the be parsed as well. But there should be a way to handle these comments.

Context could besplit by lines and everything starting from -- could be cut out
/*...*/ can be handled later in the word wise parsing stage

Error when dropping newly created table with double quotes

Environment

Postgres 11.1
PgMemento scripts executed from latest commit available at the moment of reporting this issue:
cc0f1ff

Expected Behaviour

Dropping table without error

Current Behaviour

After creating table and dropping it an error occurs:

create table "scheduled_messages" ("id" BIGSERIAL NOT NULL PRIMARY KEY);
drop table "scheduled_messages";

sql> create table "scheduled_messages" ("id" BIGSERIAL NOT NULL PRIMARY KEY)
[2019-01-30 13:42:07] completed in 30 ms
sql> drop table "scheduled_messages"
[2019-01-30 13:42:10] [42704] ERROR: unrecognized configuration parameter "pgmemento.688"
[2019-01-30 13:42:10] Where: SQL statement "UPDATE
[2019-01-30 13:42:10] pgmemento.audit_table_log
[2019-01-30 13:42:10] SET
[2019-01-30 13:42:10] txid_range = numrange(lower(txid_range), current_setting('pgmemento.' || txid_current())::numeric, '(]')
[2019-01-30 13:42:10] WHERE
[2019-01-30 13:42:10] table_name = $1
[2019-01-30 13:42:10] AND schema_name = $2
[2019-01-30 13:42:10] AND upper(txid_range) IS NULL
[2019-01-30 13:42:10] AND lower(txid_range) IS NOT NULL
[2019-01-30 13:42:10] RETURNING
[2019-01-30 13:42:10] id"
[2019-01-30 13:42:10] PL/pgSQL function unregister_audit_table(text,text) line 6 at SQL statement
[2019-01-30 13:42:10] SQL statement "SELECT pgmemento.unregister_audit_table(obj.object_name, obj.schema_name)"
[2019-01-30 13:42:10] PL/pgSQL function table_drop_post_trigger() line 11 at PERFORM

Could anyone confirm the same behaviour in their environment?

errors when checking for data types for ADD COLUMN events

TL;DR: the to_regtype method in the event trigger procedure table_alter_pre_trigger throws either syntax errors or doesn't catch data type with spaces (= no log)

Details
The event trigger procedure table_alter_pre_trigger tries to parse different kinds of DDL statements that modifiy columns. The general idea is to split the DDL text by spaces into an array. Whenever the keywords ADD, ALTER or DROP are found within a loop over the array it is expected to find the column name in the next iteration or the one after the next. For ALTER and DROP events the extracted columnname can be check againt the audit_column_log table.

For ADD COLUMN events this is not possible as the column does not exist yet. Therefore, the current assumption is to look out for a data type in the next iteration. I thought, this could easily be checked with Postgres' to_regtype function. If the current text input resembles a data type to_regtype returns the name, else NULL. Unfortunately, splitting up the DDL text can produce input that cause syntax errors, as reported by @coutoPL in his PR #11.

What's even worse it that data types with spaces do not produce the error but can bypass the logging. For example a data type like double precision results in two iterations with double and precision. to_regtype('double') returns zero, so the ADD COLUMN event is not logged. There would also be no entry in the audit_column_log table.

Feature request: start auditing on some columns of table

Hi,

this is a feature request: I would like to start auditing for some columns of table.

For instance:

SELECT pgmemento.create_table_audit(
  tablename := 'contacts,
  schemaname := 'public',
  audit_id_column_name := 'audit_trail_id',
  log_old_data := TRUE,
  log_new_data := TRUE,
  log_state := TRUE,
  include_columns := ['firstname', 'lastname']
);

SELECT pgmemento.create_table_audit(
  tablename := 'contacts,
  schemaname := 'public',
  audit_id_column_name := 'audit_trail_id',
  log_old_data := TRUE,
  log_new_data := TRUE,
  log_state := TRUE,
  exclude_columns := ['foobar']
);

Best regards,
Stéphane

Logging multiple table events of the same type

Idea: Add a timestamp column to table_event_log and feed it with statement_timestamp(). This way multiple events of the same type can be logged per transaction. Important question is: Can INSERTs into row_log be referenced correctly and fast enough (statememt_timestamp needs to be the same in BEFORE and AFTER phase!). Previous attemps to use transaction variables also for table events decreased the logging performance. So, this might not be an option then.

A change in the event logging would also affect the code in DDL_LOG, REVERT and SETUP where logging only one event per transaction is used as a feature (by calling log_table_event manually).

RFC | Make audit_id less invasive

Description

pgMemento v0.6 implements row level tracking by adding a field named audit_id to tables.

This RFC proposes to improve the audit_id implementation to make it less invasive.

Problem statement

audit_id is a quite common name for a table column, which may conflict with native table data structure.
Currently, pgMemento does not provide any means to configure or change that behaviour.

Customization of the column name might seem to be too complicated and even over-engineering. However, it could be done some time in the future without introducing breaking changes to the module.

On the other hand, picking a more unique name for the column introduces somewhat breaking changes, whereas it is quite easy to implement without much complexity and effort.

Proposal

I suggest we change the audit_id column name to something quite unique so that it reduces chances of conflicting with native data column names. Ideally it should contain the name of the extension as prefix.

Possible options

Option A is "pgmemento audit_id". Use space to separate prefix and column name
Option B is "pgmemento__audit_id". Use two underscores to separate prefix and column name

JSONB columns are always NULL in pgmemento.restore_record(s)

I have a JSONB column in one of my records. Edits to this column are recorded correctly in the row_log, but when I use restore_record on that record, the JSONB column is always null. Have I set something up incorrectly, or is this a bug?

I have attached a transcript of what I did to produce this:

testdb=# create table test (id SERIAL PRIMARY KEY, values JSONB NOT NULL DEFAULT '{}');
CREATE TABLE
testdb=# insert into test (values) values ('{ "test": 3 }');
INSERT 0 1
testdb=# select * from test;
 id |   values    | pgmemento_audit_id 
----+-------------+--------------------
  1 | {"test": 3} |                 53
(1 row)

testdb=# select * from pgmemento.restore_record(1, 50, 'test', 'public', 53, true) as (log JSONB);
                         log                         
-----------------------------------------------------
 {"id": 1, "values": null, "pgmemento_audit_id": 53}
(1 row)

testdb=# update test set values = '{ "test": 4 }';
UPDATE 1
testdb=# select * from pgmemento.restore_records(1, 50, 'test', 'public', 53, true) as (log JSONB);
                                                                           log                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------
 {"id": 1, "values": null, "stmt_time": "2020-07-09T14:15:34.823561-04:00", "transaction_id": 44, "table_operation": "INSERT", "pgmemento_audit_id": 53}
 {"id": 1, "values": null, "stmt_time": "2020-07-09T14:16:53.936469-04:00", "transaction_id": 45, "table_operation": "UPDATE", "pgmemento_audit_id": 53}
(2 rows)

testdb=# select * from pgmemento.row_log where audit_id = 53;
 id | audit_id |                        event_key                        |        old_data         |                          new_data                          
----+----------+---------------------------------------------------------+-------------------------+------------------------------------------------------------
 57 |       53 | 1594318534.82356;1594318534.82356;2295516;3;test;public |                         | {"id": 1, "values": {"test": 3}, "pgmemento_audit_id": 53}
 58 |       53 | 1594318613.93647;1594318613.93647;2295517;4;test;public | {"values": {"test": 3}} | {"values": {"test": 4}}

I would expect in the output that I see the full JSONB object, but the values are always null. They are correctly recorded in the row_log though.

I have installed PgMemento to my database using the provided INSTALL_PGMEMENTO.sql script, both master and v0.7 versions.

Error when uninstalling pgMemento

When uninstalling pgMemento from my PostgreSQL 10 database, using the UNINSTALL_PGMEMENTO.sql script, I get the following error message:

C:\Program Files\PostgreSQL\10\bin\pgMemento>"C:\Program Files\PostgreSQL\10\bin

\psql.exe" -h localhost -d ccdr -p 5432 -U pgadmin-mschn79534 -f "UNINSTALL_PGMEMENTO.sql"
Password for user pgadmin-mschn79534:
SET

Removing event triggers ...
drop_schema_event_trigger

Removing audit_id columns from audited tables ...
psql:UNINSTALL_PGMEMENTO.sql:40: ERROR: relation "slp20110006.d20180328_sa_birt
hs_pslk_source" does not exist
CONTEXT: SQL statement "SELECT EXISTS (
SELECT
1
FROM
pg_attribute
WHERE
attrelid = ($2 || '.' || $1)::regclass::oid
AND attname = 'audit_id'
AND attislocal = 't'
AND NOT attisdropped
)"
PL/pgSQL function drop_table_audit_id(text,text) line 4 at IF
SQL statement "SELECT pgmemento.drop_table_audit_id($1, $2)"
PL/pgSQL function drop_table_audit(text,text) line 4 at PERFORM

I have checked table pgmemento.audit_table_log and found an entry for table slp20110006.d20180328_sa_births_pslk_source with txid_range [1248,]. However, there is indeed no table d20180328_sa_births_pslk_source in schema slp20110006. This is a table name that previously existed but that has now been renamed.

Should the uninstall script cater for this eventuality of tables being deleted or renamed?

Is there any other way to safely uninstall/remove pgMemento from my PostgreSQL database?

Thanks in advance for any help.

Endless loop in table_drop_pre_trigger function

I've got an endless loop from a simple command:
DROP TABLE IF EXISTS myschema.nonexisting_table CASCADE;

Digging into a code of table_drop_pre_trigger function, I've found that this line causes trouble.
It sets fetch_next variable to TRUE and this results in endless loop.

@FxKu Can you please explain the logic behind this EXCEPTION block? Why does it set fetch_next to FALSE on invalid_name and to TRUE on undefined_table? Isn't NULL and invalid name?

Adding columns to a wide table breaks tracking

Hi all,

When we add new columns to a wide(ish) table that is tracked, we get the following issue which makes the migration fail.

2022-08-23 08:57:26.084 UTC [34] ERROR:  cannot pass more than 100 arguments to a function at character 104
2022-08-23 08:57:26.084 UTC [34] QUERY:  INSERT INTO pgmemento.row_log AS r (audit_id, event_key, new_data)
SELECT audit_trail_id, $1, jsonb_build_object('col', "col", 'col2', col2, ...) AS content

The statement itself is
2022-08-23 08:57:26.084 UTC [34] STATEMENT: ALTER TABLE "public"."table" ADD COLUMN "column" TIMESTAMP WITH TIME ZONE DEFAULT NULL;

Please let us know how to proceed with this as we might have to disable tracking in PROD due to this issue if we want to release these new columns.

Renaming multiple times in one transaction is not recorded by auditing system

When changing a table name, column name or type name, nothing is changed in either audit_table_log or audit_column_log. This prevents properly restoring data on the renamed table.

I have tried the following types of queries and not seen the updates to audit tables as I would expect:

ALTER TABLE old_table RENAME TO new_table;
ALTER TYPE old_enum RENAME TO new_enum;
ALTER TABLE table RENAME COLUMN old_column TO new_column;

Not sure if there are more queries that are being missed.

TEST.sql fails in clean database

TEST.sql fails with

TEST 1: pgMemento setup
psql:test/setup/TEST_INSTALL.sql:182: ERROR:  Error: Incorrect number of audit tables!
CONTEXT:  PL/pgSQL function inline_code_block line 16 at ASSERT

Fails because spatial_ref_sys table is automatically created in pgmemento schema when creating postgis extension.

Postgis also creates these views: 'geography_columns', 'geometry_columns', 'raster_columns', 'raster_overviews'
and a bunch of stored procedures.

ENV:
Clean Postgres 10 database
postgis-2.5

0.7 release date

Any time estimates on when 0.7 would be released?
It'd be great to have an extension instead of installation script.
Extension is way more migration friendly.

Table Partitioning

it would be nice that in a future version if it is possible to use table partitioning (https://www.postgresql.org/docs/11/ddl-partitioning.html).
what do you think about replicating stmt_date in the table_event_log and row_log tables ?

Changing a column multiple times in one transaction breaks restoring

If a column has been changed multiple times in one transaction, multiple entries are added into audit_column_log. All but one of these entries will have an empty txid_range. This kinda makes sense, there isn't a transaction that would match those intermediate columns, so the range is empty.

However, when these empty range entries are present, the restore functionality no longer works for these tables. All of the restore functions end up using pgmemento.restore_query() and that function return duplicate statements for the columns that were changed multiple times in one transaction earlier. pgmemento.get_column_list() does not duplicate the columns, so i'm thinking is something to do with the way pgmemento.restore_query() deals with ranges that is incompatible with empty ranges.

fix testrun.sh

TL;DR: test returns expected value but is not passed correctly to assertion

Final test in testrun.sh does not work properly during travis build (passing only locally).
Problem: The executed SQL file sets a temporary configuration parameter that only lives for one session. For some reason it is not known when running the DO block passed with the same psql command (with -1 set).

Logging application user information?

Maybe pgMememto can already do this, and I just didn't see it, but is there a way to specify on the client side what user is making the changes? My applications by and large all connect as the same PG user in a pool, and every query runs from that pool. So every change through the app would just log the user_name as myapp or whatever, which isn't super useful.

I could implement an updated_by or updated_by_id field on each audited table, and that would have the user that made the change in the audit log, but it's not at the top level with the connection username where it would be more convenient to access. Also, does pgMemento store a diff of the changes? If so, then if the same user updated a row multiple times, that updated_by field wouldn't be present in the diff, and make it difficult to track.

I'm also thinking about things such as logging maybe the ip address of the application user which has similar problems.

Is there any way for pgMemento to lift this information up to the transaction_log table? Maybe PG has some kind of connection context/session that could be used to set this information that pgMemento could then read in? Even if it just dumped this information into a JSONB field like transaction_log.client_session or something could be really useful.

new event_key in table log has redundant data

In line

pgMemento/ctl/UPGRADE.sql

Line 53 in 9baeba2

    
           event_key = concat_ws(';', extract(epoch from t.txid_time), extract(epoch from t.txid_time), t.txid, e.op_id)

value

extract(epoch from t.txid_time)

is repeated twice. This is unnecessary and doesn't add any uniqueness to key values.
So when you have two events of the same type in one transaction this value won't be unique.

I guess what really was meant here is

extract(epoch from e.stmt_time)

So the whole key looks like:

event_key = concat_ws(';', extract(epoch from t.txid_time), extract(epoch from e.stmt_time), t.txid, e.op_id)

@FxKu Is that true?

revert_transaction cannot revert updates with array types

When using pgmemento.revert_transaction(id) on a transaction with changes for an array column, the update to the array column fails. In this case, the table has a default value of []

NOTICE:  Could not revert UPDATE event for table public.Table: null value in column "arrayColumn" violates not-null constraint
NOTICE:  Could not revert UPDATE event for table public.Table: malformed array literal: "arrayValue"

The audit_column_log does look correct for this column.

db=# select * from pgmemento.audit_column_log where column_name = 'arrayColumn';
 id  | audit_table_id | column_name | ordinal_position |        data_type         |                     column_default                     | not_null | txid_range 
-----+----------------+----------------+------------------+--------------------------+--------------------------------------------------------+----------+------------
 214 |             26 | arrayColumn |               53 | character varying(255)[] | ARRAY[]::character varying[]::character varying(255)[] | t        | (1,)
(1 row)

Installing to a different location

Hi, i'm trying to install pgMemento as an extension.

The make install puts the file down at this location: /usr/share/pgsql/extension/pgmemento.control

However, my Postgres instance wants to look for it at this location: /usr/pgsql-11/share/extension/pgmemento.control

I copied the file there, and can successfully run SELECT * FROM pg_available_extensions WHERE name = 'pgmemento'; - However, if I now run CREATE EXTENSION pgmemento; i get the following error:
ERROR: extension "pgmemento" has no installation script nor update path for version "0.7.3"

How do i resolve this?

Any best practices for frequent bulk import?

I would like to import data frequently. About every 5 minutes.

I would like to track when the data changes, but without filling up transaction_log and table_event_log.

Any ideas?

`REFERENCES` constraints on table creation leads to duplicate creation of `pgmemento_audit_id` column

Hey there :)

I believe I have found a bug where a trigger triggers once too often or something like that.

This is something I noticed when updating the tests for postgraphile-audit-plugin to 0.7 and it also occurs on the current master.

It can be repoduced by running INSTALL_PGMEMENTO.sql followed by:

CREATE SCHEMA postgraphile_audit_plugin;

SELECT pgmemento.init(
    'postgraphile_audit_plugin', -- schema_name
    'pgmemento_audit_id', -- audit_id_column_name
    TRUE, -- log_old_data
    TRUE, -- log_new_data
    FALSE, -- log_state
    TRUE, -- trigger_create_table
    '{}' -- except_tables
  );

CREATE TABLE postgraphile_audit_plugin.users (
	id SERIAL PRIMARY KEY
);

CREATE TABLE postgraphile_audit_plugin.users_emails (
	user_id INT NOT NULL REFERENCES postgraphile_audit_plugin.users(id)
);

The second CREATE TABLE statement will fail with

psql:__tests__/setupSchema.sql:27: ERROR:  column "pgmemento_audit_id" of relation "users_emails" already exists
CONTEXT:  SQL statement "ALTER TABLE postgraphile_audit_plugin.users_emails ADD COLUMN pgmemento_audit_id BIGINT DEFAULT nextval('pgmemento.audit_id_seq'::regclass) UNIQUE NOT NULL"
PL/pgSQL function create_table_audit_id(text,text,text) line 8 at EXECUTE
SQL statement "SELECT pgmemento.create_table_audit_id($1, $2, $3)"
PL/pgSQL function create_table_audit(text,text,text,boolean,boolean,boolean) line 41 at PERFORM
SQL statement "SELECT pgmemento.create_table_audit(
          tablename,
          schemaname,
          current_default_column,
          current_log_old_data,
          current_log_new_data,
          FALSE
        )"
PL/pgSQL function table_create_post_trigger() line 42 at PERFORM

To reproduce this, you can use our test setup:

Just clone https://github.com/mayflower/postgraphile-audit-plugin and run yarn run-tests (you'll need yarn and docker installed).

Improve logging of ALTER COLUMN events

So far, an ALTER COLUMN event is only logged if the data type changes and the DDL command contains the keyword USING. The idea behind this was that it's not necessary to log whole data of the affected column if it would still work against the old data type. However, the changes inthe table layout should still be logged in the audit_column_log which is not the case, unfortunately.

The alter_table_pre_trigger function should also care for commands where

no USING is used
a user changes the column default
a user changes the nullable behaviour

If this is parsed correctly and the event is logged the necessary updates of audit_column_log will be applied by the alter_table_post_trigger procedure.

How do you track JSONB columns changes?

We store our main data inside one JSONB field. Can it track only what was changed inside jsonb?

tables and columns including quotes in their names

when creating a table like this

CREATE TABLE public."My-table"
(
  "My id" character varying(15),
...
)

triggers serveral errors:

..INIT.sql:36: ERROR: relation «public.my-table» does not exist

New table audit_schema_log

Have an audit_schema_log table to control for which schemas pgMemento has been initialized and when to handle CREATE TABLE events. With this it should also be possible to add support for CREATE TABLE AS DDL events.

Create init function

So far, pgMemento can be initialized per schema via an interactive INIT script. But, there should also be a function which one can use.

This function could also init logging for more schemas at once by passing a text array (or all schemas if array is empty). When realizing #24 init should also register schemas for logging.

Consider using transition tables to improve logging performance

Hi @FxKu. What do you think of using transition tables in the row_log triggers (or maybe other triggers as well) to improve performance? The only drawback I can think of is that it would require Postgres 10

ERROR: permission denied for table transaction_log when deleting from table.

I'm getting an error when trying to delete from a table with enabled audit.

[2019-05-30 19:40:08] [42501] ERROR: permission denied for table transaction_log
[2019-05-30 19:40:08] Where: SQL statement "INSERT INTO pgmemento.transaction_log
[2019-05-30 19:40:08] (txid, stmt_date, process_id, user_name, client_name, client_port, application_name, session_info)
[2019-05-30 19:40:08] VALUES
[2019-05-30 19:40:08] ($1, transaction_timestamp(), pg_backend_pid(), current_user, inet_client_addr(), inet_client_port(),
[2019-05-30 19:40:08] current_setting('application_name'), session_info_obj
[2019-05-30 19:40:08] )
[2019-05-30 19:40:08] ON CONFLICT (txid, stmt_date)
[2019-05-30 19:40:08] DO NOTHING
[2019-05-30 19:40:08] RETURNING id"
[2019-05-30 19:40:08] PL/pgSQL function pgmemento.log_table_event(bigint,oid,text) line 29 at SQL statement
[2019-05-30 19:40:08] SQL statement "SELECT pgmemento.log_table_event(txid_current(), TG_RELID, TG_OP)"
[2019-05-30 19:40:08] PL/pgSQL function pgmemento.log_transaction() line 3 at PERFORM
[2019-05-30 19:40:08] SQL statement "DELETE FROM ONLY "auth"."lnk_users_user_roles" WHERE $1 OPERATOR(pg_catalog.=) "user_guid""

I'm logged in and executing a query as a postgres superuser, so this looks quite strange.
I've looked into pgmemento.log_transaction() definition and found just a plain insert into transaction_log table. Nothing, that could prevent from inserting into it.

Any clues on how to debug this?

ERROR: extension "pgmemento" has no installation script nor update path for version "0.7"

Hi, I installed pgmemento (pre-compiled) in dockerfile image bitnami/postgres:12.2 like so:

FROM bitnami/postgresql:12
USER root

RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y --no-install-recommends install unzip wget make

RUN wget https://github.com/pgMemento/pgMemento/releases/download/v0.7/pgmemento-0.7.zip

RUN unzip pgmemento-0.7.zip

RUN cd pgmemento-0.7 \ 
  && make install PG_CONFIG=/opt/bitnami/postgresql/bin/pg_config

COPY ./init/ /docker-entrypoint-initdb.d/
USER 1001

When trying to install the extension, I'm getting error
SQL Error [22023]: ERROR: extension "pgmemento" has no installation script nor update path for version "0.7"

input:

SELECT * FROM pg_available_extensions WHERE name = 'pgmemento';
CREATE EXTENSION pgmemento;

output:

"name","default_version","installed_version","comment"
pgmemento,"0.7",,Audit trail with schema versioning for PostgreSQL using transaction-based logging

SQL Error [22023]: ERROR: extension "pgmemento" has no installation script nor update path for version "0.7"

restore_record should not return NULL if audit_id does not exist

Currently if one calls pgmemento.restore_record with an audit_id value that does not exist, still one row is returned only with NULL values. The expected behaviour would be that no row is returned at all.

Fix COALESCE on restore

While #71 might have fixed restoring JSONB and array values it introduced a bug when restoring NULL values.

Before: COALESCE( '{"a":null}'::jsonb->'a') = NOT NULL ->> 0 = NULL
Now: COALESCE( '{"a":null}'::jsonb->>'a') = NULL -> query recent state from table)

Instead it must be:

CASE WHEN jsonb_typeof( '{"a":null}'::jsonb->'a') = 'null'
     THEN NULL::columntype
     ELSE COALESCE( '{"a":null}'::jsonb->>'a')::columntype -> query recent state from table)
END

Function register_audit_table not called (in create_table_audit)?

Hi,

I've called function create_table_audit, and all audit_table_log/audit_column_log tables are empty. I saw you have function register_audit_table but it's called only in modify_ddl_log_tables function (which is called in trigger which modifies the table itself).

in that function you do have this line
-- remember audit_id_column when registering table in audit_table_log later

but there is no call to register_audit_table?

On the opposite function drop_table_audit, you do call unregister_audit_table (for some reason).

0.7.2 update error

Hi,
I've got this error message:

ERROR: extension "pgmemento" has no update path from version "0.7.1" to version "0.7.2"
SQL state: 22023

Cannot create new table that contains a foreign key

Hey again, I believe I found another issue. I tried to create a table containing a foreign key, but I get an error in the create table trigger:

> create db testing;
> psql -f INSTALL_PGMEMENTO.sql -d testing
> psql -f INIT.sql -d testing
SET

Please enter the name of the schema to be used along with pgMemento: public
Store new data in audit logs, too? (y|N): y
Log existing data as inserted (baseline)? (y|N): y
Trigger CREATE TABLE statements? (y|N): y
Specify tables to be excluded from logging processes (separated by comma): 

Initializing pgMemento in public schema ...
                    init                     
---------------------------------------------
 pgMemento is initialized for public schema.

> psql -d testing
testing=# create table test (id serial primary key, value integer);
CREATE TABLE
testing=# create table test2 (id serial primary key, test_id integer references test (id));
ERROR:  column "pgmemento_audit_id" of relation "test2" already exists
CONTEXT:  SQL statement "ALTER TABLE public.test2 ADD COLUMN pgmemento_audit_id BIGINT DEFAULT nextval('pgmemento.audit_id_seq'::regclass) UNIQUE NOT NULL"
PL/pgSQL function create_table_audit_id(text,text,text) line 8 at EXECUTE
SQL statement "SELECT pgmemento.create_table_audit_id($1, $2, $3)"
PL/pgSQL function create_table_audit(text,text,text,boolean,boolean,boolean) line 42 at PERFORM
SQL statement "SELECT pgmemento.create_table_audit(
          tablename,
          schemaname,
          current_default_column,
          current_log_old_data,
          current_log_new_data,
          FALSE
        )"
PL/pgSQL function table_create_post_trigger() line 42 at PERFORM

This occurred both in v0.7 and master.

Suggestion: Provide an (optional) way to track the post-modification value.

Hi there!

It would be great if there were a way to let pgMemento not only save the value before a transaction, but also the value after a transaction - currently we're caluclating the value after by executing restore_record, but this is quite expensive for a lot of changes at once, so doing the storage <-> runtime tradeoff would be well worth it.

Thank you very much!

(Here's what we are currently doing)

https://github.com/mayflower/postgraphile-audit-plugin/blob/e070f88eb9fc7cb352341e9c7e049966fffbf5fb/Setup.sql#L18-L61

Make audit id column type configurable

@FxKu Can you please share your thoughts on what it takes to make it possible?
I'd like to use my existing UUID columns as audit keys.

create_table_audit function throws an error

pgmemento.create_table_audit function throws an error.
It's easy to replicate:

CREATE DATABASE test;
--switch to database test;

CREATE EXTENSION pgmemento;

CREATE TABLE test_tab (
  id INTEGER
);

SELECT pgmemento.create_table_audit(
         tablename := 'test_tab',
         schemaname := 'public',
         audit_id_column_name := 'audit_id');

You'll get:

[42701] ERROR: column "audit_id" of relation "test_tab" already exists Where: SQL statement "ALTER TABLE public.test_tab ADD COLUMN audit_id BIGINT DEFAULT nextval('pgmemento.audit_id_seq'::regclass) UNIQUE NOT NULL" PL/pgSQL function pgmemento.create_table_audit_id(text,text,text) line 8 at EXECUTE SQL statement "SELECT pgmemento.create_table_audit_id($1, $2, $3)" PL/pgSQL function pgmemento.create_table_audit(text,text,text,boolean,boolean,boolean) line 41 at PERFORM

I guess create_table_audit_id function is called twice here:
First create_table_audit finds nothing in audit_schema_log table and calls create_schema_audit

Which in turn calls create_table_audit for this particular table. This one creates an id column, so the outer call fails.

My guess is there should be an ELSE block here, so standalone create_table_audit_id is called only when schema audit is already set.

Installation instructions are broken, what to "make"?

https://github.com/pgMemento/pgMemento#quickstart says

To use pgMemento as an extension download and unzip the
pgmemento-.zip archive, change to extracted folder and
run the following commands from within a shell environment:
make
sudo make install

However there is no Makefile in the root of the archive, not in src. There is one in extension/pgxn but trying to use that results in an error:

~/pgMemento-0.7.1/extension/pgxn# make
grep: pgmemento.control: No such file or directory
grep: pgmemento.control: No such file or directory
make: *** No rule to make target 'sql/pgmemento.sql', needed by 'sql/pgmemento--.sql'.  Stop.

It's probably not the Makefile the README was referencing though anyways.

Installation worked perfectly fine for me using the described "classic way" but I wanted to let you know. :)

Summary of currently monitored schemas

Would you be able to tell me how I can find out which of the schemas of my database are monitored at any one time? I have multiple schemas in my databases and have started and stopped auditing for some of the them multiple times (using the stop_auditing.sql and start_auditing.sql scripts) and lost track of it. Is there a way to find out which ones are currently monitored?

Bug: some updates are not tracked when a table is updated via function calls (in the same transaction)

Hi,

With this setup:

create table users (id int, name text, age int);

select pgmemento.create_table_audit(
  schemaname => 'public',
  tablename => 'users',
  audit_id_column_name => 'audit_log_id',
  log_old_data => TRUE,
  log_new_data => TRUE,  
  log_state => TRUE
);

create function update_age (_id int, _age int) returns void language 'sql' as $$ 
  update users set age = _age where id = _id; 
$$;

create function update_user (_id int, _name text, _age int) returns void language 'sql' as $$ 
  update users set name = _name where id = _id; 
  select update_age(_id, _age); 
$$;

insert into users values (1, 'foo', 10);

Data in the row_log:

select old_data, new_data from pgmemento.row_log;

-- old_data |                         new_data                          
-- ---------+-----------------------------------------------------------
--          | {"id": 1, "age": 10, "name": "foo", "audit_log_id": 1}

Update via the function above:

select update_user(1, 'foo updated', 20);
select old_data, new_data from pgmemento.row_log;

--     old_data    |                         new_data                          
-- ----------------+-----------------------------------------------------------
--                 | {"id": 1, "age": 10, "name": "foo", "audit_log_id": 1}
-- {"name": "foo"} | {"age": 20}

Whereas the expected data would be:

--     old_data               |                         new_data                          
-- ---------------------------+-----------------------------------------------------------
--                            | {"id": 1, "age": 10, "name": "foo", "audit_log_id": 1}
-- {"name": "foo", "age": 10} | {"name": "foo updated", "age": 20}

Or perhaps:

--     old_data    |                         new_data                          
-- ----------------+-----------------------------------------------------------
--                 | {"id": 1, "age": 10, "name": "foo", "audit_log_id": 1}
-- {"name": "foo"} | {"name": "foo updated"}
-- {"age": 10}     | {"age": 20}

Which is what happens when one runs the updates directly:

begin;
update users set name = 'foo updated' where id = 1; 
update users set age = 20 where id = 1; 
commit;

Tested on PostgreSQL version 12.2 with version 0.7.2 of pgMemento. Is this a valid bug or am I doing something wrong?

Prevent users from editing log tables directly

I'd like to prevent certain users from editing log tables.
This should be possible with revoking editing permissions for log tables from a user and setting trigger functions to run as SECURITY DEFINER so log tables cannot be edited directly.
Is it reasonable to set trigger functions to SECURITY DEFINER by default?

Getting view of data how it looked at specific event

Nice work you have done.

Not really an issue but i hope i am on a right track.
Will this work (customer with 4 fields)

CREATE TABLE public.customer
(
name text,
email text,
phone text,
address text,
id serial
)

-- all the events that occured
select event_id from pgmemento.row_log where audit_id = ( select audit_id from public.customer where id = 1)

-- and now i want to see how data looked at specific event (7)

`SELECT
event_id,
first_value(name) over (partition by name_partition order by event_id) as name,
first_value(email) over (partition by email_partition order by event_id) as email,
first_value(phone) over (partition by phone_partition order by event_id) as phone,
first_value(address) over (partition by address_partition order by event_id) as address
FROM (
select
event_id,
rl.changes->>'name' as name,
sum(case when (rl.changes->'name')::text is null then 0 else 1 end) over (order by event_id) as name_partition,
rl.changes->>'email' as email,
sum(case when (rl.changes->'email')::text is null then 0 else 1 end) over (order by event_id) as email_partition,
rl.changes->>'phone' as phone,
sum(case when (rl.changes->'phone')::text is null then 0 else 1 end) over (order by event_id) as phone_partition,
rl.changes->>'address' as address,
sum(case when (rl.changes->'address')::text is null then 0 else 1 end) over (order by event_id) as address_partition
from row_log rl
inner join public.customer target
on target.audit_id = rl.audit_id
where
rl. event_id = 7
) as q

Crash in pgmemento.restore_record with an array column.

There appears to be a bug with array columns when trying to restore records, giving this error message:

ERROR:  malformed array literal: "abc"
DETAIL:  Array value must start with "{" or dimension information.
CONTEXT:  PL/pgSQL function restore_records(integer,integer,text,text,bigint,boolean) line 11 at RETURN QUERY

This is a transcript of what I did to produce this behaviour. I have tried using both master version and v0.7, installing using the provided INSTALL_PGMEMENTO.sql script.

testdb=# create table test2 (id serial primary key, values text[]);
CREATE TABLE
testdb=# insert into test2 (values) values ('{abc,def}');
INSERT 0 1
testdb=# select * from test2;
 id |  values   | pgmemento_audit_id 
----+-----------+--------------------
  1 | {abc,def} |                  2
(1 row)

testdb=# update test2 set values = '{ghi,jkl}';
UPDATE 1
testdb=# select * from test2;
 id |  values   | pgmemento_audit_id 
----+-----------+--------------------
  1 | {ghi,jkl} |                  2
(1 row)

testdb=# select * from pgmemento.row_log;
 id | audit_id |                        event_key                         |          old_data          |                           new_data                           
----+----------+----------------------------------------------------------+----------------------------+--------------------------------------------------------------
  1 |        1 | 1594320164.98008;1594320164.98008;2298892;3;test;public  |                            | {"id": 1, "values": {"Test": 3}, "pgmemento_audit_id": 1}
  2 |        2 | 1594320573.5166;1594320573.5166;2301281;3;test2;public   |                            | {"id": 1, "values": ["abc", "def"], "pgmemento_audit_id": 2}
  3 |        2 | 1594320585.49882;1594320585.49882;2301282;4;test2;public | {"values": ["abc", "def"]} | {"values": ["ghi", "jkl"]}
(3 rows)

testdb=# select * from pgmemento.restore_records(1, 100, 'test2', 'public', 2, true) as (log jsonb);
ERROR:  malformed array literal: "abc"
DETAIL:  Array value must start with "{" or dimension information.
CONTEXT:  PL/pgSQL function restore_records(integer,integer,text,text,bigint,boolean) line 11 at RETURN QUERY

Did you consider adding the package to PGXN

Hi @FxKu, thank you for creating such an awesome tool!

Did you consider adding it to the PGXN registry? I guess this might help more people to find it.
If you don't have time, I could perhaps write an initial META.json spec file and make a PR to this repo, but I guess you'd still have to upload the final package to PGXN as the author.

Let me know what you think.
Cheers

Full record history between two dates

Hi all!

First of all.. congrats for this amazing script!. I just love it!

I have a newbie question (sorry)... I would like to have a "full history" of a record (or even better, a range of records). So, given an audit_id (or a set of them) and a "from date" and "to date", I would like to see the serie of records ordered by date where each record has the full state (all fields with its data) at a given date.

date	audit_id	field 1	field 2	...	field n
2020/09/18	1	original value 1	original value 2	...	original value n
2020/09/20	1	original value 1	updated value 2	...	original value n
2020/09/25	1	updated value 1	updated value 2	...	updated value n

This is kind of what you can do with temporal_tables.

Is there any "easy" way to reach that?.

I've seen that if I call

SELECT
  *
FROM
  pgmemento.restore_record(1, 9, 'company_limits_assets', 'platform', 1, TRUE)
  AS (log jsonb);

I get the final status... so I have to query the transaction table, get all transactions between two dates and call pgmemento.restore_record one time per transaction... however, I think that this is far from being efficient.

Is there any other way?

Thank you!

-- Jorge

pg_dump loses history data

When restoring data from backup some history data appears to be lost.

More details are here: #19 (comment)