bitcrowd / carbonite Goto Github PK

View Code? Open in Web Editor NEW

132.0 12.0 9.0 236 KB

Audit trails for Elixir/PostgreSQL based on triggers

License: Apache License 2.0

Elixir 100.00%

change-data-capture auditing elixir ecto postgresql

carbonite's People

Contributors

Stargazers

Watchers

Forkers

anagromataf bamorim phanmn giangnt41 carlosmorette kianmeng boon-wego laiboonh natau4 ubi-mirrors

carbonite's Issues

Transaction "reentrancy"

Is the following use case supported? I can not tell for sure from the documentation. Consider the case where multiple functions call Carbonite.Transaction.changeset(). They are called during a single Ecto transaction in different orders depending on the workflow. All changes should be associated with the same carbonite transaction row. I'm guessing that a carbonite transaction id added to the changeset, so the question is whether Carbonite.Transaction.changeset() looks for it, thus creating only a single transaction row no matter how many times it is called. Is that what happens? That would be fantastic but then I suppose another problem arises where one Ecto transaction involves multiple changesets. Perhaps Carbonite puts the transaction id in, say, the process dictionary?

Track replaced data in changes

Hi! Thank you for such awesome library.

I have a question regarding workflow with old data. I have a lot of records in database already and don't want to lose any changes history.

So my problem is: when I'll integrate Carbonite into existing table and it will track first change (eg record updated) - I'll don't know what data I had before this update. (since there it no initial insert transaction).

I saw in earlier versions you stored both new and old data so it was possible to make DIFF based on it. But then you deleted old data for some reason.

So what I suggested workflow here? Should I backfill my transactions/changes from existing data? Or there any better options?

Thank you!

Make trigger raise message more clear

Hello,

we just debugged a bit on a missing Carbonite.insert_transaction since the raised message from the DB about the FK constraint does not clearly say "carbonite" on it.

Since one just sees:

Postgrex.Error) ERROR 23503 (foreign_key_violation) DELETE on table foo.bar without prior INSERT into foo.transactions

it only indicated that it is carbonite related, once you have a better look at where the transactions table comes from.
Someone that doesn't know carbonite so well might think that this is an internal postgres table or something else.

Maybe we can add "from one of your carbonite triggers" to the raise message to make it clear for everyone what to search for.

Wdyt @maltoe

Array changes are not always detected

Hi 👋

When tracking changes for an array field it doesn't always detect a difference. I think it's due to how the fields are compared <@.

When an item is added to the list and the previous item is not removed the field is not detected as a change.

-- NOT change_row.data @> old_field_jsonb

-- changed
SELECT NOT '["new"]'::jsonb @> '["old"]'::jsonb; 

-- changed
SELECT NOT '["new"]'::jsonb @> '["old", "new"]'::jsonb;

-- should be changed
SELECT NOT '["new", "old"]'::jsonb @> '["old"]'::jsonb;

Multiple inserts within the same transaction collide

Optionally tracking SELECT statements

Hello there,

First of all, thank you so much for this amazing library. It has been a real relief to us of having this!

Right now, the carbonite tracks down all the mutations in the database (insert, update, and delete), but we don't have an option of (optionally) tracking down all the reads for audit logging purposes (e.g. know which user has read which data). My questions are:

Are you folks planning on adding this capability?
If not, do you have an interest in having it?

Using @schema_prefix breaks prefix option

Since we set a @schema_prefix in the schemas, some of our functions using the prefix option of Ecto.Repo functions are broken:

override_mode/2

etc.

Reason

This is due to the fact that the @schema_prefix takes priority over any "query-level" prefix set, see for example:

iex(1)> from(t in Carbonite.Trigger) |> put_query_prefix("foobar") |> Map.from_struct()
%{
  aliases: %{},
  assocs: [],
  combinations: [],
  distinct: nil,
  from: %Ecto.Query.FromExpr{
    source: {"triggers", Carbonite.Trigger},
    file: "iex",
    line: 8,
    as: nil,
    prefix: "carbonite_default",
    params: [],
    hints: []
  },
  group_bys: [],
  havings: [],
  joins: [],
  limit: nil,
  lock: nil,
  offset: nil,
  order_bys: [],
  prefix: "foobar",
  preloads: [],
  select: nil,
  sources: nil,
  updates: [],
  wheres: [],
  windows: [],
  with_ctes: nil
}

iex(2)> from(t in Carbonite.Trigger) |> put_query_prefix("foobar") |> R.one()
[debug] QUERY ERROR source="triggers" db=0.0ms queue=1.0ms idle=1494.7ms
SELECT t0."id", t0."table_prefix", t0."table_name", t0."primary_key_columns", t0."excluded_columns", t0."filtered_columns", t0."store_changed_from", t0."mode", t0."override_xact_id", t0."inserted_at", t0."updated_at" FROM "carbonite_default"."triggers" AS t0 []
** (Postgrex.Error) ERROR 42P01 (undefined_table) relation "carbonite_default.triggers" does not exist

Same goes for Repo.one(Carbonite.Trigger, prefix: "foobar").

Side-note 1

Apparently the only way to override the @schema_prefix is by specifying the prefix option directly on the from expression. However, that one does not allow for variables but expects a string a compile-time.

iex(1)> from(t in Carbonite.Trigger, prefix: "foobar") |> Map.from_struct()
%{
  ...
  from: %Ecto.Query.FromExpr{
    source: {"triggers", Carbonite.Trigger},
    file: "iex",
    line: 11,
    as: nil,
    prefix: "foobar",
    params: [],
    hints: []
  },
  ...
}

Side-note 2

While this is correctly explained in some parts of the Ecto docs, e.g. for Repo.one, it's definitely wrong for Repo.update_all/3 where it says

The prefix to run the query on (such as the schema path in Postgres or the database in MySQL). This overrides the prefix set in the query and any @schema_prefix set in the schema.

Repo.delete/3 and delete_all/2 have the same statement, it's likely incorrect everywhere.

EDIT: It's correct for everything in the "Schema API" (Ecto.Repo.Schema) where the :prefix option does override the @schema_prefix. For any "Query API" operation (Ecto.Repo.Queryable), the @schema_prefix takes precendence. 🤷

Possible fixes

Remove the @schema_prefix again (and live with the fact that a simple Repo.all(Carbonite.Foo) will never do)
Remove the @schema_prefix again and change the default prefix name to public 🤷 -> probably too late for that

ERROR 42P01 (undefined_table) relation "transactions" does not exist

Hello, sorry for the n00b question.

I'm getting the error "relation "transactions" does not exist" with the following code. The transactions table exists in the carbonite_default database. How do I get Ecto to prefix table names with a database name?

Thank you

     %{meta: %{type: "signed_in"}, carbonite_prefix: "carbonite_default"}
      |> Carbonite.Transaction.changeset()
      |> Repo.insert!()

Can we save in meta the id of the entity being created?

For example in my existing multi

Multi.new()
|> Multi.insert(:insert_user, user_changeset)
|> Multi.run(:auth_providers, fn _repo, %{insert_user: user} ->
  insert_user_profile(user.id, profile_chageset)
end)

What if i want to save user id inside meta

Multi.new()
### |> Carbonite.Multi.insert_transaction(%{meta: %{user_id: user.id}}) <- How do i do this?
|> Multi.insert(:insert_user, user_changeset)
|> Multi.run(:auth_providers, fn _repo, %{insert_user: user} ->
  insert_user_profile(user.id, profile_chageset)
end)

Schema doesn't drop

Running Carbonite.Migrations.drop_schema/1 doesn't work if the tables have not been dropped before. Therefor the function should be updated to drop the tables first.

Add drop_tables migration function

At the moment there is only drop_schema/1. In the case of using the same schema for the carbonite tables as for other tables drop_schema/1 cannot be used and the tables, functions and types created by carbonite have to be dropped manually. Therefor it would be nice to have a drop_tables function to do that.

Transaction is logged even when no change is involved

When a mutation to update an entity which has no actual changes is attempted, the transaction is still recorded but it has no change record associated to it. I would expect no transaction to be inserted since changes in the Ecto.Changeset is %{}.

[debug] QUERY OK db=0.1ms idle=289.6ms
begin []
[debug] QUERY OK db=3.9ms
INSERT INTO "carbonite_default"."transactions" ("meta","inserted_at") VALUES ($1,$2) ON CONFLICT ("id") DO UPDATE SET "id" = EXCLUDED."id" RETURNING "inserted_at","meta","xact_id","id" [%{type: "user_updated"}, ~U[2022-10-31 16:36:13.186498Z]]
[debug] QUERY OK db=1.1ms
commit []

I figured at least updated_at for the entity would be logged in the change for the transaction, but that doesn't even seem to be what triggers the transaction. I tried excluding updated_at through the trigger configuration, and that does not stop the transaction from being created.

What is the best way to address this?

Approaches for using fixtures with unit tests

I'm currently evaluating Carbonite (the library is extremely elegant, by the way -- thank you!).

Ideally I'd like my unit tests to catch any cases where the Carbonite.Transaction is missing, but am running into some difficulty here. In more detail:

I set up Carbonite and added a trigger to a single table (in a Phoenix 1.7 project, in the off chance that matters), and then immediately ran the unit tests. The test that creates a new record in that table fails:

** (Postgrex.Error) ERROR 23503 (foreign_key_violation) INSERT on table public.foos without prior INSERT into carbonite_default.transactions

Perfect, exactly what I was hoping to see. I updated the code that inserts a new record to call Carbonite.Multi.insert_transaction as part of the transaction, and re-ran the tests.

That test now passes, but I also have some other failures related to tests that update and delete from this table. The underlying error is the same as above, but the difference is that these tests use a fixture that first creates a record (so that the tests can modify or delete the record and then assert on the outcome).

I updated the test fixture to call Carbonite.Multi.insert_transaction as part of the transaction, and re-ran the tests.

However, they all passed... which is not what I wanted to see, because my update and delete context methods haven't been updated yet. The unit tests pass, but the production app crashes when updating or deleting records from this table.

I tried modifying the fixture to call Carbonite.Multi.override_mode instead of Carbonite.Multi.insert_transaction, like so:

Ecto.Multi.new()
|> Carbonite.Multi.override_mode()
|> Ecto.Multi.insert(:foo, foo)
|> Repo.transaction()

However, the tests still pass with this change.

From the docs:

In tests using Ecto's SQL sandbox, subsequent calls to transactional operations (even to the same operation twice) are wrapped inside the overarching test transaction, and hence also effectively call insert_transaction/3 within the same transaction.

...

Therefore, insert_transaction/3 ignores subsequent calls within the same database transaction (equivalent to ON CONFLICT DO NOTHING), discarding metadata passed to all calls but the first.

I'm assuming that the behaviour I'm seeing is because of the above?

Digging around in the source, I came across:

@doc """
Returns an `t:Ecto.Query.t/0` that can be used to select or delete the "current" transaction.

This function is useful when your tests run in a database transaction using Ecto's SQL sandbox.

## Example: Asserting on the current transaction

When you insert your `Carbonite.Transaction` record somewhere inside your domain logic, you do
not wish to return it to the caller only to be able to assert on its attributes in tests. This
example shows how you could assert on the metadata inserted.

    # Test running inside Ecto's SQL sandbox.
    test "my test" do
      some_operation_with_a_transaction()
      assert current_transaction_meta() == %{"type" => "some_operation"}
    end

    defp current_transaction_meta do
      Carbonite.Query.current_transaction()
      |> MyApp.Repo.one!()
      |> Map.fetch(:meta)
    end

...

"""

If I use override_mode when creating a record (in the fixture), then Carbonite.Query.current_transaction() |> Repo.one!() fails with:

** (Ecto.NoResultsError) expected at least one result but got none in query:

     from t0 in Carbonite.Transaction,
       prefix: "carbonite_default",
       where: t0.id == fragment("CURRVAL(CONCAT(?::VARCHAR, '.transactions_id_seq'))", ^"carbonite_default"),
       where: t0.xact_id == fragment("pg_current_xact_id()")

So I could assert on Carbonite.Query.current_transaction() returning a result in my tests, but I'd rather (with the exception of the fixtures) not need for them to know anything about Carbonite.

Which brings me to my questions... am I doing something wrong? Is there a better way to approach this?

Mode condition is broken

The mode condition compares the current transaction id to override_transaction_id which can be NULL. This case needs to be handled explicitly.

Transaction ID not stable across Postgres installations

Problem

Turns out pg_current_xact_id is not stable (as in, persisted) across Postgres installations / database backups from pg_dump. Who would have thought. The docs say something about "unique for the lifespan of an installation". 🤦

Consequently, when a database is restored from a backup into a new Postgres installation, Carbonite begins to insert data into the transactions table with lower id values. 💥

Solution

We switch to using a sequence for generating unique, monotonically increasing id values for the transactions. Sequences are persisted in database dumps.
We reference the transactions table from the changes table by calling currval() on the sequence. currval() keeps returning the same number within a database session since the last time nextval() was called. In other words: With two transactions incrementing the sequence value, all inserts into the changes table will still see the correct value as retrieved within its own transaction.
However, only using a sequence would break the "transaction grouping" of changes.
- Scenario: Transaction A writes an entry to the transactions table, setting the current sequence value. Transaction B fails to write an entry to the transactions table (= bug in code), all inserts to changes would use currval and get the value of transaction A 💥
- Solution for this: We "tag" the id values with the current transaction id in a new xact_id field

we discarded this:

~~Pseudo code with << being a bit-shift-left and % a modulo:~~

id of transaction = nextval(sequence) << 8 + txid_current() % 256

~~Instead of shifting we can set the increment option of the sequence to 256.~~

How to add seed data

Hello,

A project seeds data by calling various functions that use Ecto. The script is, for example, located at priv/repo/migrations/seed.exs. A snippet:

MyApp.Repo.delete_all(User)
UserSeeder.admin()

When the project is converted to use Carbonite, the script fails because of the foreign key constraints to the transaction table.

Carbonite's README references insert_migration_transaction/1 as a potential solution but the example doesn't pass any arguments to it. I tried passing a function:

** (RuntimeError) could not find migration runner process for #PID<0.94.0>
    (ecto_sql 3.9.0) lib/ecto/migration/runner.ex:304: Ecto.Migration.Runner.runner/0
    (ecto_sql 3.9.0) lib/ecto/migration/runner.ex:84: Ecto.Migration.Runner.migrator_direction/0
    (carbonite 0.5.0) lib/carbonite/migrations.ex:330: Carbonite.Migrations.insert_migration_transaction/2

Then I realized the seed script is not a migration. Thanks to migrations_test.exs, I converted the script to a migration (see below). I'm not sure what insert_migration_transaction_after_begin/0 does, but at least there are no errors! 🥇 I have no idea what I'm doing. 😸

defmodule PetalPro.Repo.Migrations.SeedUsers do
  alias MyApp.Accounts.{User, UserSeeder}
  use Ecto.Migration

  import Carbonite.Migrations
  insert_migration_transaction_after_begin()

  def up do
    if Mix.env() == :dev do
      MyApp.Repo.delete_all(User)
      UserSeeder.admin()
      UserSeeder.normal_user(%{
        email: "[email protected]",
        name: "Sara",
        password: "password",
        confirmed_at: Timex.now() |> Timex.to_naive_datetime()
      })
      UserSeeder.random_users(20)
    end
  end
end

Migration versioning

A bit of support for data migrations

API to fetch changes in current transaction

Incompatible with PostgreSQL 16.0?

With PostgreSQL 16.0 we started seeing the following in our CI, which was using the postgres:latest docker image:

[...}
15:29:12.091 [info] alter table carbonite_default.triggers

15:29:12.092 [info] create table carbonite_default.outboxes
** (Postgrex.Error) ERROR 22P02 (invalid_text_representation) invalid input syntax for type xid8: "0::TEXT::xid8"
    (ecto_sql 3.10.2) lib/ecto/adapters/sql.ex:1047: Ecto.Adapters.SQL.raise_sql_call_error/1
    (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (ecto_sql 3.10.2) lib/ecto/adapters/sql.ex:1154: Ecto.Adapters.SQL.execute_ddl/4
    (ecto_sql 3.10.2) lib/ecto/migration/runner.ex:327: Ecto.Migration.Runner.log_and_execute_ddl/3
    (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (elixir 1.15.5) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2
    (ecto_sql 3.10.2) lib/ecto/migration/runner.ex:290: Ecto.Migration.Runner.perform_operation/3
    (stdlib 5.0.2) timer.erl:270: :timer.tc/2
Error: Process completed with exit code 1.

Have you verified that Carbonite is compatible with PG 16.0? We've specified postgres:15.4 in our CI so not an issue for us for now, FYI.

Multiple Carbonite inserts in a single transaction

Hello, how does one make multiple carbonite inserts in a transaction that touches multiple tables?

                Ecto.Multi.new()
                |> Carbonite.Multi.insert_transaction(%{meta: %{type: "account_updated"}})
                |> Ecto.Multi.update("account", account_changeset)
                |> Carbonite.Multi.insert_transaction(%{meta: %{type: "create_wallet"}})
                |> Ecto.Multi.insert("wallet", %Wallet{create_params | wallet_id: wallet_id, uri: uri})
                |> Test.Repo.transaction()} do

This always fails with :carbonite_transaction is already a member of the Ecto.Multi: but in this case, what would you recommend as a best practise for including Carbonite in a transaction that hits multiple tables at one go?

Thanks.

Outbox redo

explicit outboxes table with memo
lock these records instead of advisory lock
purge only if all outboxes have seen a transaction
order transactions by id, not inserted_at
make min_age optional