Giter Site home page Giter Site logo

superfly / fly_postgres_elixir Goto Github PK

View Code? Open in Web Editor NEW
91.0 9.0 10.0 93 KB

Library for working with local read-replica postgres databases and performing writes through RPC calls to other nodes in the primary Fly.io region.

Home Page: https://hex.pm/packages/fly_postgres

License: Apache License 2.0

Elixir 100.00%

fly_postgres_elixir's Introduction

Fly Postgres

Helps take advantage of geographically distributed Elixir applications using Ecto and PostgreSQL in a primary/replica configuration on Fly.io.

Online Documentation

Mark Ericksen's ElixirConf 2022 presentation explains more about what this library is for and the problems it helps solve.

Installation

Fly.Postgres is published on Hex. Add it to your list of dependencies in mix.exs:

def deps do
  [
    {:fly_postgres, "~> 0.3.0"}
  ]
end

Then run mix deps.get to install Fly.Postgres and its dependencies. Note that fly_postgres depends on fly_rpc so it is pulled in as well. The configuration section below includes the relevant parts for fly_rpc.

After the packages are installed, you must create a database migration to add the watch_for_lsn_change stored procedure to the database:

mix ecto.gen.migration add_fly_postgres_proc

Open the generated migration in your editor and call the up and down functions like this:

defmodule MyApp.Repo.Local.Migrations.AddFlyPostgresProc do
  use Ecto.Migration

  def up do
    Fly.Postgres.Migrations.V01.up()
  end

  def down do
    Fly.Postgres.Migrations.V01.down()
  end
end

NOTE: The stored procedure is only executed when in remote (non-primary) regions. It actively watches for replication changes with the Postgres WAL (Write Ahead Log).

Now, run the migration to create the stored procedure:

mix ecto.migrate

Before you can use Fly.Postgres, it must be configured.

Configuration

Repo

This assumes your project already has an Ecto.Repo. To start using the Fly.Repo, here are the changes to make.

For a project named "MyApp", change it from this...

defmodule MyApp.Repo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.Postgres
end

To something like this...

defmodule MyApp.Repo.Local do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.Postgres

  @env Mix.env()

  # Dynamically configure the database url based on runtime and build
  # environments.
  def init(_type, config) do
    Fly.Postgres.config_repo_url(config, @env)
  end
end

defmodule MyApp.Repo do
  use Fly.Repo, local_repo: MyApp.Repo.Local
end

This renames your existing repo to "move it out of the way" and adds a new repo to the same file. The new repo uses the Fly.Repo and links back to your project's Ecto.Repo. The new repo has the same name as your original Ecto.Repo, so your application will be referring to it now when talking to the database.

The other change adds the init function to your Ecto.Repo. This dynamically configures your Ecto.Repo to connect to the primary (writable) database when your application is running in the primary region. When your application is not in the primary region, it is configured to connect to the nearest read-only replica. The replica is like a fast local cache of all your data. The .Local idea is that your Ecto.Repo is configured to talk to it's physically "local" database.

The Fly.Repo performs all read operations like all, one, and get_by directly on the local replica. Other modifying functions like insert, update, and delete are performed on the primary database through proxy calls to a node in your Elixir cluster running in the primary region. That ability is provided by the fly_rpc library.

The value of the Mix.env() is set at build time to @env and passed in to let fly_postgres know about the project's build environment. Fly.Postgres only attempts to rewrite the database URL when your app is running in :prod mode. When running in :dev or :test, no Ecto.Repo configuration changes are made.

Migration Files

After changing your repo name, generating migrations can end up in the wrong place, or at least not where we want them.

You can override the inferred location in your config. The following keeps looking for migrations in the default location.

config :my_app, MyApp.Repo.Local,
  priv: "priv/repo"

Repo References

The goal with using this repo wrapper, is to leave the majority of your application code and business logic unchanged. However, there are a few places that need to be updated to make it work smoothly.

The following examples are places in your project code that need reference your actual Ecto.Repo. Following the above example, it should point to MyApp.Repo.Local.

  • test_helper.exs files make references like this Ecto.Adapters.SQL.Sandbox.mode(MyApp.Repo.Local, :manual)
  • data_case.exs files start the repo using Ecto.Adapters.SQL.Sandbox.start_owner! calls.
  • channel_case.exs need to start your local repo.
  • conn_case.exs need to start your local repo.
  • config/config.exs needs to identify your local repo module. Ex: ecto_repos: [MyApp.Repo.Local]
  • config/dev.exs, config/test.exs, config/runtime.exs - any special repo configuration should refer to your local repo.

With these project plumbing changes, your application code remains largely untouched!

Primary Region

If your application is deployed to multiple Fly.io regions, the instances (or nodes) must be clustered together.

Through ENV configuration, you tell the app which region is the "primary" region.

This example configuration says that the Sydney Australia region is the "primary" region. This is where the primary postgres database lives and where our application has fast write access to it.

fly.toml

[env]
  PRIMARY_REGION = "syd"

Application

Changes are needed in your application supervision tree.

application.ex

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    # ...

    children = [
      # Start the RPC server
      {Fly.RPC, []},
      # Start the Ecto repository
      MyApp.Repo.Local,
      # Start the supervisor for LSN tracking
      {Fly.Postgres.LSN.Supervisor, repo: MyApp.Repo.Local},
      #...
    ]

    # ...
  end
end

The following changes were made:

  • Added the Fly.RPC GenServer
  • Start your Repo
  • Added Fly.Postgres.LSN.Supervisor and told it which Repo to use.

Multiple Ecto.Repos?

If you have multiple Ecto.Repos setup in your application, you can still use Fly.Postgres. You will need an LSN Supervisor Tracker for each repository that you want to work with.

In your application file, it would be similar to this:

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    # ...

    children = [
      # Start the RPC server
      {Fly.RPC, []},
      # Start Ecto repositories
      MyApp.Repo.Local_1,
      MyApp.Repo.Local_2,
      # Start the supervisor for LSN tracking and name them.
      {Fly.Postgres.LSN.Supervisor, repo: MyApp.Repo.Local_1, name: :repo_tracker_1},
      {Fly.Postgres.LSN.Supervisor, repo: MyApp.Repo.Local_2, name: :repo_tracker_2},
      #...
    ]

    # ...
  end
end

Usage

Local Development

If you get an error like (ArgumentError) could not fetch environment variable "PRIMARY_REGION" because it is not set then see the README docs in fly_rpc for details on setting up your local development environment.

Automatic Usage

Normal calls like MyApp.Repo.all(User) are performed on the local replica repo. They are unchanged and work exactly as you'd expect.

Calls that modify the database like "insert, update, and delete", are performed through an RPC (Remote Procedure Call) in your application running in the primary region.

In order for this to work, the application must be clustered together and configured to identify which region is the "primary" region. Additionally, the application needs to be deployed to multiple regions. This assumes an instance of the application is running in the primary region as well.

A call to MyApp.Repo.insert(changeset) is proxied to perform the insert in the primary region. If the function is already running in the primary region, it just executes normally locally. If the function is running in a non-primary region, it makes a RPC execution to run on the primary.

The magic bits are that it additionally fetches the Postgres LSN (Log Sequence Number) for the database after making the change. The calling function then blocks, waits for the async database replication process to complete, and continues on once the data modification has replayed on the local replica.

In this way, it becomes seamless for you and your code! You get the benefits of being globally distributed and running closer to your users without re-designing your application!

By default, a Repo function that modifies the database is proxied to a primary server and waits for the data to be replicated locally before continuing. Passing the await: false option instructs the proxy code to not wait for replication to complete.

This is helpful when you only need the function result or the data is not immediately needed locally.

MyApp.Repo.insert(changeset, await: false)

MyApp.Repo.update(changeset, await: false)

MyApp.Repo.delete(item, await: false)

Explicit RPC Usage

When business logic code makes a number of changes or does some back and forth with the database, the "Automatic Usage" is too slow. An example is looping through a list and performing a database insert on each iteration. Waiting for each insert to complete and be locally replicated before performing the next iteration could be very slow!

For those cases, execute the function that does all the database work but do it in the primary region where it is physically close to the database.

Fly.Postgres.rpc_and_wait(MyModule, :do_complex_work, [arg1, arg2])

The function is executed in the primary region and locally, blocks until any relevant database changes are replicated.

Explicit RPC but don't Wait for Replication

Sometimes you might not want to wait for DB replication. Perhaps it's a fire-and-forget or the function result is enough.

For this case, you can use the fly_rpc library directly.

Fly.RPC.rpc_primary({MyModule, :do_work, [arg1, arg2]})

This is a convenience function which is equivalent to the following:

Fly.RPC.rpc_region(:primary, {MyModule, :do_work, [arg1, arg2]})

This also works when modifying the database too.

Using Ecto Queries in Migrations

If you are trying to run an Ecto Query in a Migration, it will fail when using MyApp.Repo.insert(...) or MyApp.Repo.update(...).

The solution is to explicitly use the local repo instead.

MyApp.Repo.Local.insert(...) or MyApp.Repo.Local.update(...)

Explanation:

It doesn't work to use the repo wrapper in a migration because the Tracker started in your MyApp.Application hasn't been started. When running migrations, the "Application" is not started because we could have GenServers that make queries and interact with the database. When running migrations, those parts of the application shouldn't be running because the very structure of the database can change.

It is safe to use MyApp.Repo.Local because on Fly.io, migrations are run in the primary region that already has direct access to the writable database.

In general, it is discouraged to use .update(...), .insert(...), and .delete(...) statements in migrations. For more information on why that presents problems and alternative options, check out this section of the Safe Ecto Migrations series.

Production Environment

Prevent temporary outages during deployments

When deploying on Fly.io, a new instance is rolled out before removing the old instance. This creates a period of time where both new and old instances are deployed together. By default, when deploying a Phoenix application, a new BEAM cookie is generated for each deployment. When the new instance rolls out with a new BEAM cookie, the old and new instances will not cluster together. BEAM instances must have the same cookie in order to connect. This is by design.

This means a newly deployed application running in a secondary region using fly_postgres is unable to perform writes to the older application running in the primary region. It is possible for writes to fail during that rollout window.

To prevent this problem, the BEAM cookie can be explicitly set instead of using a randomly generated one for new builds. When explicitly set, the newly deployed application is still able to connect and cluster with the older application running in the primary region.

Here is a guide to setting a static cookie for your project that is written into the code itself. This is fine to do because the cookie isn't considered a secret used for security.

fly.io/docs/app-guides/elixir-static-cookie/

When the cookie is static and unchanged from one deployment to the next, then applications can continue to cluster and access the applications running in primary region.

LSN Polling

The library polls the local database for what point in the replication process it has gotten to. It uses the LSN (Log Sequence Number) to determine that. Using this information, a process making changes against the primary database can request to be notified once the LSN it cares about has been replicated. This enables blocking operations that pause and wait for replication to complete.

The active polling only happens once a process has requested to be notified. When there are no pending requests, there is no active polling. Also, there can be many active pending requests and still there will be only 1 polling process. So each waiting process isn't polling the database itself.

The polling design scales well and doesn't perform work when there is nothing to track.

Backup Regions

By default, Fly.io defines some "backup regions" where your app may be deployed. This can happen even during a normal deploy where the app instance created for running database migrations could come up in a backup region.

The fly_postgres library makes an important assumption: that the app instance is always running in a region with either the primary or replica database.

To make your deployments reliably end up in a desired region, we'll disable the backup regions. Or rather, explicitly set which regions to use as backup regions.

To see the current set of backup regions:

fly regions backup list

If we want to serve 2 regions like lax and syd, then we can set the backup regions like this:

fly regions backup lax syd

This makes the backup regions only use the desired regions.

Ensuring a Deployment in your Primary Region

Currently it is possible to be configured for two regions, scale your app to 2 instances and end up with both app instances in the same region! Clearly, when one app needs to be running in the primary region in order to receive the RPC calls, it's really important that it consistently be there!

A Fly.io config option lets us have more control over that.

fly scale count 2 --max-per-region 1

Using the --max-per-region 1 ensures each region will not get an unbalanced number like 2 in one place.

If your scale count matches up with your desired number of regions, then they will be evenly distributed.

fly_postgres_elixir's People

Contributors

brainlid avatar oo6 avatar rudebono avatar sorentwo avatar tvdfly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fly_postgres_elixir's Issues

Feature request: Repo.transaction_on_primary()

Currently, it is recommended that you change your 'non-distributed' code:

def update(some, params) do
  Myapp.Repo.transaction(fn ->
    # ... complicated work involving DB manipulation here.
  end)
end

into

def update(some, params) do
  Fly.Postgres.rpc_and_wait(__MODULE__, :do_update, [some, params])
end

def do_update(some, params) do
  Myapp.Repo.transaction(fn ->
    # ... complicated work involving DB manipulation here.
  end)
end

However, the following alternative would require much less boilerplate and therefore fit the library's goal as 'easy to introduce, easy to rip out':

def update(some, params) do
  Myapp.Repo.transaction_on_primary(fn ->
    # ... complicated work involving DB manipulation here.
  end)
end

The main required change, would be to amend the fly_rpc library to allow passing an anonymous function (which would desugar to calling e.g. Fly.RPC.rpc(node, :erlang, :apply, [fun, []]).

The one drawback of this change, is that logging an anonymous function is not as insightful as logging a MFA-tuple.
However, inspecting an anonymous function like the one constructed above will still look like:

#Function<3.103361360/1 in YourModule.update/2>

In my opinion, that should be good enough if you need to debug the RPC.

Fly Postgres breaks with Fly.RPC 0.2.1

Hi.
I updated my dependencies and it failed saying that Fly.is_primary?() does not exist.
Its because of a breaking change in Fly RPC:
superfly/fly_rpc_elixir@5f0add3

I updated all instances in my code to Fly.RPC.is_primary?
But it's still failing saying the code in this library also has calls to Fly.is_primary?

E.g. Here:
https://github.com/superfly/fly_postgres_elixir/blob/00378fa89e72cd655ef4397adbeed304bd51ea1b/lib/lsn/lsn_reader.ex#L87C8-L87C25

I haven't checked all the code, but there may be other lines where this is called.
Thanks in advance!

DB DNS URL config assumes running inside a Fly network

Currently, there is an implied dependence on Fly.io databases because of how the DNS is modified so a "nearest" database can found. This helps when an instance is auto-deployed to a backup region other than an expected region where a Postgres primary or replica database can be found.

Backup regions can be disabled, or limited to regions where there are databases present.

The goal with a change is to enable the library to work more easily outside of Fly.io networks.

Configure timeout for RPC calls

Hi @brainlid! 😄
I continue play with the lib. I have some timeout errors on some queries:

19:45:50.563 [error] RPC notification timeout calling Elixir.Brandkit.Repo.Local.insert/2}
19:45:50.566 [debug] QUERY OK db=2.4ms
rollback []
19:45:50.567 [error] GenServer #PID<0.3459.0> terminating
** (stop) time out
    (fly_postgres 0.1.11) lib/fly_postgres.ex:136: Fly.Postgres.rpc_and_wait/4
    ...

It might be a good idea to be able to increase the timeout from a configuration for methods:

def request_and_await_notification(%Fly.Postgres.LSN{source: :insert} = lsn, timeout \\ 5_000) do

or
def await_notification(%Fly.Postgres.LSN{source: :insert} = lsn, timeout \\ 5_000) do

What do you think?

Writes fail (RPC timeouts) whenever the DB is not available in the current region

Attempting to use the Fly repo from any region in which a replica is not yet running, seems to work perfectly fine...

... that is, until any mutating calls (insert, update etc), or calls like Fly.Postgres.rpc_and_wait are made. Instead of transparently falling back to directly using the sole available DB connection (and maybe logging a warning?), these RPC calls will always time out.


At first I thought this was just a 'principle of least surprise' kind of issue that users which follow the guide to the letter would not stumble across. But actually the problematic situation can also arise during normal operation: When the read replica in the local region fails, DB traffic from that region is again transparently rerouted to the primary DB and the same situation occurs!

Tightly bound to Fly.io?

Hello, this looks like a great lib! I have been looking at it and it seems like a pretty elegant solution to multiread/write over high latency deployments. I do understand why its currently tightly integrated to fly_rpc, but I am wondering if there is any appetite to mutate this module into a more general solution that is configurable where fly.io dependency is just one of the backends/configurations possible?

For instance a backend being created for onprem where the write head and locals can be configured and maybe failover via config, and a aws or other PAASs?

I think this could be a more general solution to this type of problem, but as it stands its very bound to fly (don't get me wrong I really like Fly.io but for various reasons it may not be a usable environment for certain projects). Totally understand if there is resistance for this -- I just like this design and it seems like it could be a much more general solution in the toolbox (to the end that I am thinking about reimplementing a similar idea with pluggable backends).

Recent change in flyctl makes url rewrite unnecessary?

This could be a total non-issue, but I couldn't help but notice:

I recently created a multi-region PostgreSQL cluster, and apparently fly pg attach set the host part of the DATABASE_URL to top2.nearest.of.<pg app>.internal (as far as I understand, this was a recent change in flyctl). So the way url rewrite works now, app instance in "primary" region connects to <primary-region>.top2.nearest.of.<pg app>.internal, and instances in other regions connect to top1.nearest.of.top2.nearest.of.<pg app>.internal.

Everything seems to be working fine, but I'm wondering if this new default version of DATABASE_URL makes primary/replica host rewrites unnecessary, or maybe breaks some assumption and potentially introduces a problem?

For example, I have an app with instances in fra and ord, and a pg cluster with leader and one replica in fra and another replica in ord.

Here's what the rewritten hostname resolves to for app instance in fra :

root@3871ba0f:/# dig +short aaaa fra.top2.nearest.of.redexor-pg-cluster.internal
fdaa:0:344e:a7b:23c3:0:71e1:2
fdaa:0:344e:a7b:23c6:0:71e0:2

the fra subdomain seems to have no effect (the third IP is a replica in ord):

root@3871ba0f:/# dig +short aaaa fra.top3.nearest.of.redexor-pg-cluster.internal
fdaa:0:344e:a7b:23c3:0:71e1:2
fdaa:0:344e:a7b:23c6:0:71e0:2
fdaa:0:344e:a7b:2052:0:71e2:2

Same machine, without the rewrite:

root@3871ba0f:/# dig +short aaaa top2.nearest.of.redexor-pg-cluster.internal
fdaa:0:344e:a7b:23c3:0:71e1:2
fdaa:0:344e:a7b:23c6:0:71e0:2

And in ord :

root@01b4e350:/# dig +short aaaa top1.nearest.of.top2.nearest.of.redexor-pg-cluster.internal
fdaa:0:344e:a7b:2052:0:71e2:2

(so apparently top1.nearest.of here - unlike fra - does provide additional filtering?)
and without the rewrite:

root@01b4e350:/# dig +short aaaa top2.nearest.of.redexor-pg-cluster.internal
fdaa:0:344e:a7b:2052:0:71e2:2
fdaa:0:344e:a7b:23c3:0:71e1:2

Maybe I'm missing something - I'm just exploring the fly.io platform and this library - but as far as I understand, the region name is injected into the DB url for primary app instances because we can't rely on hostnames set by older versions of flyctl (that is, something.internal without top2.nearest.of) to consistently resolve to pg instance in primary region - but maybe with the new version, which explicitly asks for the "nearest" pg instance, we can?

And if we can't, then I guess we would need to replace top2.nearest.of with primary region name, rather than prepend?

Confusion regarding db read performance after the switch fo fly_postgres_elixir.

Hello,

I have a test application that is essentially based on Hello-Phoenix (https://github.com/fly-apps/hello-phoenix ).
I initially had this test application including postgres db deployed exclusively in the fra region. Using the classic Repo and running a simple read request (Repo.all(User)), the response time was several hours at an average of 30ms. After I switched the test application to use fly_postgres_elixir and deployed the application including the postgres volumes to the fra and lax regions, the application's read performance deteriorated from 30ms to an average of 300ms. Unfortunately, I cannot explain this performance degradation.

App-Status

vscode ➜ /workspace (main ✗) $ fly status
App
  Name     = billowing-star-1538          
  Owner    = personal                     
  Version  = 13                           
  Status   = running                      
  Hostname = billowing-star-1538.fly.dev  

Deployment Status
  ID          = 2ab52089-3db7-6cc9-5ef8-77c266ffa7fa         
  Version     = v13                                          
  Status      = successful                                   
  Description = Deployment completed successfully            
  Instances   = 2 desired, 2 placed, 2 healthy, 0 unhealthy  

Instances
ID       PROCESS VERSION REGION DESIRED STATUS  HEALTH CHECKS      RESTARTS CREATED   
54bf2e53 app     13      fra    run     running 1 total, 1 passing 0        2h26m ago 
1b8ff1ed app     13      lax    run     running 1 total, 1 passing 0        2h27m ago 

Database Status

vscode ➜ /workspace (main ✗) $ fly status -a hello-phoenix-database

App
Name = hello-phoenix-database
Owner = personal
Version = 3
Status = running
Hostname = hello-phoenix-database.fly.dev

Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
3e6d8469 app 3 lax run running (replica) 3 total, 3 passing 0 3h13m ago
ebaa9d9a app 3 fra run running (replica) 3 total, 3 passing 0 2021-11-01T18:17:40Z
ae18bd04 app 3 fra run running (leader) 3 total, 3 passing 0 2021-11-01T18:17:30Z

log Region fra

2021-11-03T11:47:40.774 app[54bf2e53] fra [info] 11:47:40.774 request_id=FrQG0VRHT3tGIS4AAJkB [info] GET /
2021-11-03T11:47:40.775 app[54bf2e53] fra [info] 11:47:40.775 request_id=FrQG0VRHT3tGIS4AAJkB [info] Sent 200 in 808µs
2021-11-03T11:47:49.533 app[54bf2e53] fra [info] 11:47:49.532 request_id=FrQG015Tpj4V_H8AAJkR [info] GET /users
2021-11-03T11:47:49.838 app[54bf2e53] fra [info] 11:47:49.838 request_id=FrQG015Tpj4V_H8AAJkR [info] Sent 200 in 305ms
2021-11-03T11:47:49.953 app[54bf2e53] fra [info] 11:47:49.953 request_id=FrQG03dit6HuymYAAJkh [info] GET /users
2021-11-03T11:47:50.258 app[54bf2e53] fra [info] 11:47:50.258 request_id=FrQG03dit6HuymYAAJkh [info] Sent 200 in 304ms
2021-11-03T11:47:52.420 app[54bf2e53] fra [info] 11:47:52.419 request_id=FrQG1ApnEQCQShcAAJkx [info] GET /users
2021-11-03T11:47:52.725 app[54bf2e53] fra [info] 11:47:52.725 request_id=FrQG1ApnEQCQShcAAJkx [info] Sent 200 in 305ms
2021-11-03T11:47:54.173 app[54bf2e53] fra [info] 11:47:54.173 request_id=FrQG1HLsVgC4PokAAJlB [info] GET /
2021-11-03T11:47:54.174 app[54bf2e53] fra [info] 11:47:54.174 request_id=FrQG1HLsVgC4PokAAJlB [info] Sent 200 in 871µs

Region lax

2021-11-03T11:44:17.573 app[1b8ff1ed] lax [info] 11:44:17.571 request_id=FrQGogRxGvi3RxQAAFhB [info] GET /
2021-11-03T11:44:17.574 app[1b8ff1ed] lax [info] 11:44:17.572 request_id=FrQGogRxGvi3RxQAAFhB [info] Sent 200 in 1ms
2021-11-03T11:44:17.838 app[1b8ff1ed] lax [info] 11:44:17.836 request_id=FrQGohQ_qQBdPLgAAFhR [info] GET /users
2021-11-03T11:44:18.009 app[1b8ff1ed] lax [info] 11:44:18.006 request_id=FrQGohQ_qQBdPLgAAFhR [info] Sent 200 in 170ms

Connection will fail when deployed to backup region or no replica in current region

Hi!

I'm currently testing this library and am pretty happy with how it works.
There's one issue for me: the selection of the replica to use is pretty naive.

replica_uri = %URI{uri | host: "#{current}.#{uri.host}", port: 5433}

Here, we infer that the region our app is running on has to have a replica available. If not, the DNS lookup will fail and we will get a 502. This is okay under normal circumstances, but fails if we don't have a replica in all of our backup regions as well.

For example, I have the following regions:

❯ flyctl regions list
Region Pool: 
dfw
ewr
fra
Backup Region: 
ams
atl
cdg
iad
ord
vin

I also have my attached pg cluster replicated in dfw, ewr and fra - no problem, they all work fine.
Now, I redeployed and for some reason, vin was used for one of the VMs. Outside of using this library, no problem, my users would still have a good experience. With it, though, our database URL was set to vin.postgres.internal, which does not exist. This can only work if we have a postgres replica in every backup region as well, even if there is no app instance there and it goes completely unused. This is inefficient and wasteful.
I don't know what solution we could implement here, but we should probably find a way to find the closest database instance instead of just assuming that one is available in the same region.

Documentation request: Oban

Hey Mark! Awesome repo! I'm working on adding this to a project and it all seems pretty straightforward except for Oban.

The Problem

For those who may not be familiar, Oban allows you to do job processing with Postgres as its data store. Normally you configure Oban with something like this in your config/config.exs file:

config :my_app, Oban,
  repo: MyApp.Repo,
  plugins: [Oban.Plugins.Pruner],
  queues: [default: 10]

After installing fly_postgres_elixir, Oban fails to start with the following exception:

** (Mix) Could not start application my_app: MyApp.Application.start(:normal, []) returned an error: shutdown: failed to start child: Oban
    ** (EXIT) an exception was raised:
        ** (ArgumentError) expected :repo to be an Ecto.Repo, got: MyApp.Repo
            (oban 2.10.1) lib/oban/config.ex:193: Oban.Config.validate_opt!/1
            (elixir 1.12.3) lib/enum.ex:930: Enum."-each/2-lists^foreach/1-0-"/2
            (oban 2.10.1) lib/oban/config.ex:55: Oban.Config.new/1
            (oban 2.10.1) lib/oban.ex:151: Oban.start_link/1
            (stdlib 3.15.2) supervisor.erl:414: :supervisor.do_start_child_i/3
            (stdlib 3.15.2) supervisor.erl:400: :supervisor.do_start_child/2
            (stdlib 3.15.2) supervisor.erl:384: anonymous fn/3 in :supervisor.start_children/2
            (stdlib 3.15.2) supervisor.erl:1234: :supervisor.children_map/4
            (stdlib 3.15.2) supervisor.erl:350: :supervisor.init_children/2
            (stdlib 3.15.2) gen_server.erl:423: :gen_server.init_it/2
            (stdlib 3.15.2) gen_server.erl:390: :gen_server.init_it/6
            (stdlib 3.15.2) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

If I change the configuration to point to MyApp.Repo.Local then the application will start up, but if I understand correctly this will cause problems down the line. Oban jobs are inserted into the database using Oban.insert/2 which would then try to insert into the local database which may or may not be a read-only replica.

My Solution

I would love feedback on this solution as I'm sure there might be a better way to do it, but I updated my application module to look like this:

defmodule MyApp.Application do
  use Application

  def start(_type, _args) do
    children = [
      {Fly.RPC, []},
      MyApp.Repo.Local,
      {Fly.Postgres.LSN.Tracker, repo: MyApp.Repo.Local},
      ...
      oban(),
      MyAppWeb.Endpoint
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]

    children
    |> Enum.reject(&is_nil/1)
    |> Supervisor.start_link(opts)
  end
  ...
  defp oban do
    if Fly.is_primary?() do
      config = Application.fetch_env!(:my_app, Oban)
      {Oban, config}
    end
  end
end

Then I created a new Background module that uses a macro to wrap all the Oban functions so that they will all be called against the primary region every time.

defmodule MyApp.Background do
  @moduledoc """
  This module serves as a wrapper around Oban so that it will work properly
  with `Fly.Repo` instead of an `Ecto.Repo`.
  """

  for {func, arity} <- Oban.__info__(:functions), func not in [:child_spec, :init, :start_link] do
    args = Macro.generate_arguments(arity, __MODULE__)

    @doc """
    See documentation for Oban.#{func}/#{arity}
    """
    def unquote(func)(unquote_splicing(args)) do
      if Fly.is_primary?() do
        Oban.unquote(func)(unquote_splicing(args))
      else
        Fly.RPC.rpc_region(Fly.primary_region(), Oban, unquote(func), unquote(args))
      end
    end
  end
end

Am I overlooking a simpler solution? I was thinking about making a video about how to do this, but I was wondering if it might make sense to have it live in the official documentation to make it easier for others to find. I also considered making the Background module a supervisor and have it determine whether or not to start the Oban process so that everything would be contained within that one module. Then I think it could maybe make sense to release it as a separate library if that would have value to others. I wasn't sure if it made sense to do that considering the relative simplicity of it. But I would love to know others' thoughts.

opentelemetry_ecto does not write traces when using fly_postgres

I attempted to add opentelemetry_ecto to a project using fly_postgres and am getting no errors, but also no traces. When I am not using fly_postgres, I get the traces I expect. My hope is that this is easy to get working; however, if that is not the case, if you could give me an idea of how to approach adding the functionality I would appreciate it immensely. Thanks so much.

Lib crashes on startup without `base_name` option provided in childspec

Due to this change - https://github.com/superfly/fly_postgres_elixir/blob/main/lib/lsn/lsn_tracker.ex#L53 the Fly.Postgres.LSN.Tracker process now requires a base_name parameter on startup as there is no default value. Without this provided, the application referencing the library will fail to start up.

Expected result

Error logs

16:44:43.747 [notice] Application App exited: Explorer.Application.start(:normal, []) returned an error: shutdown: failed to start child: Fly.Postgres.LSN.Tracker
    ** (EXIT) an exception was raised:
        ** (KeyError) key :base_name not found in: [repo: App.Repo.Local]
            (elixir 1.13.4) lib/keyword.ex:559: Keyword.fetch!/2
            (fly_postgres 0.3.0) lib/lsn/lsn_tracker.ex:53: Fly.Postgres.LSN.Tracker.start_link/1
            (stdlib 3.17.2) supervisor.erl:414: :supervisor.do_start_child_i/3
            (stdlib 3.17.2) supervisor.erl:400: :supervisor.do_start_child/2
            (stdlib 3.17.2) supervisor.erl:384: anonymous fn/3 in :supervisor.start_children/2
            (stdlib 3.17.2) supervisor.erl:1250: :supervisor.children_map/4
            (stdlib 3.17.2) supervisor.erl:350: :supervisor.init_children/2
            (stdlib 3.17.2) gen_server.erl:423: :gen_server.init_it/2

Error running migrations on version 2.0

Hi! :D

I am trying to update the package to the latest version.

When I run the migrations (with an update query) using the new settings, I get the following error:

❯ MIX_ENV=test mix ecto.migrate
Compiling 1 file (.ex)

18:43:07.232 [info]  == Running 20220123222023 MyApp.Repo.Local.Migrations.AddCodeToAlbums.up/0 forward

18:43:07.236 [info]  alter table albums

18:43:07.305 [debug] QUERY OK source="albums" db=2.0ms
UPDATE "albums" AS a0 SET "code" = $1 [<<1, 127, 35, 99, 86, 236, 122, 124, 253, 61, 216, 142, 130, 189, 151, 143>>]
** (ArgumentError) errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table

    (stdlib 3.17) :ets.lookup(:"lsn_tracker_ets_cache_Elixir.Fly.Postgres.LSN.Tracker", :repo)
    (fly_postgres 0.2.0) lib/lsn/lsn_tracker.ex:47: Fly.Postgres.LSN.Tracker.get_repo/1
    (fly_postgres 0.2.0) lib/fly_postgres.ex:173: Fly.Postgres.__rpc_lsn__/4
    (fly_postgres 0.2.0) lib/fly_postgres.ex:150: Fly.Postgres.rpc_and_wait/4
    priv/repo/migrations/20220123222023_add_code_to_albums.exs:16: MyApp.Repo.Local.Migrations.AddCodeToAlbums.up/0
    (ecto_sql 3.7.1) lib/ecto/migration/runner.ex:279: Ecto.Migration.Runner.perform_operation/3
    (stdlib 3.17) timer.erl:166: :timer.tc/1
    (ecto_sql 3.7.1) lib/ecto/migration/runner.ex:25: Ecto.Migration.Runner.run/8

I don't understand why this error appears only when running migrations.
Do I have to load the app before running the migrations?

I added the following line on my application.ex

children =
      [
        {Cluster.Supervisor, [topologies, [name: MyApp.ClusterSupervisor]]},
        {Fly.RPC, []},
        MyApp.Repo.Local,
        {Fly.Postgres.LSN.Tracker, repo: MyApp.Repo.Local},
       ...
      ]

I'm missing something?

Thanks!

Do I need a replica in every backup region?

When fly deploys my app to a backup region I get a connection error because I only have databases for every region in the non-backup region pool.

It seems to me this library could either have a mapping of "nearby" regions and fallback to a nearby database if there isn't a replica in the exact region, or take control of creating replicas itself.

Is there something obvious I'm missing that makes this not a big deal?

Queries are still made in a failed transaction, leading to errors being thrown

Description

When an sql transaction fails, fly_postgres_elixir will still try to call select CAST(pg_current_wal_insert_lsn() AS TEXT) in the context of the transaction, resulting in the error

** (Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction) current transaction is aborted, commands ignored until end of transaction block

This results in the process crashing and prevents the error response result of an Ecto.Multi from being made available to the application code and being handled appropriately.

Expected result

  • Transactions / Ecto.Multi works as with default App.Repo.Local on the primary
    • Ecto.Multi returns with a standard {:error, stage, error_response, changes_so_far} tuple
  • The library doesn't try to wait for lsn on a failed transaction

Actual result

  • Postgrex error

Notes

Failed test output

  2) test register_owner/1 returns error if owner already exists (app.AdminTest)
     test/app/admin_test.exs:36
     ** (Postgrex.Error) ERROR 25P02 (in_failed_sql_transaction) current transaction is aborted, commands ignored until end of transaction block

         query: select CAST(pg_current_wal_insert_lsn() AS TEXT)
     code: assert {:error, changeset} = Admin.register_owner(@valid_registration_params)
     stacktrace:
       (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
       (fly_postgres 0.2.6) lib/lsn/lsn.ex:67: Fly.Postgres.LSN.current_wal_insert/1
       (fly_postgres 0.2.6) lib/fly_postgres.ex:192: Fly.Postgres.__rpc_lsn__/4
       (fly_postgres 0.2.6) lib/fly_postgres.ex:169: Fly.Postgres.rpc_and_wait/4
       (ecto 3.7.1) lib/ecto/multi.ex:716: Ecto.Multi.apply_operation/5
       (elixir 1.12.3) lib/enum.ex:2385: Enum."-reduce/3-lists^foldl/2-0-"/3
       (ecto 3.7.1) lib/ecto/multi.ex:690: anonymous fn/5 in Ecto.Multi.apply_operations/5
       (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:1013: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
       (db_connection 2.4.0) lib/db_connection.ex:1512: DBConnection.run_transaction/4
       (ecto 3.7.1) lib/ecto/repo/transaction.ex:20: Ecto.Repo.Transaction.transaction/4
       (explorer 0.0.1) lib/app/admin.ex:69: app.Admin.register_owner/1
       test/app/admin_test.exs:38: (test)

Can't start the local repo locally

Hi 👋

I'm trying to add read replicas to my application with this library but I'm struggling to make it work locally.

I'm starting the LocalRepo in my application.ex.

children = [
      # Start the RPC server
      {Fly.RPC, []},
      # Start the Ecto repository
      Homepage.LocalRepo,
      # Start the tracker after your DB.
      {Fly.Postgres.LSN.Tracker, []},
     ...
    ]

The LocalRepo looks like this:

defmodule Homepage.LocalRepo do
  use Ecto.Repo,
    otp_app: :homepage,
    adapter: Ecto.Adapters.Postgres

  # Dynamically configure the database url based on runtime environment.
  def init(_type, config) do
    {:ok, Keyword.put(config, :url, Fly.Postgres.database_url())}
  end
end


defmodule Homepage.Repo do
  use Fly.Repo, local_repo: Homepage.LocalRepo
end

But it seems that calling Fly.Postgres.database_url() requires to set the variables: PRIMARY_REGION and FLY_REGION.
Which is a bit odd given that those variables are set by Fly.

If I run mix phx.server, I get: (ArgumentError) could not fetch environment variable "PRIMARY_REGION" because it is not set

I only have config :fly_postgres, :rewrite_db_url, true set in runtime.exs.

Maybe I'm missing something, do you have any suggestion ?

Thank you.

Maybe a bug: Trying to connect to nearest.of.localhost

Hi.
We updated fly_postgres to 0.2.0 and ran into a problem.
When runing mix ecto.reset locally I got:

09:16:33.228 [error] GenServer #PID<0.436.0> terminating
** (DBConnection.ConnectionError) tcp connect (top2.nearest.of.localhost:5432): non-existing domain - :nxdomain
    (db_connection 2.4.1) lib/db_connection/connection.ex:100: DBConnection.Connection.connect/2
    (connection 1.1.0) lib/connection.ex:622: Connection.enter_connect/5
    (stdlib 3.17) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: nil
State: Postgrex.Protocol

The problem was that it was trying to connect to top2.nearest.of.localhost:5432 instead of localhost:5432.

I made it work locally by adding an alias in /etc/hosts

127.0.0.1.   top2.nearest.of.localhost:5432

But I think this is not the expected behaviour when running the app locally. Am I right?

Please let me know if you need any more information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.