Giter Site home page Giter Site logo

jarvusinnovations / lapidus Goto Github PK

View Code? Open in Web Editor NEW
171.0 14.0 22.0 241 KB

Stream your PostgreSQL, MySQL or MongoDB databases anywhere, fast.

License: MIT License

JavaScript 100.00%
postgresql logical replication cdc streaming mysql mongodb event source

lapidus's Introduction

Lapidus

Lapidus

Build Status Coverage Status Join the chat at https://gitter.im/JarvusInnovations/lapidus

Getting Started

Currently MySQL and PostgreSQL databases are fully supported. MongoDB supports inserts and deletes, however, updates return a copy of the operation (for example, a $rename operation will return a $set for the new field and an $unset for the old field) instead of the object as it exists in the database. Redis support is on the way. Lapidus can currently be used as a daemon or Node.js module. Support for piping line-delimited JSON to other processes is a high priority.

To install

npm install -g lapidus

PostgreSQL

You'll need PostgreSQL 9.4 or higher with logical replication configured and the JSONCDC plugin installed and loaded. Any PostgreSQL fork that ships with pg_recvlogical should be compatible.

To install the JSONCDC logical decoding plugin using pgxn:

sudo easy_install pgxnclient
pgxn install jsoncdc --testing

NOTE: JSONCDC also provides .debs inside their releases repo, if you wish to not install through pgxn.

To enable logical decoding and the JSONCDC plugin add the following lines to your postgresql.conf:

wal_level = logical
max_wal_senders = 10
max_replication_slots = 10

shared_preload_libraries = 'jsoncdc'

Create a user with replication privileges and add them to your pg_hba.conf file.

Afterwards, restart PostgreSQL and verify that it starts correctly:

service postgresql restart
service postgresql status

WARNING: PostgreSQL will hold onto the WAL logs until all logical replication slots have consumed their data. This means that if you try out Lapidus and fail to delete your slot that you'll likely run out of disk space on your system.

For information on managing replication slots: consult the documentation.

MySQL

You'll need MySQL 5.1.15 or higher with binary logging configured. Forks of MySQL should be compatible but have not been tested.

Add the following lines to your my.cnf:

server-id        = 1
log_bin          = /var/log/mysql/mysql-bin.log
max_binlog_size  = 100M  # WARNING: make sure to set this to a sane value or you may fill your disk
expire_logs_days = 10    # Optional
binlog_format    = row

Create a user with replication permissions and select permissions:

GRANT REPLICATION SLAVE, REPLICATION CLIENT, SELECT ON *.* TO 'lapidus'@'localhost' IDENTIFIED BY 'secure-password';

Restart MySQL and verify that it starts correctly:

service mysql restart
service mysql status

MongoDB

We test against MongoDB 3.x, however, older versions should work. You'll need to setup MongoDB as a replica set. If you're not truly using replication during development you will need to connect and run:

// DO NOT DO THIS IN PRODUCTION
rs.initiate()
db.getMongo().setSlaveOk();

For more information on setting up replication in MongoDB check out the docs.

Configuration

Lapidus will search for lapidus.json in the current working directory. You can specify a different configuration by passing it to the constructor or using the -c flag on the terminal. For a list of command line options run lapidus --help.

Here is a sample configuration file that will connect to two PostgreSQL backends, two MySQL backends, one MongoDB backend and publish all events to NATS using the NATS plugin:

{
  "backends": [
    {
      "type": "mysql",
      "hostname": "127.0.0.1",
      "username": "jacob",
      "database": "jacob",
      "password": "2PQM9aiKMJX5chv76gYdFJNi",
      "serverId": 1,
      "excludeTables": [
        "sessions"
      ]
    },

    {
      "type": "mysql",
      "hostname": "127.0.0.1",
      "username": "sawyer",
      "database": "sawyer",
      "password": "2PQM9aiKMJX5chv76gYdFJNi",
      "serverId": 2
    },

    {
      "type": "postgresql",
      "host": "127.0.0.1",
      "user": "lapidus",
      "database": "lapidus",
      "password": "2PQM9aiKMJX5chv76gYdFJNi",
      "slot": "lapidus_slot"
    },

    {
      "type": "postgresql",
      "host": "127.0.0.1",
      "user": "hurley",
      "database": "hurley",
      "password": "2PQM9aiKMJX5chv76gYdFJNi",
      "slot": "hurley_slot"
    },

    {
      "type": "mongo",
      "hostname": "127.0.0.1",
      "username": "lapidus",
      "database": "lapidus",
      "password": "2PQM9aiKMJX5chv76gYdFJNi",
      "replicaSet": "rs0"
    }
  ],
  "plugins": {
    "nats": {
      "server": "nats://localhost:4222"
    }
  }
}

Plugins

NATS

Lapidus ships with a lightweight NATS plugin. NATS is an open-source, high-performance, lightweight cloud native messaging system.

###Configuration

Publish to NATS for all backends (one connection per backend):

{
  "backends": [...]
  ],
  "plugins": {
    "nats": {
      "server": "nats://localhost:4222"
    }
  }
}

Publish to NATS for a specific backend:

{
  "backends": [
    {
      "type": "postgresql",
      "host": "the.hatch",
      "user": "desmond",
      "database": "darma",
      "password": "notpennysboat123",
      "slot": "walts_raft",

      "plugins": {
        "nats": {
          "server": "nats://localhost:4222"
        }
      }
    }
  ]
}

Events

Insert, Update and Delete events will be published using the subject schema.table.pk. Here are examples events:

Insert:

// TODO: sample insert

Update:

// TODO: sample update

Delete:

// TODO: sample delete

Production usage

Lapidus worked well for 5k concurrent clients on $5-10 Digital Ocean droplets using the MySQL and PostgreSQL backends and per-tenant socket.io servers using the NATS plugin. Typical latency between MySQL -> Lapidus -> NATS within a datacenter was 1ms - 3ms. Since our messages were fire-and-forget, we never solved for the gotchas below and our end-user lag and CPU usage were neglible . Since NATS will disconnect slow consumers there was good tenant isolation without much work.

Gotchas

To prevent a deluge of events during bulk loading of data or restoration of backups make sure your procedures/scripts stop Lapidus and/or delete/reset any persistent subscriptions or message queues external to Lapdidus as needed.

Resource requirements

CPU

CPU usage is light, as a rule of thumb, measure your peak MySQL CPU usage (after enabling binary logging) and multiply that by 0.075. That's how much CPU Lapidus is likely to use at peak.

Memory

Generally speaking, each worker requires 10-15 MB of ram.

Your peak memory usage is dictated by V8's garbage collection. When running the TPC-C benchmark against the MySQL worker using 8 cores memory sat around 70MB and peaked at 120MB before garbage collection knocked it back down to 70MB.

I tested for memory leaks by running 5 million transactions using the TPC-C benchmark and things look pretty solid, if you notice any issues please report them.

License

Lapidus is MIT licensed. The artwork in the header is Copyright Matt Greenholt.

Contributors

Matt Greenholt has kindly allowed the use of his artwork. Check out his blog and flickr.

lapidus's People

Contributors

dependabot[bot] avatar jmealo avatar timwis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lapidus's Issues

Oracle support?

This project looks insanely awesome and I can't wait to try it out. Did you happen to come across any oracle "logminer" parsers in the process of building this?

Production ready MySQL

  • Zongji error handling improvements
  • Zongi connection handling
  • Zongi should write the last known binlog position in a resilient way that protects against partial writes enabling Lapidus to start at the last known good position.
  • When configured, Lapidus should gracefully read the last known binlog position and pass that as the position to start()
  • Write tests for connection related error handling and events
  • Write tests to verify "auto-resume" works properly

Outdated NPM version

Just realised that the latest NPM package version and the master branch aren't aligned and there's significant changes & bugfixes between the 2 versions.

Would it be possible to push a new NPM version or at least make clear (with tags) which version is deployed?

Production Ready PostgreSQL

  • Write test(s) to verify that spawned pg_recvlogical processes die immediately if Node exits.
  • Find way to get the LSN for each event (this will have to happen in the logical decoding plugin)
  • After each event write the latest LSN to file in such a way that a partial write will allow Lapidus to start at the last known good position.
  • When configured, Lapidus should gracefully read the last known LSN from file and use that as the starting position when spawning pg_recvlogical
  • If pg_recvlogical dies, Node should respawn it
  • Write tests for connection related error handling and events
  • Write test suite for logical decoding plugins to verify all built-in data types work as expected; this should be plugin agnostic and be architected in such a way that it can be used to compare multiple plugins for accuracy/performance comparison and regression tests.

Before and after update record

Is it possible to send event data before and after update record in the table? So we can only get the value column changes.
Thanks

Support schema change notifications and introspection reloading from PostgreSQL

This issue serves as a discussion for adding schema change notifications and introspection reloading support to PostgreSQL.

Relevant documentation pages:

The following event triggers seem like a good start for most introspection/schema tasks.

ALTER TABLE
ALTER SEQUENCE
ALTER SCHEMA
ALTER TABLE
ALTER VIEW

CREATE SCHEMA
CREATE SEQUENCE
CREATE TABLE
CREATE TABLE AS
CREATE VIEW

DROP SCHEMA
DROP SEQUENCE
DROP TABLE
DROP VIEW

GRANT

It would be trivial to support all events and push filtering into the consumer or using code generation.

It appears that the plpgsql example shows a good path forward as we can emit generic WAL messages easily.

I think it makes sense to specify an introspection function or materialized view and either refresh that view or emit the output of the function as a generic WAL message.

Optionally, we could emit the event trigger as a generic WAL message or NOTIFY and allow processing else where.

The question becomes whether we package schema reloading as an installable extension or suggest that this functionality is pushed into jsoncdc.

Pinging relevant parties on the following issues:

Readme: Link to advice on how to protect against/recover from WAL slot deletion failure

This project sounds interesting. However, the current readme's warning:

WARNING: PostgreSQL will hold onto the WAL logs until all logical replication slots have consumed their data. This means that if you try out Lapidus and fail to delete your slot that you'll likely run out of disk space on your system.

sounds like my server could easily go on a rampage if the process using lapidus crashes or is killed for whatever reason. The readme should either explain how that's a wrong interpretation, or give a link to advice on how to prevent this scenario (e.g. monitor and auto-fix).
The link to the manual for general WAL slot management doesn't currently look like it answers that particular concern.

Known Issues

Should fix

  • MySQL worker doesn't attempt to reconnect if the initial connection times out (add retry behavior/configurable timeout)
  • Event objects are not immutable

Nice to have

  • zstd compression is not supported for MySQL (this would need to be added to zongji)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.