Giter Site home page Giter Site logo

benbjohnson / litestream Goto Github PK

View Code? Open in Web Editor NEW
10.0K 89.0 222.0 1.01 MB

Streaming replication for SQLite.

Home Page: https://litestream.io

License: Apache License 2.0

Go 98.78% Makefile 0.54% HCL 0.09% PowerShell 0.15% Dockerfile 0.16% Python 0.28%
sqlite replication s3

litestream's Introduction

Litestream GitHub release (latest by date) Status GitHub Docker Pulls test

Litestream is a standalone disaster recovery tool for SQLite. It runs as a background process and safely replicates changes incrementally to another file or S3. Litestream only communicates with SQLite through the SQLite API so it will not corrupt your database.

If you need support or have ideas for improving Litestream, please join the Litestream Slack or visit the GitHub Discussions. Please visit the Litestream web site for installation instructions and documentation.

If you find this project interesting, please consider starring the project on GitHub.

Acknowledgements

While the Litestream project does not accept external code patches, many of the most valuable contributions are in the forms of testing, feedback, and documentation. These help harden software and streamline usage for other users.

I want to give special thanks to individuals who invest much of their time and energy into the project to help make it better:

Huge thanks to fly.io for their support and for contributing credits for testing and development!

Contribution Policy

Initially, Litestream was closed to outside contributions. The goal was to reduce burnout by limiting the maintenance overhead of reviewing and validating third-party code. However, this policy is overly broad and has prevented small, easily testable patches from being contributed.

Litestream is now open to code contributions for bug fixes only. Features carry a long-term maintenance burden so they will not be accepted at this time. Please submit an issue if you have a feature you'd like to request.

If you find mistakes in the documentation, please submit a fix to the documentation repository.

litestream's People

Contributors

asg017 avatar benbjohnson avatar btoews avatar ekristen avatar evanphx avatar hifi avatar josegonzalez avatar kalafut avatar lstoll avatar schnz avatar testwill avatar tydavis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

litestream's Issues

Errors in replication?

Hi, getting this when running replicate

/home/xxx/datadrive/code/nzquarantine/storage.db: sync error: create generation: initialize shadow wal: read header: EOF

/home/xxx/datadrive/code/nzquarantine/storage.db: sync: new generation "21b638752c28dcb5", wal overwritten by another process

/home/xxx/datadrive/code/nzquarantine/storage.db(s3): snapshot: creating 21b638752c28dcb5/00000000 t=56.330497ms

Command run: litestream replicate storage.db s3://xxx

I am able to restore but it is missing latest data.

Tested with AWS S3 on Linux 64bit. Sqlite being used by Python standard library, no Sqlachemy or anything

Only using a single python process that is singlethreaded.

Container Image

Ciao!

I see some issues regarding running litestream as a sidecar container (as an example): #29

But I couldn't find a Dockerfile or any other container image definition in the repo. Do you have plans to create it along with publishing the container image in a public registry?

Thanks!

cannot verify wal state: ...-wal: no such file or directory

I'm testing out a litestream setup on k8s, where a python app container and litestream container share the db via a local volume. The db is created via an init container that just runs CREATE TABLE before either of them start.

Here is what init looks like:

litestream v0.3.2
initialized db: /db/hello.sqlite3
replicating to: name="s3" type="s3" bucket="..." path="hello.sqlite3" region=""
litestream /db/hello.sqlite3: init: cannot determine last wal position, clearing generation (primary wal header: EOF)
litestream /db/hello.sqlite3: sync: new generation "b7852795bc0c7d42", no generation exists

Afterwards, I get no logs until I run a query through the app (read or write). Then, I get a steady stream of litestream /db/hello.sqlite3: sync error: cannot verify wal state: stat /db/hello.sqlite3-wal: no such file or directory until I shut down litestream. I've tried issuing the wal pragma from the app's connection and enabling autocommit mode, but neither seem to make a difference.

Similar to #58, I'm wondering if I'm doing something wrong or whether this is a litestream bug. If you think it's the latter, I can try to work on a simpler repro (I haven't tried it locally yet).

Replicate only on WAL changes

Thanks again for this software!

I'm using litestream with a service that has infrequent database writes (like a handful of times per day).

I noticed my AWS dashboard reporting many PUTs and revisited the documentation and realized that litestream replicates the WAL every 10s.

A nice to have feature would be if litestream skips the WAL replication if no local changes have occurred since last sync.

An even nicer feature for my scenario would be a "sync only on write" mode where instead of replicating every N seconds, it replicates immediately after each change to the WAL, but otherwise does not replicate.

Low priority since my understanding is that even 300k S3 PUTs is only ~$1.50/month, but it'd be cool if litestream could run completely in the free tier for infrequent write scenearios.

Replicate whole directories

This is a great project! Would it be possible to configure replication for whole directories with multiple database files, with arbitrary number of nested directories?

Client libraries for replication (iOS, etc)

Hi,

Thanks for releasing such an interesting and exciting project. I'm curious if you have any input on whether this project is suitable for a single-writer, multi-reader project for replicated data to mobile devices for offline use. I know the question has been asked before about using this for replication, but mostly as relates to clustering or data at edge use cases.

I've done this in the past at the API layer but found there to be performance and reliability challenges. Wondering if simply feeding the device as a read replica would be simpler and easier to deliver.

2-Way Replication to S3

Hey, this is awesome!

As a potential enhancement, would it be possible to periodically watch the S3 bucket for updates and restore whenever a new version is published?

This would be excellent for low-load distributed systems. Multiple nodes would be able to write to their local SQLite database, this is pushed up to the S3 bucket and then litestream sees this and updates the local SQLite databases on all of the other nodes within that system.

Document how to configure Promethus exporter port.

Thanks for opensourcing this project!

I have an existing Prometheus exporter on port 9090 so I need to change it.

❯ ./dist/litestream replicate -config ../litestream_testbed/config.yml
litestream (development build)
initialized db: /home/darreng/code/litestream_testbed/sourcedb/db.sqlite
replicating to: name="file" type="file" path="/home/darreng/code/litestream_testbed/replicadb/db.sqlite"
serving metrics on http://localhost:9090/metrics
cannot start metrics server: listen tcp :9090: bind: address already in use

I modified my config to include the "addr" setting and I was able to change the Prometheus metrics port.

addr: 0.0.0.0:9091
dbs:
  - path: /home/xxx/code/litestream_testbed/sourcedb/db.sqlite
    replicas:
      - path: /home/xxx/code/litestream_testbed/replicadb

Thanks

Darren

Parallelize restore

Currently, WAL files are loaded/read/applied sequentially during restore. We could parallelize some of the processing (transferring over the network, decompression, etc) so restores can happen faster.

Disconnect DB if init() fails

The DB.init() function can fail because of a write lock on the database, however, the DB.db field is assigned the database so it will not try to reinit:

litestream/db.go

Lines 384 to 389 in f2d3acc

// Connect to SQLite database & enable WAL.
if db.db, err = sql.Open("sqlite3", dsn); err != nil {
return err
} else if _, err := db.db.Exec(`PRAGMA journal_mode = wal;`); err != nil {
return fmt.Errorf("enable wal: %w", err)
}

The check in DB.init():

litestream/db.go

Lines 359 to 362 in f2d3acc

// Exit if already initialized.
if db.db != nil {
return nil
}

The db field should only be assigned if init() completes successfully.

See also: #58

Small typo in tutorial

In the Continuous replication section you insert a blue grape:

INSERT INTO fruits (name, color) VALUES ('grape', 'blue');

But you get back a purple one when restoring fruits3.db:

SELECT * FROM fruits;
apple|red
banana|yellow
grape|purple

Nothing major of course, but it confused me for a second.

Thanks for this project, it looks really useful and simple to use.

Rules for how to develop an application with a litestream-able SQLite database

Hello,

First of all, thank you for this great project! I have never used SQLite in a position where I would have a live replication set up, so I wanted to have a hands-on test scenario to see how the tool works. Unfortunately, I cannot get it to work in a meaningful way with what I have tried so far:

  • Brave history
  • Firefox history

I see lots of different errors. I understand fully that what I am trying to do is... questionable. But I would like to understand how the tool works under some kind of load and I don't have any so far. What are the rules that I need to follow in my application to make sure litestream replication does not fail in all the ways I have seen it fail?

So far I can see these rules:

  • Do not lock the database file (Brave and FF)
  • Enable WAL, journal is not enough (Brave, though there seems to be a History-wal file)
  • Do not truncate the WAL on shutdown or restart (or at all?) (FF)

With Brave:

litestream v0.3.2
initialized db: /Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History
replicating to: name="s3" type="s3" bucket="mybucket" path="Brave-History.sqlite" region=""
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync error: enable wal: database is locked
...
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync error: ensure wal exists: no such table: _litestream_seq
...
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync: new generation "d051be38c06dbfe0", no generation exists
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History(s3): snapshot: creating d051be38c06dbfe0/00000000 t=190.91946ms
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync error: cannot verify wal state: stat /Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History-wal: no such file or directory
...

With Firefox

$ litestream replicate places.sqlite s3://mybucket/Firefox-History.sqlite
litestream v0.3.2
initialized db: /Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite
replicating to: name="s3" type="s3" bucket="mybucket" path="Firefox-History.sqlite" region=""
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: enable wal: database is locked
...
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: _litestream_lock: no such table: _litestream_lock
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: ensure wal exists: no such table: _litestream_seq
...
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: cannot verify wal state: stat /Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite-wal: no such file or directory
...
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync: new generation "bdf080e3a79aab8f", wal truncated by another process
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite(s3): snapshot: creating bdf080e3a79aab8f/00000000 t=15.509471996s
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: checkpoint: mode=PASSIVE err=disk I/O error: errno 260
...

Grafana dashboard

Ciao!

I am trying this project and I just discovered that it exposes some (Prometheus) metrics. So I looked at the repo to find a Grafana dashboard but I didn't found any.

I jumped into the task of creating one (simple):

Captura de pantalla 2021-02-26 a las 10 12 04

I'm sure it can be improved, but if you consider it is a good starting point, I can share it with you with a PR.

Thanks!

Allow replica URL for subcommands

Currently, running the litestream CLI's subcommands reference DB paths so they require a configuration file. However, it would be useful to only specify a URL path so that users can pull down a replica locally without a config file.

For example, restoring a replica from S3:

$ litestream restore -o db s3://mybkt/path/to/db

32-bit ARM Support

Ran compilation as per instructions:

/db.go:1374:46: constant 9223372036854775807 overflows int
./db.go:1376:22: constant 9223372036854775807 overflows int
./db.go:1699:3: constant 9223372036854775807 overflows int
./replica.go:999:14: constant 9223372036854775807 overflows int

Which I think is due to this . I attempted a patch, but am not a Go hacker, and got stuck !!

Linux raspberrypi 5.4.79-v7l+ #1373 SMP Mon Nov 23 13:27:40 GMT 2020 armv7l GNU/Linux
go version go1.15.8 linux/arm

ARM64 Build

Hi,

Great project! I’m dreaming up use cases for it already...

It would be great if this project had ARM64 Linux builds available. This could potentially be quite easy with gox: https://github.com/mitchellh/gox

Thanks!

Google Cloud Storage Support?

Hello! This is a super-awesome project, you're inspiring me to build more of my sideprojects on SQLite. Thank you for making it!

Are there any plans for Google Cloud/GCS support in the future?

Thanks again for the great project!

Write Docker guide

Litestream should already work inside Docker but testing & documentation are needed.

Live read replicas

Currently, litestream replicas a database to cold storage (file or s3). It would be nice to provide a way to replicate streams of WAL frames to other live nodes to serve as read replicas.

This a network replication endpoint (probably http) as well as figuring out how to apply WAL writes to a live database.

Document NFS usage

While SQLite recommends against NFS usage because of broken lock implementations, it seems that locking has improved in NFS v4 & v4.1. If SQLite can work on recent NFS versions, it could be interesting to run SQLite on a shared NFS mount for lower traffic clusters of machines. For example, implementing a work queue across a cluster of machines.

Snapshot Interval

Currently, litestream performs a snapshot when the retention period rolls over but this means that a long retention period can extend restore times. It would be useful to separate the retention interval from the snapshot interval so a user can have multiple snapshots within their retention period (e.g. snapshot every day, keep for one week).

Bucket Autocreation

Currently, you must create a bucket on an S3-compatible store before replicating. This adds an extra step that could be handled by litestream if it finds that a bucket doesn't exist. This would simplify the Getting Started page as well.

Files replication

The design of lite stream is pretty nice

I am planning to kick the tires to use is in a master slave setup with minio

one thing that I am curious about is files being replicated as I have to also be able to store files and replicate changes and minio is excellent at that but I was more curious about if SQLite can store files ?

mit could be an anti pattern too so tell if you think it’s a bad idea

Disable fsync() during restore

Databases are restored to a temporary file so there is no benefit to syncing each WAL checkpoint to disk until the restore is complete.

Log WAL frame header checksum mismatch errors

The test server occasionally sees a sync error: checksum mismatch which indicates that a page is not fully written. It causes the sync to fail momentarily but the subsequent sync a second later succeeds. Currently, this shows as an error but should be changed to a log notification as it is temporary.

replicate ignores -config flag when database is specified

Example command

litestream replicate -config /home/mike/litestream.yml "${PWD}/data/store.db" s3://scratch.tinypilotkvm.com/db

For the command above, litestream silently ignores the -config flag, which caught me by surprise. The issue seems to be here:

https://github.com/benbjohnson/litestream/blob/f652186adf7470256b6a7bf7e96194dde49c2af2/cmd/litestream/replicate.go#L43L61

It only reads the config flag if it's the only argument, like:

litestream replicate -config /home/mike/litestream.yml

Desired behavior

If I've specified a -config file but litestream is not using it, I'd like litestream to error out or print a warning.

It'd also be handy if litestream allowed me to specify a db, and then if there's only one replica URL in my config file, it just uses that one, so I could say:

litestream replicate -config /home/mike/litestream.yml "${PWD}/data/store.db"

And then if my config file had an entry for store.db, litestream just picks the first replica URL.

Enforce maximum checkpoint index

The index is encoded as a uint32 in the shadow WAL filename which restricts the maximum value to 2^32. Theoretically, with 4MB WAL files you could reach this limit after writing 16 petabytes of WAL data without a new generation.

We should enforce a new generation after a maximum index value is reached.

Monitor error - InvalidArgument, status code: 400

Hi !

First of all, thank you for you great project. I am really exited to start new projects with SQLite.
I tried your last commits (d6ece0b) on master in order to upload generations into Google Cloud Storage.

Everything worked correctly until I tried to import a ~10MB CSV dataset into SQLite.

yann@xps:~/Projects/Perso/poc-litestream$ sqlite3 test.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .import ../Shakespeare_data.csv shakespear
_________________________________________
yann@xps:~/Projects/Perso/poc-litestream$ litestream replicate -config ~/Golang/bin/litestream.yml
litestream (development build)
initialized db: /home/yann/Projects/Perso/poc-litestream/test.db
replicating to: name="s3" type="s3" bucket="test-litestream-regional" path="db" region="europe-west1" endpoint="https://storage.googleapis.com" sync-interval=10s
/home/yann/Projects/Perso/poc-litestream/test.db(s3): monitor error: InvalidArgument: Invalid argument.
        status code: 400, request id: , host id:
/home/yann/Projects/Perso/poc-litestream/test.db(s3): monitor error: InvalidArgument: Invalid argument.
        status code: 400, request id: , host id:
_________________________________________
yann@xps:~/Projects/Perso/poc-litestream$ tree -ah
.
├── [9.3M]  test.db
├── [4.0K]  .test.db-litestream
│   ├── [  17]  generation
│   └── [4.0K]  generations
│       └── [4.0K]  310573930e9594aa
│           └── [4.0K]  wal
│               ├── [197K]  00000006.wal
│               ├── [9.4M]  00000007.wal
│               └── [4.1K]  00000008.wal
├── [ 32K]  test.db-shm
└── [9.4M]  test.db-wal

4 directories, 7 files

Here is my configuration:

dbs:
 - path: /home/yann/Projects/Perso/poc-litestream/test.db
   replicas:
    - type: s3
      bucket: test-litestream-regional
      path: db
      endpoint: https://storage.googleapis.com
      forcePathStyle: true
      region: europe-west1
yann@xps:~/Projects/Perso/poc-litestream$ env | grep AWS
AWS_SECRET_ACCESS_KEY=<<REDACTED>>
AWS_ACCESS_KEY_ID=<<REDACTED>>

I wonder if this issue is GCP related of it is more general. I'm pretty sure the error is raised here: https://github.com/benbjohnson/litestream/blob/main/s3/s3.go#L818-L824

If I can do anything to help, I would love to !
Have a great day and Thank you !

Kubernetes guide

Add documentation for how to deploy Litestream as a sidecar in a stateful set.

Allow configuration-less replication mode

Litestream should allow simplified usage for common cases (e.g. replicate a single database to a single replica). This would allow users testing out the software to get up and running more quickly.

Usage:

$ litestream replicate /path/to/db s3://mybkt/db

Depends on #2

Support hot file replicas

Currently, the file replica only supports snapshots with a stream of WAL files. This works well but can cause longer restore times because all WAL files must be replayed. Instead, it would be good to have an option to keep an up-to-date hot replica in addition to the snapshot/WAL. This would allow restores to be instant.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.