benbjohnson / litestream Goto Github PK

View Code? Open in Web Editor NEW

10.0K 89.0 222.0 1.01 MB

Streaming replication for SQLite.

Home Page: https://litestream.io

License: Apache License 2.0

Go 98.78% Makefile 0.54% HCL 0.09% PowerShell 0.15% Dockerfile 0.16% Python 0.28%

sqlite replication s3

litestream's Introduction

Litestream

Litestream is a standalone disaster recovery tool for SQLite. It runs as a background process and safely replicates changes incrementally to another file or S3. Litestream only communicates with SQLite through the SQLite API so it will not corrupt your database.

If you need support or have ideas for improving Litestream, please join the Litestream Slack or visit the GitHub Discussions. Please visit the Litestream web site for installation instructions and documentation.

If you find this project interesting, please consider starring the project on GitHub.

Acknowledgements

While the Litestream project does not accept external code patches, many of the most valuable contributions are in the forms of testing, feedback, and documentation. These help harden software and streamline usage for other users.

I want to give special thanks to individuals who invest much of their time and energy into the project to help make it better:

Thanks to Cory LaNou for giving early feedback and testing when Litestream was still pre-release.
Thanks to Michael Lynch for digging into issues and contributing to the documentation.
Thanks to Kurt Mackey for feedback and testing.
Thanks to Sam Weston for figuring out how to run Litestream on Kubernetes and writing up the docs for it.
Thanks to Rafael & Jungle Boogie for helping to get OpenBSD release builds working.
Thanks to Simon Gottschlag, Marin,Victor Björklund, Jonathan Beri Yuri, Nathan Probst, Yann Coleu, and Nicholas Grilly for frequent feedback, testing, & support.

Huge thanks to fly.io for their support and for contributing credits for testing and development!

Contribution Policy

Initially, Litestream was closed to outside contributions. The goal was to reduce burnout by limiting the maintenance overhead of reviewing and validating third-party code. However, this policy is overly broad and has prevented small, easily testable patches from being contributed.

Litestream is now open to code contributions for bug fixes only. Features carry a long-term maintenance burden so they will not be accepted at this time. Please submit an issue if you have a feature you'd like to request.

If you find mistakes in the documentation, please submit a fix to the documentation repository.

litestream's People

Contributors

Stargazers

Watchers

Forkers

00mjk isgasho danp 5l1v3r1 dolanor-galaxy k-rhen doytsujin dejan-stankovic scott-quinlan developgo leo2904 zeta1999 forkkit lyhiving bscott simhaonline ihassin productinfo hirajanwin bobek malisetti africhild sawka dclark fnzv mxlabs crazyrainman frytaz luisriverag spread0x epictetus mstryoda oier zhanglei pavelnguyen goguys dynajoe hadesfeng cyborgjourney suryatmodulus hricha-deepsource code-watch kalafut james-quigley monad-one m4k3r-org vinayasathyanarayana kokizzu seanpowell radekg doubaokun jungle-boogie richardji7 db4u hotelzululima qs-wang denis256 colin-sitehost blunghamer syllogy mtlynch aoj didip cxz strogo grzegorzfraszczak xinush anacrolix lenye overfittingstudyroom trevyn 0xforked pwfoo jannson ohthehugemanatee gtr8 tniessen rbarry82 essingen123 cnglish edwinyzh inamandev khongten001 cyberflamego akansal1 sthagen yibit akdilsiz iamsaker rachithrr mawaldne null-dev chsjiang r3n3g dbmirror adfernandes ssss11-future zeeyao 5gapp mattn

litestream's Issues

Errors in replication?

Hi, getting this when running replicate

/home/xxx/datadrive/code/nzquarantine/storage.db: sync error: create generation: initialize shadow wal: read header: EOF

/home/xxx/datadrive/code/nzquarantine/storage.db: sync: new generation "21b638752c28dcb5", wal overwritten by another process

/home/xxx/datadrive/code/nzquarantine/storage.db(s3): snapshot: creating 21b638752c28dcb5/00000000 t=56.330497ms

Command run: litestream replicate storage.db s3://xxx

I am able to restore but it is missing latest data.

Tested with AWS S3 on Linux 64bit. Sqlite being used by Python standard library, no Sqlachemy or anything

Only using a single python process that is singlethreaded.

Container Image

Ciao!

I see some issues regarding running litestream as a sidecar container (as an example): #29

But I couldn't find a Dockerfile or any other container image definition in the repo. Do you have plans to create it along with publishing the container image in a public registry?

Thanks!

cannot verify wal state: ...-wal: no such file or directory

I'm testing out a litestream setup on k8s, where a python app container and litestream container share the db via a local volume. The db is created via an init container that just runs CREATE TABLE before either of them start.

Here is what init looks like:

litestream v0.3.2
initialized db: /db/hello.sqlite3
replicating to: name="s3" type="s3" bucket="..." path="hello.sqlite3" region=""
litestream /db/hello.sqlite3: init: cannot determine last wal position, clearing generation (primary wal header: EOF)
litestream /db/hello.sqlite3: sync: new generation "b7852795bc0c7d42", no generation exists

Afterwards, I get no logs until I run a query through the app (read or write). Then, I get a steady stream of litestream /db/hello.sqlite3: sync error: cannot verify wal state: stat /db/hello.sqlite3-wal: no such file or directory until I shut down litestream. I've tried issuing the wal pragma from the app's connection and enabling autocommit mode, but neither seem to make a difference.

Similar to #58, I'm wondering if I'm doing something wrong or whether this is a litestream bug. If you think it's the latter, I can try to work on a simpler repro (I haven't tried it locally yet).

Replicate only on WAL changes

Thanks again for this software!

I'm using litestream with a service that has infrequent database writes (like a handful of times per day).

I noticed my AWS dashboard reporting many PUTs and revisited the documentation and realized that litestream replicates the WAL every 10s.

A nice to have feature would be if litestream skips the WAL replication if no local changes have occurred since last sync.

An even nicer feature for my scenario would be a "sync only on write" mode where instead of replicating every N seconds, it replicates immediately after each change to the WAL, but otherwise does not replicate.

Low priority since my understanding is that even 300k S3 PUTs is only ~$1.50/month, but it'd be cool if litestream could run completely in the free tier for infrequent write scenearios.

Replicate whole directories

This is a great project! Would it be possible to configure replication for whole directories with multiple database files, with arbitrary number of nested directories?

Client libraries for replication (iOS, etc)

Hi,

Thanks for releasing such an interesting and exciting project. I'm curious if you have any input on whether this project is suitable for a single-writer, multi-reader project for replicated data to mobile devices for offline use. I know the question has been asked before about using this for replication, but mostly as relates to clustering or data at edge use cases.

I've done this in the past at the API layer but found there to be performance and reliability challenges. Wondering if simply feeding the device as a read replica would be simpler and easier to deliver.

Support for Google Cloud Storage

Hi @benbjohnson, Could you support for GCS?

2-Way Replication to S3

Hey, this is awesome!

As a potential enhancement, would it be possible to periodically watch the S3 bucket for updates and restore whenever a new version is published?

This would be excellent for low-load distributed systems. Multiple nodes would be able to write to their local SQLite database, this is pushed up to the S3 bucket and then litestream sees this and updates the local SQLite databases on all of the other nodes within that system.

Missing link to source on litestream.io

Hi Ben, long time no talk. I was having some trouble finding this repository, there doesn't seem to be a link on litestream.io, also not on the 'build from source' page: https://litestream.io/install/source/

Would be useful to add a link! I also hope you'll write a blogpost about how litestream works.

Thanks for the cool project!

MacOS Release Build

Litestream should provide a signed release build for darwin/amd64.

Document how to configure Promethus exporter port.

Thanks for opensourcing this project!

I have an existing Prometheus exporter on port 9090 so I need to change it.

❯ ./dist/litestream replicate -config ../litestream_testbed/config.yml
litestream (development build)
initialized db: /home/darreng/code/litestream_testbed/sourcedb/db.sqlite
replicating to: name="file" type="file" path="/home/darreng/code/litestream_testbed/replicadb/db.sqlite"
serving metrics on http://localhost:9090/metrics
cannot start metrics server: listen tcp :9090: bind: address already in use

I modified my config to include the "addr" setting and I was able to change the Prometheus metrics port.

addr: 0.0.0.0:9091
dbs:
  - path: /home/xxx/code/litestream_testbed/sourcedb/db.sqlite
    replicas:
      - path: /home/xxx/code/litestream_testbed/replicadb

Thanks

Darren

Sqleet Support?

Does this work with https://github.com/resilar/sqleet and, if so, any recommendations on how to make it work best? I've found sqleet to be the best for encrypted sqlite.

Write AWS S3 guide

Parallelize restore

Currently, WAL files are loaded/read/applied sequentially during restore. We could parallelize some of the processing (transferring over the network, decompression, etc) so restores can happen faster.

Disconnect DB if init() fails

The DB.init() function can fail because of a write lock on the database, however, the DB.db field is assigned the database so it will not try to reinit:

litestream/db.go

Lines 384 to 389 in f2d3acc

    
           // Connect to SQLite database & enable WAL. 
        
           if db.db, err = sql.Open("sqlite3", dsn); err != nil { 
        
           	return err 
        
           } else if _, err := db.db.Exec(`PRAGMA journal_mode = wal;`); err != nil { 
        
           	return fmt.Errorf("enable wal: %w", err) 
        
           }

The check in DB.init():

litestream/db.go

Lines 359 to 362 in f2d3acc

    
           // Exit if already initialized. 
        
           if db.db != nil { 
        
           	return nil 
        
           }

The db field should only be assigned if init() completes successfully.

Small typo in tutorial

In the Continuous replication section you insert a blue grape:

INSERT INTO fruits (name, color) VALUES ('grape', 'blue');

But you get back a purple one when restoring fruits3.db:

SELECT * FROM fruits;
apple|red
banana|yellow
grape|purple

Nothing major of course, but it confused me for a second.

Thanks for this project, it looks really useful and simple to use.

Incorrect key in example litestream.yml

In the default litestream.yml file, it shows an S3 URL keyed by path:

litestream/etc/litestream.yml

Line 9 in 6a90714

# - path: s3://my.bucket.com/db # S3-based replication

The documentation seems to say that this key should be url instead:

https://litestream.io/reference/config/#replica-settings

Document retention period

Rules for how to develop an application with a litestream-able SQLite database

Hello,

First of all, thank you for this great project! I have never used SQLite in a position where I would have a live replication set up, so I wanted to have a hands-on test scenario to see how the tool works. Unfortunately, I cannot get it to work in a meaningful way with what I have tried so far:

Brave history
Firefox history

I see lots of different errors. I understand fully that what I am trying to do is... questionable. But I would like to understand how the tool works under some kind of load and I don't have any so far. What are the rules that I need to follow in my application to make sure litestream replication does not fail in all the ways I have seen it fail?

So far I can see these rules:

Do not lock the database file (Brave and FF)
Enable WAL, journal is not enough (Brave, though there seems to be a History-wal file)
Do not truncate the WAL on shutdown or restart (or at all?) (FF)

With Brave:

litestream v0.3.2
initialized db: /Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History
replicating to: name="s3" type="s3" bucket="mybucket" path="Brave-History.sqlite" region=""
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync error: enable wal: database is locked
...
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync error: ensure wal exists: no such table: _litestream_seq
...
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync: new generation "d051be38c06dbfe0", no generation exists
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History(s3): snapshot: creating d051be38c06dbfe0/00000000 t=190.91946ms
/Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History: sync error: cannot verify wal state: stat /Users/me/Library/Application Support/BraveSoftware/Brave-Browser/Default/History-wal: no such file or directory
...

With Firefox

$ litestream replicate places.sqlite s3://mybucket/Firefox-History.sqlite
litestream v0.3.2
initialized db: /Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite
replicating to: name="s3" type="s3" bucket="mybucket" path="Firefox-History.sqlite" region=""
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: enable wal: database is locked
...
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: _litestream_lock: no such table: _litestream_lock
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: ensure wal exists: no such table: _litestream_seq
...
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: cannot verify wal state: stat /Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite-wal: no such file or directory
...
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync: new generation "bdf080e3a79aab8f", wal truncated by another process
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite(s3): snapshot: creating bdf080e3a79aab8f/00000000 t=15.509471996s
/Users/me/Library/Application Support/Firefox/Profiles/myprofile.dev-edition-default/places.sqlite: sync error: checkpoint: mode=PASSIVE err=disk I/O error: errno 260
...

Grafana dashboard

Ciao!

I am trying this project and I just discovered that it exposes some (Prometheus) metrics. So I looked at the repo to find a Grafana dashboard but I didn't found any.

I jumped into the task of creating one (simple):

I'm sure it can be improved, but if you consider it is a good starting point, I can share it with you with a PR.

Thanks!

Allow replica URL for subcommands

Currently, running the litestream CLI's subcommands reference DB paths so they require a configuration file. However, it would be useful to only specify a URL path so that users can pull down a replica locally without a config file.

For example, restoring a replica from S3:

$ litestream restore -o db s3://mybkt/path/to/db

32-bit ARM Support

Ran compilation as per instructions:

/db.go:1374:46: constant 9223372036854775807 overflows int
./db.go:1376:22: constant 9223372036854775807 overflows int
./db.go:1699:3: constant 9223372036854775807 overflows int
./replica.go:999:14: constant 9223372036854775807 overflows int

Which I think is due to this . I attempted a patch, but am not a Go hacker, and got stuck !!

Linux raspberrypi 5.4.79-v7l+ #1373 SMP Mon Nov 23 13:27:40 GMT 2020 armv7l GNU/Linux

go version go1.15.8 linux/arm

Broken link in docs

At https://litestream.io/guides/systemd/, the "Replicating a single database" link under "Prerequisites" links to https://litestream.io/getting-started/basic/. It should link to https://litestream.io/getting-started/.

ARM64 Build

Hi,

Great project! I’m dreaming up use cases for it already...

It would be great if this project had ARM64 Linux builds available. This could potentially be quite easy with gox: https://github.com/mitchellh/gox

Thanks!

Google Cloud Storage Support?

Hello! This is a super-awesome project, you're inspiring me to build more of my sideprojects on SQLite. Thank you for making it!

Are there any plans for Google Cloud/GCS support in the future?

Thanks again for the great project!

Native Google Cloud Storage (GCS) support

Support replica URLs with generations command

Write Docker guide

Litestream should already work inside Docker but testing & documentation are needed.

Live read replicas

Currently, litestream replicas a database to cold storage (file or s3). It would be nice to provide a way to replicate streams of WAL frames to other live nodes to serve as read replicas.

This a network replication endpoint (probably http) as well as figuring out how to apply WAL writes to a live database.

Document NFS usage

While SQLite recommends against NFS usage because of broken lock implementations, it seems that locking has improved in NFS v4 & v4.1. If SQLite can work on recent NFS versions, it could be interesting to run SQLite on a shared NFS mount for lower traffic clusters of machines. For example, implementing a work queue across a cluster of machines.

litestream as golang module

would it be possible to embed litestream in a golang project?

Windows support, testing & builds

Litestream has not been tested against Windows. It probably doesn't work but it also probably wouldn't take much to make it work. This requires testing and release builds added.

Related information for Windows packaging: goreleaser/goreleaser#1295

Feature : MYSQL driver support

Not sure if it would fit into the scope of your project, but having MYSQL driver support would be a dream.

Snapshot Interval

Currently, litestream performs a snapshot when the retention period rolls over but this means that a long retention period can extend restore times. It would be useful to separate the retention interval from the snapshot interval so a user can have multiple snapshots within their retention period (e.g. snapshot every day, keep for one week).

Bucket Autocreation

Currently, you must create a bucket on an S3-compatible store before replicating. This adds an extra step that could be handled by litestream if it finds that a bucket doesn't exist. This would simplify the Getting Started page as well.

npm support for litestream

@benbjohnson : First up, want to say that litestream is amazing piece of work 👍

Can litestream be published as an npm package ? go-npm can help.

Files replication

The design of lite stream is pretty nice

I am planning to kick the tires to use is in a master slave setup with minio

one thing that I am curious about is files being replicated as I have to also be able to store files and replicate changes and minio is excellent at that but I was more curious about if SQLite can store files ?

mit could be an anti pattern too so tell if you think it’s a bad idea

Write Backblaze B2 guide

Disable fsync() during restore

Databases are restored to a temporary file so there is no benefit to syncing each WAL checkpoint to disk until the restore is complete.

Log WAL frame header checksum mismatch errors

The test server occasionally sees a sync error: checksum mismatch which indicates that a page is not fully written. It causes the sync to fail momentarily but the subsequent sync a second later succeeds. Currently, this shows as an error but should be changed to a log notification as it is temporary.

Add support for S3-compatible services

Backblaze B2
Minio
GCS

replicate ignores -config flag when database is specified

Example command

litestream replicate -config /home/mike/litestream.yml "${PWD}/data/store.db" s3://scratch.tinypilotkvm.com/db

For the command above, litestream silently ignores the -config flag, which caught me by surprise. The issue seems to be here:

https://github.com/benbjohnson/litestream/blob/f652186adf7470256b6a7bf7e96194dde49c2af2/cmd/litestream/replicate.go#L43L61

It only reads the config flag if it's the only argument, like:

litestream replicate -config /home/mike/litestream.yml

Desired behavior

If I've specified a -config file but litestream is not using it, I'd like litestream to error out or print a warning.

It'd also be handy if litestream allowed me to specify a db, and then if there's only one replica URL in my config file, it just uses that one, so I could say:

litestream replicate -config /home/mike/litestream.yml "${PWD}/data/store.db"

And then if my config file had an entry for store.db, litestream just picks the first replica URL.

Enforce maximum checkpoint index

The index is encoded as a uint32 in the shadow WAL filename which restricts the maximum value to 2^32. Theoretically, with 4MB WAL files you could reach this limit after writing 16 petabytes of WAL data without a new generation.

We should enforce a new generation after a maximum index value is reached.

Create a Homebrew launchd plist file

Platform: macOS

It would be useful to be able to use the brew services command to launch Litestream and replicate a specified database. The name and location of the database could be stored in an environment variable or file.

See: https://docs.brew.sh/Formula-Cookbook#launchd-plist-files

Replace S3 with MinIO in Getting Started

Monitor error - InvalidArgument, status code: 400

Hi !

First of all, thank you for you great project. I am really exited to start new projects with SQLite.
I tried your last commits (d6ece0b) on master in order to upload generations into Google Cloud Storage.

Everything worked correctly until I tried to import a ~10MB CSV dataset into SQLite.

yann@xps:~/Projects/Perso/poc-litestream$ sqlite3 test.db
SQLite version 3.34.1 2021-01-20 14:10:07
Enter ".help" for usage hints.
sqlite> .mode csv
sqlite> .import ../Shakespeare_data.csv shakespear
_________________________________________
yann@xps:~/Projects/Perso/poc-litestream$ litestream replicate -config ~/Golang/bin/litestream.yml
litestream (development build)
initialized db: /home/yann/Projects/Perso/poc-litestream/test.db
replicating to: name="s3" type="s3" bucket="test-litestream-regional" path="db" region="europe-west1" endpoint="https://storage.googleapis.com" sync-interval=10s
/home/yann/Projects/Perso/poc-litestream/test.db(s3): monitor error: InvalidArgument: Invalid argument.
        status code: 400, request id: , host id:
/home/yann/Projects/Perso/poc-litestream/test.db(s3): monitor error: InvalidArgument: Invalid argument.
        status code: 400, request id: , host id:
_________________________________________
yann@xps:~/Projects/Perso/poc-litestream$ tree -ah
.
├── [9.3M]  test.db
├── [4.0K]  .test.db-litestream
│   ├── [  17]  generation
│   └── [4.0K]  generations
│       └── [4.0K]  310573930e9594aa
│           └── [4.0K]  wal
│               ├── [197K]  00000006.wal
│               ├── [9.4M]  00000007.wal
│               └── [4.1K]  00000008.wal
├── [ 32K]  test.db-shm
└── [9.4M]  test.db-wal

4 directories, 7 files

Here is my configuration:

dbs:
 - path: /home/yann/Projects/Perso/poc-litestream/test.db
   replicas:
    - type: s3
      bucket: test-litestream-regional
      path: db
      endpoint: https://storage.googleapis.com
      forcePathStyle: true
      region: europe-west1

yann@xps:~/Projects/Perso/poc-litestream$ env | grep AWS
AWS_SECRET_ACCESS_KEY=<<REDACTED>>
AWS_ACCESS_KEY_ID=<<REDACTED>>

I wonder if this issue is GCP related of it is more general. I'm pretty sure the error is raised here: https://github.com/benbjohnson/litestream/blob/main/s3/s3.go#L818-L824

If I can do anything to help, I would love to !
Have a great day and Thank you !

Kubernetes guide

Add documentation for how to deploy Litestream as a sidecar in a stateful set.

Allow configuration-less replication mode

Litestream should allow simplified usage for common cases (e.g. replicate a single database to a single replica). This would allow users testing out the software to get up and running more quickly.

Usage:

$ litestream replicate /path/to/db s3://mybkt/db

Depends on #2

Prevent retention enforcement during validation

Currently, there is no mutex to prevent retention from removing underlying files during validation which causes the validation to fail.

Support hot file replicas

Currently, the file replica only supports snapshots with a stream of WAL files. This works well but can cause longer restore times because all WAL files must be replayed. Instead, it would be good to have an option to keep an up-to-date hot replica in addition to the snapshot/WAL. This would allow restores to be instant.

	// Connect to SQLite database & enable WAL.
	if db.db, err = sql.Open("sqlite3", dsn); err != nil {
	return err
	} else if _, err := db.db.Exec(`PRAGMA journal_mode = wal;`); err != nil {
	return fmt.Errorf("enable wal: %w", err)
	}

benbjohnson / litestream Goto Github PK

litestream's Introduction

Litestream

Acknowledgements

Contribution Policy

litestream's People

Contributors

Stargazers

Watchers

Forkers

litestream's Issues

Example command

Desired behavior

Recommend Projects

Recommend Topics

Recommend Org