rocicorp / replicache Goto Github PK

Realtime Sync for Any Backend Stack

sync offline javascript typescript offline-first multiplayer realtime collaborative-editing replicache replicache-js-sdk

replicache's Introduction

Realtime Sync for Any Backend Stack

👋🏼 Hi, and welcome. This repo is the place to file issues for Replicache. Bug reports and feature requests welcome. See also other ways to contact us, including our Discord.

To get started with Replicache, see doc.replicache.dev.

Confused? Not sure if Replicache is the right thing? Start here.

replicache's People

Contributors

Stargazers

Watchers

replicache's Issues

Redo sample app: all the PKs should be UUID

I think it's a good precedent for us to set in the sample code.

lit-todo does not work with 2.0

The order is currently a float64 (see #148) but we do not create indexes unless we the value is a string

Scan is slower than expected

The performance of scan matters a lot to our product. If scan is very fast (like if you can reasonably scan the entire cache in 5ms) then complex filter operators and indexes don't matter. If you cannot, then indexes and filter operators become much more important.

Using the existing scan() method of Replicache and setting the limit to a high number, I'm able to pull about 5MB of JSON (uncompressed) in 200ms, for a rate of 25MB/s. This seems slow to me, but I always have trouble keeping in mind what should be possible, so I did some quick research for baselines:

LevelDB advertised (9 years ago) 1M sequential reads of 100b values per second for a read bandwidth of 100MB/s. However, the values we store are quite a lot larger -- 1-3kb. Not sure which way that would make the benchmark go.
On my local machine, in a best case scenario of reading a single file, I get a read bandwidth of almost 2GB/s (!) with a cold cache and 6.7GB/s with a warm cache [1]

I'm not sure what else to compare to.

[1] I found this script for linux and modified it for OSX, as so:

Write Test (600MB/s)

# Drop disk cache
> sync && sudo purge

> sync; dd if=/dev/zero of=tempfile bs=1048576 count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 1.563227 secs (686875205 bytes/sec)

Read Test (1.8 GB/s)

# Drop disk cache
> sync && sudo purge

> dd if=tempfile of=/dev/null bs=1048576 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 0.584459 secs (1837155626 bytes/sec)

Read Test with Warm Disk Cache (6.7 GB/s)

> dd if=tempfile of=/dev/null bs=1048576 count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 0.160131 secs (6705397293 bytes/sec)

wasmModule load error if loaded from /foo/bar.html

See: https://rocicorp.slack.com/archives/CMQQ9EDML/p1605450570063600, but the crux of it is that customer sees:

 CompileError: WebAssembly.instantiate(): expected magic word 00 61 73 6d, found 3c 21 44 4f @+0

When instantiating Replicache from a path that contains multiple segments, like /foo/bar.html. It ends up trying to load /foo/replicache/....wasm in that case.

I'm not sure what's going on because the code looks correct to me.

Anyway, there are actually a few issues here:

Correct the defaulting mechanism.
replicache_client.js should not try and load at all if the response is 404. Can we easily put up a better error in that case?

Benchmarks are too noisy

We sometimes see large moves in the benchmarks because we're running them in github actions which aren't well isolated. We should either move to self-hosted runners or if we could somehow run it a bunch of times on different VMs and take the mean that would work too:

https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners

Actually have a preference for running in ourselves (in ec2 on a dedicated or isolated instance) so that we can control the performance of the machine we're running on.

Redo sample app: order number is getting serialized weirdly

22:58:20.698390 repm.go:64: <-- 200 http://localhost:7001/pull {"stateID":"ia2puj0gebgdvj6u367r3a1q52gvu361","lastMutationID":0,"patch":[{"op":"remove","path":"/"},{"op":"add","path":"/~1list~129395","value":{"id":29395,"ownerUserID":3}},{"op":"add","path":"/~1todo~11717351655","value":{"complete":false,"id":1717351655,"listId":29395,"order":89884656743115790000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000,"text":"todo one"}}],"checksum":"35478836","clientViewInfo":{"httpStatusCode":200,"errorMessage":""}}

Scan janks the UI thread

~~Scan currently pulls in batches of 500~~. This ends up blocking the UI thread for 60ms at a time. We need to improve this, and will. But in the meantime we could change this code to be adaptive -- meaning it should run for 1-2ms and then yield and let other UI stuff happen.

What is the best way to do this kind of work these days? Is it https://developer.mozilla.org/en-US/docs/Web/API/Background_Tasks_API? or requestAnimationFrame?

Document error behavior in API docs

Seems like Next.js is not working

@elsigh tried to use Replicache in a nextjs app and just imported it through npm the naive way. It correctly resolved to the cjs bundle, but the cjs bundle contains code with export which doesn't seem correct.

This blocks @elsigh from trying Replicache. We should maybe make a sample nextjs app to ensure this works.

"list" is not supported

Only Chrome supports IDBFactory databases().

We can either remove support for list or we need to manage the names ourselves in another idb database.

If we remove list then I suggest we move drop from a static method to a prototype method.

Sample app works ... weirdly ... in twitter ios in-app browser

le sigh.

I don't even have a consistent reproduction but these are the symptoms I see:

Click on link to redo.replicache.dev in Twitter/iOS
Click on the "open in safari" icon in the bottom right of the UI
Make an edit in Safari
Switch back to Twitter

Typically, but not always, at 4 sync stops working in Twitter app. The edit from 3 doesn't appear.

But I also see other glitches. Sometimes the UI goes completely unresponsive at 4 - tapping checkmarks in the Twitter app doesn't work. It appears there's a hang inside the web app. I can sometimes reproduce this by letting the entire phone lock, then unlocking it and going back to the Twitter app.

I don't know how to debug this since I can't attach a web inspector to this web browser. I assume that what's happening in first case is that the web socket connection is getting closed and for some reason Pusher isn't reopening it (I think it is supposed to do that by itself).

API Nit: ScanBound.start.id should be called "key"

We use the term "key" elsewhere in the API, and we refer to Replicache as a "key/value store".

Of course we should check if existing customers are using this field and if so keep it for them for a short time for backward compat.

Remove support for go binary completely

Scan is difficult to use

It's pretty hard to use right now. This is my fault, I just threw it together because it wasn't the most important thing at the time. Some things that stand out right away:

Users shouldn't have to batch the calls themselves. They should get a cursor thingy that intelligently pages in results behind the scenes ahead of them. The way we have it now is both hard to use and slow.
There should be the ability to scan in reverse.
Further out, we should consider adding some kind of basic filtering that can be implemented in the client so useless results don't have to be returned up to JS. This is a big one, but I'm hopeful that there's some library we can take off the shelf. There's no value in Replicache having its own filter syntax.

new phone who dis

It is frequently the case, especially during development, but occasionally in production too, that a server just wants to drop a particular client's state. Just... forget it. rm -rf, start over.

Unfortunately, when that happens, the client is then wedged. It goes to sync, uses its existing client ID, and the data layer doesn't recognize them.

We need a way for the server and client to negotiate this, and end up with the client getting a fresh state too.

Automatically sync on mutation

Most apps are going to want to push shortly or immediately after mutations are made. See e.g., the calls to _replicache.sync in each of the mutators here:

https://github.com/rocicorp/replicache-sdk-flutter/blob/master/sample/redo/lib/main.dart#L277

It's tempting to just put that in the SDK, however, some counter-arguments:

Many apps will want to do some level of debouncing of writes. The details are perhaps application-specific.
The SDK doesn't currently automatically call downstream sync - user must wire up server push. So it's somewhat consistent that it doesn't do upstream either.

Leaving this bug here as a placeholder as we'll likely do something someday, but I don't think we need to do it now.

Polymer/Material todo demo

Let's move the existing react-based todo demo to Polymer. Then we'll have one easy es6 demo and one bigger React demo.

iloop spotted in Songbook Studio when mutator name not found

See attached:

This error was happening in a tight loop, we were calling maybeEndSync thousands of times.

The HTTP status is not included in beginSync

https://github.com/rocicorp/replicache-sdk-js/blob/49b2ec9b35b464dbd5378d12a790a7e1e3a65a2f/src/replicache.ts#L272-L273

Build repm server

We need to pull in a build of test_server from rocicorp/replicache-client (similar to how we do it for replicache-sdk-flutter)

lit-todo/redo - tapping checkmarks works weirdly in mobile Safari

You have to tap once to focus, then again to actually toggle.

Indexes followup

add some unit tests!
re-enable perf tests
fix glitchyness in lit-html sample app (create todo doesn't show up until reload)

let's not deadlock the protocol (was: push endpoint error policies)

when the batch endpoint hits an error it has to decide whether to treat it as temporary (stop processing mutations) or permanent (skip this mutation and increment last mutation id).
current policy is to treat errors as temporary by default. this is good because it doesn't lose user data. this is bad because it stalls push until the problem is fixed.
in order to minimize push downtime customers have to be really on top of A) responding to temporary errors as they arise and B) converting unhandled permanent errors (which will surface as temporary errors) into permanent errors so they do the right thing going forward

we need a discussion of this in the user manual

also we have considered a customer configurable timeout that causes the client to remove a mutation is causing a temporary error after some time. this presupposes a synchronous endpoint. we are not sold on this idea.

Feat: index keyPrefix -> prefix?

Should we rename the keyPrefix option of createIndex to prefix to be more consistent with scan?

Mutations made during sync are not pushed

This bug was also reported in the Flutter repo, but re-reporting here as it's more important now:

rocicorp/replicache-sdk-flutter#94

Re-stating: if a mutation occurs during sync(), it won't get pushed. That's because the SDK ignores calls to sync() if one is in progress.

But it's difficult for user code to properly detect this and sync again at the right time. They basically have to keep a bit of state that tracks whether a mutation was made during the sync window, and if so, call sync again.

(note: the fix proposed in rocicorp/replicache-sdk-flutter#94 won't work because a mutation can be made in the window between replicache-client deciding the return value and bindings code receiving it)

I think the simplest solution here is simply to honor every call to sync. We still need them to be serialized but we can use promise chaining to do that.

Maybe `npm install` should pull the right binary version of replicache-client repo?

It's confusing right now because the samples get updated in lockstep with the source, but the release doesn't ... so it's easy for the sample code to not actually be a sample that works with "released" code.

Mock out servers in tests

We currently talk to the live servers during tests. We talk to:

https://replicache-sample-todo.now.sh/serve/replicache-batch
https://serve.replicache.dev/pull

We should mock these resources

Test out ServiceWorkerizing Replicache

This bug is more about technical details and experiments related to the RFC

#226

Plan of attack

Make IdbStore use a transaction for every get/has/scan
Add performance tests for get/has/put/del
Implement WorkerInvoker from https://github.com/arv/replicache-sdk-js/tree/wasm-service-worker
Special case scan.
- On the worker side it can create the reciever and either
  - post each result back immidately or
  - accumulate them all in an array and then post the whole array
- on the main thread side we need also need to hack things a bit...
Look into using array buffers all the way (main -> worker -> wasm -> worker -> main) and use transferables

It turns out that ServiceWorker is available on all the browsers we want to support. If we could make use of it there could be several large benefits:

1/ We have only one process reading/writing, which makes correctness much easier to reason about across the board
2/ We don't have to worry so much about jank (#114)
3/ We could make use of (poorly named) background sync to ensure writes makes it through to servers.
4/ In PWAs, we could make use of (similarly named, but dramatically different feature) periodic background sync.

We should try it out and see what the impact on performance is. It seems there must be some overhead due to the context switch to svcworker, but it might be really small and worth it for other benefits.

Pending local state lost on refresh of containing web page 😑

You can replicate the bug like this:

Use one of the JS sample apps
Make an edit
Before sync happens, refresh the page
The pending change is lost

This is because we're assigning a new client ID at startup in the replicache-client, so we get a new local DB on each startup.

It seems like the client ID needs to be stored somewhere. How did we do it in Flutter?

Replicache scan never closes transaction

Found this when performance checking... It used to work"

async function scan() {
  const start = performance.now();
  let sum = 0;
  for await (const v of rep.scan()) {
    sum += v.a;
  }
  const total = performance.now() - start;
  console.log(total);
}

Looking at the log of repm test server closeTransaction is never called.

Fix usages of JSONType

There are some room for tightening the types here.

When the value eventually goes into JSON.stringify we can accept ToJSON(). When the value comes from a json string the value cannot.

Don't refire subscriptions that haven't changed

Minor thing, but it's annoying to developers to re-render for no reason. For a simple thing, we can just keep a hash of the data and only fire if it changes.

Further out we can do pretty fun optimizations to avoid re-executing queries when db hasn't changed in a way that might effect the result.

(My notes on this were written down somewhere but I can't find them -- @arv?)

API Nit: ScanBound.start.exclusive is marked as optional but it isn't.

If you omit exclusive, you get an error coming up through repc.

subscribe API

In Dart we use Stream<R> which provides a pretty ergonomic API.

JS does of course not have anything like this. DOM has a ReadbleStream and Node has Readable stream.

See #9 for implementation using ReadableStream. There are some issues with ReadbleStream:

ReadableStream only allows a single error. After the stream errors it
becomes unusable.
The API is not as convenient as the Dart one. If using async iterator
it is nice but there is no imperative API around it.

Dying after an error seems like a blocker to me.

I'm tempted to use a much simpler API. Strawman:

type Subscription<Data extends JsonType> = {
  onData: (data: Data) => void;
  onError: (error: any) => void;
  close(): void;
};

subscribe(...) : Subscription {}

It would just call these methods over and over until the subscription is closed.

We could have this be an AsyncIterator too (it can be done in the future too) which would allow writing:

let sub = subscribe(...);
for await (const x of sub) {
  setState({x});
}

but it is not clear how error handling should be done here either.

What about relations?

Examples:

Todo -> Author

Author -> Team

Cannot easily switch to debug build of replicache

You can change wasmModule to e.g., ./output/wasm/debug/replicache_client_bg.wasm, but that doesn't change the version of the JS wrapper that is compiled, so you get compile errors.

These two actually need to change together so maybe we need a way to point to the directory that contains the two of them, not the wasm module at all?

Add a (info) log line before we start replaying pending transactions

It's unusual to replay pending transactions in the sense that, even when things are working well, we only see it in rare circumstances.

Because of this, it's slightly unexpected to developers. They see some of their code getting called, and they weren't the caller. I think that we should put a little log line in there just to help explain what they are seeing.

lit-todo/redo: reordering not supported on mobile

I'm not going to prioritize fixing it though. I feel like it's enough of a sample without.

redo.replicache.dev does not work on Edge Android

All I see is a blank page

It works on Chrome on Android

Empty hash check in beginSync response can probably go away

this can probably go now

Originally posted by @phritz in #60 (comment)

Use Benchmark.js for performance tests

If we use benchmark.js I think it will handle things better when it comes to scaling. No need to hard code the number of iterations etc.

scan should return ScanItem[] instead of iterable

scan really returns an array and annotating it as Iterable just leads to an extra copy

lit-todo/redo: need to recache resources

Right now we just cache them once, when sw.js changes.

Rename JsonType to JSONType

We should try to match these rules in our JS API:

https://w3ctag.github.io/design-principles/#casing-rules

Track size of bundle on benchmarks page

We're already tracking the size of the bundle in a gh action, just need to wire it up to the benchmark UI.

Expose more direct index API on Replicache class

Add convenience methods on Replicache class that both registers and invokes the "mutator" functions and maybe remove the methods on WriteTransaction?

Strawman:

class Replicache {
  ...
  createIndex(def: {name: string, keyPrefix?: string: jsonPointer: string}): Promise<void>;
  dropIndex(name: string): Promise<void>;
}

class Replicache {
  ...
  createIndex(...def: {name: string, keyPrefix?: string: jsonPointer: string}[]): Promise<void>;
  dropIndex(...name: string[]): Promise<void>;
}

Mutate functions API

The initial plan was for register to take a function with var args and return a function that strips the tx param. Something along the lines of:

const setTodoText = rep.register(
  async (tx: WriteTransaction, id: Id, text: string): Promise<void> {

  });

...

setTodoText(42, 'Hello');

This design was discarded because limitations in Dart (no var args) so we ended up with:

const setTodoText = rep.register(
  async (tx: WriteTransaction, {id, text}: id: Id, text: string): Promise<void> {

  });

...

setTodoText({id: 42, text: 'Hello'});

...where we have a single arg that can be any JSON value.

It is not a big deal because the developer can easily wrap this as needed:

const setTodoTextInner = rep.register(
  async (tx: WriteTransaction, {id, text}: id: Id, text: string): Promise<void> {

  });

const setTodoText = (id: Id, text: string) => setTodoTextInner({id, text});

...

setTodoText(42, 'Hello');

The problem

We need to store the arguments in the write transaction so that we can replay the transaction later. And we need to store these arguments in a language agnostic way.

Possible Solutions

To allow flexibility between the language bindings we need to either:

Force Dart to use a Dart List as the arg
Allow JS to rewrite the Array to a Map as needed.

Suggested Path Forward

Do nothing.

It is not a big deal and with TypeScript the type info/autocomplete makes working with object as maps less painful.

Sync performance not counting network. That is "how long does it take to write a 20MB client view?". This is different than our current write benchmarks because it doesn't have to go back and forth through the JS boundary.
Single byte write. This is different from our current write benchmarks because those try to write a ton in a single transaction. This is basically measuring the cost of a single small write transaction.
createIndex on an existing large cache. Our existing index benchmarks measure the effect of writing a bunch of data via the JS API in one transaction when indexes are present.

rocicorp / replicache Goto Github PK

replicache's Introduction

Realtime Sync for Any Backend Stack

replicache's People

Contributors

Stargazers

Watchers

Forkers

replicache's Issues

The problem

Possible Solutions

Suggested Path Forward

Recommend Projects

Recommend Topics

Recommend Org