Giter Site home page Giter Site logo

Comments (7)

sgwilym avatar sgwilym commented on May 21, 2024

There is a rough version of this in earthstar-graphql, where both sides of a sync are informed of what the other side wants, and sends only a subset of documents built from that information. Each side trusts that the other side is actually only giving it what it asks for. There is no outgoing filter as of yet.

  1. Peer A sends a GraphQL query to Peer B
  • Peer A queries for documents Peer B holds that conform to its sync filters
  • Peer A also queries Peer B’s sync filters
  1. Peer A compiles a list of documents conforming to Peer B’s sync filters
  2. Peer A sends these documents to Peer B using a ingestDocuments mutation
  3. Peer B ingests the documents from Peer A
  • Peer B currently trusts Peer A to have filtered the documents correctly.
  1. Peer A ingests the documents received from step 1
  • Peer A currently trusts Peer B to have filtered the documents correctly.

Some thoughts on the questions in the first post:

Would these filters need to be different for different peers / pubs you're syncing with, instead of universal?

The earthstar-graphql implementation applies globally to all workspaces stored on a pub, but rather than having settings per peer/pub, I think having separate sync filters per workspace is the one to aim for.

Would we specify a sort order (like newest first) in the query?

If all the documents are going to be ingested anyway, what could the sort order be used for?

Would we want to sync in stages of priority, like "/about/" first, then "/wiki/", ?

Is the idea that smaller requests would be more resilient to adverse network conditions?

from earthstar.

cinnamon-bun avatar cinnamon-bun commented on May 21, 2024

separate sync filters per workspace is the one to aim for.

I agree, yeah.

sort order & stages of priority

The idea here was to improve the initial sync experience for new users and people on slow connections. The first time they sync, there might be a lot of data to fetch -- we want to fetch the most important data first.

The best user experience would hypothetically be something like

first {pathPrefix: '/about/', sort: 'newestFirst'}
then {pathPrefix: '/wiki/', sort: 'newestFirst'}
then {pathPrefix: '/largeImageData/', sort: 'newestFirst'}

...but it would depend on the application.

The efficient sync algorithm will depend on both sides agreeing on a sort order... maybe we can let the pulling side win (e.g. the incoming filter), since that filter will be known by both sides.

When supplying multiple queries like this, I'm not sure if it will happen as multiple iterations of the sync algorithm, or one giant iteration. Probably multiple.

Is the idea that smaller requests would be more resilient to adverse network conditions?

Not really, but it's easier to write code to get batches of data instead of trying to stream it. I think the sync algorithm will fetch a batch of documents at a time, iteratively, using the {limit: 1000} query setting.

This goes with my general principle of "avoid streams". I know some people swear by them but they're hard to figure out, especially in some languages.

from earthstar.

cinnamon-bun avatar cinnamon-bun commented on May 21, 2024

How to query for nothing?

In https://github.com/earthstar-project/earthstar-graphql/releases/tag/v4.0.1, @sgwilym asks:

I think these two should be semantically different: an undefined sync filter means the pub has no preference on documents, whereas an empty one would mean the pub is accepting nothing. (???)

Currently queries work like this:

{
    // An empty query object returns all documents.

    // Each of the following adds an additional filter,
    // narrowing down the results further.

    pathPrefix?: string,  // Paths starting with prefix.
    // etc
}

So an incoming sync filter of {} means "I want all documents"; an outgoing filter of {} means "I will give all documents I have".

If we don't want to give / receive ANY documents in a sync, here are 4 ways to do that:

  • A. only do pushes or pulls, not bidirectional sync
  • B. allow setting the sync filter to null, meaning "nothing".
  • C. add a new query parameter that means "match nothing"
  • D. just make a query that won't ever match anything, like {pathPrefix: "nope"} (since paths have to start with /)

My feelings are:

  • A. this would require additional pub configuration besides the queries. It would be nice to accomplish this using just the queries.
  • B. ⭐ my favorite
  • C. it's a bit weird, could work
  • D. it's a hack, it works, it's not elegant

This is made more complicated because pubs are supposed to have an array of incoming queries, and an array of outgoing queries. Documents that match ANY query in the array will be sent.

So...

  • incomingQueries = [{}] -- match everything
  • incomingQueries = null -- match nothing, if we allow this
  • incomingQueries = [] -- everything? or nothing?

Queries in other places in Earthstar

It's tempting to generalize B, to accept null queries anywhere that queries are used in Earthstar. But this would mean changing the Storage query functions, since I don't like mixing undefined and null (seems like a recipe for mistakes):

// currently: omitting the query means the same as setting it to {}: get all documents
documents(query?: QueryOpts)

// the new way?  an argument is required.
// null means "no documents", {} means all documents
documents(query: QueryOpts | null)

In conclusion: 🤷 ?

from earthstar.

sgwilym avatar sgwilym commented on May 21, 2024

Using null to signify something so meaningful seems laden with danger to me 😬

  • Conceptually, I feel like it doesn't match up: having null filters seems like saying you have no filters, which to me means that you let anything through.
  • Because using checks like if (syncFilters) { ... } is so common in JS, I think it's highly likely for devs to be caught out by null meaning something.

On reflection, an empty array meaning something is similarly ambiguous to me. I wonder if a more explicit typing would work better?

type SyncFilters = {
    pathPrefixes: string[],
    authorsByVersions: string[],
} | "FILTER_EVERYTHING"

type PubConfig = {
    otherStuff?: Whatever,
    incomingFilters?: SyncFilters,
    outgoingFilters: SyncFilters,
}

// All documents are accepted and sent with these configs:

{
    incomingFilters: {},
}

{
    incomingFilters: { 
        pathPrefixes: []
    },
    outgoingFilters: null
}

// Documents are not accepted

{
    incomingFilters: 'FILTER_EVERYTHING',
}

// Documents are not sent

{
    outgoingFilters: 'FILTER_EVERYTHING'
    incomingFilters: {
        pathPrefixes: ["/gossip"]
    }
}

from earthstar.

cinnamon-bun avatar cinnamon-bun commented on May 21, 2024

How about

{ limit: 0 }

...as a query that matches nothing?

from earthstar.

sgwilym avatar sgwilym commented on May 21, 2024

How would that be applied? Like this?

{
  incoming: { limit: 0 },
  outgoing: { pathPrefixes: ["/wiki"]}  
}

It's simple. But is there any meaning/use to setting an incoming filter of { limit: 10 }?

from earthstar.

cinnamon-bun avatar cinnamon-bun commented on May 21, 2024

But is there any meaning/use to setting an incoming filter of { limit: 10 }?

Maybe if you wanted to have "just the 10 most recently edited docs"?

{
    limit: 10,
    sort: "recent",  // (sort order hasn't been defined yet, but is probably coming soon)
}

from earthstar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.