Giter Site home page Giter Site logo

Comments (3)

cinnamon-bun avatar cinnamon-bun commented on September 21, 2024

Working out scenarios in detail

Let's say we have 2 paths, and each has a history of 3 documents (o)

/path1  o   o   o
/path2  o   o   o
                ^ heads (latest versions in each path)

We do a basic query that matches some of them...

/path1  x   x   o    // "x" matched query, "o" did not
/path2  x   o   x

Here are ways we might want to filter further with a fancier query:

// [ ] means it was returned by the fancy query

/path1  x   x   o   // A. only heads
/path2  x   o  [x]

/path1  x   x   o   // B. only heads, then expand to entire history
/path2 [x] [o] [x]

/path1  x  [x]  o   // C. latest individual match per path
/path2  x   o  [x]

/path1 [x] [x]  o   // D. all individual matches
/path2 [x]  o  [x]

/path1 [x] [x] [o]  // E. every path that matched somehow.  return all history.
/path2 [x] [o] [x]

/path1  x   x  [o]  // F. every path that matched somehow.  only return the head.
/path2  x   o  [x]

Use cases

Why do all these things?

Some apps will be doing custom conflict resolution so they'll generally want full histories. Some apps will just use the simple last-write-wins that's built into Earthstar.

Besides in-app searching and filtering, these might also be useful for sync queries.

Here's some examples for a wiki app, searching for pages authored by me:

  • A: Pages where I'm the most recent editor (current version)
  • B: Pages where I'm the most recent editor (plus full edit history)
  • C: Edits I made (my most recent version per document)
  • D: Edits I made (all my revisions)
  • E: Every document I've touched (plus full edit history)
  • F: Every document I've touched (current version only, even if not by me)

Which queries are fast vs slow?

Assuming we've already tagged the latest document with "isHead = true" in the database...

  • A: ✅✅ document-wise + check isHead column
  • B: ⌛ expand to all history (subquery or group-by?)
  • C: ✅ group-by for latest, or iterate all matches & discard non-heads
  • D: ✅✅✅ document-wise
  • E: ⌛ expand to all history (subquery or group-by?)
  • F: ⌛ expand to all history and group-by for latest

Note "head" means the latest overall, and "latest" means the latest of the matches.

The operations are:

  • match: Do basic matching
  • heads: Only keep heads?
  • expand: Expand to all history?
  • latest: Only keep latest doc in each path (might not be overall head)?

Which is:

  • A: ✅✅ match, heads
  • B: ⌛ match, heads, expand
  • C: ✅ match, latest
  • D: ✅✅✅ match
  • E: ⌛ match, expand
  • F: ⌛ match, expand, heads (or match, find-heads-for-each-path)

Turning that into query parameters

We could have a query parameter like this:

// in same order as above
historyMode:
      'matching-heads'
    | 'matching-heads-plus-all-history'
    | 'latest-matching-versions'
    | 'matching-versions'
    | 'matching-versions-plus-all-history'
    | 'any-heads-that-have-matches-in-history'

Or would it be better to break this into 2 or 3 separate query parameters?

from earthstar.

cinnamon-bun avatar cinnamon-bun commented on September 21, 2024

Besides authors, we can do other operations on the set of document versions for a given path. For example, timestamps:

Timestamps

/path/1:  m n o    // three versions
/path/2:  p q r
  • To get o and r, the most recent edits of each doc, is query type A (match, heads)
  • To get m and p, the oldest edit of each doc (creation time), is query type C (match, oldest)

from earthstar.

cinnamon-bun avatar cinnamon-bun commented on September 21, 2024

In the beta branch, v6, I decided what to do: asking for "latest" docs happens FIRST, and then filters are applied only to the results.

Using the language from previous comments above:

  • { history: 'latest' } is type A -- get latest docs first, then apply filters to those.
  • { history: 'all' } is type D -- all individual matching documents regardless of location in history.

Comments from the beta source code for further details:

https://github.com/earthstar-project/earthstar/blob/beta/src/storage/query.ts#L48-L57

/**
 * Query objects describe how to query a Storage instance for documents.
 * 
 * An empty query object returns all latest documents.
 * Each of the following properties adds an additional filter,
 * narrowing down the results further.
 * The exception is that history = 'latest' by default;
 * set it to 'all' to include old history documents also.
 */
export interface Query {
    /**
     * Document author.
     * 
     * With history:'latest' this only returns documents for which
     * this author is the latest author.
     * 
     * With history:'all' this returns all documents by this author,
     * even if those documents are not the latest ones anymore.
     */
    author?: AuthorAddress,

https://github.com/earthstar-project/earthstar/blob/beta/src/storage/storageSqlite.ts#L272-L299

         * If query.history === 'all', we can do an easy query:
         * 
         * ```
         *     SELECT * from DOCS
         *     WHERE path = "/abc"
         *         AND timestamp > 123
         *     ORDER BY path ASC, author ASC
         *     LIMIT 123
         * ```               
         * 
         * If query.history === 'latest', we have to do something more complicated.
         * We don't want to filter out some docs, and THEN get the latest REMAINING
         * docs in each path.
         * We want to first get the latest doc per path, THEN filter those.
         * 
         * ```
         *     SELECT *, MAX(timestamp) from DOCS
         *     -- first level of filtering happens before we choose the latest doc.
         *     -- here we can only do things that are the same for all docs in a path.
         *     WHERE path = "/abc"
         *     -- now group by path and keep the newest one
         *     GROUP BY path
         *     -- finally, second level of filtering happens AFTER we choose the latest doc.
         *     -- these are things that can differ for docs within a path
         *     HAVING timestamp > 123
         *     ORDER BY path ASC, author ASC
         *     LIMIT 123
         * ```

from earthstar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.