usegraffy / graffy Goto Github PK

View Code? Open in Web Editor NEW

45.0 1.0 7.0 4.59 MB

Live queries for graph-shaped data

Home Page: https://graffy.org

License: Apache License 2.0

JavaScript 98.28% HTML 0.11% CSS 1.61%

graphql falcor nodejs data-fetching streaming-api

graffy's Issues

Filtering links

Create a special "filtering link" which, when traversed, modifies the keys immediately under the link to add filtering parameters.

Illustrative use case

Consider the schema

{
  posts: { [pid]: Post },
  posts$$createdAt: { [filter]: { [createdAt]: link(`/posts/${pid}`) } },
  users: { [uid]: User }
}

where the posts$$createdAt index can be filtered by authorId and tag.

Imagine we want to query last 3 posts of a user, with a particular tag, alongside their name. While this is possible already with a query containing a users branch as well as a posts$$createdAt branch, such a query would be unintuitive, duplicate userIds, and require additional post-processing of results.

Ideally this query should work:

# Query
{
  users: { '123': {
    name: 1,
    posts: { [key({ tag: 'example' })]: [{ last: 3 }, {
      title: 1, createdAt: 1
    }] }
  } }
}

and Graffy should send the following query to the posts$$createdAt provider:

{ [key({ tag: 'example', authorId: '123' })]: {
  title: 1, createdAt: 1
} }

This could be done if the user provider returned a "filtering link" for the posts property:

{
  name: 'Example',
  posts: link(['posts$$createdAt', { authorId: '123' }])
}

When the initial state of a subscription is empty, no payload is pushed

Replace pageInfo with nextRange and prevRange

It's more useful, more succinct, and protects the user from having to deal with \0s and \uffffs

The change should happen in decorate.js

Instead of:

arr.pageInfo = {
  hasNext: false,
  hasPrev: true,
  start: '',
  end: 'foobaq\uffff'
}

we should have:

arr.prevRange = null;
arr.nextRange = { first: 10, after: 'foobar' };

The null prevRange indicates that this is the first page. The first / last should match the current page size, and before / after should use the keyAfter / keyBefore helpers from @graffy/common.

Improve version

Currently, every node has a version value. In practice, in most (but not all) graphs and queries, all nodes have the same version. There is some redundancy here.

In subscription caches, we need to update the version number of the entire cache whenever there is a new update. With the current data structure, this takes O(size of cache) time, while other operations only take O(size of change).

It might be beneficial to rethink how version is stored and manipulated in the internal representation.

MVCC: "Multi-layer" graphs

Version storage might be simplified by only having one writeVersion and one readVersion per tree, but to pack multiple trees into "layers".

Merging would simply add a new layer, and occasionally perform "vacuuming" to remove data from old layers that has been shadowed by later layers.
Optimistic updates (on client) and two-phase commit (distributed backends) can then be implemented easily by delaying the "vacuuming" until commit occurs.
Slice will need to be done per layer.
setVersion would merge everything to a single layer.
Seive will only work on single-layer graphs, which require calling setVersion first. The sieve mechanism to detect relevant updates is not resilient to out-of-order updates anyway; a query denormalization-based approach should be considered.

To stop this getting out of hand, queries are immutable and can only have one layer - so all parts of a query must have the same min version requirement. When merging queries, we can take the max(readVersion) and min(writeVersion) to ensure that the data required by all constituent queries are requested.

This is inextricably linked to #2 .

Alternatives

This is a significant change from the existing CRDT / LWW model.
An alternate model to MVCC in databases is an undo log. It has some advantages but it is not entirely clear how it might be implemented within the Graffy data model.

Typescript checking, emitting definitions

Primary goal: Add typings to the published NPM modules.
Secondary goal: Get type checks into the development workflow for Graffy itself.

The preferred approach is to use JSDoc-style function annotations (that TypeScript supports) rather than converting to Typescript syntax.

Alternative to aliases

@baopham @email2vimalraj

The consumer APIs (which change to read, write and watch) could gain a path argument to avoid having to implement aliases.

By and large, Graffy encourages granular queries; if a component has the sort of data need that requires aliases, it might be better served by just making two queries.

However using dynamic keys in queries comes with a bit of boilerplate that could be eliminated.

Problem

const postId = get_post_id_somewhere();
result = await gs.read({ posts: { [postId]: { ... } });
const what_i_really_want = result.posts[postId];

It feels even worse when using filter parameters:

const filter = encodeKey({ tags: ['tech', 'javascript'] }); // This is some opaque string.
result = await gs.read({ filteredPostsByTime: { [filter]: [ { first: 10 }, {...} ] });
const what_i_really_want = result.posts[filter];

I have to store the encoded filter into a variable even though it has no meaning or use outside that query.

Solution

I feel that a better API might be:

const postId = get_post_id_somewhere();
const just_the_post = gs.read( ['posts', postId], { ... });

or with the filter:

const filteredPosts = gs.read([ 'filteredPostsByTime', encodeKey(...) ], [ { first: 10 }, { ... } ]);

What say?

In read/write/watch, we would wrap the query in the path before passing to .call(), and unwrap the results before returning.

Watch queries with per-node "raw"

Consider a watch query:

{
  users: [{
    name: true,
    email: true
  }]
}

Currently there are two modes for this watch: "values" mode, where every response will contain all users, and "raw" mode, which will contain only the changes. A common use case is for a "raw+" mode where you receive only the changed users, but for a particular user that changed both name and email is received (although only one of them has changed).

This is convenient for watching processes that would otherwise need to watch changes and then load every entity.

Two-argument form of link()

An index provider might be able to retrieve the necessary information at the link, not just its location. Allowing the provider to do, for example:

store.onRead('/posts$', query => {
  const posts = getPostsFromDb(query);
  return _.fromPairs(posts.map(post => [
    key([post.createdAt, post.id]),
    link(`/posts/${post.id}`, post),
  ]));
})

JS.ORG CLEANUP

Hello, it seems a js.org subdomain that was requested to target this repository no longer works.
The subdomain requested was graffy.js.org and had the target of aravindet.github.io/graffy.
It produced the following failures when tested as part of the cleanup:

HTTP: Failed with status code '404 Not Found'
HTTPS: Failed with status code '404 Not Found'

To keep the js.org subdomain you should add a page with reasonable content within a month so the subdomain passes the validation.
Failure to rectify the issues will result in the requested subdomain being removed from JS.ORGs DNS and the list of active subdomains.

If you are wanting to keep the js.org subdomain and have added reasonable content, YOU MUST reply to the main cleanup issue with the response format detailed at the top to keep the requested subdomain.

🤖 Beep boop. I am a robot and performed this action automatically as part of the js.org cleanup process. If you have an issue, please contact the js.org maintainers.

Server should support more REST-like paths

This is a nice-to-have for 1.0.

As a prerequisite, we should add a soft convention for naming indexes, e.g. as '$<index_name>`, then we can do:

/posts?by=time&first=10&fields=slug,title,at,authors(first:1,name,avatar) should become:

{
  'posts$time': [ { first: 10 }, {
    slug: 1, title: 1, at: 1,
    authors: [ { first: 1}, {
      name: 1, avatar: 1
    } ]
  } ]
}

GET /posts/123?fields=slug,title,at,author(name,avatar) should become:

{
  'posts': { 123: {
    slug: 1, title: 1, at: 1,
    author: { name: 1, avatar: 1 }
  } }
}

Rethink Watch

TL:DR; Replace watch() with incremental read() polling

Why

The current implementation of watch() is complex to implement in providers and doesn't support back-pressure or resumption.

Steps

Implement the new where query version semantics ("if-changed-after")
- Specify that version is a non-negative number, and that version 0 has a special meaning
- Implement the querying of linked paths in graffy-link and drop graffy-fill completely
- Restrict the use of finalize() to queries with version 0 in core, pg and link
Implement query version filtering in slice(): exclude unchanged from both known and unknown
Implement query version filtering in pg by adding a condition on verCol
Implement the async iterator form of read() to perform incremental polling

Switch to the repeater library

https://repeater.js.org/docs/repeater

This can be a replacement for @graffy/stream (which can then be deprecated) and mergeIterators. mapStream can also be replaced with an async generator.

Wrap / unwrap undefined should remain undefined

Add heartbeats to server

Make graffy/react work with Suspense

Poor perf when pushing initial state in mockVisitorList

In the subscription provider of the example mock visitor list, pushing the initial state (rather than undefined) should improve performance slightly by not requiring a separate get. However it looks like it reduces performance drastically.

Requires investigation.

read output has extraneous null with final mode

When using final mode cache:

{ foo: { "1": "34" } }

the query:

{ foo: [ { first: 3 }, 1 ] }

returns

{ foo: [ null, "34" ] }

(roughly).

Counted queries and change streams

TL:DR; Some watch() providers may handle { after: '', before: 'b' } but not { first: 15 }. How do they comunicate this?

Original write-up

Graffy providers often have limitations around what queries they can fulfil. They need to be able to signal these limitations, so graffy-fill can figure out ways to work around them.

Currently, we use some ad-hoc mechanisms to signal limitations. Perhaps we could design these in a more systematic way.

Current approaches

Dangling links

Consider the posts and users example. Let's say the posts resolver cannot fetch user data - if author info was requested, it ignores the nested fields and simply returns a link as the author field.

Graffy-fill makes a new (live) query for the linked data.

Change streams

Imagine a subscription provider that can provide change streams but not the initial result (current state). It signals this by yielding undefined as the first value.

Graffy-fill makes a separate fetch to get the initial value.

New requirements

Page bounds

Imagine a change stream provider pushing updates for users. Say it does not have access to the current state, but can access an event stream of user updates where each update specifies the user_id.

Say the query is for the first 30 users.

In a scenario where there are thousands of users, MOST user updates will be irrelevant for this query. However, there is no way for this provider to know that, because it cannot know the range of IDs that match "first 30".

Perhaps there should be a way for the provider to signal that it cannot serve "counted" pages (i.e. that use first / last parameters) but can serve "bounded" ones (i.e. those that only have before AND after, but no first / last).

Graffy fill could use the fetch results to convert a "counted" page into a "bounded" one.

NOTE: If the pagination happens in an "index" (nodes where all the children are links), it will work fine if the change stream provider ignored the bounds queries and just pretend like there are no updates. However it seems like this is just working "by accident".

Helpful error messages

This watch handler did not yield an initial value within five seconds. If it's a change handler only, please ensure that it yields undefined first.
(more)

APIs on query objects

@baopham Thread to discuss what sort of APIs the query object should have to make it easy for providers that might want to (1) construct a query, like SQL or ES (2) identify topics to subscribe to.

Say you want to write a provider /users that needs to serve both queries like:

// 1
{ users: [ { first: 10 }, { name: true } ] }

// 2
{ users: { user_id_1: { email: true } } }

The provider might need to construct SQL queries:

# 1
SELECT name FROM users ORDER BY ID ASC LIMIT 10;

# 2
SELECT email FROM users WHERE id="user_id_1";

How would the "ideal" code to get from the query objects to the SQL look?

Graffy Query Language

The pure JS "porcelain" query format currently in use is fairly verbose. This is a proposal to mitigate that with a Graffy query language. It aims to be similar enough to GraphQL to be familiar for those using it, but is not necessarily compatible with it.

Here is an example query:

{
  books {
    ( tags: {foo, bar}, publishedUntil: '2000-01-01' ) [
      ( first: 10, after: ('1998-03-23', 4398) ) {
        author {
          name
          photo
        }
        title
        cover
        description
      }
    ]
  }
}

which is equivalent to the current porcelain:

{
  books: {
    [key({
      tags: {foo: true, bar: true},
      publishedUntil: '2000-01-01',
    })]: [
      {
        first: 10,
        after: key('1998-03-23', 4398),
      }, {
        author: { name: true, photo: true },
        title: true,
        cover: true,
        description: true,
      }
    ]
  }
}

The transformations (to the current porcelain structure) are quite straightforward:

(foo: 1) becomes key({ foo: 1 })
('foo', 'bar') becomes key(['foo', 'bar'])
{ foo, bar } becomes { foo: true, bar: true }
before, after etc. within [...] get collected into an object
, and : are added as needed

This change should be made in the graph builder.