banterfm / graphql-crunch Goto Github PK

Reduces the size of GraphQL responses by consolidating duplicate values

Shell 0.19% TypeScript 99.81%

graphql performance apollo relay

graphql-crunch's Introduction

graphql-crunch

Optimizes JSON responses by minimizing duplication and improving compressibility.

On Banter.fm, we see a 76% reduction in raw JSON size and a 30% reduction in gzip'd size. This leads to reduced transfer time and faster JSON parsing on mobile.

Client support
Installation
How does it work?
Motivation
Example
- Small Example
- Large Example
Usage
- Server-side
- Client-side

Client support

graphql-crunch is client agnostic and can be used anywhere that sends or receives JSON. We provide examples for integration with apollo-client as we use this in a GraphQL environment.

Installation

This library is distributed on npm. In order to add it as a dependency, run the following command:

$ npm install graphql-crunch --save

or with Yarn:

$ yarn add graphql-crunch

How does it work?

We flatten the object hierarchy into an array using a post-order traversal of the object graph. As we traverse we efficiently check if we've come across a value before, including arrays and objects, and replace it with a reference to it's earlier occurence if we've seen it. Values are only ever present in the array once.

Note: Crunching and uncrunching is an entirely lossless process. The final payload exactly matches the original.

Motivation

Large JSON blobs can be slow to parse on some mobile platforms, especially older Android phones, so we set out to improve that. At the same time we also wound up making the payloads more amenable to gzip compression too. GraphQL and REST-ful API responses tend to have a lot of duplication leading to huge payload sizes.

Example

In these examples, we use the SWAPI GraphQL demo.

Small Example

Using this query we'll fetch the first 2 people and their first 2 films and the first 2 characters in each of those films. We limit the connections to the first two items to keep the payload small:

{
  allPeople(first: 2) {
    people {
      name
      gender
      filmConnection(first: 2) {
        films {
          title
          characterConnection(first: 2) {
            characters {
              name
              gender
            }
          }
        }
      }
    }
  }
}

We get this response:

{
  "data": {
    "allPeople": {
      "people": [
        {
          "name": "Luke Skywalker",
          "gender": "male",
          "filmConnection": {
            "films": [
              {
                "title": "A New Hope",
                "characterConnection": {
                  "characters": [
                    {
                      "name": "Luke Skywalker",
                      "gender": "male"
                    },
                    {
                      "name": "C-3PO",
                      "gender": "n/a"
                    }
                  ]
                }
              },
              {
                "title": "The Empire Strikes Back",
                "characterConnection": {
                  "characters": [
                    {
                      "name": "Luke Skywalker",
                      "gender": "male"
                    },
                    {
                      "name": "C-3PO",
                      "gender": "n/a"
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "name": "C-3PO",
          "gender": "n/a",
          "filmConnection": {
            "films": [
              {
                "title": "A New Hope",
                "characterConnection": {
                  "characters": [
                    {
                      "name": "Luke Skywalker",
                      "gender": "male"
                    },
                    {
                      "name": "C-3PO",
                      "gender": "n/a"
                    }
                  ]
                }
              },
              {
                "title": "The Empire Strikes Back",
                "characterConnection": {
                  "characters": [
                    {
                      "name": "Luke Skywalker",
                      "gender": "male"
                    },
                    {
                      "name": "C-3PO",
                      "gender": "n/a"
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

After we crunch it, we get:

{
  "data": [
    "male",
    "Luke Skywalker",
    { "gender": 0, "name": 1 },
    "n/a",
    "C-3PO",
    { "gender": 3, "name": 4 },
    [2, 5],
    { "characters": 6 },
    "A New Hope",
    { "characterConnection": 7, "title": 8 },
    "The Empire Strikes Back",
    { "characterConnection": 7, "title": 10 },
    [9, 11],
    { "films": 12 },
    { "filmConnection": 13, "gender": 0, "name": 1 },
    { "filmConnection": 13, "gender": 3, "name": 4 },
    [14, 15],
    { "people": 16 },
    { "allPeople": 17 }
  ]
}

The transformed payload is substantially smaller. After converting both payloads to JSON (with formatting removed), the transformed payload is 49% fewer bytes.

When the client receives this, we simply uncrunch it and get back the exact original version for the client to handle.

Large Example

In real-world scenarios, we'll have modularized our shcema with fragments and have as well as connections that have more than two items in them. Here's a query similar to the one above except we don't limit the size of the connections and we request a standard set of selections on Person objects.

{
  allPeople {
    people {
      ...PersonFragment
      filmConnection {
        films {
          ...FilmFragment
        }
      }
    }
  }
}

fragment PersonFragment on Person {
  name
  birthYear
  eyeColor
  gender
  hairColor
  height
  mass
  skinColor
  homeworld {
    name
    population
  }
}

fragment FilmFragment on Film {
  title
  characterConnection {
    characters {
      ...PersonFragment
    }
  }
}

The resulting response from this query is roughly 1MB of JSON (989,946 bytes), but with tons of duplication. Here is how crunching impacts the payload size:

	Raw	Crunched	Improvement
Size	989,946B	28,220B	97.1%
GZip'd Size	22,240B	5,069B	77.2%

This is an admittedly extreme result, but highlights the potential for crunching payloads with large amounts of duplication.

Usage

Server-side

With apollo-server you can supply a custom formatResponse function. We use this to crunch the data field of the response before sending it over the wire.

import { ApolloServer } from "apollo-server";

const server = new ApolloServer({
  // schema, context, etc...
  formatResponse: (response) => {
    if (response.data) {
      response.data = crunch(response.data);
    }
    return response;
  },
});

server.listen({ port: 80 });

To maintain compatibility with clients that aren't expecting crunched payloads, we recommend conditioning the crunch on a query param, like so:

import url from "url";
import querystring from "querystring";
import { ApolloServer } from "apollo-server";

const server = new ApolloServer({
  // schema, context, etc...
  formatResponse: (response, options) => {
    const parsed = url.parse(options.context.request.url);
    const query = querystring.parse(parsed.query);

    if (query.crunch && response.data) {
      const version = parseInt(query.crunch) || 1;
      response.data = crunch(response.data, version);
    }

    return response;
  },
});

server.listen({ port: 80 });

Now only clients that opt-in to crunched payloads via the ?crunch=2 query parameter will receive them.

Your client can specify the version of the crunch format to use in the query parameter. If the version isn't specified, or an unknown version is supplied, we default to v1.0.

Client-side

On the client, we uncrunch the server response before the GraphQL client processes it.

With apollo-client, use a link configuration to setup an afterware, e.g.

import { ApolloClient } from 'apollo-client';
import { ApolloLink, concat } from 'apollo-link';
import { HttpLink } from 'apollo-link-http';
import { uncrunch } from 'graphql-crunch';

const http = new HttpLink({
  credentials: 'include',
  uri: '/api'
});

const uncruncher = new ApolloLink((operation, forward) =>
  forward(operation)
    .map((response) => {
      response.data = uncrunch(response.data);
      return response;
    });
);

const client = new ApolloClient({link: concat(uncruncher, http)});

graphql-crunch's People

Contributors

Stargazers

Watchers

Forkers

lfades gpsbird rich-harris ximet alvarlaigna zephraph stevekrenzel kpman jekiwijaya laurisvan mistereo binchik andrewprins akozhemiakin mikewongblinx

graphql-crunch's Issues

Integrate directly into resolvers

I really like graphql-crunch; great work! Out of curiosity, have you considered extending the "crunching" to go all the way into the resolver runtime itself?

I.e. if I've already called authorResolver.name(a1), just don't call that name for a1 again during the rest of this request.

Granted, this would need to rely on __typename+id semantics for identity, and also an understanding from resolvers that they won't change their output based on "where in the graph the object was fetched", i.e. anything in the 4th/info param of the resolver.

But, assuming that was the case, it seems like this could slice off a whole slew of work that apollo/graphql tools puts into making a giant JSON-with-repeated-info tree that graphql-crunch is going to immediately throw away ~70% of.

Thanks!

Not compiled to es5

graphql-crunch doesn't follow the convention of publishing compiled libraries as es5. This can lead to a site appearing to funcion normally while bombing outright on Microsoft Edge. :-(

Please follow this convention. It prevents accidents.

How do you use version 2 of api with apollo?

I encountered that Apollo would error on { crunched: Object, version:number } response due to its parseAndCheckHttpResponse method looking for data and errors keys and finding those above. To make things worse we have to make it friendly with batchedHttpLink, it crunches responses across all responses in the batch array. What do you use crunch v2 with?

Publish new version (2.1.4) with Typescript types to npm and create a new release

Since the codebase has been rewritten to Typescript, it would be great to have a new version published to npm and a github release created.

Add TypeScript definitions

To aid in adoption within graphql-yoga and beyond.

See here: dotansimha/graphql-yoga#303 (comment)

Further Crunching On Key Level

Hey there!

I noticed that the values are given numerical values, but not keys. There seems to be a lot of space savings on that front too.

What was the reason for not applying the same optimizations to keys? GraphQL properties can only be alphabetical characters, so we know that the keys can never be numbers.

Thanks for open sourcing this!

Wrong response if response in date string

Actual response

{
  "data": {
    "notifications": [
      {
        "id": "16",
        "createdAt": "2019-05-04T17:10:40.509Z"
      },
  }
}

crunched response:

{
  "data": [
    "16",
    {},
    {
      "id": 0,
      "createdAt": 1
    },
    [
      2
    ]
  ]
}

somehow the createdAt converted into empty object

version: 1

Hi i am facing problem in apolloclient config

i am getting the following error:

Error: no value resolved
at Object.complete (chrome-extension://jdkknkkbebbapilgoeccciglkfbmbnfm/dist/devtools.js:1:695191)
at complete (chrome-extension://jdkknkkbebbapilgoeccciglkfbmbnfm/dist/devtools.js:1:376930)
at t.s (chrome-extension://jdkknkkbebbapilgoeccciglkfbmbnfm/dist/devtools.js:1:673115)
at t.n.emit (chrome-extension://jdkknkkbebbapilgoeccciglkfbmbnfm/dist/devtools.js:1:307830)
at chrome-extension://jdkknkkbebbapilgoeccciglkfbmbnfm/dist/devtools.js:1:381756

please suggest.

Thanks & Regards
gmchaturvedi

Any plans to provide TypeScript declarations?

Unexpected efficiency results with crunching

Hi, I tried graphql-crunch on our largest json response: a newsfeed with posts. Common repeating elements are the posting user and associated groups.

Version 1 crunch

Seems to extract the common elements properly
The response file size seems actually larger than without crunching

Version 2 crunch

It extracts common elements at a lower level when extracting at a higher level would be valid and more efficient.
E.g. a postUser contains a user and an optional impersonated group. Instead of referencing the post user as a whole, it only references the user and impersonated group, repeating the post user structure everywhere.

I do like the fact that v2 is much more readable.

I've attached example files.

posts+crunch.zip

[Feature-Request][How-To] Use with Apollo-Server-Express

Is it or would it be possible to use with Apollo-Server-Express? I have it setup as follows

Server:

const server = new ApolloServer({
  typeDefs,
  resolvers,
  context: ({ req, res }) => ({
    models,
    user: req.user,
    req,
    res
  }),
  introspection: process.env.NODE_ENV === 'production' ? false : true,
  onHealthCheck: async () => {
    try {
      const result = await onHealthCheck(req, res)
      res.json({ status: 'pass', ...(result || {}) })
    } catch (err) {
      res.status(503).json({ status: 'fail' });
    }
  },
  formatResponse: (response) => {
    if (response.data) {
      response.data = crunch(response.data);
      return response;
    }
  }
});

Client:

const middlewareLink = new ApolloLink((operation, forward) => {
  operation.setContext({
    headers: {
      accessToken: localStorage.getItem('x-access-token') || null,
      refreshToken: localStorage.getItem('x-refresh-token') || null
    }
  });
  // return forward(operation);
  return forward(operation).map((response) => {return uncrunch(response.data)});
});

But am getting the following error message when I attempt to query data
{"errors":[{"message":"crunch is not defined","extensions":{"code":"INTERNAL_SERVER_ERROR","exception":{"stacktrace":["ReferenceError: crunch is not defined"," at Object.formatResponse (D:\\Repos\\DBI\\graphql-apollo\\src\\server.js:71:7)"," at Object.<anonymous> (D:\\Repos\\DBI\\graphql-apollo\\node_modules\\apollo-server-core\\dist\\requestPipeline.js:193:50)"," at Generator.next (<anonymous>)"," at fulfilled (D:\\Repos\\DBI\\graphql-apollo\\node_modules\\apollo-server-core\\dist\\requestPipeline.js:5:58)"," at processTicksAndRejections (internal/process/task_queues.js:97:5)"]}}}]}

Integrate a chrome extension for previews

this is more of a suggestion but it would be really nice to be able to have a preview of the API response when using this with Apollo graphql. it would make debugging way easier

Explanation of the v2 `encode` function

Hey!
I have most of my (GraphQL) servers written in Rust and wanted to port the algorithm used here to make it compatible with the JS world.

However, I do not really understand the need of the encode function when it comes down to numbers. Why are they multiplied by 2 (and sometimes incremented by 1)?
Additionally, wouldn't multiplying the value by 2 decrease the available number space to half its normal size? Meaning that servers can't use the whole number range?

formatResponse(response, options) field options.context.request is undefined

I want to access the request URL inside formatResponse handler.

const server = new ApolloServer({
  // schema, context, etc...
  formatResponse: (response, options) => {
    const parsed = url.parse(options.context.request.url);
    const query = querystring.parse(parsed.query);

    if(query.crunch && response.data) {
      const version = parseInt(query.crunch) || 1;
      response.data = crunch(response.data, version);
    }

    return response;
  },
});

but the request field is undefined inside `const parsed = url.parse(options.context.request.url);`

even typescript complains about it.

I know it's related to apollo-server. but it's an example provided in graphql-crunch I think there may be something wrong with this example or maybe with apollo-server

[Question] Can this be used in normal JSON minify, like Restful json response?

Dynamic require defined at line 5; not supported by Metro

Trying to implement this wi react native and Apollo Client, but dynamically calling require does not work.

What is the reason for creating graphql-crunch when graphql-deduplicator is available?

This package seems to mimic the behaviour graphql-deduplicator, which has been available for a long-time before graphql-crunch. What is your motivation for maintaining your own implementation and would you not rather contribute to the former library?