d12 / graphql-remote_loader Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 1.0 89 KB

Performant remote GraphQL queries from within the resolvers of a Ruby GraphQL API.

Ruby 100.00%

graphql graphql-ruby graphql-server ruby ruby-gem schema-stitching

graphql-remote_loader's People

Contributors

Stargazers

Watchers

Forkers

carnegie-mona-university

graphql-remote_loader's Issues

Queries for Nested Related Data

When trying to query for a collection that has a remote relationship, I'm noticing that we have an inefficient query happening in the remote service where the related objects live. In a graphql query requesting a paginated collection of Object 1s that has a foreign key for Object 2 (something like query { object1Collection {... object2 {...} } }), where Object 2 is located in a remote service, each instance of Object 2 is retrieved one by one rather than all together (like Object2.where(id: [array_of_ids]).

In looking through the graphql-remote_loader source code, it looks like this was an intentional design decision so that arguments aren't collapsed so you can request different properties for each instance. For our use of the gem, I don't think we'd ever have a need for that functionality, but rather we'd have multiple uses where we'd like arguments to be collapsed in such a way that instead of getting query { object2(id: 1) {...} } { object2(id: 2) {...} } etc, we would get query { object2(id: [1,2]) {...} }.

Hopefully I've explained this in a way that makes enough sense, and let me know if I need to expand further.

I'm wondering if I'm missing something about how to get the gem to work for that kind of scenario and if you have any advice on how to address it, or if this just not an intended use of the gem (though I'm thinking this would be a fairly common scenario). If not, obviously we could fork the gem and change how that part works for ourselves, but I wanted to check with you first to see if I was misunderstanding something.

Thanks

Support query variables

Setup GitHub Actions

Looks like Travis stopped working, we should move this project to GitHub Actions anyways.

Make selection alias prefixes more space-efficient

Background

Every selection in a GraphQL query that goes through the remote_loader gets an alias before it gets sent to the remote API that includes some important information for the gem. The main piece of information is "who asked for this field".

We currently achieve this by assigning each input GraphQL query a unique prime number, then when we merge queries, each selection gets prefixed by the product of all primes that asked for that selection. An example:

Input query A: { viewer { login } } => Assign p=2
Input query B: { viewer { id } } => Assign p=3
Input query C: { user(login: "a"){ id } } => Assign p=5

Merged query:

{
  p6_viewer: viewer {
    p2_login: login
    p3_id: id
  }
  p5_user: user {
    p5_id: id
  }
}

Viewer is prefixed by 6 because both primes 2 and 3 requested selections on viewer.

Suppose the query returned this response JSON:

{
  "data": {
    "p6_viewer": {
      "p2_login": "foo",
      "p3_id": "M2v8P=="
    },
    "p5_user": {
      "p5_id": "Mw9LP=="
    }
  }
}

We can determine what data in the response JSON each promise asked for via prime factorization. 6 and 2 both have 2 as a prime factor, so p=2 should be resolved with:

{
  "data": {
    "viewer": {
      "login": "foo"
    }
  }
}

The problem

The primorial function grows exponentially. The product of the first 100 primes is 220 digits long. While GraphQL aliases don't have length caps, large numbers of aliases this size can cause the request size to bloat.

Instead of using prime factorization to determine who asked for what, we should use sums of powers of 2 instead ( a bitmap casted to base 10 ).

Assign each query an id = 2^n.

Query 0 => base2(0001) => 1
Query 1 => base2(0010) => 2
Query 2 => base2(0100) => 4
Query 3 => base2(1000) => 8

Now to build prefixes, we can sum these IDs instead or multiplying them. A selection from 0, 2 and 3 would be (0001) + (0100) + (1000) = 1101 = 13.

To figure out what promises need data given a sum of ID's, just split a number into it's powers of 2. 13 = 2^0 + 2^2 + 2^3, so queries 0, 2 and 3 asked for the data.

Space complexity

I was fairly certain that the sum of the first n powers of 2 grew slower than the product of the first n primes, even though powers of 2 grow faster than primes. I graphed the two functions out to be sure.

Red is the log of the product of n primes, Blue is the log of the sum of the first n powers of 2. The power of primes function actually stopped working after 15 or so, Google Sheets started thinking the results were strings and not numbers.

tldr

Replace product of prime alias prefixing with bitmap aliasing for space complexity reasons.

Mutation support

Currently the remote loader only offers Query support.

Look into whether mutations can be batched, how sending multiple mutations in one req. works, etc.

Write documentation

Some simple usecases are documented in the README.md but we could really use some thorough documentation.

Write example app

A simple example app may help to show how this can be used.

Make it easier to directly expose fields/types from the backing API

Currently, if you want to use a field from the backing API without doing any processing, you need to do this:

  field :login, String, null: false, description: "The currently authenticated GitHub user's login."

  def login
    GitHubLoader.load("viewer { login }").then do |results|
      results["viewer"]["login"]
    end
  end

We can make this way more concise with some sort of helper.

Also, there's no way to expose an entire type with all of it's fields right now. This could be handy, especially to do something like "type A from my local API and type B from my remote API are the same, I want to include all fields on type B into type A". The only info we'd need is a function from object A to a query for object B in the remote API.

Write integration tests

The loader is currently untested. We need integration tests!

Remote loader does not work with graphql 1.9.2

Using the graphql-remote loader example app (https://github.com/d12/graphql-remote_loader_example), I get the following error:

NoMethodError (undefined method 'alias=' for #GraphQL::Language::Nodes::Field:0x00007f8c5ede00c0
Did you mean? alias):

app/controllers/graphql_controller.rb:12:in 'execute'

The graphql gem version is not specified in the gemfile, and it is using version 1.9.2. Downgrading to 1.8.13 makes everything work as expected.

Querying camelCase but receiving snake_case

Aliases are working fine. I can rename things in my query. But the bad thing is, that in the result everything is snake_cased.

Is this line responsible for that?

graphql-remote_loader/lib/graphql/remote_loader/loader.rb

Line 156 in 6ee709b

    
           filtered_results[underscore(field_name)] = filter_keys_on_data(value, caller_id)

Is the underscore necessary?
I am processing the resulting data in JavaScript with an existing code, based on camelCase data.
There was a different GraphQL server before. Now I am merging things together with your package and encountered the problem.

Provide better support for errors

There really isn't any testing for how errors are handled yet. Ideally, errors key entries are only fulfilled to promises who requested fields that errored.

GitHub syntax highlighting is broken on loader.rb

https://github.com/d12/graphql-remote_loader/blob/master/lib/graphql/remote_loader/loader.rb

A Heredoc is breaking the syntax highlighting on GitHub.com :/ If there's a simple way to fix that, that would be great.

Migrate to graphql-ruby query parsing

I rolled my own low-effort query tokenizer and AST generator, but graphql-ruby includes a parse that's far more robust. We should migrate to that ASAP

Support field aliases

This gem assigns field aliases on every field and connection in the query as way to a) resolve naming conflicts and b) Determine which fields requested what data in the response payload.

To support field aliases, the original field alias will need to be encoded into the computed alias, then extracted when fulfilling promises.

Support directives

Directives will probably break my parser. Once #1 is resolved, this should be easier to do.

Support fragments

Currently this works:

... on Type {
  fields
}

But something like this would not:

... FragmentName

fragment FragmentName on Type {
  fields
}

When #1 merges, this should be easier to do.

Internal index growth leading to long queries

We are using graphql-remote_loader gem to forward GraphQL queries internally and I have noticed in the logs that the longer the application is running and forwarding the queries, the size of the queries is growing, leading to queries like the one below.

I have traced it back to the @index member that is incremented on each load call. A simple fix was something along these lines:

class MyLoader < GraphQL::RemoteLoader::Loader
  def query(query_string, context:)
    response = client.query(query_string, context: context)
    MyLoader.reset_index
    response
  end
end

I only saw reset_index method to be used in tests, so I am not sure, what consequences can this change have for the behaviour of the loader. Assuming that each call of load returns a new object, I cannot think about any case, but I just want to be 100% sure.

Another question would be, if we could use something like uuid instead of an incremental index to differentiate between queries?

Thanks in advance for the response!

Validate (syntax + type check) queries

Presently, if one field tries to load a syntactically invalid or type-invalid query, it will break the entire batched query and no data will be loaded.

This could be avoided by validating queries before the merging step.

This will incur a performance hit though, and it'll require users to provide a schema definition for the APIs their dependent on. A good mid-way point might just be syntactically validating queries, and ignoring type-correctness. This way, we don't need an SDL.

Field 'service' doesn't exist on type 'Query'

I get the following error:

{
  "error": {
    "message": "Field 'service' doesn't exist on type 'Query'",
    ...

service exists on the remote GraphQL schema. How does the base app schema get to know about that? I cannot duplicate all fields and types as I thought this is what graphql-remote_loader is for.

How is viewer known in the example app? Maybe this could help me.

Pagination

Is there a way to load cursor paginated queries with this? My first thought would be to just make proxy params for the loading like first and after but another key requirement i have is I want to also be able to load other objects as well. For instance batch loading the GitHub issues query with repositories

I can provide a sample of the query I’m trying to proxy to If that helps

Allow contexts

Currently, there's no way to pass information from the #load or #load_value call to the loader. As an e.g., perhaps we need to pass down authentication information from the local GraphQL API to the loader which queries the remote API.