d12 / graphql-remote_loader Goto Github PK
View Code? Open in Web Editor NEWPerformant remote GraphQL queries from within the resolvers of a Ruby GraphQL API.
Performant remote GraphQL queries from within the resolvers of a Ruby GraphQL API.
When trying to query for a collection that has a remote relationship, I'm noticing that we have an inefficient query happening in the remote service where the related objects live. In a graphql query requesting a paginated collection of Object 1s that has a foreign key for Object 2 (something like query { object1Collection {... object2 {...} } }
), where Object 2 is located in a remote service, each instance of Object 2 is retrieved one by one rather than all together (like Object2.where(id: [array_of_ids]
).
In looking through the graphql-remote_loader source code, it looks like this was an intentional design decision so that arguments aren't collapsed so you can request different properties for each instance. For our use of the gem, I don't think we'd ever have a need for that functionality, but rather we'd have multiple uses where we'd like arguments to be collapsed in such a way that instead of getting query { object2(id: 1) {...} } { object2(id: 2) {...} } etc
, we would get query { object2(id: [1,2]) {...} }
.
Hopefully I've explained this in a way that makes enough sense, and let me know if I need to expand further.
I'm wondering if I'm missing something about how to get the gem to work for that kind of scenario and if you have any advice on how to address it, or if this just not an intended use of the gem (though I'm thinking this would be a fairly common scenario). If not, obviously we could fork the gem and change how that part works for ourselves, but I wanted to check with you first to see if I was misunderstanding something.
Thanks
Looks like Travis stopped working, we should move this project to GitHub Actions anyways.
Every selection in a GraphQL query that goes through the remote_loader gets an alias before it gets sent to the remote API that includes some important information for the gem. The main piece of information is "who asked for this field".
We currently achieve this by assigning each input GraphQL query a unique prime number, then when we merge queries, each selection gets prefixed by the product of all primes that asked for that selection. An example:
Input query A: { viewer { login } }
=> Assign p=2
Input query B: { viewer { id } }
=> Assign p=3
Input query C: { user(login: "a"){ id } }
=> Assign p=5
Merged query:
{
p6_viewer: viewer {
p2_login: login
p3_id: id
}
p5_user: user {
p5_id: id
}
}
Viewer is prefixed by 6 because both primes 2 and 3 requested selections on viewer.
Suppose the query returned this response JSON:
{
"data": {
"p6_viewer": {
"p2_login": "foo",
"p3_id": "M2v8P=="
},
"p5_user": {
"p5_id": "Mw9LP=="
}
}
}
We can determine what data in the response JSON each promise asked for via prime factorization. 6 and 2 both have 2 as a prime factor, so p=2 should be resolved with:
{
"data": {
"viewer": {
"login": "foo"
}
}
}
The primorial function grows exponentially. The product of the first 100 primes is 220 digits long. While GraphQL aliases don't have length caps, large numbers of aliases this size can cause the request size to bloat.
Instead of using prime factorization to determine who asked for what, we should use sums of powers of 2 instead ( a bitmap casted to base 10 ).
Assign each query an id = 2^n.
Query 0 => base2(0001) => 1
Query 1 => base2(0010) => 2
Query 2 => base2(0100) => 4
Query 3 => base2(1000) => 8
Now to build prefixes, we can sum these IDs instead or multiplying them. A selection from 0, 2 and 3 would be (0001) + (0100) + (1000) = 1101 = 13.
To figure out what promises need data given a sum of ID's, just split a number into it's powers of 2. 13 = 2^0 + 2^2 + 2^3, so queries 0, 2 and 3 asked for the data.
I was fairly certain that the sum of the first n powers of 2 grew slower than the product of the first n primes, even though powers of 2 grow faster than primes. I graphed the two functions out to be sure.
Red is the log of the product of n primes, Blue is the log of the sum of the first n powers of 2. The power of primes function actually stopped working after 15 or so, Google Sheets started thinking the results were strings and not numbers.
Replace product of prime alias prefixing with bitmap aliasing for space complexity reasons.
Currently the remote loader only offers Query support.
Look into whether mutations can be batched, how sending multiple mutations in one req. works, etc.
Some simple usecases are documented in the README.md but we could really use some thorough documentation.
A simple example app may help to show how this can be used.
Currently, if you want to use a field from the backing API without doing any processing, you need to do this:
field :login, String, null: false, description: "The currently authenticated GitHub user's login."
def login
GitHubLoader.load("viewer { login }").then do |results|
results["viewer"]["login"]
end
end
We can make this way more concise with some sort of helper.
Also, there's no way to expose an entire type with all of it's fields right now. This could be handy, especially to do something like "type A from my local API and type B from my remote API are the same, I want to include all fields on type B into type A". The only info we'd need is a function from object A to a query for object B in the remote API.
The loader is currently untested. We need integration tests!
Using the graphql-remote loader example app (https://github.com/d12/graphql-remote_loader_example), I get the following error:
NoMethodError (undefined method 'alias=' for #GraphQL::Language::Nodes::Field:0x00007f8c5ede00c0
Did you mean? alias):
app/controllers/graphql_controller.rb:12:in 'execute'
The graphql gem version is not specified in the gemfile, and it is using version 1.9.2. Downgrading to 1.8.13 makes everything work as expected.
Aliases are working fine. I can rename things in my query. But the bad thing is, that in the result everything is snake_cased.
Is this line responsible for that?
Is the underscore
necessary?
I am processing the resulting data in JavaScript with an existing code, based on camelCase data.
There was a different GraphQL server before. Now I am merging things together with your package and encountered the problem.
There really isn't any testing for how errors are handled yet. Ideally, errors key entries are only fulfilled to promises who requested fields that errored.
https://github.com/d12/graphql-remote_loader/blob/master/lib/graphql/remote_loader/loader.rb
A Heredoc is breaking the syntax highlighting on GitHub.com :/ If there's a simple way to fix that, that would be great.
I rolled my own low-effort query tokenizer and AST generator, but graphql-ruby
includes a parse that's far more robust. We should migrate to that ASAP
This gem assigns field aliases on every field and connection in the query as way to a) resolve naming conflicts and b) Determine which fields requested what data in the response payload.
To support field aliases, the original field alias will need to be encoded into the computed alias, then extracted when fulfilling promises.
Directives will probably break my parser. Once #1 is resolved, this should be easier to do.
Currently this works:
... on Type {
fields
}
But something like this would not:
... FragmentName
fragment FragmentName on Type {
fields
}
When #1 merges, this should be easier to do.
We are using graphql-remote_loader
gem to forward GraphQL queries internally and I have noticed in the logs that the longer the application is running and forwarding the queries, the size of the queries is growing, leading to queries like the one below.
I have traced it back to the @index member that is incremented on each load
call. A simple fix was something along these lines:
class MyLoader < GraphQL::RemoteLoader::Loader
def query(query_string, context:)
response = client.query(query_string, context: context)
MyLoader.reset_index
response
end
end
I only saw reset_index
method to be used in tests, so I am not sure, what consequences can this change have for the behaviour of the loader. Assuming that each call of load
returns a new object, I cannot think about any case, but I just want to be 100% sure.
Another question would be, if we could use something like uuid instead of an incremental index to differentiate between queries?
Thanks in advance for the response!
Presently, if one field tries to load a syntactically invalid or type-invalid query, it will break the entire batched query and no data will be loaded.
This could be avoided by validating queries before the merging step.
This will incur a performance hit though, and it'll require users to provide a schema definition for the APIs their dependent on. A good mid-way point might just be syntactically validating queries, and ignoring type-correctness. This way, we don't need an SDL.
I get the following error:
{
"error": {
"message": "Field 'service' doesn't exist on type 'Query'",
...
service
exists on the remote GraphQL schema. How does the base app schema get to know about that? I cannot duplicate all fields and types as I thought this is what graphql-remote_loader
is for.
How is viewer
known in the example app? Maybe this could help me.
Is there a way to load cursor paginated queries with this? My first thought would be to just make proxy params for the loading like first and after but another key requirement i have is I want to also be able to load other objects as well. For instance batch loading the GitHub issues query with repositories
I can provide a sample of the query Iโm trying to proxy to If that helps
Currently, there's no way to pass information from the #load
or #load_value
call to the loader. As an e.g., perhaps we need to pass down authentication information from the local GraphQL API to the loader which queries the remote API.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.