Batching Mechanisms for Distributed Executors
To implement efficient distributed executors for composite schemas, we need robust batching mechanisms. While introducing explicit batching fields for fetching entities by keys is a straightforward approach, it becomes challenging when entities have data dependencies on other schemas.
Consider the following GraphQL schema:
type Query {
orderById(id: ID!): Order
# batching field
ordersById(ids: [ID!]!): [Order]!
}
The issue arises with directives like @require for lower-level fields, where simple batching is insufficient for data dependencies.
Example Scenario:
Source Schema 1:
type Query {
orderById(id: ID!): Order
ordersById(ids: [ID!]!): [Order]!
}
type Order {
id: ID!
deliveryEstimate(dimension: ProductDimensionInput! @require(fields: "product { dimension }")) : Int!
}
Source Schema 2:
type Query {
orderById(id: ID!): Order
ordersById(ids: [ID!]!): [Order]!
}
type Order {
id: ID!
product: Product
}
In distributed executor queries, batching individual requirements for each key becomes problematic:
query($ids: [ID!]! $requirement: ProductDimensionInput!) { # < --- we cannot have a requirement for each key
ordersById(ids: $ids) {
dimension(dimension: $requirement)
}
}
Apollo Federation's _entities field introduces a workaround, allowing partial data representation without the need for untyped inputs. While effective, an ideal solution would avoid necessitating subgraphs to introduce special fields like _entities
.
extend type Query {
_entities(representations: [_Any!]!): [_Entity]!
}
The _entities
field allows to pass in data that represents partial data of an object. This works around how GraphQL works and introduces untyped inputs. Ideally we want to find a way for batching requests that do not require a subgraph to introduce a field like _entities
.
Batching Approaches
The GraphQL ecosystem has devised various batching approaches, each with its own set of advantages and drawbacks.
Request Batching
Request Batching is the most straightforward approach, where multiple GraphQL requests are sent in a single HTTP request. This method is widely adopted due to its simplicity and compatibility with many GraphQL servers. However, the lack of semantical relation between the batched requests limits optimization opportunities, as each request is executed in isolation. This could result in inefficiencies, especially when there are potential overlaps in the data required by each request.
[
{
"query": "query getHero { hero { name } }",
"operationName": "getHero",
"variables": {
"a": 1,
"b": "abc"
}
},
{
"query": "query getHero { hero { name } }",
"operationName": "getHero",
"variables": {
"a": 1,
"b": "abc"
}
},
]
Pros:
- Broad adoption across GraphQL servers.
- Straightforward implementation.
Cons:
- Executes each request in isolation, lacking semantical relation.
- Challenges in optimizing due to isolated execution.
Operation Batching
Operation Batching, as shown by Lee Byron in 2016, leverages the @export
directive to flow data between operations within a single HTTP request. This approach introduces the ability to use the result of one operation as input for another, enhancing flexibility and enabling more complex data fetching strategies. The downside is the complexity of implementation and the fact that it’s not widely adopted, which may limit its practicality for some projects. Additionally, it does not really target our problem space.
POST /graphql?batchOperations=[Operation2,Operation1]
{
"query": "query Operation1 { stories { id @export(as: \"storyIds\") } } query Operation2($storyIds: [Int!]!) { soriesById(ids: $ids) { name } }"
}
Pros:
- Facilitates data flow between requests.
Cons:
- Complex implementation.
- Limited adoption
- Niche application (precursor of defer).
Variable Batching
Variable Batching addresses a specific batching use case by allowing a single request to carry multiple sets of variables, potentially enabling more optimized execution paths through the executor. In experimentations we could reduce the batching overhead to the impact a DataLoder has on a request, which is promising.
{
"query": "query getHero($a: Int!, $b: String!) { field(a: $a, b: $b) }",
"variables": [
{
"a": 1,
"b": "abc"
},
{
"a": 2,
"b": "def"
}
]
}
Pros:
- Optimizes a single request path.
- Relatively simple to implement.
Cons:
Alias Batching
Alias Batching uses field aliases to request multiple resources within a single GraphQL document, making it possible with every spec-compliant GraphQL server. This method’s strength lies in its compatibility and ease of use. However, it significantly hinders optimization because each GraphQL request is essentially a unique request, preventing effective caching strategies (validation, parsing, query planing). While it might solve the immediate problem of batching requests, its impact on performance and scalability makes it not ideal.
{
a: product(id: 1) {
...
}
b: product(id: 2) {
...
}
c: product(id: 3) {
...
}
}
Pros:
- Compatible with all GraphQL servers.
- Simple to use for batching requests.
Cons:
- Hinders optimization due to treating each request as unique.
- Prevents effective caching strategies (validation, parsing, query planing).