graphql / composite-schemas-spec Goto Github PK

License: MIT License

Shell 100.00%

composite-schemas-spec's Introduction

Stage 0: Preliminary

This spec is in the proposal stage of active development, and can change before reaching Draft stage. For more information, please see the Roadmap or how to get involved.

GraphQL Composite Schema Spec

Spec Location

The GraphQL Composite Schemas specification is edited in the spec-md markdown file.

In the future, we plan that you would be able to view the generated form of the specification as well.

Contributing to this repo

This repository is managed by EasyCLA. Project participants must sign the free (GraphQL Specification Membership agreement before making a contribution. You only need to do this one time, and it can be signed by individual contributors or their employers.

To initiate the signature process please open a PR against this repo. The EasyCLA bot will block the merge if we still need a membership agreement from you.

You can find detailed information here. If you have issues, please email [email protected].

If your company benefits from GraphQL and you would like to provide essential financial support for the systems and people that power our community, please also consider membership in the GraphQL Foundation.

composite-schemas-spec's People

Contributors

Stargazers

Watchers

Forkers

martijnwalraven yaacovcr simonsapin chillicream smyrick pascalsenn

composite-schemas-spec's Issues

Should entity resolvers be limited to the root level.

Do we want to allow entity resolvers that can be nested?

type Query {
  persons: PersonResolvers! @internal
}

type PersonResolvers @entityResolver {
  byId(id: ID!): Person
  byName(name: String!): Person
}

Discuss the effect on public schema of `@internal` directive

In Federation v2, the @inaccessible directive (in our case it's called @internal) hides a field from the end consumer of GraphQL API.

# schema A

type User @key(fields: "id") {
  id: ID!
  secret: ID: @internal @shareable
}

# schema B

type User @key(fields: "id") {
  id: ID!
  secret: ID @shareable
}

What GraphQL API consumer sees:

type User {
  id: ID!
}

I would say this approach (hiding a field if it's marked as internal at least once) is a good practice as it reduces chances of leaking a field to the public.
The opposite approach would be to mark a field as internal in all subgraphs that define the field (easy to leak something).

It becomes problematic with combination of the @lookup directive.
I can imagine a situation where there are more than one lookup fields of the same name, and only the subset of them is meant to be used for querying data.

Here's what I mean:

# schema A

type Query {
  userById(id: ID!): User @lookup
}

# schema B

type Query {
  userById(id: ID!): User @lookup @internal
}

In this example, a team that develops schema B, may want to limit the usage of the field to make it available only for the query planner, specifically the entity resolution (internal request to resolve a type based on its key).

What are your opinions? Should we introduce two different behaviors for @internal with and without @lookup or stick to what we have today (At least one @internal to hide a field completely).

What term should we use for a federated gateway / router in this specification?

The draft specification in this repository currently refers to a gateway configuration for what we call a supergraph schema at Apollo. That opens up the question of how we should refer to the component that is responsible for executing federated queries.

At Apollo, we've started referring to this component as a graph router because we feel that better reflects the fact that it intelligently routes requests to appropriate subgraph servers as needed. But perhaps what is needed here is a more neutral term that describes its role within the architecture.

More GraphQL native expression for FieldSelection?

I'd like to explore a way to express the FieldSelection or Schemacoordinate value with something less surprising. It's surprising to me to see strings representing selection sets (even though they're used in Federation) or index paths in a GraphQL schema.

Examples of how it's written with those in the current proposal:

directive @is(
  field: FieldSelection
  coordinate: Schemacoordinate
) on FIELD_DEFINITION | ARGUMENT_DEFINITION | INPUT_FIELD_DEFINITION

extend type Query {
  personById(id: ID! @is(field: "id")): Person @entityResolver
}

extend type Review {
  productSKU: ID! @is(coordinate: "Product.sku") @internal
  product: Product @resolve
}

(My understanding is the FieldSelection one also behaves like the one in Federation, where "id foo" is also valid for field foo.)

Can these be expressed with an existing construct, like referencing a type and its fields?
Can new keywords be introduced which allow this kind of selection?

Terminology: gateway v.s. distributed executor

The term "gateway" is use a few times across this repository. For example:

composite-schemas-spec/spec/Section 1 -- Overview.md

Lines 43 to 45 in aaf454b

    
           To enable greater interoperability between different implementations of tooling 
        
           and gateways, this specification focuses on two core components: schema 
        
           composition and distributed execution.

Based on the existence of @apollo/gateway I can guess it is used interchangeably with "distributed (GraphQL) executor", which has a proper definition in the Overview. I suggest using "execution" or "distributed executor" everywhere. (No need to repeat "GraphQL" every time, this being a document of the GraphQL WG.)

Finding the right Batching protocol

Separate from the Batching Mechanism (the payload format inside the request/protocol), we need to discuss how the data can be returned for a batched request.

So far we have discussed:

HTTP Multipart
SSE
JSONL

Benjie has written up a small gist of all these here:
https://gist.github.com/benjie/f696f494878ddebb423c978ccb3a39df

Finding the right Batching Mechanism

Batching Mechanisms for Distributed Executors

To implement efficient distributed executors for composite schemas, we need robust batching mechanisms. While introducing explicit batching fields for fetching entities by keys is a straightforward approach, it becomes challenging when entities have data dependencies on other schemas.

Consider the following GraphQL schema:

type Query {
  orderById(id: ID!): Order

  # batching field
  ordersById(ids: [ID!]!): [Order]!
}

The issue arises with directives like @require for lower-level fields, where simple batching is insufficient for data dependencies.

Example Scenario:

Source Schema 1:

type Query {
    orderById(id: ID!): Order
    ordersById(ids: [ID!]!): [Order]!
}

type Order {
    id: ID!
    deliveryEstimate(dimension: ProductDimensionInput! @require(fields: "product { dimension }")) : Int!
}

Source Schema 2:

type Query {
  orderById(id: ID!): Order
  ordersById(ids: [ID!]!): [Order]!
}

type Order {
  id: ID!
  product: Product
}

In distributed executor queries, batching individual requirements for each key becomes problematic:

query($ids: [ID!]! $requirement: ProductDimensionInput!)  { # < --- we cannot have a requirement for each key
    ordersById(ids: $ids) {
        dimension(dimension: $requirement) 
    }
}

Apollo Federation's _entities field introduces a workaround, allowing partial data representation without the need for untyped inputs. While effective, an ideal solution would avoid necessitating subgraphs to introduce special fields like _entities.

extend type Query {
  _entities(representations: [_Any!]!): [_Entity]!
}

The _entities field allows to pass in data that represents partial data of an object. This works around how GraphQL works and introduces untyped inputs. Ideally we want to find a way for batching requests that do not require a subgraph to introduce a field like _entities.

Batching Approaches

The GraphQL ecosystem has devised various batching approaches, each with its own set of advantages and drawbacks.

Request Batching

Request Batching is the most straightforward approach, where multiple GraphQL requests are sent in a single HTTP request. This method is widely adopted due to its simplicity and compatibility with many GraphQL servers. However, the lack of semantical relation between the batched requests limits optimization opportunities, as each request is executed in isolation. This could result in inefficiencies, especially when there are potential overlaps in the data required by each request.

[
  {
      "query": "query getHero { hero { name } }",
      "operationName": "getHero",
      "variables": {
          "a": 1,
          "b": "abc"
      }
  },
  {
      "query": "query getHero { hero { name } }",
      "operationName": "getHero",
      "variables": {
          "a": 1,
          "b": "abc"
      }
  },
]

Pros:

Broad adoption across GraphQL servers.
Straightforward implementation.

Cons:

Executes each request in isolation, lacking semantical relation.
Challenges in optimizing due to isolated execution.

Operation Batching

Operation Batching, as shown by Lee Byron in 2016, leverages the @export directive to flow data between operations within a single HTTP request. This approach introduces the ability to use the result of one operation as input for another, enhancing flexibility and enabling more complex data fetching strategies. The downside is the complexity of implementation and the fact that it’s not widely adopted, which may limit its practicality for some projects. Additionally, it does not really target our problem space.

POST /graphql?batchOperations=[Operation2,Operation1]
{
  "query": "query Operation1 { stories { id @export(as: \"storyIds\") } } query Operation2($storyIds: [Int!]!) { soriesById(ids: $ids) { name } }"
}

Pros:

Facilitates data flow between requests.

Cons:

Complex implementation.
Limited adoption
Niche application (precursor of defer).

Variable Batching

Variable Batching addresses a specific batching use case by allowing a single request to carry multiple sets of variables, potentially enabling more optimized execution paths through the executor. In experimentations we could reduce the batching overhead to the impact a DataLoder has on a request, which is promising.

{
  "query": "query getHero($a: Int!, $b: String!) { field(a: $a, b: $b) }",
  "variables": [
    {
      "a": 1,
      "b": "abc"
    },
    {
      "a": 2,
      "b": "def"
    }
  ]
}

Pros:

Optimizes a single request path.
Relatively simple to implement.

Cons:

Limited adoption.

Alias Batching

Alias Batching uses field aliases to request multiple resources within a single GraphQL document, making it possible with every spec-compliant GraphQL server. This method’s strength lies in its compatibility and ease of use. However, it significantly hinders optimization because each GraphQL request is essentially a unique request, preventing effective caching strategies (validation, parsing, query planing). While it might solve the immediate problem of batching requests, its impact on performance and scalability makes it not ideal.

{
  a: product(id: 1) {
    ...
  }
  b: product(id: 2) {
    ...
  }
  c: product(id: 3) {
    ...
  }
}

Pros:

Compatible with all GraphQL servers.
Simple to use for batching requests.

Cons:

Hinders optimization due to treating each request as unique.
Prevents effective caching strategies (validation, parsing, query planing).

Can we come up with a better term for `@entityResolver` / `@finder`?

The draft specification in this repository proposes the directive @entityResolver as a marker for fields that allow looking up entities by their keys. At Apollo, we've had a longstanding proposal for similar functionality where we've used @finder for the same functionality.

We discussed the term entity resolvers internally, but decided it wasn't a good match because resolvers are part of the implementation of a schema, and what we're trying to convey here is the semantics of being able to look up an entity by its key (e.g. looking up a book by its isbn or a product by its upc). We've thought about @lookup, but that didn't seem specific enough because you can look up entities by other characteristics as well (e.g. looking up all books written by a particular author).

Better terms for supergraph, subgraph and public API needed.

The term graph is not really used in the graphql core spec. When we talk about the composition in the boundaries of the spec we are using schema documents to compose from it a annotated single schema. I discussed this a bit with @benjie and we both feel that supergraph, subgraph and public API are not good terms.

"SubSchema" => a partial part of the "Superschema"
"SuperSchema" => annotated schema aka executor config
"PublicSchema" => the schema we expose to the enduser

These are also not good terms and we need to reflect a bit on better terms.

Do we agree execution mechanisms are out of scope for this specification?

The draft specification in this repository currently refers explicitly to the use of query plans to implement execution:

This specification describes the process of composing a federated GraphQL schema and outlines algorithms for executing GraphQL queries on the federated schema effectively by using query plans.

And:

A distributed GraphQL executor acts as an orchestrator that uses schema metadata to rewrite a GraphQL request into a query plan.

I think we agreed in previous meetings that execution mechanisms would be out of scope for this specification. So my proposal would be to add a note like:

The exact mechanisms of execution are outside the scope of this specification. Typically, implementations rely on a query planning phase to generate a query plan that describes the orchestrated execution of one or more requests to subgraphs that together satisfy a particular operation. Implementations are free to implement execution as they see fit however, and we expect this to be an area of continued innovation.

Composite Schemas vs. Distributed/Federated

When we revived this standardization effort, we retained the composite schemas name both for the WG and spec. Given our current focus however, it seems to me that naming doesn't really do justice to our goals and may set the wrong expectations.

Take the first paragraph from the Overview section:

The GraphQL Composite Schemas specification describes how to construct a single
unified GraphQL schema, the composite schema, from multiple GraphQL schemas,
each termed a source schema.

By only describing the process in terms of composing schemas, we leave out the important fact that the goal of the spec is to enable distributed execution across a set of GraphQL services. If we were just talking about composing a set of schemas locally, we'd be in a very different design space (e.g. we wouldn't have to worry about crossing service boundaries, defining entity keys, etc.).

Although distributed execution is mentioned as a point of focus at the end of the Overview section, I think we should be explicit about this from the start. And I would even say the design principles go beyond just distributed execution, and are formulated with a federated architecture in mind (e.g. by calling out team collaboration and associated workflows).

So I wonder if the name of the spec shouldn't reflect this. Although a spec isn't necessarily meant for end users, picking a more meaningful name would help users understand compatibility claims. Since the term federation is well known by now and in use by multiple vendors, calling the spec something like GraphQL Federation may be worth considering.

	To enable greater interoperability between different implementations of tooling
	and gateways, this specification focuses on two core components: schema
	composition and distributed execution.

graphql / composite-schemas-spec Goto Github PK

composite-schemas-spec's Introduction

GraphQL Composite Schema Spec

Contributing to this repo

composite-schemas-spec's People

Contributors

Stargazers

Watchers

Forkers

composite-schemas-spec's Issues

Batching Mechanisms for Distributed Executors

Batching Approaches

Request Batching

Operation Batching

Variable Batching

Alias Batching

Recommend Projects

Recommend Topics

Recommend Org