Giter Site home page Giter Site logo

persisted-operations's People

Contributors

benjie avatar enisdenjo avatar jemgillam avatar leo91000 avatar none23 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

persisted-operations's Issues

Not providing a getter causes downstream error

Summary

See title.

Steps to reproduce

!options.persistedOperationsGetter && !options.persistedOperations && !options.persistedOperationsDirectory

Actual results

This clause is triggered.

Possible Solution

  • Bubble the error upstream by replacing null assignment with throw new Error('No getter function was specified').

Discussion: Identifying slow queries at build time

I'm not sure yet how much of this involves writing a tool vs. just documenting a pattern or what, and I'm not sure to what extent it relates to this repo vs graphile-engine etc.

It would be awesome to write tests that, for each persisted operation, execute the query with EXPLAIN against the server while it points to a DB that represents production (perhaps actually being production) with representative variables.

Something maybe very loosely like this:

// in your app
const myQuery = gql`
  query myQuery($condition: String) 
    @explain(variables: {condition: "foo"}, maxcost: 1000)
    @explain(variables: {condition: "bar"}, maxcost: 5000) 
  {    
    getUsers(condition: $condition) {
      nodes { 
        username 
      }
    }
  }
`

This would generate several files:

.persisted-operations/client.json:

{
  "myQuery": "xyzsha",
}

.persisted-operations/xyzsha.graphql:

query myQuery($condition: String) {
  ...
}

.persisted-operations/xyzsha-myQuery-foo.sql:

-- query: ./xyzsha.graphql (make it easy to cmd+click to the gql in editors like vs code)
-- variable: condition=foo
select username from users where thing = 'foo';

/*
 EXPLAIN results here, showing the cost and query plan
*/

.persisted-operations/xyzsha-myQuery-bar.sql:

-- query: ./xyzsha.graphql
-- variable: condition=bar
select username from users where thing = 'bar';

/*
 EXPLAIN results here, showing the cost and query plan
*/

This way, developers can easily see the sql generated from the queries they write, as well as the query plan (imagine actually seeing where a Seq Scan is used!). Reviewers can audit it. And the build can fail if EXPLAIN thinks the cost of a query will be higher than the specified maxcost.

Maybe this could even be part of graphile pro.

Problems / open questions:

  1. Of course, developers will have to be responsible for thinking of the appropriate variables to test, but that's a core skill to software development, and once something you didn't think of comes up, you can add it as a performance regression test.
  2. Working with query planner costs can be tricky and sometimes the query planner is way off (or gives an unhelpfully wide range) but it is the best tool around.
  3. Naming the files gets weird, you'd want an intuitive and browsable scheme, and ideally one that isn't hell in source control. Maybe specify a @name for each persisted operation instead of using a sha? Maybe what I have here would be fine-enough, and maybe even just using sha's would be fine.
  4. Failing the build on a high EXPLAIN cost estimate could cause the build to sporadically fail, for example when the statistics change.
    a. Silly me, this would only be the case if you re-ran the sql queries in CI, which isn't a good idea for both this reason and performance.
  5. Should you EXPLAIN ANALYZE on non-mutation queries? Perhaps with a low statement_timeout to prevent dev iteration from taking down prod?
  6. You probably don't want to be running a bunch of EXPLAIN ANALYZE's against production all the time, especially as you're iterating in dev.
    a. This might not be nearly as bad as it sounds, though, because you'd only re-run it when the sha of a gql query changes, and you'll probably only be doing one or two at a time.
    b. Of course, you can also just run this against a staging or secondary db with similar data to production, or use a tool to regularly sync representative/censored prod data to dev db's.
  7. The generated query plans and costs for a given gql query will fall out of date over time, as the statistics in the db change, so it's possible a maxcost annotation would be silently exceeded for a time.
    a. Probably a command to regenerate all sql files, manually run periodically, would be fine for this.
    b. Substantial changes to query plans that affect performance should be caught by production observability tooling anyway. The thing is to not actively write and ship queries that are slow from the beginning.
  8. Changing the backend implementation (for example, by wrapping a resolver) wouldn't automatically trigger regeneration of the SQL, you'd have to do that manually.

Enable bypassing persisted operations

Feature description

Add a function, like allowExplain, that governs whether a request may use arbitrary operations versus only persisted operations.

Motivating example

In development it's common to want to send arbitrary queries from GraphiQL whilst also enforcing Persisted Operations from the application.

app.use(postgraphile(DATABASE_URL, SCHEMAS, {
  allowUnpersistedOperation(req) {
    return process.env.NODE_ENV === "development" && req.headers.referer.endsWith("/graphiql");
  }
});

(Should we pass the operation too, I wonder?)

Breaking changes

None

Prepared statements / debugging query performance?

Heya,

I'm considering postgraphile for a project, and one of my core anxieties about the tool (and graphql in general) is being able to detect and diagnose huge, expensive queries. Persisted operations seems like one helpful way to at least lock that down to known-at-build-time queries, but the next step would be to actually understand the performance of these queries.

One approach might be prepared statements – saving each persisted operation as its sha. This would reduce query parsing and planning time, which I assume isn't much.

My thinking is that this might also help make it easier to see what queries are running frequently and expensively, debug their query plans, etc – but this may be totally off-base.

As an aside, it might be worth adding a section to Production Considerations talking about how to understand your query performance.

Cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.