Giter Site home page Giter Site logo

Comments (9)

jeffweiss avatar jeffweiss commented on July 19, 2024 1

@dillonkearns I think you have an incorrect assumption about the Absinthe logger. Yes, it is configurable (for explicitly filtering variables, as opposed to the entire payload), but the logger itself if not swappable; i.e. there is no way to tell Absinthe to use a custom logger, which filters entire payloads for misbehaving clients.

I understand you have no intention of making graphqelm/elm-graphql use variables instead of injecting the data directly into the payload body. That was clear from the wontfix. I'm not asking you to change it.

My comment was to inform future folks that:

  1. filtering sensitive values from graphqelm does not work
  2. aside from a user fork of Absinthe with a custom Logger there's no workaround.

from elm-graphql.

benwilson512 avatar benwilson512 commented on July 19, 2024 1

Hey folks,

Absinthe core member here. As I note in the issue on the Absinthe repo, a PR adding query string filtering would be welcome. The challenge here of course is that it requires parsing to happen before logging, whereas today it goes the other way around. Nonetheless, this is a solvable problem. Clients that pass in malformed GraphQL that also contain secrets are a bit out of luck, as would clients who pass parsable but invalid (by the schema) graphql. Still, this would be an improvement over the functionality today.

I did want to make a few comments however regarding what I understand to be the stance on variables here.

Notably, the single source of truth argument does NOT require that secrets sit as string literals to work. Suppose the feature worked as follows:

  1. A query comes in
  2. It's parsed and validated against a schema
  3. The schema (or other criterion) is used to determine which inputs are secrets
  4. A new query string is reconstructed from this information that excises the secret info
  5. This string is logged.

This is how I'm interpreting the "single source of truth" notion. Importantly, this works even if variables are still used. If I know what input values are secret, I can work backwards into the variables JSON to prevent printing of secret values.

The single source of truth idea is a good one because it prevents clients from needing to know about magic variable names that are hidden. However, that's just an argument for being smarter about logging in Absinthe, it isn't an argument for using inputs directly in the query string vs as variables.

Why Use Variables

  1. The GraphQL spec and educational literature includes and promotes variable use.
  2. Variables are more secure (no need to escape user input strings).
  3. Variable use makes it possible to cache the parsing and validation of documents.
  4. Variable use enables clients to send document SHAs (or other identifiers) instead of actual query strings if the backend supports registered documents.
  5. Variable use promotes intrinsically faster server side parsing time, since JSON grammar is simpler than GraphQL grammar.
  6. Variable use promotes intrinsically faster client side query building time for the same reason.
  7. Variable use makes GraphQL requests easier to introspect and manipulate by generic tooling.

About Elm

I don't know much about Elm (I've heard good things!), but it does look like they are building up a data structure that represents the query, and then ultimately constructing a GraphQL document from that structure. Is it possible for the Elm GraphQL client library to, before sending a document, extract parameter values into variables generally? This is similar to the pattern used commonly in SQL libaries. You have some kind of dictionary oriented querying interface where you're allowed to do User |> where(name: "Ben") but when the SQL library generates the actual SQL, user input is extracted out and passed in as parameters.

Conclusion

As I say, I think this is a useful feature, but mostly as a safeguard for GraphiQL users who are the most likely to type in password like values manually. A PR to Absinthe adding this feature would be welcome.

It seems to me that Elm-GraphQL, like most other GraphQL clients, should use variables. I'm totally on board with wanting to provide a simple query experience for elm users, but that doesn't, as far as I can tell, mean that the library itself can't extract inputs into variables for transmission over the wire.

from elm-graphql.

dillonkearns avatar dillonkearns commented on July 19, 2024 1

Hello @benwilson512, thank you so much for taking the time to share your thoughts here, I really appreciate it!

The single source of truth idea is a good one because it prevents clients from needing to know about magic variable names that are hidden.

Yes, my thoughts exactly. I will follow up on the Absinthe github repo and see if I can find someone to make a PR to add that on ๐Ÿ‘

However, that's just an argument for being smarter about logging in Absinthe, it isn't an argument for using inputs directly in the query string vs as variables.

I am 100% in agreement here ๐Ÿ‘It's not either-or. You can have a single source of truth where the server determines which arguments are sensitive, and at the same time use GraphQL Variables. I changed the name of this issue to better reflect that this issue isn't about GraphQL Variables in general, it's specifically about supporting masking on the server (with GraphQL Variables as a strategy for how to do that). I'm not saying that elm-graphql should never use GraphQL Variables. What I have tried to do as the maintainer is to drive all new features based on problems and use cases that users give me context on.

This particular problem (getting the server to mask sensitive arguments) does not seem to be the right use case to drive the design of GraphQL variables in elm-graphql. There are a lot of different directions that an API design to support GraphQL variables could go, and so it's important that it be done with the right context.

Reasons for using GraphQL Variables

There are absolutely some good reasons to consider GraphQL Variables in elm-graphql. I hope that my amendment to the issue name helps clarify that this issue is specifically about finding a solution for argument masking.

I'm speaking to some users about some ideas around server-side caching and persisted queries. And I totally agree, those are compelling reasons to use GraphQL Variables.

They're in an exploration phase for picking a server to support caching or query whitelisting. Once we have some more concrete details, I'll open up a new Github issue and move those discussions over there to explore different API design ideas.

elm-graphql vs. JS Clients

Here are some thoughts on a few of the points besides caching and whitelisting in your notes on Why Use Variables.

Working in elm is fundamentally different than JS in some ways. The problems are unique, too. With elm-graphql, some of the problems you point out for Why Use Variables don't happen as they would in a JavaScript context. For example:

  • It's impossible to build up a syntactically incorrect GraphQL query (it will even guarantee that it's valid according to the schema as long as the server and client are not out of sync).
  • Queries are not built up with string concatenation, it's built up with a query builder API

Also, elm-graphql has the goal of being a level of abstraction above direct GraphQL queries. Since elm-graphql isn't so much a direct port of GraphQL, but a layer of abstraction that sits above it, having GraphQL Variables is not binary, but rather a wide spectrum of possibilities. There are many different API design paths that could use GraphQL Variables under the hood. All that to say, that's why I do my best to find users who can help me understand use cases that help drive new feature development.

Thanks again for taking the time to respond @benwilson512!

from elm-graphql.

dillonkearns avatar dillonkearns commented on July 19, 2024

Hi @xtian, I appreciate you reporting this and starting this dialogue. However, I would be concerned about adding support in for using GraphQL variables because I want to honor the following core principles of this library:

  1. Whatever request you send is guaranteed to be valid according to the server's schema
  2. Extremely simple and intuitive to build requests
  3. Wherever possible, use Elm language features instead of GraphQL features (i.e. Elm variables, constants, and arguments instead of GraphQL variables. Aliases are a low-level detail that the user of the library does not need to think about. Etc.)

Take this generated code from the Star Wars example:

human : { id : Swapi.Scalar.Id } -> SelectionSet decodesTo Swapi.Object.Human -> Field (Maybe decodesTo) RootQuery
human requiredArgs object =
    Object.selectionField "human" [ Argument.required "id" requiredArgs.id (\(Swapi.Scalar.Id raw) -> Encode.string raw) ] object (identity >> Decode.maybe)

You would need the ability to pass in something other than a Swapi.Scalar.Id here. So you'd need to have something like { id : Argument Swapi.Scalar.Id } where argument has a definition like:

type Argument a
    = Variable String a
    | Raw a

(The String would hold the variable's name). The same would hold for using a variable for a field in an InputObject.

That would mean that instead of

query : SelectionSet Response RootQuery
query =
    Query.selection Response
        |> with (Query.human { id = Swapi.Scalar.Id "1001" } human)

you would have to wrap it in a type, whether you wanted a variable or not:

query : SelectionSet Response RootQuery
query =
    Query.selection Response
        |> with (Query.human { id = Raw (Swapi.Scalar.Id "1001") } human)

I think this violates 2). Now the user would have to wrap each argument, and with an input object they would have to wrap the input object itself as well as each value within the input object. This would be another concept to explain, which to me violates 3) (I find that the addition of a single concept to explain can often make the learning curve a lot steeper). And the error messages would become a little more cryptic.

This also introduces several issues around guaranteeing 1). If you introduce variable names, you can now have problems where you provide multiple variables of the same name. Do you change one of these names? This wouldn't solve the issue you're discussing. Otherwise, you will end up with an invalid request.

The addition of these special cases and extra types adds a ton of complexity for both the end user, and the library itself. It would be possible to generate separate files that mirror the other generated files but have all their arguments take the Argument wrapper type instead of raw values. But this is another concept to explain and more complexity for the user and the library as well.

I think the reasons that the Absinthe team gives for not supporting filtering except for variables is reasonable. However, I believe it would be possible to create a custom logger that uses a regex or something similar to do the filtering you require: https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/logger.ex.

So based only on this use case, the best solution seems to me to be to build a custom logger on the server-side rather than build out the infrastructure to send queries just for the purpose of filtering out sensitive info in the logs. But I would absolutely be open to hearing about anything that I haven't considered here, or other use cases, or other ideas for supporting variables in a way that is in line with the 3 principles I outlined above. Thanks again for opening the issue!

from elm-graphql.

xtian avatar xtian commented on July 19, 2024

@dillonkearns I totally agree with your goals for the project - I definitely wasnโ€™t suggesting a compromise there.

I guess itโ€™s not clear to me why passing input values up in the variables JSON key would require an API change on the Elm side. It seems like purely a serialization concern. Instead of Variable String a why not just auto-generate a variable name when building the query text?

from elm-graphql.

dillonkearns avatar dillonkearns commented on July 19, 2024

Hey @xtian, if there was a way to design it so that it "just works" then that would very much be in line with the design goals of this library. I'm having trouble imaging what that behavior would look like, though. Here are the questions that come to mind:

  • Would it use a variable for every argument? Or just for every input value?
  • What if two things have the same value but are being passed in to different parameters.
  • What if two arguments have the same name but different values? Like
{
  person(id: 123) {  name }
  person(id: 456) { name }
}

Would you have variables $id1 and $id2? And if you were depending on a variable of name $id being used under the hood (say for your server log masking similar to in your case), is it reasonable that this changes when you add the second person(id: 456) ... line?

  • Do you only generate variables for input objects? If so, what if you have nested input objects that repeat names.

Perhaps you have a different vision than what I'm picturing, though? I'm happy to explore any different approaches.

from elm-graphql.

jeffweiss avatar jeffweiss commented on July 19, 2024

For posterity, it is not currently possible to replace the default Absinthe.Logger to also filter out sensitive data from the body. There is no known way for both graphqelm and absinthe to play nicely together for filtering out sensitive content.

from elm-graphql.

dillonkearns avatar dillonkearns commented on July 19, 2024

The default Absinthe logger supports filtering based on variables because it's easier to parse the incoming JSON from variables. But it can also be manually modified because the way it's set up is extensible. I would suggest tweaking it to filter out any arguments named password (or other names of the secure input values). Here's the default module:

https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/logger.ex

A regex check would do a pretty good job. Something like checking for pasword: ... (you'd probably want to handle some more corner cases like having whitespace in between). But the logger could also be modified to parse the GraphQL query properly. The logger is extensible and it's elixir code, so you can do all the parsing you need there.

Should the client, the server, or both have knowledge about masking values?

My two cents on this topic is that filtering based on variables is the simplest thing for the server to do, but it is more error-prone because it depends on the client making requests with a certain variable naming convention. To me, it seems much more robust to keep the knowledge of which input values are secure to the server's knowledge. Then the client doesn't need to know the variable names that the server expects in order to filter it out.

If the filtering logic is written strictly on the server-side, telling it to filter the input values that are known to be sensitive, then you don't have the knowledge of how to mask values split between client and server. Splitting the knowledge seems error-prone and less robust to me. And conceptually, you don't really want to filter values with a certain variable name... you want to filter values that are passed in as a particular sensitive argument. So in my opinion, that's the pinch point where it is the most robust and least error-prone determine what to mask.

from elm-graphql.

dillonkearns avatar dillonkearns commented on July 19, 2024

@jeffweiss thanks for clarifying, you're right I was assuming that you could swap in your own custom Absinthe logger based on this syntax:

https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/logger.ex#L17-L18

It turns out that substituting your own module in there is actually a no-op. It just uses the module name by convention, so you can't change the implementation as you say.

I don't consider elm-graphql to be a misbehaving client, to me GraphQL variables seem like they are an implementation detail which Absinthe depends on. It's a totally reasonable design decision with a history that makes perfect sense. But I think there is a strong case for having the knowledge of what to filter having a single source of truth on the server. Imagine, for example, that you have a public-facing API. You would want to make sure that password arguments are always filtered, regardless of whether the users of the public API use a GraphQL variable of the name the server expects.

I opened up an issue to re-examine the issue with this new context in the Absinthe repo:

absinthe-graphql/absinthe#723

from elm-graphql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.