Comments (9)
@dillonkearns I think you have an incorrect assumption about the Absinthe logger. Yes, it is configurable (for explicitly filtering variables, as opposed to the entire payload), but the logger itself if not swappable; i.e. there is no way to tell Absinthe to use a custom logger, which filters entire payloads for misbehaving clients.
I understand you have no intention of making graphqelm
/elm-graphql
use variables instead of injecting the data directly into the payload body. That was clear from the wontfix
. I'm not asking you to change it.
My comment was to inform future folks that:
- filtering sensitive values from
graphqelm
does not work - aside from a user fork of Absinthe with a custom Logger there's no workaround.
from elm-graphql.
Hey folks,
Absinthe core member here. As I note in the issue on the Absinthe repo, a PR adding query string filtering would be welcome. The challenge here of course is that it requires parsing to happen before logging, whereas today it goes the other way around. Nonetheless, this is a solvable problem. Clients that pass in malformed GraphQL that also contain secrets are a bit out of luck, as would clients who pass parsable but invalid (by the schema) graphql. Still, this would be an improvement over the functionality today.
I did want to make a few comments however regarding what I understand to be the stance on variables here.
Notably, the single source of truth argument does NOT require that secrets sit as string literals to work. Suppose the feature worked as follows:
- A query comes in
- It's parsed and validated against a schema
- The schema (or other criterion) is used to determine which inputs are secrets
- A new query string is reconstructed from this information that excises the secret info
- This string is logged.
This is how I'm interpreting the "single source of truth" notion. Importantly, this works even if variables are still used. If I know what input values are secret, I can work backwards into the variables JSON to prevent printing of secret values.
The single source of truth idea is a good one because it prevents clients from needing to know about magic variable names that are hidden. However, that's just an argument for being smarter about logging in Absinthe, it isn't an argument for using inputs directly in the query string vs as variables.
Why Use Variables
- The GraphQL spec and educational literature includes and promotes variable use.
- Variables are more secure (no need to escape user input strings).
- Variable use makes it possible to cache the parsing and validation of documents.
- Variable use enables clients to send document SHAs (or other identifiers) instead of actual query strings if the backend supports registered documents.
- Variable use promotes intrinsically faster server side parsing time, since JSON grammar is simpler than GraphQL grammar.
- Variable use promotes intrinsically faster client side query building time for the same reason.
- Variable use makes GraphQL requests easier to introspect and manipulate by generic tooling.
About Elm
I don't know much about Elm (I've heard good things!), but it does look like they are building up a data structure that represents the query, and then ultimately constructing a GraphQL document from that structure. Is it possible for the Elm GraphQL client library to, before sending a document, extract parameter values into variables generally? This is similar to the pattern used commonly in SQL libaries. You have some kind of dictionary oriented querying interface where you're allowed to do User |> where(name: "Ben")
but when the SQL library generates the actual SQL, user input is extracted out and passed in as parameters.
Conclusion
As I say, I think this is a useful feature, but mostly as a safeguard for GraphiQL users who are the most likely to type in password like values manually. A PR to Absinthe adding this feature would be welcome.
It seems to me that Elm-GraphQL, like most other GraphQL clients, should use variables. I'm totally on board with wanting to provide a simple query experience for elm users, but that doesn't, as far as I can tell, mean that the library itself can't extract inputs into variables for transmission over the wire.
from elm-graphql.
Hello @benwilson512, thank you so much for taking the time to share your thoughts here, I really appreciate it!
The single source of truth idea is a good one because it prevents clients from needing to know about magic variable names that are hidden.
Yes, my thoughts exactly. I will follow up on the Absinthe github repo and see if I can find someone to make a PR to add that on ๐
However, that's just an argument for being smarter about logging in Absinthe, it isn't an argument for using inputs directly in the query string vs as variables.
I am 100% in agreement here ๐It's not either-or. You can have a single source of truth where the server determines which arguments are sensitive, and at the same time use GraphQL Variables. I changed the name of this issue to better reflect that this issue isn't about GraphQL Variables in general, it's specifically about supporting masking on the server (with GraphQL Variables as a strategy for how to do that). I'm not saying that elm-graphql
should never use GraphQL Variables. What I have tried to do as the maintainer is to drive all new features based on problems and use cases that users give me context on.
This particular problem (getting the server to mask sensitive arguments) does not seem to be the right use case to drive the design of GraphQL variables in elm-graphql
. There are a lot of different directions that an API design to support GraphQL variables could go, and so it's important that it be done with the right context.
Reasons for using GraphQL Variables
There are absolutely some good reasons to consider GraphQL Variables in elm-graphql
. I hope that my amendment to the issue name helps clarify that this issue is specifically about finding a solution for argument masking.
I'm speaking to some users about some ideas around server-side caching and persisted queries. And I totally agree, those are compelling reasons to use GraphQL Variables.
They're in an exploration phase for picking a server to support caching or query whitelisting. Once we have some more concrete details, I'll open up a new Github issue and move those discussions over there to explore different API design ideas.
elm-graphql vs. JS Clients
Here are some thoughts on a few of the points besides caching and whitelisting in your notes on Why Use Variables.
Working in elm is fundamentally different than JS in some ways. The problems are unique, too. With elm-graphql
, some of the problems you point out for Why Use Variables don't happen as they would in a JavaScript context. For example:
- It's impossible to build up a syntactically incorrect GraphQL query (it will even guarantee that it's valid according to the schema as long as the server and client are not out of sync).
- Queries are not built up with string concatenation, it's built up with a query builder API
Also, elm-graphql
has the goal of being a level of abstraction above direct GraphQL queries. Since elm-graphql
isn't so much a direct port of GraphQL, but a layer of abstraction that sits above it, having GraphQL Variables is not binary, but rather a wide spectrum of possibilities. There are many different API design paths that could use GraphQL Variables under the hood. All that to say, that's why I do my best to find users who can help me understand use cases that help drive new feature development.
Thanks again for taking the time to respond @benwilson512!
from elm-graphql.
Hi @xtian, I appreciate you reporting this and starting this dialogue. However, I would be concerned about adding support in for using GraphQL variables because I want to honor the following core principles of this library:
- Whatever request you send is guaranteed to be valid according to the server's schema
- Extremely simple and intuitive to build requests
- Wherever possible, use Elm language features instead of GraphQL features (i.e. Elm variables, constants, and arguments instead of GraphQL variables. Aliases are a low-level detail that the user of the library does not need to think about. Etc.)
Take this generated code from the Star Wars example:
human : { id : Swapi.Scalar.Id } -> SelectionSet decodesTo Swapi.Object.Human -> Field (Maybe decodesTo) RootQuery
human requiredArgs object =
Object.selectionField "human" [ Argument.required "id" requiredArgs.id (\(Swapi.Scalar.Id raw) -> Encode.string raw) ] object (identity >> Decode.maybe)
You would need the ability to pass in something other than a Swapi.Scalar.Id
here. So you'd need to have something like { id : Argument Swapi.Scalar.Id }
where argument has a definition like:
type Argument a
= Variable String a
| Raw a
(The String
would hold the variable's name). The same would hold for using a variable for a field in an InputObject
.
That would mean that instead of
query : SelectionSet Response RootQuery
query =
Query.selection Response
|> with (Query.human { id = Swapi.Scalar.Id "1001" } human)
you would have to wrap it in a type, whether you wanted a variable or not:
query : SelectionSet Response RootQuery
query =
Query.selection Response
|> with (Query.human { id = Raw (Swapi.Scalar.Id "1001") } human)
I think this violates 2). Now the user would have to wrap each argument, and with an input object they would have to wrap the input object itself as well as each value within the input object. This would be another concept to explain, which to me violates 3) (I find that the addition of a single concept to explain can often make the learning curve a lot steeper). And the error messages would become a little more cryptic.
This also introduces several issues around guaranteeing 1). If you introduce variable names, you can now have problems where you provide multiple variables of the same name. Do you change one of these names? This wouldn't solve the issue you're discussing. Otherwise, you will end up with an invalid request.
The addition of these special cases and extra types adds a ton of complexity for both the end user, and the library itself. It would be possible to generate separate files that mirror the other generated files but have all their arguments take the Argument
wrapper type instead of raw values. But this is another concept to explain and more complexity for the user and the library as well.
I think the reasons that the Absinthe team gives for not supporting filtering except for variables is reasonable. However, I believe it would be possible to create a custom logger that uses a regex or something similar to do the filtering you require: https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/logger.ex.
So based only on this use case, the best solution seems to me to be to build a custom logger on the server-side rather than build out the infrastructure to send queries just for the purpose of filtering out sensitive info in the logs. But I would absolutely be open to hearing about anything that I haven't considered here, or other use cases, or other ideas for supporting variables in a way that is in line with the 3 principles I outlined above. Thanks again for opening the issue!
from elm-graphql.
@dillonkearns I totally agree with your goals for the project - I definitely wasnโt suggesting a compromise there.
I guess itโs not clear to me why passing input values up in the variables
JSON key would require an API change on the Elm side. It seems like purely a serialization concern. Instead of Variable String a
why not just auto-generate a variable name when building the query text?
from elm-graphql.
Hey @xtian, if there was a way to design it so that it "just works" then that would very much be in line with the design goals of this library. I'm having trouble imaging what that behavior would look like, though. Here are the questions that come to mind:
- Would it use a variable for every argument? Or just for every input value?
- What if two things have the same value but are being passed in to different parameters.
- What if two arguments have the same name but different values? Like
{
person(id: 123) { name }
person(id: 456) { name }
}
Would you have variables $id1
and $id2
? And if you were depending on a variable of name $id
being used under the hood (say for your server log masking similar to in your case), is it reasonable that this changes when you add the second person(id: 456) ...
line?
- Do you only generate variables for input objects? If so, what if you have nested input objects that repeat names.
Perhaps you have a different vision than what I'm picturing, though? I'm happy to explore any different approaches.
from elm-graphql.
For posterity, it is not currently possible to replace the default Absinthe.Logger
to also filter out sensitive data from the body. There is no known way for both graphqelm
and absinthe
to play nicely together for filtering out sensitive content.
from elm-graphql.
The default Absinthe logger supports filtering based on variables because it's easier to parse the incoming JSON from variables. But it can also be manually modified because the way it's set up is extensible. I would suggest tweaking it to filter out any arguments named password
(or other names of the secure input values). Here's the default module:
https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/logger.ex
A regex check would do a pretty good job. Something like checking for pasword: ...
(you'd probably want to handle some more corner cases like having whitespace in between). But the logger could also be modified to parse the GraphQL query properly. The logger is extensible and it's elixir code, so you can do all the parsing you need there.
Should the client, the server, or both have knowledge about masking values?
My two cents on this topic is that filtering based on variables is the simplest thing for the server to do, but it is more error-prone because it depends on the client making requests with a certain variable naming convention. To me, it seems much more robust to keep the knowledge of which input values are secure to the server's knowledge. Then the client doesn't need to know the variable names that the server expects in order to filter it out.
If the filtering logic is written strictly on the server-side, telling it to filter the input values that are known to be sensitive, then you don't have the knowledge of how to mask values split between client and server. Splitting the knowledge seems error-prone and less robust to me. And conceptually, you don't really want to filter values with a certain variable name... you want to filter values that are passed in as a particular sensitive argument. So in my opinion, that's the pinch point where it is the most robust and least error-prone determine what to mask.
from elm-graphql.
@jeffweiss thanks for clarifying, you're right I was assuming that you could swap in your own custom Absinthe logger based on this syntax:
https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/logger.ex#L17-L18
It turns out that substituting your own module in there is actually a no-op. It just uses the module name by convention, so you can't change the implementation as you say.
I don't consider elm-graphql
to be a misbehaving client, to me GraphQL variables seem like they are an implementation detail which Absinthe depends on. It's a totally reasonable design decision with a history that makes perfect sense. But I think there is a strong case for having the knowledge of what to filter having a single source of truth on the server. Imagine, for example, that you have a public-facing API. You would want to make sure that password
arguments are always filtered, regardless of whether the users of the public API use a GraphQL variable of the name the server expects.
I opened up an issue to re-examine the issue with this new context in the Absinthe repo:
from elm-graphql.
Related Issues (20)
- Enhancement - Add support for Dgraph HOT 1
- non null list of non null list ofโฆ HOT 6
- Include an exhaustive query for each type HOT 3
- elm-graphql generation fails if there are no columns selected (Hasura) HOT 6
- Add operationName to mutation request body HOT 1
- Enum with underscore as a leading character, changed after generating elm code. HOT 1
- Scalar decoder fails when the field contains an object. HOT 1
- Scalars should not be encoded as json in mutations HOT 3
- Enforce nonempty list in required list arguments HOT 3
- Selecting implementation attributes from interfaces HOT 2
- Corrupt package data for 5.0.5? HOT 1
- Requesting feedback on the following changes to the serializeChildren function regarding field hashing HOT 2
- Can not find ScalarCodecs HOT 1
- Feature request: Convenience functions for primitive comparisons HOT 3
- Idea: serialize invalid JSON Input Values as arguments
- Missing fragment in Interface `Fragments` type HOT 1
- Idea: include deprecation warnings into the generated code as comments HOT 1
- Building large queries takes a long time and pauses the UI HOT 2
- Encode function for enums is missing HOT 1
- Generate input object helpers HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elm-graphql.