Giter Site home page Giter Site logo

breus / json-masker Goto Github PK

View Code? Open in Web Editor NEW
125.0 5.0 11.0 694 KB

High-performance JSON masker library in Java with no runtime dependencies

License: MIT License

Java 100.00%
java json masking message anonymization pii-anonymization sensitive-data high-performance

json-masker's Introduction

High-performance JSON masker

Maven Central GitHub Workflow Status (with event) Sonar Quality Gate Sonar Coverage Sonar Tests

JSON masker library which can be used to mask (sensitive) values inside JSON corresponding to a set of keys (block-mode) or, alternatively, allow only specific values to be unmasked corresponding to a set of keys while all others are masked (allow-mode).

The library provides modern and convenient Java APIs which offers a wide range of masking customizations. Furthermore, the implementation is focused on maximizing the throughput and minimizing heap memory allocations to minimize GC pressure.

Finally, no additional third-party runtime dependencies are required to use this library.

Features

  • Mask all primitive values by specifying the keys to mask, by default any string is masked as "***", any number as "###" and any boolean as "&&&"
  • If the value of a targeted key corresponds to an object, all nested fields, including nested arrays and objects will be masked, recursively
  • If the value of a targeted key corresponds to an array, all values of the array, including nested arrays and objects, will be masked, recursively
  • Ability to define a custom masking strategy per value type
    • (default) mask strings with a different string: "maskMe": "secret" -> "maskMe": "***"
    • mask characters of a string with a different character: "maskMe": "secret" -> "maskMe": "*****" (preserves length)
    • (default) mask numbers with a string: "maskMe": 12345 -> "maskMe": "###" (changes number type to string)
    • mask numbers with a different number: "maskMe": 12345 -> "maskMe": 0 (preserves number type)
    • mask digits of a number with a different digit: "maskMe": 12345 -> "maskMe": 88888 (preserves number type and length)
    • (default) mask booleans with a string: "maskMe": true -> "maskMe": "&&&" (changes boolean type to string)
    • mask booleans with a different boolean: "maskMe": true -> "maskMe": false (preserves boolean type)
  • Ability to define a custom masking strategy per key
  • Ability to configure JSON type preserving masking configurations so the masked JSON can be deserialized back into a Java object it was serialized from
  • Target key case sensitivity configuration (default: false)
  • Use block-list (maskKeys) or allow-list (allowKeys) for masking
  • Limited support for JSONPath masking in both block-list (maskJsonPaths) and allow-list (allowJsonPaths) modes
  • Masking a valid JSON will always return a valid JSON

Note: Since RFC 8259 dictates that JSON exchanges between systems that are not part of an enclosed system MUST be encoded using UTF-8, the json-masker only supports UTF-8 encoding.

Using the json-masker package

The json-masker library is available from Maven Central.

To use the package, you can use the following Gradle dependency:

implementation("dev.blaauwendraad:json-masker:${version}")

Or using Maven:

<dependency>
    <groupId>dev.blaauwendraad</groupId>
    <artifactId>json-masker</artifactId>
    <version>${version}</version>
</dependency>

The package requires no additional runtime dependencies.

JDK Compatibility

The json-masker baseline JDK requirement is JDK 17. However, we might consider releasing a version which lowers this requirement to JDK 11, when requested.

Usage examples

JsonMasker instance can be created using any of the following factory methods:

// block-mode, default masking config
var jsonMasker = JsonMasker.getMasker(Set.of("email", "iban"));

// block-mode, default masking config (using a builder)
var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskKeys(Set.of("email", "iban"))
                .build()
);

// block-mode, JSONPath
var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskJsonPaths(Set.of("$.email", "$.nested.iban", "$.organization.*.name"))
                .build()
);

// allow-mode, default masking config
var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .allowKeys(Set.of("id", "name"))
                .build()
);

// allow-mode, JSONPath
var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .allowJsonPaths(Set.of("$.id", "$.clients.*.phone", "$.nested.name"))
                .build()
);

Using JsonMaskingConfig allows customizing the masking behaviour of types, keys or JSONPath or mix keys and JSON paths.

Note

Whenever a simple key (maskKeys(Set.of("email", "iban"))) is specified, it is going to be masked recursively regardless of the nesting, whereas using a JSONPath (maskJsonPaths(Set.of("$.email", "$.iban"))) would only mask those keys on the top level JSON

After creating the JsonMasker instance, it can be used to mask a JSON as following:

String maskedJson = jsonMasker.mask(json);

The mask method is thread-safe, and it is advised to reuse the JsonMasker instance as it pre-processes the masking (allowed) keys for faster lookup during the actual masking.

Default JSON masking

Example of masking fields (block-mode) with a default config

Usage

var jsonMasker = JsonMasker.getMasker(Set.of("email", "age", "visaApproved", "iban", "billingAddress"));

String maskedJson = jsonMasker.mask(json);

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "***",
    "age": "###",
    "visaApproved": "&&&"
  },
  "payment": {
    "iban": "***",
    "successful": true,
    "billingAddress": [
      "***",
      "***"
    ]
  },
  "companyContact": {
    "email": "***"
  }
}

Allow-list approach

Example showing an allow-list based approach of masking a JSON.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .allowKeys(Set.of("orderId", "id", "travelPurpose", "successful"))
                .build()
);

String maskedJson = jsonMasker.mask(json);

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "***",
    "age": "###",
    "visaApproved": "&&&"
  },
  "payment": {
    "iban": "***",
    "successful": true,
    "billingAddress": [
      "***",
      "***"
    ]
  },
  "companyContact": {
    "email": "***"
  }
}

Overriding default masks

The default masks can be overridden for any type.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskKeys(Set.of("email", "age", "visaApproved", "iban", "billingAddress"))
                .maskStringsWith("[redacted]")
                .maskNumbersWith("[redacted]")
                .maskBooleansWith("[redacted]")
                .build()
);

String maskedJson = jsonMasker.mask(json);

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[redacted]",
    "age": "[redacted]",
    "visaApproved": "[redacted]"
  },
  "payment": {
    "iban": "[redacted]",
    "successful": true,
    "billingAddress": [
      "[redacted]",
      "[redacted]"
    ]
  },
  "companyContact": {
    "email": "[redacted]"
  }
}

Masking with JSONPath

To have more control over the nesting, JSONPath can be used to specify the keys that needs to be masked (allowed).

The following JSONPath features are not supported:

  • Descendant segments.
  • Child segments.
  • Name selectors.
  • Array slice selectors.
  • Index selectors.
  • Filter selectors.
  • Function extensions.
  • Escape characters.

The library also imposes a number of additional restrictions:

  • Numbers as key names are disallowed.
  • JSONPath keys must not be ambiguous. For example, $.a.b and $.*.b combination is disallowed.
  • JSONPath must not end with a single leading wildcard. Use $.a instead of $.a.*.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskJsonPaths(Set.of(
                        "$.customerDetails.email",
                        "$.customerDetails.age",
                        "$.customerDetails.visaApproved",
                        "$.payment.iban",
                        "$.payment.billingAddress",
                        "$.customerDetails.identificationDocuments.*.number"
                ))
                .build()
);

String maskedJson = jsonMasker.mask(json);

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true,
    "identificationDocuments": [
      {
        "type": "passport",
        "country": "NL",
        "number": "1234567890"
      },
      {
        "type": "passport",
        "country": "US",
        "number": "E12345678"
      }
    ]
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "***",
    "age": "###",
    "visaApproved": "&&&",
    "identificationDocuments": [
      {
        "type": "passport",
        "country": "NL",
        "number": "***"
      },
      {
        "type": "passport",
        "country": "US",
        "number": "***"
      }
    ]
  },
  "payment": {
    "iban": "***",
    "successful": true,
    "billingAddress": [
      "***",
      "***"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Masking with preserving the type

The following configuration might be useful where the value must be masked, but the type needs to be preserved, so that the resulting JSON can be parsed again or if the strict JSON schema is required.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskKeys(Set.of("email", "age", "visaApproved", "iban", "billingAddress"))
                .maskNumbersWith(0)
                .maskBooleansWith(false)
                .build()
);

String maskedJson = jsonMasker.mask(json);

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "***",
    "age": 0,
    "visaApproved": false
  },
  "payment": {
    "iban": "***",
    "successful": true,
    "billingAddress": [
      "***",
      "***"
    ]
  },
  "companyContact": {
    "email": "***"
  }
}

Masking with preserving the length

Example showing masking where the length of the original value (string or number) is preserved.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskKeys(Set.of("email", "age", "visaApproved", "iban", "billingAddress"))
                .maskStringCharactersWith("*")
                .maskNumberDigitsWith(8)
                .build()
);

String maskedJson = jsonMasker.mask(json);

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "*******************************",
    "age": 88,
    "visaApproved": "&&&"
  },
  "payment": {
    "iban": "**********************",
    "successful": true,
    "billingAddress": [
      "*************",
      "*****************"
    ]
  },
  "companyContact": {
    "email": "*************"
  }
}

Masking with using a per-key masking configuration

When using a JsonMaskingConfig you can also define a per-key masking configuration, which allows to customize the way certain values are masked.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskKeys(Set.of("email", "age", "visaApproved", "billingAddress"))
                .maskKeys(Set.of("iban"), KeyMaskingConfig.builder()
                        .maskStringCharactersWith("*")
                        .build()
                )
                .build()
);

String maskedJson = jsonMasker.mask(json);

Note

When defining a config for the specific key and value of that key is an object or an array, the config will apply recursively to all nested keys and values, unless the nested key(s) defines its own masking configuration.

If config is attached to a JSONPath it has a precedence over a regular key.

Input

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "[email protected]",
    "age": 29,
    "visaApproved": true
  },
  "payment": {
    "iban": "NL91 FAKE 0417 1643 00",
    "successful": true,
    "billingAddress": [
      "Museumplein 6",
      "1071 DJ Amsterdam"
    ]
  },
  "companyContact": {
    "email": "[email protected]"
  }
}

Output

{
  "orderId": "789 123 456",
  "customerDetails": {
    "id": 1,
    "travelPurpose": "business",
    "email": "***",
    "age": "###",
    "visaApproved": "&&&"
  },
  "payment": {
    "iban": "**********************",
    "successful": true,
    "billingAddress": [
      "***",
      "***"
    ]
  },
  "companyContact": {
    "email": "***"
  }
}

Masking with a ValueMasker

In addition to standard options like maskStringsWith, maskNumbersWith and maskBooleansWith, the ValueMasker is a functional interface for low-level value masking, which allows fully customizing the masking process. It can be used for masking all values, specific JSON value types, or specific keys.

The ValueMasker operates on the full value on the byte level, i.e., the value is byte[].

Note

ValueMasker can modify JSON value in any way, but also means that the implementation needs to be careful with parsing the value of any JSON type and replacing the correct slice of the value. Otherwise, the masking process could produce an invalid JSON.

For convenience, a couple out-of-the-box maskers are available in ValueMaskers as well as adapters to Function<String, String>.

Usage

var jsonMasker = JsonMasker.getMasker(
        JsonMaskingConfig.builder()
                .maskKeys(Set.of("values"))
                .maskStringsWith(ValueMaskers.withRawValueFunction(value -> value.startsWith("\"secret:") ? "\"***\"" : value))
                .maskKeys(Set.of("email"), KeyMaskingConfig.builder()
                        .maskStringsWith(ValueMaskers.email(/* prefix */ 2, /* suffix */ 2, /* keep domain */ true, "***"))
                        .build()
                )
                .build()
);

String maskedJson = jsonMasker.mask(json);

Input

{
  "values": [
    "not a secret",
    "secret: very much"
  ],
  "email": "[email protected]"
}

Output

{
  "values": [
    "not a secret",
    "***"
  ],
  "email": "ag***[email protected]"
}

Dependencies

  • The library has no third-party runtime dependencies
  • The library only has a single JSR-305 compilation dependency for nullability annotations
  • The test/benchmark dependencies for this library are listed in the build.gradle

Performance

The json-masker library is optimized for a fast key lookup that scales well with a large key set to mask (or allow). The input is only scanned once and memory allocations are avoided whenever possible.

Benchmarks

For benchmarking, we compare the implementation against multiple baseline benchmarks, which are:

  • Counting the bytes of the JSON message without doing any other operation
  • Using Jackson to parse a JSON message into JsonNode and masking it by iterating over and replacing all values corresponding to the targeted keys
  • A naive regex masking (replacement) implementation.

Generally our implementation is ~15-25 times faster than using Jackson, besides the additional benefits of no runtime dependencies and a convenient API out-of-the-box.

Benchmark                              (characters)  (jsonPath)  (jsonSize)  (maskedKeyProbability)   Mode  Cnt        Score        Error  Units
BaselineBenchmark.countBytes                unicode         N/A         1kb                     0.1  thrpt    4  2578523.937 ± 133325.274  ops/s
BaselineBenchmark.jacksonParseAndMask       unicode         N/A         1kb                     0.1  thrpt    4    30917.311 ±   1055.254  ops/s
BaselineBenchmark.regexReplace              unicode         N/A         1kb                     0.1  thrpt    4     5272.318 ±     48.701  ops/s
JsonMaskerBenchmark.jsonMaskerBytes         unicode       false         1kb                     0.1  thrpt    4   369819.788 ±   5381.612  ops/s
JsonMaskerBenchmark.jsonMaskerBytes         unicode        true         1kb                     0.1  thrpt    4   214893.887 ±   2143.556  ops/s
JsonMaskerBenchmark.jsonMaskerString        unicode       false         1kb                     0.1  thrpt    4   179303.261 ±   3833.357  ops/s
JsonMaskerBenchmark.jsonMaskerString        unicode        true         1kb                     0.1  thrpt    4   154621.472 ±   2132.929  ops/s

json-masker's People

Contributors

alexeyshary avatar breus avatar dependabot[bot] avatar donavdey avatar gavlyukovskiy avatar robertblaauwendraad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

json-masker's Issues

Keys and JsonPATH keys are not isolated form each other in the trie

Given the following configuration

maskKeys: $.a
maskJsonPaths: $.b, $.c

And the following json:

{
  "a": "do not mask",
  "b": "mask",
  "c": "mask",
}

The result will be:

{
  "a": "***",
  "b": "***",
  "c": "***",
}

The expected result should be:

{
  "a": "do not mask",
  "b": "***",
  "c": "***",
}

The bug is caused by not separating keys and JsonPATH keys in the trie. This can be solved by either using a separate trie instance for JsonPATH keys or ascribing a type to the endOfWord field in trie nodes (endOfJsonPathKeyWord and endOfKeyWord)

Disallow unambiguous JsonPath

With the introduction of wildcards, we want to disallow unambiguous JsonPath combinations. The main reason is to avoid loopbacks during matching that could occur for such combinations:

$.a.*.c
$.a.b.*

on a JSON

{
  "a": {
    "b": {
      "d": "masked"
    }
  }
}

current implementation only matches forward, meaning that node "a.b" would be already matched against "a.*", but the next node "d" would fail to match the first JsonPath while satisfying the second JsonPath, which would require a loopback.

Until someone asks for this particular use case, providing a good reason, we should not allow these.

Replace RandomWhiteSpaceInjector to Jackson PrettyPrinter implementation

Currently, the RandomJsonWhiteSpaceInjector is used to have fuzzing tests that also have random white spaces injected in different places.

However, Jackson is not able to deal with certain white spaces which are injected that way. For example, unescaped cariage return cannot be used inside a JSON key in Jackson.

Right now, the Fuzzing test checks if the randomly generated, white space injected JSON can be parsed by Jackson. If it cannot, it is disregarded and a new random JSON is generated to do fuzzing testing on.

This however, is a quick fix we added and wastes computing resources which in turn decrease the number of fuzzing tests that can be run in a predetermined time frame (in our pipelines).

What we can do instead is implement a com.fasterxml.jackson.core.PrettyPrinter instance that also randomly injects white spaces and use that instead of our current RandomJsonWhiteSpaceInjector implementation.

Support wildcards in JsonPath

We want to support wildcards for array and object matching:

$.a.*.b

which should mask

{
  "a": [
    {
      "b": "masked",
      "c": "allowed"
    },
    "allowed"
  ]
}

and

{
  "a": {
    "d": {
      "b": "masked",
      "c": "allowed"
    },
    "e": "allowed"
  }
}

Support for recursive selectors $.a..b is not planned as it would require matching from the tail and require a loopback.

Unify API runmtime exceptions on masker when invalid JSON is provided

Currently, calling JsonMasker.mask(String or byte[]) can give two results:

  1. In case valid JSON was provided, valid JSON is returned according to the provided masking configurations
  2. In case invalid JSON was provided, some runtime exception is thrown

The second case can be caused by multiple cases (e.g. IllegalStateException or ArrayOutOfBoundException).

This should be unified such that the API can give two results:

  1. Unchanged
  2. An InvalidJsonException runtime exception is thrown

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.