Giter Site home page Giter Site logo

y-ken / fluent-plugin-anonymizer Goto Github PK

View Code? Open in Web Editor NEW
53.0 53.0 14.0 84 KB

Fluentd filter output plugin to anonymize records with MD5/SHA1/SHA256/SHA384/SHA512 algorithms. This data masking plugin protects privacy data such as ID, email, phone number, IPv4/IPv6 address and so on.

Home Page: http://rubygems.org/gems/fluent-plugin-anonymizer

License: Other

Ruby 100.00%
anonymize fluentd ruby

fluent-plugin-anonymizer's People

Contributors

bocytko avatar cosmo0920 avatar jerezzprime avatar okkez avatar pavdmyt avatar tagomoris avatar y-ken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fluent-plugin-anonymizer's Issues

Can't anonymize fields containing <ip-addr>:<port>

I'm having multiple logs entries with ipPort field with values like this: 192.168.0.1:8080. Trying to use anonymizer with the following config does not produce desired result:

<source>
  @type dummy
  tag raw.dummy
  dummy [ {"ipPort":"10.102.3.80:5555","member_id":"12345", "mail":"[email protected]"} ]
</source>

<filter raw.**>
  @type anonymizer

  # Specify rounding address keys with comma and subnet mask
  <mask network>
    key_pattern .*
    ipv4_mask_bits  0
    ipv6_mask_bits  0
  </mask>
</filter>

<match raw.**>
  @type stdout
</match>

Gives:

2020-06-22 17:17:46.099515554 +0000 raw.dummy: {"ipPort":"10.102.3.80:5555","member_id":"12345","mail":"[email protected]"}

Probably plugin does not recognize value 10.102.3.80:5555 as an IP address, hence skips it. Is there any workaround to handle such cases?

Implementing a filter plugin support

This plugin is a great example of a useful output plugin that should really be a filter plugin. I suggest that

  1. the anonymization functionality is factored out into a re-usable library used both by the existing out_anonymizer.rb and filter_anonymizer.rb
  2. Implement the filter plugin version (filter_anonymizer)

Support for masking a part of string within a value

I am using the value_pattern to anonymize the value of the msg key from log. It is observed that the fluent-plugin-anonymizer hashes the entire string which โ€˜hasโ€™ the matching pattern.

Can you add support for hashing only the part of matching string and not the entire value?

Sample Config:
<filter **>
  @type anonymizer
  <mask sha1>
     value_pattern UserId:\d{3}-?\d{3}
  </mask>
 </filter>

Log Sample:
{ "level": "INFO", "timezone": "UTC", "threadname": "main", "msg": "Logged in UserId:123-456"}

Output:
{ "level": "INFO", "timezone": "UTC", "threadname": "main", "msg": "<hashed value>"}

Expected Output:
{"level": "INFO", "timezone": "UTC", "threadname": "main", "msg": "Logged in <hashed value>"}

Anonymizer does not handle multi-level keys.

I have a complex log entry that is tailed from a file in the JSON format such as:

 { "eventType": "logEvent", "context": { "email": "[email protected]" } }

When I try to anonymize the email field with the following config, the email field is not anonymized:

<match anonymize_event.**>
    type anonymizer
    remove_tag_prefix anonymize_event.
    sha1_keys context.email
    hash_salt salty_awesomeness
  </match>

Am I configuring the plugin incorrectly?

Is <mask ARGUMENT> style easy to mistake?

Copied from #25

@y-ken said:


I suggest <mask> section won't have ARGUMENT like <mask ARGUMENT>.
Because It does not often used section which have argument as like of other plugins.
In other point of view, It can be mistake to masking network key when using<mask network> style.
How do you think about this?

I suggest this style. It is hard to mistake setting mask type.

  <mask>
    type sha1
    keys user_id, member_id, mail
    # Set hash salt with any strings for more security
    salt mysaltstring
  </mask>

Current sample is below.

  <mask sha1>
    keys user_id, member_id, mail
    # Set hash salt with any strings for more security
    salt mysaltstring
  </mask>

@y-ken
In v0.14/v1, buffer configuration is also using argument to specify chunk_keys:
https://docs.fluentd.org/v1.0/articles/buffer-section#chunk-keys
I think that this arguments feature is not well known feature, though...

Support label feature

This plugin seems to be using Fluent::Engine.emit.
It should be using router.emit instead.

I'll start to work this a bit of improvement after #8 is merged.

Drop out_anonymizer

Because out_anonymizer is not migrated to v1 API and uses some deprecated APIs. (using compat layer)
Therefore we cannot keep maintaining out_anonymizer anymore.

How do you think about this? @y-ken

Improve filter_anonymizer configurations

Current configuration is redundant and very complicated.
We can simplify parameters using record_accessor plugin helper.

My plan is followings:

  • organize configurations
    • remove mask/{key,key_chain,key_chanis}
    • use record_accessor plugin helper
    • add :deprecated option or :obsoleted options to obsolete configuration parameters
    • refactor code
    • update README.md

If you are OK, I will create patches for this issue.

endless loop caused by HandleTagNameMixin miss usage

On using HandleTagNameMixin for multiple filter chain, we must use tag.dup.

ref. kentaro/fluent-plugin-extract_query_params#1

sample config
<match app.postfix.sent>
  type anonymizer
  sha1_keys         mail
  remove_tag_prefix app.postfix.
  add_tag_prefix    mail.
</match>

sample log

2013-12-03 18:35:09 +0900 [warn]: no patterns matched tag="mail.mail.sent"
2013-12-03 18:35:09 +0900 [warn]: no patterns matched tag="mail.mail.mail.sent"
2013-12-03 18:35:09 +0900 [warn]: no patterns matched tag="mail.mail.mail.mail.sent"
2013-12-03 18:35:09 +0900 [warn]: no patterns matched tag="mail.mail.mail.mail.mail.sent"
2013-12-03 18:35:09 +0900 [warn]: no patterns matched tag="mail.mail.mail.mail.mail.mail.sent"
2013-12-03 18:35:09 +0900 [warn]: no patterns matched tag="mail.mail.mail.mail.mail.mail.mail.sent"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.