Giter Site home page Giter Site logo

ozontech / file.d Goto Github PK

View Code? Open in Web Editor NEW
304.0 11.0 66.0 7.11 MB

A blazing fast tool for building data pipelines: read, process and output events. Our community: https://t.me/file_d_community

Home Page: https://ozontech.github.io/file.d

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 0.05% Makefile 0.18% Go 99.30% HTML 0.10% Smarty 0.17% Shell 0.21%
pipeline processing go input output actions events logs observability tracing

file.d's People

Contributors

adokukin avatar akastav avatar alikhil avatar andrewmed avatar ansakharov avatar anuriq avatar d-ulyanov avatar dependabot[bot] avatar dmitryromanov avatar dsmolonogov avatar goshansmails avatar headhunter483 avatar juneezee avatar kamui26 avatar kirillov6 avatar mmatros avatar sashamelentyev avatar snyssfx avatar ssnd avatar vadimalekseev avatar vano144 avatar vitkovskii avatar ythosa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

file.d's Issues

Feature: contributing guidelines

A good and structured approach to contributions/pr's is crucial to make file.d development as easy, effective and transparent as possible for everyone involved.

Here's a list of things that should be done initially:

  • Write an initial CONTRIBUTING.md that would contain some basic contributing guidelines that everyone should follow
  • Create issue templates for bug reports/feature requests
  • Choose a git branch naming convention

Ideally the list should also include adding a code of conduct and choosing a license but these things aren't that important for now I think.

@vano144 @ansakharov @andrewmed would love your input on this one

Feature: Go 1.18

Is your feature request related to a problem? Please describe.
Increase the minimum version of Go to 1.18. Impact on code logic is not expected.

Add postgres output plugin

Required output plugin for pgsql.
Plugin must collect batch of events and write it into DB.
Config should contain DB fields with column name, type and uniqueness.
For first implementation required: integer, string, timestamp pgtypes.
Fields that have unique constraint should have unique: true.
DB request timeout and retry count and retention should be configurable.

Config suggesstion:

    output:
      type: postgres
      batch_flush_timeout: "5s"
      db_request_timeout: "3000ms"
      retention: "50ms"
      batch_size: 2 * 1
      strict: false 
      host: 0.0.0.0
      port: 54320
      dbname: postgres
      user: postgres
      password: my_password   
      table: s3index
      columns:
        - name: service
          type: string
          unique: false
        - name: host
          type: string
          unique: true
        - name: first_date 
          type: timestamp
          unique: false
        - name: last_date
          type: timestamp
          unique: false
        - name: s3_url
          type: string
          unique: false
        - name: domain
          type: string
          unique: false

Feature: Add optional commit commit to kafka in s3 plugin

Issue
Current implementation: file plugin commits event after written to file, s3 plugin writes zip files to storage.
Desired implementation: file plugin commits event after written to file, s3 plugin writes zip files to storage and commits metadata of upload to kafka.
Feature must be optional.

Solution
Add optional embedding kafka output plugin in s3 plugin and add commitment logic.
Afterwards commits to other sources will be provided.

Feature: cfg.Unit

Add cfg.Unit type for config parsing. It should be able to handle MB, MiB, GB, GiB and other dimensions. Add the ability to handle aliases

Feature: mask action add field

Issue
Sometime it's hard to find masked event in storage. Need solution, that helps filter such events.

Solution
Add into config optional props: masked_key, masked_value.
If masked_key were set, than masked_key: masked_value should be added to event.

Bug: http input plugin cannot parse log

Describe the bug

Http input plugin cannot parse log without '\n' at the end.

To Reproduce

To reproduce the behavior:

curl "localhost:9999/logger" -H 'Content-Type: application/json' -d \
'{"message": "hello", "kind": "normal"}'

But it works:

curl "localhost:9999/logger" -H 'Content-Type: application/json' -d \
'{"message": "hello", "kind": "normal"}
'

Bug: wrong regex for golang panic join

Describe the bug
Wrong regex for golang panic join.

To Reproduce
Steps to reproduce the behavior:

Run TestSimpleJoin with this stack trace:

panic: interface conversion: *card.CheckUserDeleteResponse is not protoreflect.ProtoMessage: missing method ProtoReflect

goroutine 1112 [running]:
gitlab.ozon.ru/platform/scratch/pkg/mw/grpc/callopts.unaryClientInterceptor.func1({0x18bda78, 0xc002e47080}, {0x162a945, 0x1a}, {0x158e100, 0xc002e46e40}, {0x158d840, 0xc002e46e70}, 0x1, 0xc002e471a0, ...)
    /builds/.cache/go/pkg/mod/gitlab.ozon.ru/platform/[email protected]/pkg/mw/grpc/callopts/interceptors.go:49 +0x205
google.golang.org/grpc.getChainUnaryInvoker.func1({0x18bda78, 0xc002e47080}, {0x162a945, 0x1a}, {0x158e100, 0xc002e46e40}, {0x158d840, 0xc002e46e70}, 0xc0007d14c0, {0x0, ...})
    /builds/.cache/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:360 +0x154
gitlab.ozon.ru/platform/scratch/pkg/mw/grpc/circuitbreaker.(*RTCircuitBreakerGroup).UnaryClientInterceptor.func1.1({0x18bda78, 0xc002e47080})
    /builds/.cache/go/pkg/mod/gitlab.ozon.ru/platform/[email protected]/pkg/mw/grpc/circuitbreaker/interceptors.go:23 +0x6b
gitlab.ozon.ru/platform/circuit/v3.(*Circuit).run(0xc002e449c0, {0x18bda78, 0xc002e47080}, 0xc0007d1710)
    /builds/.cache/go/pkg/mod/gitlab.ozon.ru/platform/circuit/[email protected]/circuit.go:298 +0x2b7
gitlab.ozon.ru/platform/circuit/v3.(*Circuit).Execute(0xc002e449c0, {0x18bda78, 0xc002e47080}, 0x162a945, 0x1a)
    /builds/.cache/go/pkg/mod/gitlab.ozon.ru/platform/circuit/[email protected]/circuit.go:235 +0x65
gitlab.ozon.ru/platform/scratch/pkg/mw/grpc/circuitbreaker.(*RTCircuitBreakerGroup).UnaryClientInterceptor.func1({0x18bda78, 0xc002e47080}, {0x162a945, 0x1a}, {0x158e100, 0xc002e46e40}, {0x158d840, 0xc002e46e70}, 0xc0008bcf00, 0xc002e470b0, ...)
    /builds/.cache/go/pkg/mod/gitlab.ozon.ru/platform/[email protected]/pkg/mw/grpc/circuitbreaker/interceptors.go:20 +0x18c
google.golang.org/grpc.getChainUnaryInvoker.func1({0x18bda78, 0xc002e47080}, {0x162a945, 0x1a}, {0x158e100, 0xc002e46e40}, {0x158d840, 0xc002e46e70}, 0x7fd128bcbfff, {0x0, ...})
    /builds/.cache/go/pkg/mod/google.golang.org/[email protected]/clientconn.go:360 +0x154

Expected behavior
Test passed

Feature: add conditions to rename action

Is your feature request related to a problem? Please describe.
We have 2 apps running in kubernetes, the first one produces logs like { "message": "hello" } and the second one like { "message": { "field": "hello", "otherField": "world"}}.
As result after writing these logs to the same index in elasticsearch there is a conflict on type of field and kibana does not show them.

Describe the solution you'd like
I would like to rename message field if it's json object to for example message_json but rename and neither other actions don't support any conditions/filters.

Describe alternatives you've considered
For now, we separated logs from different apps to different indexes.

Bug: can't easily pass the pipeline

Describe the bug
Some tests fail, difficult to pass the pipeline

To Reproduce
Steps to reproduce the behavior:

  1. Run test job

Expected behavior
Tests should not fail

Feature: set_time action

Is your feature request related to a problem? Please describe.
I want to set time using actions.

Describe the solution you'd like
New plugin set_time.

Additional context
Now I can't set time, only convert.

Feature: Publish images to GHCR

Describe the solution you'd like
Let's add CI-step for publishing last Docker-image to Github container registry automatically for tags.

Feature: event_size limit support

Is your feature request related to a problem? Please describe.

I can limit batch size by quantity but I need to limit batch by size in the elasticsearch output plugin

Feature: improve code style

Is your feature request related to a problem? Please describe.
File.d codebase contains other code styles.

Describe the solution you'd like
Configure the golangci-lint and fix common styling issues.

Feature: automated builds for new file.d releases

Automate the build and release process when new tags are created.
Binaries compiled for different platforms and architectures should be uploaded to the releases page along with .deb files and changelog.

Bug: fix pg plugin to bog timestamp

Describe the bug
Postgres can't save dates from far future( >294_000 a.d.) and crushes.
To Reproduce
Steps to reproduce the behavior:

  1. Create config with timestamp pg filed
  2. Send event from far future: `{"last_date": 63746702582048}

Expected behavior
Event shall be discarded.

Feature: add version to mertics

For better work with metrics file.d version should be added to mertic labels or tags.
This will allow to group metrics by version and easily detect old version of file.d.

Feature: add metrics for events count, size for input/output

Is your feature request related to a problem? Please describe.
metrics_holder only creates metrics for i/o events when there are actions between input&output.
We want to be able to see the count and size of input (and output?) events even when there are no configured actions (i.e. there's only an input and output).

Describe the solution you'd like
Add a separate metric that shows input and output event count and size for each pipeline.

Describe alternatives you've considered
We could also create a dummy action plugin only to have the events count/size metric created though I don't think it's the right choice here.

Bug: eventPool.dump panics

Describe the bug
eventPool.dump panics when/pipelines/<name> is called.

To Reproduce
Steps to reproduce the behavior:

  1. eventPoll must be busy;
  2. call /pipelines/<name> to trigger .dump();

Expected behavior
It shouldn't panic.

Additional context
Version: v0.5.2
Stack trace:

2022/05/26 08:15:03 http: panic serving 10.37.201.234:51522: runtime error: invalid memory address or nil pointer dereference
goroutine 1328031 [running]:
net/http.(*conn).serve.func1()
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:1802 +0xb9
--
goroutine 1328031 [running]:
net/http.(*conn).serve.func1()
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:1802 +0xb9
panic({0x144d100, 0x22cf530})
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/runtime/panic.go:1047 +0x266
github.com/ozontech/file.d/pipeline.(*Event).String(0x15fb4a7)
	/Users/skiritsa/go/src/file.d/pipeline/event.go:189 +0x27
github.com/ozontech/file.d/pipeline.(*eventPool).dump.func1()
	/Users/skiritsa/go/src/file.d/pipeline/event.go:300 +0x53
github.com/ozontech/file.d/logger.Cond(...)
	/Users/skiritsa/go/src/file.d/logger/util.go:25
github.com/ozontech/file.d/pipeline.(*eventPool).dump(0xc000314200)
	/Users/skiritsa/go/src/file.d/pipeline/event.go:296 +0x65
github.com/ozontech/file.d/pipeline.(*Pipeline).servePipeline(0xc000134580, {0x18186d8, 0xc0057542a0}, 0x0)
	/Users/skiritsa/go/src/file.d/pipeline/pipeline.go:627 +0x106
net/http.HandlerFunc.ServeHTTP(0x0, {0x18186d8, 0xc0057542a0}, 0x0)
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:2047 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0, {0x18186d8, 0xc0057542a0}, 0xc00ab80e00)
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:2425 +0x149
net/http.serverHandler.ServeHTTP({0xc0079ab1d0}, {0x18186d8, 0xc0057542a0}, 0xc00ab80e00)
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc007fca780, {0x1823eb8, 0xc000551a40})
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
	/opt/homebrew/Cellar/go/1.17.6/libexec/src/net/http/server.go:3034 +0x4e8

Bug: Fix join action plugin behaviour for nil node

Currently there is no handling of nil node in join action plugin but value for nil node taken is an empty string "", which can be a valid for regex of kind \s* (valid continuation of go panic message). Thus on nil node event is processed as a valid continuation of the log when it shouldn't (e.g. check behaviour for continue regex for which empty string is not valid).

Bug: Invalid Kafka events didn't get ack it non-strict mode.

Describe the bug
If invalid event passed to pipeline in non-strict mode it will not be acked. If such events fill pipelines capacity work will lock.

To Reproduce

  1. Create config with input: settings: { capacity: 8 } in kafka plugin.
  2. Send 9 invalid json message: {"no_value": } - and so on.
  3. At ninth message pipeline will get locked.

Expected behavior
In non-strict mode such events mustn't pass to output but should be acked in kafka.

Additional context
Version: v0.5.2 (no matter)
Platform: MacOS (no matter)

Feature: redis throttle

For better separation of throttling required different mechanism of limiting.

Solution
Use Redis to store limit for each pod and renew it. In absence of Redis fallback to memory limiter.

Feature: support es data streams

Is your feature request related to a problem? Please describe.
When trying to configure file.d to write to datastream the following error occurs:

error	fd.k8s.output elasticsearch	indexing error: {"type":"illegal_argument_exception","reason":"only write ops with an op_type of create are allowed in data streams"}

Describe the solution you'd like
Add option to configure op_type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.