Giter Site home page Giter Site logo

Support input from STDIN about plumber HOT 8 CLOSED

streamdal avatar streamdal commented on August 25, 2024
Support input from STDIN

from plumber.

Comments (8)

blinktag avatar blinktag commented on August 25, 2024 1

@christoshadjiaslanis piping data has been added in v0.30.0

The default behavior is to treat each newline as a separate message.
If you specify the --json-array flag, it will expect a JSON array as input and treat each object in the top level array as a separate message

from plumber.

dselans avatar dselans commented on August 25, 2024

Fantastic idea! How would you see event delimitation to work? One event per line? Something else?

from plumber.

christos-h avatar christos-h commented on August 25, 2024

There are 2 options I see:

  1. Separate events by new lines.
{ "a" : 5 }
{ "a" : 10 }

This is how reading from a file works with plumber I believe so behavior would be consistent. The main drawback here is that I don't think this is part of the json spec.

  1. Events are submitted as a JSON array.
[ { "a" : 5 }, { "a" : 10 } ]

This would conform to the json spec (i.e. can be parsed without string-fu). The question here is how to disambiguate between an 'array of events', and 'my event is literally an array'. I guess if your event is an array you could have:

[ [...], [...] ]

Naively I would opt for 2 but I'm not sure. Are there other tools out there which receive collections of json objects as new-line delimited or is this unique to plumber?

from plumber.

dselans avatar dselans commented on August 25, 2024

I'd opt for #1 as it doesn't require you to transform your input JSON. I think the fact that it's not valid JSON is irrelevant - it is just a transport stream - the resulting JSON input (the 1 line) is what matters.

Another reason - if you're piping data into plumber via CLI, you will be using various other tools to get that done. Minifying JSON and transforming it into a single line would be trivial while transforming the input JSON into a single blob would be quite difficult. Also, if you're piping in 100M events, does that mean you have a single JSON array with a 100 million entries?

Finally - streaming input data - if you do arrays, streaming would be rough.


As for other folks that do newline delimited JSON - you can have newline delimited JSON in S3 and have it be searchable using Athena (not optimal, but hey).


Another option I see potentially viable - delimiters between entries:

{{BEGIN}} 
{"foo": 
  {
  "bar": "baz"
  }
}
{{END}}

I think I still go for option #1. Minifying is easy and less intrusive than having to manage delimiters.

Thoughts?

from plumber.

christos-h avatar christos-h commented on August 25, 2024

Option 1 sounds good to me :)

from plumber.

christos-h avatar christos-h commented on August 25, 2024

Are there other tools out there which receive collections of json objects as new-line delimited or is this unique to plumber?

The mongodb mongoimport tool which is used to import data into mongo has an optional --jsonArray field which toggles accepting json in array format. For whatever reason this is limited to 16MB but I think this is a good compromise.

from plumber.

dselans avatar dselans commented on August 25, 2024

You know, we could just support both haha

from plumber.

christos-h avatar christos-h commented on August 25, 2024

Yeah! That is the point of the --jsonArray flag 💯

from plumber.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.