mbostock / ndjson-cli Goto Github PK

Command line tools for operating on newline-delimited JSON streams.

License: Other

JavaScript 100.00%

ndjson-cli's Introduction

ndjson-cli

Unix-y tools for operating on newline-delimited JSON streams.

npm install ndjson-cli

Command Line Reference

Options
ndjson-cat - concatenate values to form a stream
ndjson-filter - filter values
ndjson-map - transform values
ndjson-reduce - reduce a stream of values to a single value
ndjson-split - transform values to streams of values
ndjson-join - join two streams of values into a single stream
ndjson-sort - sort a stream of values
ndjson-top - select the top values from a stream

Options

All ndjson-cli commands support --help and --version. Commands that take an expression also support --require.

# ndjson-command -h
# ndjson-command --help

Output usage information.

# ndjson-command -V
# ndjson-command --version

Output the version number.

# ndjson-command -r [name=]module
# ndjson-command --require [name=]module

Requires the specified module, making it available for use in any expressions used by this command. The loaded module is available as the symbol name. If name is not specified, it defaults to module. (If module is not a valid identifier, you must specify a name.) For example, to sort using d3.ascending from d3-array:

ndjson-sort -r d3=d3-array 'd3.ascending(a, b)' < values.ndjson

Or to require all of d3:

ndjson-sort -r d3 'd3.ascending(a, b)' < values.ndjson

The required module is resolved relative to the current working directory. If the module is not found during normal resolution, the global root is also searched, allowing you to require global modules from the command line.

cat

# ndjson-cat [files…] <>

Sequentially concatenates one or more input files containing JSON into a single newline-delimited JSON on stdout. If files is not specified, it defaults to “-”, indicating stdin. This command is especially useful for converting pretty-printed JSON (that contains newlines) into newline-delimited JSON. For example, to print the binaries exported by this repository’s package.json:

ndjson-cat package.json | ndjson-split 'Object.keys(d.bin)'

filter

# ndjson-filter [expression] <>

Filters the newline-delimited JSON stream on stdin according to the specified expression: if the expression evaluates truthily for the given value d at the given zero-based index i in the stream, the resulting value is output to stdout; otherwise, it is ignored. If expression is not specified, it defaults to true. This program is much like array.filter.

For example, given a stream of GeoJSON features from shp2json, you can filter the stream to include only the multi-polygon features like so:

shp2json -n example.shp | ndjson-filter 'd.geometry.type === "MultiPolygon"'

Or, to skip every other feature:

shp2json -n example.shp | ndjson-filter 'i & 1'

Or to take a random 10% sample:

shp2json -n example.shp | ndjson-filter 'Math.random() < 0.1'

Side-effects during filter are allowed. For example, to delete a property:

shp2json -n example.shp | ndjson-filter 'delete d.properties.FID, true'

map

# ndjson-map [expression] <>

Maps the newline-delimited JSON stream on stdin according to the specified expression: outputs the result of evaluating the expression for the given value d at the given zero-based index i in the stream. If expression is not specified, it defaults to d. This program is much like array.map.

For example, given a stream of GeoJSON features from shp2json, you can convert the stream to geometries like so:

shp2json -n example.shp | ndjson-map 'd.geometry'

Or you can extract the properties, and then convert to tab-separated values:

shp2json -n example.shp | ndjson-map 'd.properties' | json2tsv -n > example.tsv

You can also add new properties to each object by assigning them and then returning the original object:

shp2json -n example.shp | ndjson-map 'd.properties.version = 2, d' | json2tsv -n > example.v2.tsv

reduce

# ndjson-reduce [expression [initial]] <>

Reduces the newline-delimited JSON stream on stdin according to the specified expression. For each value in the input stream, evaluates the expression for the given value d at the given zero-based index i in the stream and the previous value p, which is initialized to initial. If expression and initial are not specified, they default to p.push(d), p and [], respectively, merging all input values into a single array (the inverse of ndjson-split). Otherwise, if initial is not specified, the first time the expression is evaluated p will be equal to the first value in the stream (i = 0) and d will be equal to the second (i = 1). Outputs the last result when the stream ends. This program is much like array.reduce.

For example, to count the number of values in a stream of GeoJSON features from shp2json, like wc -l:

shp2json -n example.shp | ndjson-reduce 'p + 1' '0'

To merge a stream into a feature collection:

shp2json -n example.shp | ndjson-reduce 'p.features.push(d), p' '{type: "FeatureCollection", features: []}'

Or equivalently, in two steps:

shp2json -n example.shp | ndjson-reduce | ndjson-map '{type: "FeatureCollection", features: d}'

To convert a newline-delimited JSON stream of values to a JSON array, the inverse of ndjson-split:

ndjson-reduce < values.ndjson > array.json

split

# ndjson-split [expression] <>

Expands the newline-delimited JSON stream on stdin according to the specified expression: outputs the results of evaluating the expression for the given value d at the given zero-based index i in the stream. The result of evaluating the expression must be an array (though it may be the empty array if no values should be output for the given input). If expression is not specified, it defaults to d, which assumes that the input values are arrays.

For example, given a single GeoJSON feature collection from shp2json, you can convert a stream of features like so:

shp2json example.shp | ndjson-split 'd.features'

To convert a JSON array to a newline-delimited JSON stream of values, the inverse of ndjson-reduce:

ndjson-split < array.json > values.ndjson

join

# ndjson-join [expression₀ [expression₁]] file₀ file₁ <>

Joins the two newline-delimited JSON streams in file₀ and file₁ according to the specified key expressions expression₀ and expression₁. For each value d₀ at the zero-based index i₀ in the stream file₀, the corresponding key is the result of evaluating the expression₀. Similarly, for each value d₁ at the zero-based index i₁ in the stream file₁, the corresponding key is the result of evaluating the expression₁. When both input streams end, for each distinct key, the cartesian product of corresponding values d₀ and d₁ are output as an array [d₀, d₁].

If expression₀ is not specified, it defaults to i, joining the two streams by line number; in this case, the length of the output stream is the shorter of the two input streams. If expression₁ is not specified, it defaults to expression₀.

For example, consider the CSV file a.csv:

name,color
Fred,red
Alice,green
Bob,blue

And b.csv:

name,number
Fred,21
Alice,42
Bob,102

To merge these into a single stream by name using csv2json:

ndjson-join 'd.name' <(csv2json -n a.csv) <(csv2json -n b.csv)

The resulting output is:

[{"name":"Fred","color":"red"},{"name":"Fred","number":"21"}]
[{"name":"Alice","color":"green"},{"name":"Alice","number":"42"}]
[{"name":"Bob","color":"blue"},{"name":"Bob","number":"102"}]

To consolidate the results into a single object, use ndjson-map and Object.assign:

ndjson-join 'd.name' <(csv2json -n a.csv) <(csv2json -n b.csv) | ndjson-map 'Object.assign(d[0], d[1])'

# ndjson-join --left
# ndjson-join --right
# ndjson-join --outer

Specify the type of join: left, right, or outer. Empty values are output as null. If none of these arguments are specified, defaults to inner.

sort

# ndjson-sort [expression] <>

Sorts the newline-delimited JSON stream on stdin according to the specified comparator expression. After reading the entire input stream, sorts the array of values with a comparator that evaluates the expression for two given values a and b from the input stream. If the resulting value is less than 0, then a appears before b in the output stream; if the value is greater than 0, then a appears after b in the output stream; any other value means that the partial order of a and b is undefined. If expression is not specified, it defaults to ascending natural order.

For example, to sort a stream of GeoJSON features by their name property:

shp2json -n example.shp | ndjson-sort 'a.properties.name.localeCompare(b.properties.name)'

top

# ndjson-top [expression] <>

Selects the top n values (see -n) from the newline-delimited JSON stream on stdin according to the specified comparator expression. (This selection algorithm is implemented using partial heap sort.) After reading the entire input stream, outputs the top n values in ascending order. As with ndjson-sort, the input values are compared by evaluating the expression for two given values a and b from the input stream. If the resulting value is less than 0, then a appears before b in the output stream; if the value is greater than 0, then a appears after b in the output stream; any other value means that the partial order of a and b is undefined. If expression is not specified, it defaults to ascending natural order. If the input stream has fewer than n values, this program is equivalent to ndjson-sort.

For example, to output the GeoJSON feature with the largest size property:

shp2json -n example.shp | ndjson-top 'a.properties.size - b.properties.size'

This program is equivalent to ndjson-sort expression | tail -n count, except it is much more efficient if n is smaller than the size of the input stream.

# ndjson-top -n count

Specify the maximum number of output values. Defaults to 1.

Recipes

To count the number of values in a stream:

shp2json -n example.shp | wc -l

To reverse a stream:

shp2json -n example.shp | tail -r

To take the first 3 values in a stream:

shp2json -n example.shp | head -n 3

To take the last 3 values in a stream:

shp2json -n example.shp | tail -n 3

To take all but the first 3 values in a stream:

shp2json -n example.shp | tail -n +4

ndjson-cli's People

Contributors

Stargazers

Watchers

Forkers

rjmcguire awesome-javascript bianchimro aendra-rininsland reidab niilante jstcki zhuberty stevage frou josesaribeiro itsnickbarry dev-alex-alex2006hw szhangpitt git-ashish abgaryanharutyun jcalan8907 derhuerst tjdev7

ndjson-cli's Issues

Invalid join when using --inner flag

Given:

a.json

{"name":"Fred","color":"red"}
{"name":"Alice","color":"green"}
{"name":"Bob","color":"blue"}

and b.json

{"name":"Fred","number":"21"}
{"name":"Bob","number":"102"}
{"name":"Alice","number":"42"}

the command ndjson-join --inner 'd.name' <(cat a.json) <(cat b.json) outputs:

[{"name":"Fred","color":"red"},{"name":"Fred","number":"21"}]
[{"name":"Alice","color":"green"},{"name":"Bob","number":"102"}]
[{"name":"Bob","color":"blue"},{"name":"Alice","number":"42"}]

Note that it works as expected when the --inner flag is omitted.

Feature Request: Option to ignore corrupt lines instead of SyntaxError?

With the file demo.json (contents below) which has some lines that are incomplete /malformed json, ndjson-cat raises an error:

 $ ndjson-cat  demo.json 
demo.json: SyntaxError: Unexpected token a

Would it be worthwhile to add this feature as an option to ndjsonfilter (or is it already possible there?)?.

# demo.json
aofa": 1}
{"five": 1}
{"wa

require() not supported

Using node v16.13.2 I get:

Error [ERR_REQUIRE_ESM]: require() of ES Module C:\Users\erwin\AppData\Roaming\npm\node_modules\d3\src\index.js from C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\requires.js not supported.
Instead change the require of index.js in C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\requires.js to a dynamic import() which is available in all CommonJS modules.
at module.exports (C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\requires.js:6:12)
at Command. (C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\node_modules\commander\index.js:412:13)
at Command.emit (node:events:390:28)
at Command.parseOptions (C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\node_modules\commander\index.js:730:14)
at Command.parse (C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\node_modules\commander\index.js:471:21)
at Object. (C:\Users\erwin\AppData\Roaming\npm\node_modules\ndjson-cli\ndjson-map:15:6) {
code: 'ERR_REQUIRE_ESM'
}

ndjson-join file arguments should come after expression.

Per UNIX convention.

Add ndjson-sort?

Add ndjson-reverse?

Default expressions?

filter could default to true
map could default to d
reduce could default to p.push(d), p and []?
sort could default to ascending
split could default to d

Question regarding filtering

(Cross-post from https://gis.stackexchange.com/questions/298144/filter-geojson-by-with-ndjson )

I would like to filter a GeoJSON with pipelines by the properties diameter and type (gas/oil):

{"type":"Feature","properties":{"OPERATOR":"CENTERPOINT ENERGY INTRA P/L,LLC","CONTACT_PHONE_NUMBER":"2819600076","COMMODITY_DESCRIPTION":"NATURAL GAS","SYSTEM_NAME":"102","SUBSYSTEM_NAME":"IGS SYSTEM","DIAMETER":10.75,"T4PERMIT":"03755","T4PERMIT_MILES":0.17,"TYPE":"NATURAL GAS","INTERSTATE":"No","SYSTEM_TYPE":"Gas Transmission              ","STATUS":"In Service","COUNTY_FIPS":"201","OBJECTID":114027,"COUNTY_NAME":"HARRIS","SHAPE.LEN":893.364654362696,"P5_NUM":"141067"},"geometry":{"type":"LineString","coordinates":[[-95.21162248240012,29.809304236846675],[-95.21188396247372,29.809229064820354],[-95.21213367287471,29.809216873014062],[-95.21253903873001,29.809090744025184],[-95.21428860240415,29.808534857775637]],"bbox":[-95.21428860240415,29.808534857775637,-95.21162248240012,29.809304236846675]}}
{"type":"Feature","properties":{"OPERATOR":"CENTERPOINT ENERGY INTRA P/L,LLC","CONTACT_PHONE_NUMBER":"2819600076","COMMODITY_DESCRIPTION":"NATURAL GAS","SYSTEM_NAME":"102","SUBSYSTEM_NAME":"IGS SYSTEM","DIAMETER":14,"T4PERMIT":"03755","T4PERMIT_MILES":0.72,"TYPE":"NATURAL GAS","INTERSTATE":"No","SYSTEM_TYPE":"Gas Transmission              ","STATUS":"In Service","COUNTY_FIPS":"201","OBJECTID":114028,"COUNTY_NAME":"HARRIS","SHAPE.LEN":3816.63490489555,"P5_NUM":"141067"},"geometry":{"type":"LineString","coordinates":[[-95.22946208114293,29.74733103507275],[-95.22730678727892,29.74736645770293],[-95.22697222682558,29.74738420770742],[-95.22619495077038,29.74739703963204],[-95.2256267402878,29.74741726953456],[-95.2251207496631,29.747439244223912],[-95.22454696130762,29.747516679799684],[-95.22400349999367,29.747569300304647],[-95.22396883251615,29.7475741127291],[-95.22391307079606,29.747581851133376],[-95.22354516023839,29.747846246752985],[-95.22323050181615,29.747754044689554],[-95.22297271764369,29.74765347102593],[-95.2228425411359,29.747664902158586],[-95.22214615184606,29.74773584988962],[-95.22188590793394,29.747755685837003],[-95.22138062865608,29.747898445049255],[-95.22099956452766,29.748003944007852],[-95.22067291708576,29.74808833541714],[-95.22017806203013,29.74818307157069],[-95.21889922086419,29.748424777644107],[-95.2178592681016,29.748662637063198],[-95.21774062648129,29.7486894877378]],"bbox":[-95.22946208114293,29.74733103507275,-95.21774062648129,29.7486894877378]}}
{"type":"Feature","properties":{"OPERATOR":"SHERIDAN PRODUCTION COMPANY, LLC","CONTACT_PHONE_NUMBER":"7135481027","COMMODITY_DESCRIPTION":"NATURAL GAS","SYSTEM_NAME":"STOCKMAN SYSTEM (TRANSFERRED FROM 07825)","SUBSYSTEM_NAME":"JONES #2 : SWGP","DIAMETER":0,"T4PERMIT":"09371","T4PERMIT_MILES":0,"TYPE":"NATURAL GAS","INTERSTATE":"No","SYSTEM_TYPE":"Gas Gathering                 ","STATUS":"In Service","COUNTY_FIPS":"419","OBJECTID":114029,"COUNTY_NAME":"SHELBY","SHAPE.LEN":16.530275254816498,"P5_NUM":"775854"},"geometry":{"type":"LineString","coordinates":[[-94.37263800261006,31.776053575981486],[-94.37263747379711,31.776008115196802]],"bbox":[-94.37263800261006,31.776008115196802,-94.37263747379711,31.776053575981486]}}

I've naively written

cat Pipelines.ndjson | ndjson-filter 'd.DIAMETER == "14"'

but this line gives me no output. What's my error?

Require: Cannot find module

Hi,

Trying to use the d3.polygon module from the command line. Specifically, the idea is to utilize d3.polygonContains to determine whether a polygon contains a specific latitude / longitude pair.

I think the starting point is required the d3.polygon module.

However, I am receiving an error when trying to require the d3.polygon module, where d3Polygon.js is in the current directory:

$ ndjson-split --require=d3Polygon.js 'uscounties.ndjson' > testing
/usr/lib/node_modules/ndjson-cli/node_modules/resolve/lib/sync.js:45
throw err;
^

Error: Cannot find module 'd3Polygon.js' from '/home/A/B/C'
at Function.module.exports [as sync] (/usr/lib/node_modules/ndjson-cli/node_modules/resolve/lib/sync.js:43:15)
at module.exports (/usr/lib/node_modules/ndjson-cli/resolve.js:11:22)
at module.exports (/usr/lib/node_modules/ndjson-cli/requires.js:6:20)
at Command. (/usr/lib/node_modules/ndjson-cli/node_modules/commander/index.js:412:13)
at emitOne (events.js:96:13)
at Command.emit (events.js:188:7)
at Command.parseOptions (/usr/lib/node_modules/ndjson-cli/node_modules/commander/index.js:733:14)
at Command.parse (/usr/lib/node_modules/ndjson-cli/node_modules/commander/index.js:471:21)
at Object. (/usr/lib/node_modules/ndjson-cli/ndjson-split:15:6)
at Module._compile (module.js:577:32)

Allow blank lines?

The ndjson specification suggests that parsers should have an option for allowing (and ignoring) blank lines. It doesn’t explicitly specify the definition of an “empty line” but I’m guessing an all whitespace line is also ignored.

Warn about unmatched records in ndjson-join

I'm using ndjson-join to make 1 to 1 joins between two files and I'd like to check that after the operation each record of the LHS stream is present in the final recordset.

Counting the final records by piping into wc -l and comparing to the length of the recordset might be misleading as we could have 0 matches for some records and 2 or more matches for some other.

For my use case It would be great to have a command line switch enabling warnings about unmatched records in the LHS stream, for example --check-joins or something like that.

hitting limits with large GeoJSON file

I have a large hexagon grid that I thought would be a perfect candidate for topojson.

But I seem to be hitting the limits of ndjson and geo2topo.

There are a few million (3 or 4) hexagons in the GeoJSON file and the file is about 1GB.

Thanks!

Can anything be done ?
Running on Windows 7
node: 10.13
npm: 5.6
ndjson-split 0.3.1
geo2topo 3.0.0

D:>ndjson-split 'd.features' < output_hexgrid_oneline.json > hexgrid.ndjson
readline.js:421
string = this._line_buffer + string;
^
RangeError: Invalid string length
at Interface._normalWrite (readline.js:421:32)
at ReadStream.ondata (readline.js:149:10)
at ReadStream.emit (events.js:182:13)
at addChunk (_stream_readable.js:283:12)
at readableAddChunk (_stream_readable.js:264:11)
at ReadStream.Readable.push (_stream_readable.js:219:10)
at lazyFs.read (internal/fs/streams.js:181:12)
at FSReqWrap.wrapper [as oncomplete] (fs.js:460:17)

D:>geo2topo tracts=output_hexgrid_oneline.json > hex_topo.json
buffer.js:646
return this.utf8Slice(0, this.length);
^
Error: Cannot create a string longer than 0x3fffffe7 characters
at Buffer.toString (buffer.js:646:17)
at JSON.parse ()
at ReadStream. (C:\Users\steve.o'brien\AppData\Roaming\npm\node_modules\topojson\node_modules\topojson-server\bin\geo2topo:107:46)
at ReadStream.emit (events.js:187:15)
at endReadableNT (_stream_readable.js:1094:12)
at process._tickCallback (internal/process/next_tick.js:63:19)

Support for left joins

Right now it seems like ndjson-join always does an inner join, and will delete features that are not shared between the two files. But it's very common for an external dataset to be missing data for some features (counties, states, etc.) that you still want to display in your final map.

Could ndjson-join take a parameter to support some outer joins? For a left join where a feature was missing in the "right" file, I'd imagine that d[1] could be and empty string or simply non-existent.

Apologies if this is already possible!

Support for filename input

Would be great to just pass in a file name as input without having to use input redirection.

Add ndjson-join?

It’d be cool to join two ndjson streams (e.g., a stream from a CSV file and a stream of features from a GeoJSON collection).

Support 'group by' functionality

I have a CSV file which contains place-names and their associated counties (i.e. UK-based). I'd like to be able to group the place-names by county so that I can join them to a GeoJSON dataset for rendering.

I don't see an obvious way to do this with the current CLI tools.

Here's what I have managed to far ...

With csv2json I've created an ndjson file with JSON objects containing name and county properties.

Using reduce, I've managed to convert this into a single JSON object with place-name arrays keyed by counties:

ndjson-reduce '(p[d.county] = p[d.county] || []).push(d.name), p' '{}' \
    < uk_towns_and_counties.ndjson

However, I cannot see a way to convert this back to ndjson. Just for reference, the target format i am looking for is as follows:

{ county: 'Durham', places: ['Woodford', 'Fordham', ...] }
{ county: 'Lancaster', places: ['Bury', 'Burwell', ...] }
...

If this cannot easily be achieved with the current tools, is it worth adding an ndjson-groupby?

ndjson-split output is blank

I am following along this tutorial: https://medium.com/@mbostock/command-line-cartography-part-2-c3a82c5c0f3

ndjson-split 'd.features' < ca-albers.json | output.ndjson

gives a blank file. ca-albers.json is a file with features in it.

SyntaxError: Unexpected end of JSON input

How to reproduced:

I'm trying to execute this example:
https://bl.ocks.org/mbostock/c57be63169694093cca028194068ea3b

but i can not because of this: https://github.com/substack/shp2json/issues/23

So, i modify the code, to not used shp2json, in that way:


# Download.
curl -z build/estados.zip -o build/estados.zip http://mapserver.inegi.org.mx/MGN/mge2010v5_0.zip
# Decompress.
unzip -od build build/estados.zip

#Convert the shape file to GeoJSON:
ogr2ogr -f GeoJSON build/states.json build/Entidades_2010_5.shp

#Now use the ndjson-map in the same way from a cat of the GeoJson file:
cat build/states.json | ndjson-map -r d3=d3-geo 'p = d3.geoIdentity().reflectY(true).fitExtent([[10, 10], [960 - 10, 600 - 10]], d), "d3.geoIdentity().reflectY(true).scale(" + p.scale() + ").translate([" + p.translate() + "])"'

but it resulting in:

stdin:1
{
^
SyntaxError: Unexpected end of JSON input

My data:
lsb_release -a
No LSB modules are available.
Distributor ID: LinuxMint
Description: LMDE 2 Betsy
Release: 2
Codename: betsy

node --version
v6.11.2

ogr2ogr --version
GDAL 1.10.1, released 2013/08/26

ndjson-map --version
0.3.1

filter geojson with bbox

I manage a huge geojson file and only need a small portion of it which is inside a bounding box (the viewport in my case). Is it possible with ndjson-split and ndjson-map to remove the features outside the bbox? Thanks for your input on that.

Another option will be to filter the geojson on a distance (radius) from a point (lat/lng) but don't know if it's possible.

ndjson-join join using a function instead of ndjson-join expression₀ expression₁

I enjoy using your cli. Using it, I'm stuck to make a join on a more complex condition than simple col1 = col2 using ndjson-join expression₀ expression₁ file₀ file₁. I want in particular to make spatial comparison using something similar to the following pseudo code (for illustrating the idea)

ndjson-join 'col1' 'col2' -r turf=@turf/turf 'turf.booleanWithin(d0, d1)' 'd = Object.assign(d0.properties, d1.properties), d' <(ndjson-split 'd.features' < points.geojson) <(ndjson-split 'd.features' < polys.geojson)

d0 and d1 would be alias for 1st and 2nd columns

Any idea own to achieve something similar using your cli? I missed the obvious? Outside of the intended scope?

Thanks for any ideas/feedback

Support --unquoted output for ndjson-map

There doesn't seem to be a way to use ndjson-map to transform JSON values into raw strings that aren't wrapped in double quotes.

Maybe an option like --unquoted:

echo '{ "foo": "blah" }' | ndjson-map --unquoted d.foo
blah

I'm not sure there are any other types where this is an issue: numbers, arrays, booleans, nulls, objects should probably always be output as currently. Any non-standard output could be achieved by transforming to a string, and then using this option.

Support object spread syntax ({...d, }) in ndjson-map

It would be really convenient if you could do this:

ndjson-map '{...d, newProp: 3}'

It currently throws an error:

ndjson-map:expression
({...d, newProp: 3})
^
SyntaxError: Unexpected token (1:3)

The current workaround is:

ndjson-map 'd.newProp = 3, d'

ndjson-map 'Object.assign(d, {newProp: 3})'

which feels less elegant.

Add ndjson-top?

Like ndjson-sort | head but more efficient.

Validate expressions strictly.

Unfortunately vm.Script compiles a program, and it makes more sense for us to limit the result to an expression. We need to

wrap the expression in parens to make sure that ambiguous statements such as {} are correctly interpreted as expressions (and not a block statement) and that semicolons are not allowed
parse the expression using ~~Esprima~~ Acorn (because it supports preserveParens) to restrict programs to individual expression statements

Ignore EPIPE errors on write.

For example:

$ shp2json -n example.shp | ndjson-filter true | head -n2
{"type":"Feature","properties":{"FID":0},"geometry":{"type":"Point","coordinates":[1,2]}}
{"type":"Feature","properties":{"FID":1},"geometry":{"type":"Point","coordinates":[3,4]}}
events.js:160
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at exports._errnoException (util.js:1012:11)
    at WriteWrap.afterWrite (net.js:793:14)

If stdout throws an EPIPE exception we should just swallow it silently and abort.

Related mbostock/shapefile#26.

ndjson-sort returns only the features

hey there

I'm working off of the census House map at this link here (cb_2016_us_cd115_500k.zip).

My command is--> shp2json -n cb_2016_us_cd115_500k.shp | ndjson-sort 'a.properties.GEOID.localeCompare(b.properties.GEOID)' > us-house.json

I'm index sorting the GEOIDs for faster lookup in another application.

The command is only outputting the map features, one per line, with no comma separation between the objects. The bbox etc are all stripped out.

Improve ndjson-map documentation

I've found ndjson-cli to be really useful in the context of dealing with ElasticSearch records, which bulk import in ndjson format. One task I frequently find myself needing to do (and which might also be relevant for handling ndjson in the context of topojson files) is to add a constant to each record.

I've found you can do this with the following syntax:

cat output.ldj | ndjson-map 'd["@imported-from"] = "file-dump", d'

But it took me a bit of experimenting to figure that out. Might it be worth adding a few notes on syntax to the README, or perhaps the cli --help docs? Happy to contribute if so!

Out of memory with ndjson-reduce. A streaming ndjson-merge for the rescue

I was experiencing an out of memory error when trying to process the US counties shapefile from census.gov with the following workflow:

geo2topo $OUTPUT_OBJECT_NAME=<(\
  ndjson-join  <(shp2json -n $INPUT_SHAPEFILE) <(dbf2json -n $INPUT_DBF) \
  | ndjson-map 'Object.assign(d[0], { properties: d[1] })' \
  | ndjson-map -r d3geo=d3-geo 'd.properties.centroid = d3geo.geoCentroid(d), d' \
  | ndjson-reduce 'p.features.push(d), p' '{type: "FeatureCollection", features: []}' \
) \
| toposimplify -f -s $SIMPLIFICATION_THRESHOLD \
| topoquantize 1e5

ndjson-reduce is obviously not streaming but loading everything in memory and that causes the overflow.

I could work around the issue by replacing the ndjson-reduce step with my own ndjson-merge utility as below:

| ./ndjson-merge '{"type": "FeatureCollection", "features": [' ',' ']}'

This is my implementation of ndjson-merge:

#!/usr/bin/env node
const readline = require('readline');

if (process.argv.length < 5) {
  console.log('Usage: ndjson-merge <prefix> <separator> <suffix>')
  process.exit(1);
}

var prefix = process.argv[2];
var separator = process.argv[3];
var suffix = process.argv[4];

process.stdout.write(prefix);
var count = 0;
readline.createInterface({
  input: process.stdin,
  output: null,
}).on('line', function(line) {
  if (count++ > 0) {
    process.stdout.write(separator);
  }
  process.stdout.write(line);
}).on('close', function() {
  process.stdout.write(suffix);
}).on('error', (err) => {
  console.error(err);
  process.exit(1);
});

Would it make sense to add a command like this one to ndjson-cli? If so, I'd try to put together a proper PR.

SyntaxError: Unexpected end of JSON input

Hi
I want to remove all the properties from my geojson file so for that I first try to make a njson file like that:

/root/npm/bin/ndjson-split 'd.features' < test.json > test.ndjson

but I get this error:

stdin:1
{"type":"FeatureCollection","features":[
^
SyntaxError: Unexpected end of JSON input

My geojson is a valid one, it's just here: https://gist.github.com/2803media/b52c9078c6c7932396ce6b42d7f0073e

Thanks

Example of ndjson-reduce to make a FeatureCollection.

To make an array:

shp2json -n example.shp | ndjson-reduce 'p.push(d), p' '[]'

To make a FeatureCollection:

shp2json -n example.shp | ndjson-reduce 'p.features.push(d), p' '({type:"FeatureCollection",features:[]})'

The extra parens are only necessary until #1 is fixed.

Require external libraries?

We could allow you to require external libraries. For example, maybe -r d3 does a require("d3") when initializing the context, and a d3 global is made available in the sandbox?

We’d need a way to specify both the module name and the sandbox variable name, since modules like “d3-dsv” aren’t valid symbol names. Rollup does that with -g d3-dsv:d3, but I feel like -r d3=d3-dsv might be another option.

ndjson-split output artifacts in PowerShell

Inspired by the Command Line Cartography series, I started out implementing the same sequence for the state of Texas (this issue relates to step 8 in that sequence). Following this, I decided to automate this process by writing a PowerShell script that generates a choropleth based on a provided FIPS code.

With this in mind, I ran into a pretty big roadblock when trying to generate the NDJSON version of the census data. I've created a GitHub Repo to demonstrate the issue, but here is the short of it:

When calling ndjson-cat "{json-file}" | ndjson-split "d.slice(1)", I'm getting odd output artifacts in PowerShell:

resulting .ndjson

E:\Jaime\Desktop\Work\Projects\ps-mapping-issue>node  "C:\Users\jpsti\AppData\Local\Yarn\Data\global\node_modules\.bin\\..\ndjson-cli\ndjson-split" d.slice(1) 
["1201","01","001","021000"]
["1293","01","001","021100"]
/* additional data removed */
["1880","01","133","965800"]
["834","01","133","965900"]

E:\Jaime\Desktop\Work\Projects\ps-mapping-issue>

I initially realized there was an issue when I would pipe this into the ndjson-map call directly after ndjson-split, and it was providing me with this error:

stdin:1

^
SyntaxError: Unexpected end of JSON input

If I run this from a command prompt instead, the desired output is provided.

The following block works just fine in PowerShell:

shp2json "maps/$($shape)/$($shape).shp" | `
      geoproject "d3.$($state.projection).fitSize([$width, $height], d)" | `
      ndjson-split "d.features" | `
      ndjson-map "d.id = d.properties.GEOID.slice(2), d" > "data/$($state.name).ndjson"

The only thing I can think of that would be different in this scenario is that this call to ndjson-split is simply accessing a property of the data, as opposed to calling a JavaScript Array function.

Error output may be truncated before the error message

Many of the ndjson tools use error reporting code of the following sort – this example is from ndjson-split:

  try {
    sandbox.d = JSON.parse(line);
  } catch (error) {
    console.error("stdin:" + (i + 1));
    console.error(line);
    console.error("^");
    console.error("SyntaxError: " + error.message);
    process.exit(1);
  }

When processing something like GeoJSON, line may contain a substantial amount of data. In that case the process will exit before the output buffer has been fully printed, with the effect that the error output is truncated before the actual error message has been printed.

The documentation for process.exit warns that this can happen:

It is important to note that calling process.exit() will force the process to exit as quickly as possible even if there are still asynchronous operations pending that have not yet completed fully, including I/O operations to process.stdout and process.stderr.

Obviously not printing the error message makes debugging more difficult.

Add ndjson-split?

Like ndjson-map, but each expression evaluation returns an array of objects, and each object is output separately. Allows you to take a FeatureCollection as input and then split it into a stream of Features, for example, which could be processed and then joined back together with ndjson-reduce.