juanjodiaz / streamparser-json Goto Github PK

View Code? Open in Web Editor NEW

117.0 117.0 10.0 562 KB

Streaming JSON parser in Javascript for Node.js and the browser

License: MIT License

JavaScript 1.71% TypeScript 98.29%

streamparser-json's People

Contributors

Stargazers

Watchers

Forkers

knownasilya gamoutatsumi rehagoal slevy85 drawmindmap callumlocke miunau lunaris

streamparser-json's Issues

Stream incomplete values

Hey @juanjoDiaz

Is it possible to stream incomplete values? Currenly both OnToken and onValue will provide full tokens or attributes. However, I want to be able to easily access given string value before it is fully complete. It would need an artificial closing quote dynamically generated until the end closing quote finally arrives.

Any idea how to do it?

Add support for Node streams

Add possibility to filter which objects are emitted using JSON-path

Tokenizer token offset is incorrect

The Tokenizer outputs the wrong offset for tokens after a string token with special characters. The difference in the expected offset is consistent with the number of certain special characters within the input string.

Some examples

This is the expected behaviour

test('testing string 1', async () => {  
  const json = JSON.stringify({"abcd": "abcd"});  
  console.log('raw string length: ', json.length)  
  const tokenizer = new streamParser.Tokenizer()  
  tokenizer.onToken = (token) => console.log(token);  
  tokenizer.write(json)  
  console.log(json[7])  
})  
  
// raw string length:  15
// { token: 0, value: '{', offset: 0 }  
// { token: 9, value: 'abcd', offset: 1 }  
// { token: 4, value: ':', offset: 7 }  // Using this token as the reference
// { token: 9, value: 'abcd', offset: 8 }  
// { token: 1, value: '}', offset: 14 }  
// :  // We print the expected character

Using a single \t special character

test('testing string 2', async () => {  
  const json = JSON.stringify({"ab\t": "abcd"});  
  console.log('raw string length: ', json.length)  
  const tokenizer = new streamParser.Tokenizer()  
  tokenizer.onToken = (token) => console.log(token);  
  tokenizer.write(json)  
  console.log(json[6])  
})  
  
// raw string length:  15 // Same length as above
// { token: 0, value: '{', offset: 0 }  
// { token: 9, value: 'ab\t', offset: 1 }  
// { token: 4, value: ':', offset: 6 } // Off by 1 now
// { token: 9, value: 'abcd', offset: 7 }  
// { token: 1, value: '}', offset: 13 }  
// " // This isn't the character we expected

The difference in expected output is consistent with the number of special characters

test('testing string 3', async () => {  
  const json = JSON.stringify({"\t\n": "abcd"});  
  console.log('raw string length: ', json.length)  
  const tokenizer = new streamParser.Tokenizer()  
  tokenizer.onToken = (token) => console.log(token);  
  tokenizer.write(json)  
  console.log(json[5])  
})  
  
// raw string length:  15  // Same length
// { token: 0, value: '{', offset: 0 }  
// { token: 9, value: '\t\n', offset: 1 }  
// { token: 4, value: ':', offset: 5 }  // Off by 2 now
// { token: 9, value: 'abcd', offset: 6 }  
// { token: 1, value: '}', offset: 12 }  
// n

My expectation here should be that the offset is relative to the input. I understand that this is a niche use case but is this something you can fix?

OptimisticJSONParser not found

However it's documented in the doc with this commit, OptimisticJSONParser is nowhere to be found.

Any plans to materialize it?

Cannot import package, incorrect exports

I tried to use the package in a vite project and I get the following error:

[vite] Internal server error: Failed to resolve entry for package "@streamparser/json". The package may have incorrect main/module/exports specified in its package.json.

It seems like the "module" key in package.json points to a not existing file ./dist/mjs/index.js.

OnValue not working if OnToken is set

Great product btw

Had a problem with OnValue not working if I set OnToken. I dont think that was intended?

Cannot find module '@streamparser/json/index' or its corresponding type declarations

After upgrading to the latest version of all packages, I'm getting this type error:

../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@types/json2csv__plainjs/src/StreamParser.d.ts:1:58 - error TS2307: Cannot find module '@streamparser/json/index' or its corresponding type declarations.

1 import { Tokenizer, TokenizerOptions, TokenParser } from '@streamparser/json/index';
                                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~


Found 1 error in ../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@types/json2csv__plainjs/src/StreamParser.d.ts:1

Versions:

"@json2csv/plainjs": "6.1.3",
"@types/json2csv__plainjs": "6.1.0",
"@streamparser/json": "0.0.14"

Replace nested objects and arrays

@juanjoDiaz ,

I know this is probably out of scope of this library, but do you think it is possible to adjust the code in order to omit nested objects and arrays.

I have a large json object that looks like this:

{
   "cards": [
     {
       "id": 1,
       "name": "Some card name"
      },
     {},
     {}
    ],
   "meta": {
      "updated":"2022-12-31"
    }
}

The cards array is very large, so that it won't fit into memory on its own. (Even when parsing in chunks)
I'd like to get all objects as flat objects that replace nested arrays with "[...]" and nested objects with "{...}".
The result would look like this:

{
  "cards":"[...]",
  "meta":"{...}"
},
{
 "id": 1,
 "name": "Some card name"
},
{},
{},
{
  "updated": "2022-12-31"
}

I'm aware that this is probably out of scope of this repo, but I would like to apply the changes in my fork.
Can you point me into a direction where to look at or where those changes would fit best?

Best regards and thanks a lot for the awesome parser :-)

Potential performance improvement

streamparser-json/packages/plainjs/src/tokenizer.ts

Line 158 in cda44fb

for (let i = 0; i < buffer.length; i += 1) {

const l = buffer.length
for (let i = 0; i < l; i += 1) {

Error importing `JSONParser`

In our code base when we try to import JSONParser with import { JSONParser } from '@streamparser/json'; we get the following error TS2307: Cannot find module '@streamparser/json' or its corresponding type declarations..

Currently as a work around we are importing it with const jsonStreamParsers = require('@streamparser/json'); which works fine.

My question is, do you have any insights to why the usual import is failing? Or is this likely a problem with the configuration of our project in some way?

Thanks.

Related MatrixAI/Polykey#516

TypedArray subarray method is slow in V8

Which makes using buffers pretty slow.

Follow up https://bugs.chromium.org/p/v8/issues/detail?id=7161 and nodejs/node#17431 for possible solutions or workarounds.

Add support for WHATWG streams

This is awesome 😎 👏

Thanks so much for making this library!
I was looking for a web compatible version of https://github.com/node-geojson/geojson-stream, and luckily stream-reading GeoJSON is possible using the path option.

P.S.: I made a small demo for streaming GeoJSON on Observable:
https://observablehq.com/@chrispahm/streaming-geojson

error TS7029: Fallthrough case in switch

I just tried migrating from the old json2csv package to the new one, and now I'm getting type errors from within node_modules:

../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:374:11 - error TS7029: Fallthrough case in switch.

374           case TokenizerStates.STRING_UNICODE_DIGIT_4:
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:500:11 - error TS7029: Fallthrough case in switch.

500           case TokenizerStates.NUMBER_AFTER_E:
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:697:18 - error TS6133: 'parsedToken' is declared but its value is never read.

697   public onToken(parsedToken: ParsedTokenInfo): void {
                     ~~~~~~~~~~~

../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenparser.ts:324:18 - error TS6133: 'parsedElementInfo' is declared but its value is never read.

324   public onValue(parsedElementInfo: ParsedElementInfo): void {
                     ~~~~~~~~~~~~~~~~~


Found 4 errors in 2 files.

Errors  Files
     3  ../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:374
     1  ../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenparser.ts:324

Message parser should be able to support arbitrary whitespace such as '\n', '\t', '\r', and ' ' within and between messages

How do I configure the stream parser to be able to discard whitespace between JSON messages?

I am facing an issue during implementation of a RPC system, which requires usage of JSONParser to parse binary to JSON for transmitting thru RPC. The issue I am facing is that input streams could be separated by a variety of different whitespaces, and the current implementation of seperator in the lib only supports a single separator.

Assuming as such, we have to keep out input streams in the following manner :
{...message}{...message}.

However, to improve readability, we wish to be able to add whitespaces in between messages, in a similar manner as demonstrated below.

// Normal separation
{...message}{...message}{...message}

// Spaces
{...message} {...message}    {...message}

// New lines
{...message}
{...message}
{...message}

// Any combination
{...message}                                  {...message}
                       {...message}
{...message}

jsonparser.onValue = (value, key, parent, stack, jsonPath) => {
   console.log(jsonPath);
   //e.g. ['someProp', 0, 'someProp',...]
};

What would be the right place to look at?

Thanks for this awesome parser!

Dist files are using `export * ...` fails to compile with webpack 4

Error from: @streamparser/json/dist/mjs/index.js 4:9