juanjodiaz / streamparser-json Goto Github PK
View Code? Open in Web Editor NEWStreaming JSON parser in Javascript for Node.js and the browser
License: MIT License
Streaming JSON parser in Javascript for Node.js and the browser
License: MIT License
Hey @juanjoDiaz
Is it possible to stream incomplete values? Currenly both OnToken and onValue will provide full tokens or attributes. However, I want to be able to easily access given string value before it is fully complete. It would need an artificial closing quote dynamically generated until the end closing quote finally arrives.
Any idea how to do it?
The Tokenizer outputs the wrong offset for tokens after a string token with special characters. The difference in the expected offset is consistent with the number of certain special characters within the input string.
This is the expected behaviour
test('testing string 1', async () => {
const json = JSON.stringify({"abcd": "abcd"});
console.log('raw string length: ', json.length)
const tokenizer = new streamParser.Tokenizer()
tokenizer.onToken = (token) => console.log(token);
tokenizer.write(json)
console.log(json[7])
})
// raw string length: 15
// { token: 0, value: '{', offset: 0 }
// { token: 9, value: 'abcd', offset: 1 }
// { token: 4, value: ':', offset: 7 } // Using this token as the reference
// { token: 9, value: 'abcd', offset: 8 }
// { token: 1, value: '}', offset: 14 }
// : // We print the expected character
Using a single \t
special character
test('testing string 2', async () => {
const json = JSON.stringify({"ab\t": "abcd"});
console.log('raw string length: ', json.length)
const tokenizer = new streamParser.Tokenizer()
tokenizer.onToken = (token) => console.log(token);
tokenizer.write(json)
console.log(json[6])
})
// raw string length: 15 // Same length as above
// { token: 0, value: '{', offset: 0 }
// { token: 9, value: 'ab\t', offset: 1 }
// { token: 4, value: ':', offset: 6 } // Off by 1 now
// { token: 9, value: 'abcd', offset: 7 }
// { token: 1, value: '}', offset: 13 }
// " // This isn't the character we expected
The difference in expected output is consistent with the number of special characters
test('testing string 3', async () => {
const json = JSON.stringify({"\t\n": "abcd"});
console.log('raw string length: ', json.length)
const tokenizer = new streamParser.Tokenizer()
tokenizer.onToken = (token) => console.log(token);
tokenizer.write(json)
console.log(json[5])
})
// raw string length: 15 // Same length
// { token: 0, value: '{', offset: 0 }
// { token: 9, value: '\t\n', offset: 1 }
// { token: 4, value: ':', offset: 5 } // Off by 2 now
// { token: 9, value: 'abcd', offset: 6 }
// { token: 1, value: '}', offset: 12 }
// n
My expectation here should be that the offset is relative to the input. I understand that this is a niche use case but is this something you can fix?
However it's documented in the doc with this commit, OptimisticJSONParser
is nowhere to be found.
Any plans to materialize it?
I tried to use the package in a vite project and I get the following error:
[vite] Internal server error: Failed to resolve entry for package "@streamparser/json". The package may have incorrect main/module/exports specified in its package.json.
It seems like the "module" key in package.json
points to a not existing file ./dist/mjs/index.js
.
Great product btw
Had a problem with OnValue not working if I set OnToken. I dont think that was intended?
After upgrading to the latest version of all packages, I'm getting this type error:
../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@types/json2csv__plainjs/src/StreamParser.d.ts:1:58 - error TS2307: Cannot find module '@streamparser/json/index' or its corresponding type declarations.
1 import { Tokenizer, TokenizerOptions, TokenParser } from '@streamparser/json/index';
~~~~~~~~~~~~~~~~~~~~~~~~~~
Found 1 error in ../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@types/json2csv__plainjs/src/StreamParser.d.ts:1
Versions:
"@json2csv/plainjs": "6.1.3",
"@types/json2csv__plainjs": "6.1.0",
"@streamparser/json": "0.0.14"
I know this is probably out of scope of this library, but do you think it is possible to adjust the code in order to omit nested objects and arrays.
I have a large json object that looks like this:
{
"cards": [
{
"id": 1,
"name": "Some card name"
},
{},
{}
],
"meta": {
"updated":"2022-12-31"
}
}
The cards array is very large, so that it won't fit into memory on its own. (Even when parsing in chunks)
I'd like to get all objects as flat objects that replace nested arrays with "[...]" and nested objects with "{...}".
The result would look like this:
{
"cards":"[...]",
"meta":"{...}"
},
{
"id": 1,
"name": "Some card name"
},
{},
{},
{
"updated": "2022-12-31"
}
I'm aware that this is probably out of scope of this repo, but I would like to apply the changes in my fork.
Can you point me into a direction where to look at or where those changes would fit best?
Best regards and thanks a lot for the awesome parser :-)
const l = buffer.length
for (let i = 0; i < l; i += 1) {
In our code base when we try to import JSONParser
with import { JSONParser } from '@streamparser/json';
we get the following error TS2307: Cannot find module '@streamparser/json' or its corresponding type declarations.
.
Currently as a work around we are importing it with const jsonStreamParsers = require('@streamparser/json');
which works fine.
My question is, do you have any insights to why the usual import is failing? Or is this likely a problem with the configuration of our project in some way?
Thanks.
Which makes using buffers pretty slow.
Follow up https://bugs.chromium.org/p/v8/issues/detail?id=7161 and nodejs/node#17431 for possible solutions or workarounds.
Thanks so much for making this library!
I was looking for a web compatible version of https://github.com/node-geojson/geojson-stream, and luckily stream-reading GeoJSON is possible using the path option.
P.S.: I made a small demo for streaming GeoJSON on Observable:
https://observablehq.com/@chrispahm/streaming-geojson
I just tried migrating from the old json2csv
package to the new one, and now I'm getting type errors from within node_modules:
../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:374:11 - error TS7029: Fallthrough case in switch.
374 case TokenizerStates.STRING_UNICODE_DIGIT_4:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:500:11 - error TS7029: Fallthrough case in switch.
500 case TokenizerStates.NUMBER_AFTER_E:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:697:18 - error TS6133: 'parsedToken' is declared but its value is never read.
697 public onToken(parsedToken: ParsedTokenInfo): void {
~~~~~~~~~~~
../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenparser.ts:324:18 - error TS6133: 'parsedElementInfo' is declared but its value is never read.
324 public onValue(parsedElementInfo: ParsedElementInfo): void {
~~~~~~~~~~~~~~~~~
Found 4 errors in 2 files.
Errors Files
3 ../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenizer.ts:374
1 ../../common/temp/node_modules/.pnpm/@[email protected]/node_modules/@streamparser/json/src/tokenparser.ts:324
How do I configure the stream parser to be able to discard whitespace between JSON messages?
I am facing an issue during implementation of a RPC system, which requires usage of JSONParser to parse binary to JSON for transmitting thru RPC. The issue I am facing is that input streams could be separated by a variety of different whitespaces, and the current implementation of seperator in the lib only supports a single separator.
Assuming as such, we have to keep out input streams in the following manner :
{...message}{...message}
.
However, to improve readability, we wish to be able to add whitespaces in between messages, in a similar manner as demonstrated below.
// Normal separation
{...message}{...message}{...message}
// Spaces
{...message} {...message} {...message}
// New lines
{...message}
{...message}
{...message}
// Any combination
{...message} {...message}
{...message}
{...message}
I ran into an issue where the tokenizer is choking on files with a BOM. This throws with Error: Unexpected "รฏ" at position "0" in state START
.
I was able to patch tokenizer with a quick and dirty addition of TokenizerStates.BOM
. Unfortunately I don't have time to submit a formal PR but wanted to raise the issue for tracking.
The project with this package included doesn't compile if 'noFallthroughCasesInSwitch' flag is set to 'true' in tsconfig.json.
Hey @juanjoDiaz ,
could the Tokenizer be extended to keep the jsonPath of each emitted object?
something like this:
jsonparser.onValue = (value, key, parent, stack, jsonPath) => {
console.log(jsonPath);
//e.g. ['someProp', 0, 'someProp',...]
};
What would be the right place to look at?
Thanks for this awesome parser!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.