Comments (2)
Hello,
Thanks for filing the issue!
doesn't seem to work with require
Indeed, it won't work with require. It's based on JavaScript modules.
To make it work out of the box in Node.js you can enable ESM. I recommend this if possible.
Otherwise you'd have to resort to transpilation or manual intervention.
I suspect you have performance issues because your buf
is large and you are converting it to string before passing it on. That may incur a significant penalty.
An optimal low level way to do this instead could be like this:
import {JsonLow} from './JsonLow.js'
import {JsonLowToHigh} from './JsonLowToHigh.js'
// this is copied from https://github.com/xtao-org/utf8x2x
// it's a simple low-level library for handling utf-8 byte streams as code points
// which is compatible with JsonHilo; it's not prepared for use with Node.js yet, so
// I copied it in here, so you don't have to figure this out manually
// you can treat it as a black box
const Utf8bs2c = (next) => {
const {codePoint} = next
let partialCodePoint = 0
let bytesRemain = 0
return {
bytes: (bytes) => {
for (const byte of bytes) switch (bytesRemain) {
case 0:
if (byte < 128) codePoint(byte)
else if ((byte >> 5) === 0b110) {
bytesRemain = 1
partialCodePoint = (byte & 0b00011111) << 6
}
else if ((byte >> 4) === 0b1110) {
bytesRemain = 2
partialCodePoint = (byte & 0b00001111) << 12
}
else if ((byte >> 3) === 0b11110) {
bytesRemain = 3
partialCodePoint = (byte & 0b00000111) << 18
}
else {
throw Error(`Unexpected byte ${byte} (0x${byte.toString(16)} = 0b${byte.toString(2)})!`)
}
break
case 1:
bytesRemain = 0
codePoint(partialCodePoint | (byte & 0b00111111))
break
case 2:
bytesRemain = 1
partialCodePoint |= (byte & 0b00111111) << 6
break
case 3:
bytesRemain = 2
partialCodePoint |= (byte & 0b00111111) << 12
break
}
},
end: () => {
if (bytesRemain > 0) {
throw Error(`Unexpected end! Expected at least ${bytesRemain} more bytes. Incomplete code point at the end: ${partialCodePoint}.`)
}
return next.end?.()
}
}
}
// creating a stream which will accept bytes and pass them on to successive parser streams
const stream = Utf8bs2c(JsonLow(JsonLowToHigh({
openArray: () => {
// do something
},
closeArray: () => {
// do something
},
end() {console.log('ended')}
})))
process.stdin.on('data', (buf) => {
stream.bytes(buf)
})
process.stdin.on('end', () => {
stream.end()
})
This sketch script uses the Node.js standard input to receive the utf-8 bytes.
You'd run it like so:
node script.js < bigjsonfile.json
That's all I am able to explain for now. Hope it's better than nothing. Will try to expand later if you're interested.
In the long run I suspect you might be interested by a higher-level library based on JsonHilo I've been sketching out:
https://github.com/xtao-org/jsonstrum
It may be sensible for your use case.
If you send me a sample file which is structured the way your input is and tell me how you want the output to look like I can look into it specifically.
Last thing: you shouldn't call stream.end()
inside the event handlers (closeArray
in your example).
You only call end
to signal to the parser stream that the input is finished. It will then verify that everything is valid and return the result of your handler.
Thanks again for filing the issue and considering the library. It's a motivation to improve.
Cheers
from jsonhilo.
Thank you very much for your good advice! I was actually considering parsing based off of the buffer, but it turns out the buffer.toString is pretty swift (< 10% of the total time) but still once I work out the kinks here in my approach could become more significant.
Also, looks like I need to brush up on my module/script loading knowledge.
I can't send anything at the moment but when I get a chance I'll send some scrubbed data and a sample program.
I used sax about 20 years ago when I was using Java primarily, it was so much faster than the state of the art DOM loaders at the time.
Nice work!
from jsonhilo.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jsonhilo.