Giter Site home page Giter Site logo

Comments (2)

djedr avatar djedr commented on May 27, 2024

Hello,
Thanks for filing the issue!

doesn't seem to work with require

Indeed, it won't work with require. It's based on JavaScript modules.

To make it work out of the box in Node.js you can enable ESM. I recommend this if possible.

Otherwise you'd have to resort to transpilation or manual intervention.

I suspect you have performance issues because your buf is large and you are converting it to string before passing it on. That may incur a significant penalty.

An optimal low level way to do this instead could be like this:

import {JsonLow} from './JsonLow.js'
import {JsonLowToHigh} from './JsonLowToHigh.js'

// this is copied from https://github.com/xtao-org/utf8x2x
// it's a simple low-level library for handling utf-8 byte streams as code points
// which is compatible with JsonHilo; it's not prepared for use with Node.js yet, so
// I copied it in here, so you don't have to figure this out manually 
// you can treat it as a black box
const Utf8bs2c = (next) => {
  const {codePoint} = next

  let partialCodePoint = 0
  let bytesRemain = 0

  return {
    bytes: (bytes) => {
      for (const byte of bytes) switch (bytesRemain) {
      case 0:
        if (byte < 128) codePoint(byte)
        else if ((byte >> 5) === 0b110) {
          bytesRemain = 1
          partialCodePoint = (byte & 0b00011111) << 6
        }
        else if ((byte >> 4) === 0b1110) {
          bytesRemain = 2
          partialCodePoint = (byte & 0b00001111) << 12
        }
        else if ((byte >> 3) === 0b11110) {
          bytesRemain = 3
          partialCodePoint = (byte & 0b00000111) << 18
        }
        else {
          throw Error(`Unexpected byte ${byte} (0x${byte.toString(16)} = 0b${byte.toString(2)})!`)
        }
      break
      case 1:
        bytesRemain = 0
        codePoint(partialCodePoint | (byte & 0b00111111))
      break
      case 2:
        bytesRemain = 1
        partialCodePoint |= (byte & 0b00111111) << 6
      break
      case 3:
        bytesRemain = 2
        partialCodePoint |= (byte & 0b00111111) << 12
      break
      }
    },
    end: () => {
      if (bytesRemain > 0) {
        throw Error(`Unexpected end! Expected at least ${bytesRemain} more bytes. Incomplete code point at the end: ${partialCodePoint}.`)
      }
      return next.end?.()
    }
  }
}

// creating a stream which will accept bytes and pass them on to successive parser streams
const stream = Utf8bs2c(JsonLow(JsonLowToHigh({
  openArray: () => {
    // do something
  },
  closeArray: () => {
    // do something
  },
  end() {console.log('ended')}
})))

process.stdin.on('data', (buf) => { 
  stream.bytes(buf)
})
process.stdin.on('end', () => {
  stream.end()
})

This sketch script uses the Node.js standard input to receive the utf-8 bytes.

You'd run it like so:

node script.js < bigjsonfile.json

That's all I am able to explain for now. Hope it's better than nothing. Will try to expand later if you're interested.

In the long run I suspect you might be interested by a higher-level library based on JsonHilo I've been sketching out:

https://github.com/xtao-org/jsonstrum

It may be sensible for your use case.

If you send me a sample file which is structured the way your input is and tell me how you want the output to look like I can look into it specifically.

Last thing: you shouldn't call stream.end() inside the event handlers (closeArray in your example).

You only call end to signal to the parser stream that the input is finished. It will then verify that everything is valid and return the result of your handler.

Thanks again for filing the issue and considering the library. It's a motivation to improve.

Cheers

from jsonhilo.

iklotzko avatar iklotzko commented on May 27, 2024

@djedr,

Thank you very much for your good advice! I was actually considering parsing based off of the buffer, but it turns out the buffer.toString is pretty swift (< 10% of the total time) but still once I work out the kinks here in my approach could become more significant.

Also, looks like I need to brush up on my module/script loading knowledge.

I can't send anything at the moment but when I get a chance I'll send some scrubbed data and a sample program.

I used sax about 20 years ago when I was using Java primarily, it was so much faster than the state of the art DOM loaders at the time.

Nice work!

from jsonhilo.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.