Giter Site home page Giter Site logo

Comments (16)

inikulin avatar inikulin commented on September 9, 2024 1

Yes, parse5 loads HTML entities data in memory and this is a requirement for the HTML5 parsing algorithm. BTW it's 2015, how 30mb RAM usage can be called "excessive" (especially for the sandboxed VM app)? Are you running it on Apple Watch or something? =)

from parse5.

Sebmaster avatar Sebmaster commented on September 9, 2024

@inikulin Do you have the script for the generation of the trie still lying around/could you put it in the repo? I was thinking of playing with the structure to try and reduce the memory requirement somewhat.

from parse5.

inikulin avatar inikulin commented on September 9, 2024

@Sebmaster Seems like generator died with my previous desktop. I'm ultra-busy right now, give me a couple of days, I'll rollout the new one. The only idea regarding optimization that I have is to keep trie in the JSON form, so v8 will not generate AST representation for it. But I'm not sure we will win a lot we this approach. Morevover I have concerns about browserified version, as far as I know there is no easy way to mock FS with browserify.

from parse5.

Sebmaster avatar Sebmaster commented on September 9, 2024

I was thinking of trying to either move it to an array-only structure or try a trie compacting algorithm (although that'd slow down lookups a tiny bit probably).

from parse5.

inikulin avatar inikulin commented on September 9, 2024

Yeah and this is not an option. CPU performance is the priority.

from parse5.

 avatar commented on September 9, 2024

What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those?

from parse5.

inikulin avatar inikulin commented on September 9, 2024

What if CPU load increases by less than 1% but RAM usage decreases by over 80% Would you still not implement a theoretical change that would change for proportions like those?

No. Let's take a look at the problem using big O notation. Trie consturcted once at startup and never modified, so in our case it's space complexity is constant - O(1). Currently the time complexity of the trie lookup is O(m), where m is the average length of the word. If we will have time increased by 1% we will get O(1,01m). Meanwhile, if we will descrease trie size by 80% we will still get constant space complexity - O(1). Speaking clearly we will gain nothing, but we will loose in speed. Constant 22.5mb consumption (7.5mb is consumed by runtime itself for me) doesn't seems like a big deal for me nowadays.

from parse5.

domenic avatar domenic commented on September 9, 2024

O(1.01m) = O(m)

from parse5.

inikulin avatar inikulin commented on September 9, 2024

@domenic Argh, yes, my bad, up to constants. Never do the math at the morning =0.

from parse5.

inikulin avatar inikulin commented on September 9, 2024

Taking into account my wrong math it worth making a try.

from parse5.

inikulin avatar inikulin commented on September 9, 2024

@Sebmaster Here is the trie generator https://github.com/inikulin/parse5/tree/master/tools

parse5 bootstrap consumed 15Mb for me, BTW

from parse5.

inikulin avatar inikulin commented on September 9, 2024

@Sebmaster any progress on this?
@domenic Do you have any complains about excessive RAM usage in jsdom. I mean, does even worth discussion and efforts?

from parse5.

domenic avatar domenic commented on September 9, 2024

I have no complaints, but one of our users does, so if that user (or someone else) wants to do a PR that helps, and you're willing to review it, it seems like it would be a nice thing to do.

from parse5.

inikulin avatar inikulin commented on September 9, 2024

I'm trying to figure out that we want to accomplish here. I mean, which memory usage can be considered non-excessive and if current memory consumption causes any real-life problems (actually it does, but it happens in quite exotic envirionments - #54 ).

from parse5.

 avatar commented on September 9, 2024

As small as possible without compromising the main objective, that is performance.

from parse5.

inikulin avatar inikulin commented on September 9, 2024

Well, the most significant memory footprint of parse5 comes from the name entities trie. The only optimization that comes to my mind is to replace it with the Patricia tree. But I'm quite sceptical about it: it might work well with long suffixes, but it's not the case for the named entites. I don't think we can win more than 10-20%. I'm wondering if it's even worth a try.

from parse5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.