Comments (25)
well it used to be this way, for zips atleast. tar has to know length beforehand. i had it collect as it simplifies code for file queuing.
really, im not sure how much a difference it makes cause we need to feed ourself in order. so if stream is slow anyways, it will hold things up just as much as collecting it does. the collector just makes a nice wrapper around the process.
from node-archiver.
It makes no difference if you wanna get the zip in memory, but it should do if you pipe it to another stream. If you collect, you send to the dest stream a large badass block of data, but if you stream it, you're sending little pieces. with the latter approach, the underlying stream should handle better.
It also should make a difference in memory usage. (ie you don't have to hold data in-memory, you can feed it downstream in pieces).
At least that's what makes sense to me. I'll give this a shot sometime.
from node-archiver.
true about memory, i want to write some benchmarks in the near future to test with various data sizes.
with readable-stream, we are basically moving the emitting part out of archiver's realm. its simply pushing data into an internal buffer that readable-stream manages. there is a low watermark that asks for data when needed and a high watermark that tells it when to emit. the internal push method is what we use to fill buffer when we have data. this is a little different than the normal way i think it was envisioned but as we are working with variable input sources its the way i've structured it to work.
from node-archiver.
I don't see how readable-stream's internal buffer is a solution to the first issue.
They way I think of it
archive.pipe(net.createConnection(...))
will be faster if you send the net stream, smaller chunks and sooner, than delivering a large blob later.
from node-archiver.
faster only for zips, would be interesting to see benchmarks. i've mainly worked with fs haven't done much testing with net.
it actually reads 16kb chunks for read
by default, my bad but looks like it'll read the whole thing if pipe doesn't tell it how much it wants. its just the new streams2 design.
also, since archiver is based on readable-stream, advanced users can control the buffer, watermarks, pipe sizes, etc. i do need to further test some parts of the concept though.
https://github.com/isaacs/readable-stream#new-readableoptions
from node-archiver.
also in regards to speed and piping. its really controlled by user. cause there could be huge gaps between finishing reading a stream and when user adds next or calls finalize. thats one of the more complex parts because most streams rely on one source. we are usually waiting on multiple and for the user to define them.
next time that i'm in the code, I'll see what I can do for zip and pushing data earlier, just think its going to add complexity that I was trying to avoid by taking emitting out of each archiver sub-class.
from node-archiver.
thanks for the feedback so far, much appreciated. i do think this new structure is for the best and the fact that its letting us be v0.10 ready while supporting v0.6, v0.8, v0.9 is a plus. i just got so tired of managing the emit queue.
from node-archiver.
ok, zip should now push at first chance.
from node-archiver.
Was this reverted recently in 0b6d0fe due to #16?
I'm not saying it shouldn't be fixed. I'm saying it should be fixed properly. Reverting this feature so it "just works" is no solution.
To illustrate the difference it makes:
$ git checkout 0b6d0f
$ node benchmark/simple/pack-zip.js ~/Videos/joyent\ api.mp4
min: 71.66 mbits/s
avg: 115.61 mbits/s
max: 157.12 mbits/s
median: 133.36 mbits/s
$ git checkout HEAD^
$ node benchmark/simple/pack-zip.js ~/Videos/joyent\ api.mp4
min: 199.04 mbits/s
avg: 230.04 mbits/s
max: 246.55 mbits/s
median: 230.81 mbits/s
Where joyent api.mp4
is a 200mb file. Of course memory usage rises to 200+mb, compared to constant 50 with the latter.
Did you figure out what was the issue with #16? Let's fix it properly.
from node-archiver.
nope, its most likely the compression level. the issue was that the way the zlib helper worked with callback was pushing data out of order by using the old we listen on zlib data which should always be in order.
from node-archiver.
@danmilon i've added the ability to specify compression level by 2nd arg.
Compression levels.
#define Z_NO_COMPRESSION 0
#define Z_BEST_SPEED 1
#define Z_BEST_COMPRESSION 9
#define Z_DEFAULT_COMPRESSION (-1)
from node-archiver.
whats interesting is that i default to best speed (1) so maybe its just the way it handles the data different from the helper functions (which we used before) although they should just be making a new instance internally (see below) so not sure why such a drop.
Convenience Methods
All of these take a string or buffer as the first argument, and call the supplied callback with callback(error, result). The compression/decompression engine is created using the default settings in all convenience methods.
i was happy with the helper methods, although little less control, its just that depending on the data received from stream, the callbacks would get fired out of order.
it could also be that we are writing data to zlib which has its own buffer (chunkSize (default: 16*1024)) before it triggers data emit. so that extra step could very well be slowing things down as i'm not sure if node tweaks the buffering when using callbacks.
from node-archiver.
putting the fact that it's slower aside, my point is that it drains the whole stream in memory before processing it. The change is in 0b6d0fe, as shown above. This should be reopened IMO.
from node-archiver.
@danmilon if you have a solution that works around out of order callbacks, id be more than open to investigating the newer way.
from node-archiver.
trying to find the cause of the regression. I'll keep you posted.
from node-archiver.
i think it had to do with sometimes data would take longer to compress so its callback wouldnt get fired in order.
EDIT: i believe the only way around it would be to keep a buffer from stream and have zlib callback do the next in order vs on data events.
from node-archiver.
@danmilon would you try out ea88dac and let me know if it makes much difference.
actually make that 9f0ba92
from node-archiver.
Same. Here's an idea. Why don't we pipe source
to deflate
(auto back-pressure) and then deflate.on('data' ...)
? Seems like we're reinventing .pipe()
here.
from node-archiver.
hum are you sure the performance hit is actually coming from the deflate change? cause what i did in that commit def should have resolved pref issues. the pipe may work hum.
EDIT: ok so pipe does work from initial test but we need the data from stream to compute checksum
from node-archiver.
i def saw a speed improvement though with pipe on 100mb file but as soon as i listened for raw data to do checksum it dropped in half..
min: 0.00 mbits/s
avg: 5.76 mbits/s
max: 10.74 mbits/s
median: 9.93 mbits/s
vs
min: 0.00 mbits/s
avg: 3.58 mbits/s
max: 5.53 mbits/s
median: 4.54 mbits/s
EDIT: ok the performance dip is actually CRC32.
EDIT2: the test file i'm using is pure text so they are way lower due to the shear amount of compressing going on.
from node-archiver.
im wondering to do this in a performant way if we are going to have to make a zlib based class that can do checksum building while being piped data.
EDIT: this idea works but checksum kills the speed.
from node-archiver.
can you give eab88eb a go and let me know the results you get. if you would also try something other than video as that doesn't do much compression. so it skews things a bit.
from node-archiver.
See comments in f0bf860
from node-archiver.
@danmilon can you give master a go and let me know if all our changes have helped at all.
from node-archiver.
all good. Memory steady at 10mb, magnificent!
from node-archiver.
Related Issues (20)
- Dependency Dashboard
- ESM HOT 1
- .directory() generate a empty folder, compress completed will remove the empty folder,could I retain the empty folder?
- Class extends value undefined is not a constructor or null HOT 1
- Is there a way to know how many files were ultimately compressed? HOT 2
- Question: Were there any API changes from 5.3 to 6.x? HOT 1
- Recommendation for an unzip library? HOT 1
- Overwrite existing files
- Missing Release of Resource after Effective Lifetime [High Severity] HOT 1
- zlib level 0 stops the progress of ongoing read streams
- Improve docs around append callback? HOT 1
- How to convert csv to 7z HOT 1
- Heap out of memory with archiver.append() HOT 1
- Append CSV file line by line
- `append()` fails after upgrading from v6 to v7 HOT 4
- Cannot add file named `\` HOT 2
- Archiver Version 7.0 might be breaking due to ESM HOT 1
- [Question] Can this library work with Web Streams ?
- Create a single zip file incrementally? HOT 1
- Produce the same zip file all time HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from node-archiver.