fs-make-simple / undup Goto Github PK
View Code? Open in Web Editor NEWThis project forked from radii/undup
store less bytes thanks to backreferences
License: GNU General Public License v2.0
This project forked from radii/undup
store less bytes thanks to backreferences
License: GNU General Public License v2.0
undup - compress files by consolidating duplicate data undup tries to compress an input stream by watching for blocks that have previously appeared. It replaces the duplicated data with a backreference. Integrity is ensured by validating a SHA256 across the entire stream at reconstruction time. undup is intended to be pipelined with a general-purpose compressor such as gzip, bzip2, or xz. USAGE ----- tar cf - dir | undup | xz > dir.tar.undup.xz xzcat dir.tar.undup.xz | undup -d | tar xv SAMPLE RESULTS -------------- % for r in 3.0 3.1 3.2 3.3-rc1; do git archive --format=tar --prefix=linux-$r/ v$r | tar -C /tmp/linuxes -xf - done % tar -C /tmp -cf linuxes.tar linuxes % du -shc /tmp/linuxes/* 500M /tmp/linuxes/linux-3.0 504M /tmp/linuxes/linux-3.1 511M /tmp/linuxes/linux-3.2 518M /tmp/linuxes/linux-3.3-rc1 2.0G total File sizes: 1833635840 linuxes.tar 937173504 linuxes.tar.undp 404399664 linuxes.tar.gz 316914845 linuxes.tar.bz2 270460412 linuxes.tar.xz 203023371 linuxes.tar.undp.gz 167099750 linuxes.tar.lrz 159673153 linuxes.tar.undp.bz2 138929420 linuxes.tar.undp.xz format ratio pipelined w/ undup ------ ----- ------------------ undp 1.95 gzip 4.53 9.03 bzip2 5.78 11.48 xz 6.78 13.19 lrzip 10.97 Timings for undup + compressors on Core i7 L 640 @ 2.13GHz (2.9 GHz Turbo) First, we time the undup phase. This consumes a significant amount of memory (for undup 0.2, about 105 MB of RAM to store hashes for the 1.8 GB linuxes.tar) and can be pipelined, but to get the most reproducible timing results, we've run each phase separately. undup linuxes.tar 47.26s user 4.15s system 97% cpu 52.885 total Second, we compare times for various compressors to compress linuxes.tar.undp. gzip 35.81s user 0.72s system 96% cpu 37.817 total bzip2 117.79s user 0.45s system 99% cpu 1:58.66 total xz 606.51s user 1.31s system 99% cpu 10:09.72 total undup + bzip2 achieves an 11.48x compression ratio while consuming only 165 seconds of CPU time; elapsed time for a pipeline is reasonably similar: undup 59.64s user 3.93s system 32% cpu 3:14.76 total bzip2 138.65s user 1.05s system 71% cpu 3:14.73 total This compares favorably to lrzip 0.608, which achieves a 10.97x ratio after consuming 913 seconds of CPU time (lrzip is multithreaded by default): lrzip -v -w 10 linuxes.tar 913.08s user 14.99s system 298% cpu 5:10.78 total
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.