Giter Site home page Giter Site logo

librsync's Introduction

README

http://librsync.sourcefrog.net/

\copyright

Copyright 1999-2016 Martin Pool and other contributors.

librsync is distributed under the GNU LGPL v2.1 (see COPYING), which basically means that you can dynamically link librsync into non-GPL programs, but you must redistribute the librsync source, with any modifications you have made.

librsync contains the BLAKE2 hash algorithm, written by Samuel Neves and released under the CC0 public domain dedication.

Introduction

librsync is a library for calculating and applying network deltas, with an interface designed to ease integration into diverse network applications.

librsync encapsulates the core algorithms of the rsync protocol, which help with efficient calculation of the differences between two files. The rsync algorithm is different from most differencing algorithms because it does not require the presence of the two files to calculate the delta. Instead, it requires a set of checksums of each block of one file, which together form a signature for that file. Blocks at any position in the other file which have the same checksum are likely to be identical, and whatever remains is the difference.

This algorithm transfers the differences between two files without needing both files on the same system.

librsync is for building other programs that transfer files as efficiently as rsync. You can use librsync in a program you write to do backups, distribute binary patches to programs, or sync directories to a server or between peers.

This tree also produces the \ref page_rdiff that exposes the key operations of librsync: generating file signatures, generating the delta from a signature to a new file, and applying the delta to regenerate the new file given the old file.

librsync was originally written for the rproxy experiment in delta-compression for HTTP. librsync is used by: Dropbox, rdiff-backup, Duplicity, and others. (If you would like to be listed here, let me know.)

What librsync is not

  1. librsync does not implement the rsync wire protocol. If you want to talk to an rsync server to transfer files you'll need to shell out to rsync. You cannot make use of librsync to talk to an rsync server.

  2. librsync does not deal with file metadata or structure, such as filenames, permissions, or directories. To this library, a file is just a stream of bytes. Higher-level tools can deal with such issues in a way appropriate to their users.

  3. librsync also does not include any network functions for talking to SSH or any other server. To access a remote filesystem, you need to provide your own code or make use of some other virtual filesystem layer.

More information

  • \ref page_downloads
  • \ref page_versioning
  • \ref page_install
  • \ref page_rdiff
  • \ref page_librsync
  • \ref page_formats
  • \ref page_support
  • CONTRIBUTING
  • NEWS
  • TODO

librsync's People

Contributors

aaronm04 avatar adsun701 avatar andreas-schwab avatar ardovm avatar avdn avatar bje- avatar bkuhls avatar dbaarda avatar deajan avatar efidler avatar ffontaine avatar fornwall avatar ljusten avatar mbrt avatar meoo avatar paulharris avatar rizsotto avatar robert-scheck avatar salamek avatar santazhang avatar sourcefrog avatar telles-simbiose avatar texierp avatar therealmik avatar timothygu avatar victordenisov avatar wayned avatar wrar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

librsync's Issues

Please add gzip/bzip compression for delta files in rdiff

Hello!

I tried to use flags --gzip/--bzip for rdiff but got error:

rdiff: ERROR: (rdiff_options) sorry, compression is not really implemented yet

For my data (VPS disks) compression provides really excellent compression for delta files:

source size: 4.6 Gb delta size: 2093.0 MB compressed size: 223.0
source size: 14.8 Gb delta size: 2205.0 MB compresses size: 998.7 MB

Thank you!

rdiff inefficient for zero sized signatures.

Hello!

I tried to use rdiff with zero sized signatures (with correct header but without any literals/copy items) and rdiff delta for this signatures produce bunch of cpu consumption and need so much time.

popt.h issue

When running make install, we get the following (we are running RHEL 7):

libtool: link: ranlib .libs/librsync.a
libtool: link: ( cd ".libs" && rm -f "librsync.la" && ln -s "../librsync.la" "librsync.la" )
gcc -DHAVE_CONFIG_H -I. -Wall -Wshadow -Wundef -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wpointer-arith -Wcast-qual -Wcast-align -g -O2 -MT rdiff.o -MD -MP -MF .deps/rdiff.Tpo -c -o rdiff.o rdiff.c
rdiff.c:54:18: fatal error: popt.h: No such file or directory
#include <popt.h>
^
compilation terminated.
make[1]: *** [rdiff.o] Error 1
make[1]: Leaving directory `/apps/librsync'
make: *** [install-recursive] Error 1

Any help would be much appreciated.

Thanks

fails to build on clang with `undefined reference to 'rs_appendflush'`

On Debian Jessy with

Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)

libtool: link: clang -Wall -Wshadow -Wundef -Wwrite-strings -Wmissing-prototypes -Wmissing-declarations -Wstrict-prototypes -Wpointer-arith -Wcast-qual -Wcast-align -g -O2 -o rdiff rdiff.o isprefix.o  ./.libs/librsync.a /usr/lib/x86_64-linux-gnu/libpopt.so -lbz2 -lz
./.libs/librsync.a(delta.o): In function `rs_appendmatch':
/home/mbp/src/librsync/delta.c:294: undefined reference to `rs_appendflush'
./.libs/librsync.a(delta.o): In function `rs_appendmiss':
/home/mbp/src/librsync/delta.c:322: undefined reference to `rs_appendflush'
./.libs/librsync.a(delta.o): In function `rs_appendmatch':
/home/mbp/src/librsync/delta.c:294: undefined reference to `rs_appendflush'
./.libs/librsync.a(delta.o): In function `rs_appendmiss':
/home/mbp/src/librsync/delta.c:322: undefined reference to `rs_appendflush'
./.libs/librsync.a(delta.o): In function `rs_delta_s_flush':
/home/mbp/src/librsync/delta.c:221: undefined reference to `rs_appendflush'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Makefile:594: recipe for target 'rdiff' failed

It might be unhappy with something about the inline declaration.

Please add timestamps to -v mode of rdiff

Hello!

Current output is not an informative without time:

rdiff: (rs_loadsig_add_sum) read in checksum: weak=0x25f88369, strong=e863a963d41eeb14
rdiff: (rs_scoop_readahead) got 4 bytes from input buffer
rdiff: (rs_scoop_advance) advance over 4 bytes from input buffer
rdiff: (rs_scoop_readahead) got 8 bytes from input buffer
rdiff: (rs_scoop_advance) advance over 8 bytes from input buffer
rdiff: (rs_loadsig_add_sum) read in checksum: weak=0xcefec257, strong=b47073ba9de402cb
rdiff: (rs_scoop_readahead) got 4 bytes from input buffer
rdiff: (rs_scoop_advance) advance over 4 bytes from input buffer
rdiff: (rs_scoop_readahead) got 8 bytes from input buffer
rdiff: (rs_scoop_advance) advance over 8 bytes from input buffer
rdiff: (rs_loadsig_add_sum) read in checksum: weak=0xb54749a3, strong=1ba24589bfe99c0e
rdiff: (rs_scoop_readahead) got 4 bytes from input buffer
rdiff: (rs_scoop_advance) advance over 4 bytes from input buffer
rdiff: (rs_scoop_readahead) got 8 bytes from input buffer
rdiff: (rs_scoop_advance) advance over 8 bytes from input buffer
rdiff: (rs_loadsig_add_sum) read in checksum: weak=0xb49349a5, strong=aa6e4b743b735a45

migrate tests from bash

Test drivers are currently mostly in bash, which makes it annoying to run them on Windows: probably possible, but they probably won't run by default for most people and they can easily break.

It'd be better if those for C functionality switched to being directly in C. Command-line tests I'm not sure: maybe C, or Python.

Crash in `rs_search_for_block`

Sometimes the l and r variables take the value sig->count when leaving de while loop. Then this lines are executed:

if (l == r) {
int i = sig->targets[l].i;
rs_block_sig_t *b = &(sig->block_sigs[i]);

Tested with compilers
gcc (i686-posix-dwarf-rev1, Built by MinGW-W64 project) 4.9.2
gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 5.2.0
the i var goes out of bounds

In Linux with differents gcc versions i get a value that is bound.

SECURITY: MD4 collision/preimage attacks (CVE-2014-8242)

If you are syncing a mix of trusted and untrusted data (such as VM images or databases), an attacker could corrupt synced data.

The easier attack is to generate collisions of the combined MD4/rolling sum in order to corrupt the file. This attack has almost no complexity for MD4, and in general for a 64-bit hash there's a birthday attack of 2^32 complexity.

With some effort a preimage could be generated (with any 64-bit hash, this has 2^64 complexity - maybe a better attack is possible with MD4). This would allow for malicious changes in synced files.

CMake transition changed the soname to librsync.so.1

Assuming the 1.0.1 release is binary compatible with 1.0.0, it needs to retain the "librsync.so.2" soname and under no circumstances use "librsync.so.1" (which was the soname back in the librsync 0.9.7 days).

I believe a new release must be made with the above correction; 1.0.1 cannot be packaged by Linux distros.

`make test` doesn't build the tests

After merging the CMake patch from @Salamek (thanks), I see that make test doesn't implicitly build the tests. You have to make all first or they may fail.

I tried to get this working and could not. Maybe someone else can. For the moment I just added a note to README.md.

Question: License

I'm starting to work on bindings for this library in Rust. The problem is the following:

  • linking between Rust libraries is static;
  • linking between a Rust library and a C library can be static or dynamic. However dynamically linking means that the library (and thus all the derivative projects) needs to be rebuilt for every linux distribution, osx and windows systems (because librsync needs to be present in the system, and could be different);
  • the vast majority of Rust libraries are licensed under MIT.

So, if I want to provide the best experience to library users I should link librsync statically and provide a MIT license. However MIT and LGPL are not strictly compatible. So, to be pedantic I need to:

  • link dynamically (and be not portable across distributions without rebuld);
  • or be LGPL myself (and be poorly integrated in the Rust ecosystem).

I'm not a lawyer, but since I will provide sources on GitHub, package through crates.io I will be 100% open source. Is it a problem if the license is MIT?

Am I missing something? I don't know if I'm clear.

Lot of warnings on current master compilation

gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 5.3.0
mingw32-make

Scanning dependencies of target rsync
[  3%] Building C object CMakeFiles/rsync.dir/src/prototab.c.obj
[  6%] Building C object CMakeFiles/rsync.dir/src/base64.c.obj
[  9%] Building C object CMakeFiles/rsync.dir/src/buf.c.obj
[ 12%] Building C object CMakeFiles/rsync.dir/src/checksum.c.obj
[ 15%] Building C object CMakeFiles/rsync.dir/src/command.c.obj
[ 18%] Building C object CMakeFiles/rsync.dir/src/delta.c.obj
[ 21%] Building C object CMakeFiles/rsync.dir/src/emit.c.obj
[ 25%] Building C object CMakeFiles/rsync.dir/src/fileutil.c.obj
[ 28%] Building C object CMakeFiles/rsync.dir/src/hex.c.obj
[ 31%] Building C object CMakeFiles/rsync.dir/src/job.c.obj
In file included from librsync-master\src\job.c:56:0:
librsync-master\src\job.c: In function 'rs_job_iter':
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\job.c:143:13: note: in expansion of
 macro 'rs_log'
             rs_log(RS_LOG_ERR, "internal error: job made no progress "
             ^
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\job.c:143:13: note: in expansion of
 macro 'rs_log'
             rs_log(RS_LOG_ERR, "internal error: job made no progress "
             ^
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\job.c:143:13: note: in expansion of
 macro 'rs_log'
             rs_log(RS_LOG_ERR, "internal error: job made no progress "
             ^
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\job.c:143:13: note: in expansion of
 macro 'rs_log'
             rs_log(RS_LOG_ERR, "internal error: job made no progress "
             ^
librsync-master\src\trace.h:81:33: warning: too many ar
guments for format [-Wformat-extra-args]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\job.c:143:13: note: in expansion of
 macro 'rs_log'
             rs_log(RS_LOG_ERR, "internal error: job made no progress "
             ^
[ 34%] Building C object CMakeFiles/rsync.dir/src/mdfour.c.obj
librsync-master\src\mdfour.c: In function 'rs_mdfour_bl
ock':
librsync-master\src\mdfour.c:254:28: warning: cast from
 pointer to integer of different size [-Wpointer-to-int-cast]
     unsigned long ptrval = (unsigned long) p;
                            ^
[ 37%] Building C object CMakeFiles/rsync.dir/src/mksum.c.obj
[ 40%] Building C object CMakeFiles/rsync.dir/src/msg.c.obj
[ 43%] Building C object CMakeFiles/rsync.dir/src/netint.c.obj
In file included from librsync-master\src\netint.c:61:0
:
librsync-master\src\netint.c: In function 'rs_int_len':

librsync-master\src\trace.h:92:8: warning: unknown conv
ersion type character 'l' in format [-Wformat=]
        (s) , ##str);                    \
        ^
librsync-master\src\netint.c:181:9: note: in expansion
of macro 'rs_fatal'
         rs_fatal("can't encode integer " PRINTF_FORMAT_U64 " yet", PRINTF_CAST_
U64(val));
         ^
librsync-master\src\trace.h:92:8: warning: too many arg
uments for format [-Wformat-extra-args]
        (s) , ##str);                    \
        ^
librsync-master\src\netint.c:181:9: note: in expansion
of macro 'rs_fatal'
         rs_fatal("can't encode integer " PRINTF_FORMAT_U64 " yet", PRINTF_CAST_
U64(val));
         ^
[ 46%] Building C object CMakeFiles/rsync.dir/src/patch.c.obj
In file included from librsync-master\src\patch.c:37:0:

librsync-master\src\patch.c: In function 'rs_patch_s_li
teral':
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\patch.c:154:9: note: in expansion o
f macro 'rs_log'
         rs_log(RS_LOG_ERR, "invalid length=" PRINTF_FORMAT_U64 " on LITERAL com
mand", PRINTF_CAST_U64(len));
         ^
librsync-master\src\trace.h:81:33: warning: too many ar
guments for format [-Wformat-extra-args]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\patch.c:154:9: note: in expansion o
f macro 'rs_log'
         rs_log(RS_LOG_ERR, "invalid length=" PRINTF_FORMAT_U64 " on LITERAL com
mand", PRINTF_CAST_U64(len));
         ^
librsync-master\src\patch.c: In function 'rs_patch_s_co
py':
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\patch.c:181:9: note: in expansion o
f macro 'rs_log'
         rs_log(RS_LOG_ERR, "invalid length=" PRINTF_FORMAT_U64 " on COPY comman
d", PRINTF_CAST_U64(len));
         ^
librsync-master\src\trace.h:81:33: warning: too many ar
guments for format [-Wformat-extra-args]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\patch.c:181:9: note: in expansion o
f macro 'rs_log'
         rs_log(RS_LOG_ERR, "invalid length=" PRINTF_FORMAT_U64 " on COPY comman
d", PRINTF_CAST_U64(len));
         ^
librsync-master\src\trace.h:81:33: warning: unknown con
version type character 'l' in format [-Wformat=]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\patch.c:186:9: note: in expansion o
f macro 'rs_log'
         rs_log(RS_LOG_ERR, "invalid where=" PRINTF_FORMAT_U64 " on COPY command
", PRINTF_CAST_U64(where));
         ^
librsync-master\src\trace.h:81:33: warning: too many ar
guments for format [-Wformat-extra-args]
      rs_log0((l), __FUNCTION__, (s) , ##str);  \
                                 ^
librsync-master\src\patch.c:186:9: note: in expansion o
f macro 'rs_log'
         rs_log(RS_LOG_ERR, "invalid where=" PRINTF_FORMAT_U64 " on COPY command
", PRINTF_CAST_U64(where));
         ^
[ 50%] Building C object CMakeFiles/rsync.dir/src/readsums.c.obj
[ 53%] Building C object CMakeFiles/rsync.dir/src/rollsum.c.obj
[ 56%] Building C object CMakeFiles/rsync.dir/src/scoop.c.obj
[ 59%] Building C object CMakeFiles/rsync.dir/src/search.c.obj
[ 62%] Building C object CMakeFiles/rsync.dir/src/stats.c.obj
[ 65%] Building C object CMakeFiles/rsync.dir/src/stream.c.obj
[ 68%] Building C object CMakeFiles/rsync.dir/src/sumset.c.obj
[ 71%] Building C object CMakeFiles/rsync.dir/src/trace.c.obj
librsync-master\src\trace.c:69:4: warning: #warning siz
e_t is larger than a long integer, values in trace messages may be wrong [-Wcpp]

 #  warning size_t is larger than a long integer, values in trace messages may b
e wrong
    ^
[ 75%] Building C object CMakeFiles/rsync.dir/src/tube.c.obj
[ 78%] Building C object CMakeFiles/rsync.dir/src/util.c.obj
[ 81%] Building C object CMakeFiles/rsync.dir/src/version.c.obj
[ 84%] Building C object CMakeFiles/rsync.dir/src/whole.c.obj
[ 87%] Building C object CMakeFiles/rsync.dir/src/blake2b-ref.c.obj
[ 90%] Linking C shared library librsync.dll
[ 90%] Built target rsync
Scanning dependencies of target isprefix_test
[ 93%] Building C object CMakeFiles/isprefix_test.dir/tests/isprefix_test.c.obj
[ 96%] Building C object CMakeFiles/isprefix_test.dir/src/isprefix.c.obj
[100%] Linking C executable isprefix_test.exe
[100%] Built target isprefix_test

On 2.0 release to compare:

Scanning dependencies of target rsync
[  3%] Building C object CMakeFiles/rsync.dir/src/prototab.c.obj
[  6%] Building C object CMakeFiles/rsync.dir/src/base64.c.obj
[  9%] Building C object CMakeFiles/rsync.dir/src/buf.c.obj
[ 12%] Building C object CMakeFiles/rsync.dir/src/checksum.c.obj
[ 15%] Building C object CMakeFiles/rsync.dir/src/command.c.obj
[ 18%] Building C object CMakeFiles/rsync.dir/src/delta.c.obj
[ 21%] Building C object CMakeFiles/rsync.dir/src/emit.c.obj
[ 25%] Building C object CMakeFiles/rsync.dir/src/fileutil.c.obj
[ 28%] Building C object CMakeFiles/rsync.dir/src/hex.c.obj
[ 31%] Building C object CMakeFiles/rsync.dir/src/job.c.obj
[ 34%] Building C object CMakeFiles/rsync.dir/src/mdfour.c.obj
librsync-2.0.0\src\mdfour.c: In function 'rs_mdfour_blo
ck':
librsync-2.0.0\src\mdfour.c:255:28: warning: cast from
pointer to integer of different size [-Wpointer-to-int-cast]
     unsigned long ptrval = (unsigned long) p;
                            ^
[ 37%] Building C object CMakeFiles/rsync.dir/src/mksum.c.obj
[ 40%] Building C object CMakeFiles/rsync.dir/src/msg.c.obj
[ 43%] Building C object CMakeFiles/rsync.dir/src/netint.c.obj
[ 46%] Building C object CMakeFiles/rsync.dir/src/patch.c.obj
[ 50%] Building C object CMakeFiles/rsync.dir/src/readsums.c.obj
[ 53%] Building C object CMakeFiles/rsync.dir/src/rollsum.c.obj
[ 56%] Building C object CMakeFiles/rsync.dir/src/scoop.c.obj
[ 59%] Building C object CMakeFiles/rsync.dir/src/search.c.obj
[ 62%] Building C object CMakeFiles/rsync.dir/src/stats.c.obj
[ 65%] Building C object CMakeFiles/rsync.dir/src/stream.c.obj
[ 68%] Building C object CMakeFiles/rsync.dir/src/sumset.c.obj
[ 71%] Building C object CMakeFiles/rsync.dir/src/trace.c.obj
librsync-2.0.0\src\trace.c:71:4: warning: #warning size
_t is larger than a long integer, values in trace messages may be wrong [-Wcpp]
 #  warning size_t is larger than a long integer, values in trace messages may b
e wrong
    ^
[ 75%] Building C object CMakeFiles/rsync.dir/src/tube.c.obj
[ 78%] Building C object CMakeFiles/rsync.dir/src/util.c.obj
[ 81%] Building C object CMakeFiles/rsync.dir/src/version.c.obj
[ 84%] Building C object CMakeFiles/rsync.dir/src/whole.c.obj
[ 87%] Building C object CMakeFiles/rsync.dir/src/blake2b-ref.c.obj
[ 90%] Linking C shared library librsync.dll
[ 90%] Built target rsync
Scanning dependencies of target isprefix_test
[ 93%] Building C object CMakeFiles/isprefix_test.dir/tests/isprefix_test.c.obj
[ 96%] Building C object CMakeFiles/isprefix_test.dir/src/isprefix.c.obj
[100%] Linking C executable isprefix_test.exe
[100%] Built target isprefix_test

types.h not found

As I've been having this rs_search_for_block crash problem ( #50 ) on 2.0.0 occasionally, I gave the master branch a shot. When I was compiling against the newly installed librsync, a compilation error pops out:

/usr/local/include/librsync.h:40:19: fatal error: types.h: No such file or directory

I can see that librsync.h is including a types.h, but there's no sight of this file under /usr/local/include. Maybe there's one step missing in installation process to copy this file?

signature.test fails on big endian

librsync 1.0.0 failed to build on Debian buildds for architectures mips, powerpc and s390x, because signature.test failed:

    ../rdiff --hash=md4 -I4096 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I1 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I2 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I3 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I7 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I15 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I100 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I10000 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=md4 -I200000 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
    ../rdiff --hash=blake2 -I4096 signature ./signature.input/01.in /tmp/librsynctest_eR3ZJiKg/signature
./signature.input/blake2/01.sig /tmp/librsynctest_eR3ZJiKg/signature differ: char 17, line 1
: comparison failed from command:

I can provide additional info if you need.

Why did rs_search_for_block() become using binary search?

First, sorry for my poor English.
It seems in a6122f1 , rs_search_for_block() changed greatly, using a binary search algorithm to search for weak checksum matching, instead of a hash table.
Is there any discussion about it?
Append: open it as a issue because it may affect performance.

Hashtable DoS: Use a better rolling hash.

[ Extremely low impact DoS condition ]

The hashtable implementation is vulnerable to collisions - if somebody were to make a file full of rolling sum collisions (where md4 didn't collide), this would cause an md4 to be generated at each colliding offset, then a sequential search through the list.

A tree of weak sums pointing to a tree of md4 sums would avoid some of this worst-case behaviour, but you'd still have to compute the md4 each collision.

Not merging COPY changes for duplicate blocks

If you create a file that's (for example) all-sparse, the signatures will all have the same hash.

The delta file for a (slightly) changed version (eg. you append some data to the end) will contain a COPY of the first block repeated for each block, rather than a merged COPY.

The most efficient representation of the delta would by in a single merged COPY (which would also produce correct results).

Needs a more thorough test suite

The librsync tests are not what I would write today. I recall doing some manual testing or interactive testing through rproxy on previous releases, but it'd be better to build them into a larger test suite that can demonstrate a good amount of coverage.

excess cpu use in rs_mdfour / rs_search_for_block / rs_findmatch

Hello!

I'm using rdiff for creating incremental backups and it hangs every day with one file with 5GB in size.

OS: CentOS 6

ps aux|grep rdiff
root      771334  0.0  0.0 103248   892 pts/0    S+   14:39   0:00 grep rdiff
root      945509 99.2  0.0  78444 59484 ?        R    Sep29 1207:21 rdiff --block-size=1048576 --input-size=16777216 --output-size=16777216 delta /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature /vz/private/4822/root.hdd/root.hdd /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature.delta

Source files:

ls -al /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature.delta
-rw-r--r-- 1 root root 191889408 Sep 30 13:41 /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature.delta

ls -alh /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature.delta
-rw-r--r-- 1 root root 183M Sep 30 13:41 /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature.delta

ls -la /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature
-rw-r--r-- 1 root root 49752 Sep 29 18:23 /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature

Strace show nothing:

strace -f -p 945509
Process 945509 attached - interrupt to quit

But gdb show so much very interesting info:

debuginfo-install librsync-0.9.7-15.el6.x86_64
(gdb) bt
#0  0x00007f2d96a53bbf in rs_mdfour64 (m=0x7fff58b760a0, p=0x7f2d946d33a4) at mdfour.c:113
#1  0x00007f2d96a5413f in rs_mdfour_block (md=0x7fff58b760a0, in_void=<value optimized out>, n=<value optimized out>) at mdfour.c:263
#2  rs_mdfour_update (md=0x7fff58b760a0, in_void=<value optimized out>, n=<value optimized out>) at mdfour.c:359
#3  0x00007f2d96a5442a in rs_mdfour (out=0x7fff58b76170 "\251\017\226/\035B\253\212\311lH<\222\222t\230", in=0x7f2d946c63a4, n=1048576) at mdfour.c:389
#4  0x00007f2d96a55bbc in rs_search_for_block (weak_sum=0, inbuf=<value optimized out>, block_len=<value optimized out>, sig=0xb3dd80, stats=<value optimized out>, match_where=<value optimized out>)
    at search.c:143
#5  0x00007f2d96a52b01 in rs_findmatch (job=0xb3dbd0) at delta.c:272
#6  rs_delta_s_scan (job=0xb3dbd0) at delta.c:156
#7  0x00007f2d96a5391e in rs_job_work (job=0xb3dbd0, buffers=0x7fff58b762e0) at job.c:187
#8  rs_job_iter (job=0xb3dbd0, buffers=0x7fff58b762e0) at job.c:145
#9  0x00007f2d96a53a39 in rs_job_drive (job=0xb3dbd0, buf=0x7fff58b762e0, in_cb=0x7f2d96a51ee0 <rs_infilebuf_fill>, in_opaque=<value optimized out>, out_cb=0x7f2d96a51d90 <rs_outfilebuf_drain>, 
    out_opaque=<value optimized out>) at job.c:241
#10 0x00007f2d96a56b72 in rs_whole_run (job=0xb3dbd0, in_file=<value optimized out>, out_file=0xb3d990) at whole.c:80
#11 0x00007f2d96a56cf7 in rs_delta_file (sig=<value optimized out>, new_file=0xb3d750, delta_file=0xb3d990, stats=0x7fff58b76380) at whole.c:154
#12 0x000000000040149b in rdiff_delta (argc=-1804782684, argv=0xb208f4c2) at rdiff.c:298
#13 rdiff_action (argc=-1804782684, argv=0xb208f4c2) at rdiff.c:358
#14 main (argc=-1804782684, argv=0xb208f4c2) at rdiff.c:374

With disk system everything looks fine:

cat  /vz/tmp/extracted_backup_34bdf94b-43eb-4491-bc65-196d8f624d48.signature |pv >/dev/null
48.6kB 0:00:00 [ 452MB/s] [   <=>                  
cat /vz/private/4822/root.hdd/root.hdd  |pv >/dev/null
4.05GB 0:00:02 [1.36GB/s] [          <=>                  

Please help me :(

trace is always compiled in on !GCC

trace.h conditionally compiles trace macros (and they're always off under CMake at present), but on !gcc they're always on. This is probably pretty slow.

get away from automake

automake is pretty archaic, hard to work with, and especially hard on people on Windows.

We should switch to something else. I looked previously at waf or scons, but perhaps cmake is a better plan, and is proposed in #27, with apparent agreement on cmake.

I'd much rather not have separate Windows build scripts, which won't be tested.

Builtin whole-file hash

It'd be helpful if librsync optionally computed a whole-file hash across the input, as rsync does.

It would also be useful, and easy to add at the same time, the length of the input file, in signatures and deltas.

New release

Hi, it has been literally 10 years since the last release. It's good to see that it is still alive, but we would really appreciate a new release. I can see that this same issue has been raised in #5, but just to keep track of that, I'm opening a new issue. Thank you.

Build failure because of search.c

32s] search.c: In function 'rs_build_hash_table':
[ 32s] search.c:131:1: warning: declaration of 'rs_search_for_block' shadows a global declaration [-Wshadow]
[ 32s] rs_search_for_block(rs_weak_sum_t weak_sum,
[ 32s] ^
[ 32s] In file included from search.c:47:0:
[ 32s] search.h:24:1: warning: shadowed declaration is here [-Wshadow]
[ 32s] rs_search_for_block(rs_weak_sum_t weak_sum,
[ 32s] ^
[ 32s] search.c: In function 'rs_search_for_block':
[ 32s] search.c:149:6: warning: declaration of 'i' shadows a previous local [-Wshadow]
[ 32s] int i = sig->targets[m].i;
[ 32s] ^
[ 32s] search.c:80:9: warning: shadowed declaration is here [-Wshadow]
[ 32s] int i;
[ 32s] ^
[ 32s] search.c:154:4: warning: implicit declaration of function 'rs_calc_strong_sum' [-Wimplicit-function-declaration]
[ 32s] rs_calc_strong_sum(inbuf, block_len, &strong_sum);
[ 32s] ^
[ 32s] search.c: In function 'rs_build_hash_table':
[ 32s] search.c:172:1: error: expected declaration or statement at end of input
[ 32s] }
[ 32s] ^
[ 32s] search.c:172:1: error: expected declaration or statement at end of input
[ 32s] search.c:172:1: warning: control reaches end of non-void function [-Wreturn-type]
[ 32s] }
[ 32s] ^

Not sure if it does have something to do with this issue but I have applied the patch from
PR#14

rs_fatal is overused

librsync has a function/macro rs_fatal etc, to indicate severe errors or internal bugs, which calls abort().

However as a well-behaved library, it's probably better if it doesn't do that unless there's no safe way to cleanly unwind. Probably most uses should be pulled out and it should just return an error.

CMake issue: Drop perl as a build dependency.

Hi,
I am trying to build from the latest master using CMake 3.6.0 and Visual Studio 2015.
Whenever I try and configure CMake I get these two errors:

PERL_EXECUTABLE-NOTFOUND
POPT_INCLUDE_DIRS-NOTFOUND

How can I make it work?

Building under windows

Trying to build a visual studio project using cmake and i get the following errors:
... Cannot open include file: 'alloca.h': No such file or directory ...
... Cannot open include file: 'dlfcn.h': No such file or .....
... Cannot open include file: 'strings.h': No such file or directory ...
...
Cannot open include file: 'unistd.h': No such file or directory
Cannot open include file: 'bzlib.h': No such file or
Cannot open include file: 'mcheck.h': No such file
Cannot open include file: 'sys/file.h': No such file ..
Cannot open include file: 'zlib.h': No such file

and at the end of the log file i get few of: fatal error LNK1120: 1 unresolved externals

Using windows 8.1 64bit, visual studio 2013.

Any idea what i should do?

Compile error rs_stats

Hi,
I wrote a simple code that only calls "rs_sig_file", but I get this compile error:
error C3646: 'start': unknown override specifier
the 'start' is a member of "rs_stats" inside the "librsync.h", but when I include <time.h> or above the "librsync.h" include then the project compiles successfully.
Why I need to include time library inside my own cpp file that uses only a function call of "rs_sig_file", I also tried to include "stdlib.h" that is works too.
I compile using Visual studio 15.

#include <stdlib.h>
//#include <ctime>
#include "rsync/librsync.h"
#include "sync_signature.hpp"

int SyncSignature::generateSignature(std::string inputFilePath, std::string sigFilePath)
{
    FILE *fpInput;
    fpInput = fopen(inputFilePath.c_str(), "rb");

    FILE *fpSig;
    fpSig = fopen(sigFilePath.c_str(), "wb+");

    // The MD4 sig magic is deprecated.
    rs_result res = rs_sig_file(fpInput, fpSig, 2048, 32, RS_BLAKE2_SIG_MAGIC, NULL);

    fclose(fpInput);
    fclose(fpSig);

    return 0;
}

Infinite loop for truncated signature or delta files.

This might be related to #29
I was investigating why delta files seem to have additional characters on the end when generating deltas of identical files...

With the current rdiff,

echo "Hello world" > input
rdiff signature input sig
rdiff delta sig input delt
rdiff patch input delt output
# all is fine
# time to hang the process, truncate the delta file
dd if=delt of=delt2 bs=1 count=1
rdiff patch input delt2 output
# HANG, infinite loop somewhere
# Note that when the delta file is big enough, THEN the problem is handled properly
dd if=delt of=delt2 bs=1 count=4
rdiff patch input delt2 output
librsync: ERROR: (librsync) patch job failed: unexpected end of input
librsync: ERROR: unexpected end of input

LIBTOOL eval'd to nothing in autogen.sh message

On line 77 of autogen.sh the var LIBTOOLIZE is eval'd whereas the message requires it to be not evaluated -- we're interested in the name not the value.

Obvious fix is to escape the $ - i.e. $ but perhaps it would be better to change it to:

echo "You can set the LIBTOOLIZE environment variable."

BTW, homebrew on Mac installs libtoolize as glibtoolize.

Multithreaded operations(signature generation)

Hi,
I took a look at the whole file api and it seems that it's single threaded and on a system with more than one core, the processing power is not utilized for the signature generation and the other operations.
On the documentation i found the pull and push modes but I am not sure which one is useful so i need some clarification.

What i want to achieve is to start x number of threads, lets say 4. Each thread will open the same file of lets say 40bytes and read at different positions to calculate:
Thread 1: read 0-9bytes from 'ab.txt' => calculate sig into a temp file
Thread 2: read 10-19bytes from 'ab.txt' => calculate sig into a temp file
Thread 3: read 20-29bytes from 'ab.txt' => calculate sig into a temp file
Thread 4: read 30-39bytes from 'ab.txt' => calculate sig into a temp file

and finally concatenate all files and get the sig file.

So i don't know if this scenario is even possible. Any help would be appreciated.

adler 32 rollsum, the s1 is set to 1 instead of zero

https://www.ietf.org/rfc/rfc1950.txt:

8.2. The Adler-32 algorithm

  The Adler-32 algorithm is much faster than the CRC32 algorithm yet
  still provides an extremely low probability of undetected errors.

  The modulo on unsigned long accumulators can be delayed for 5552
  bytes, so the modulo operation time is negligible.  If the bytes
  are a, b, c, the second sum is 3a + 2b + c + 3, and so is position
  and order sensitive, unlike the first sum, which is just a
  checksum.  That 65521 is prime is important to avoid a possible
  large class of two-byte errors that leave the check unchanged.
  (The Fletcher checksum uses 255, which is not prime and which also
  makes the Fletcher check insensitive to single byte changes 0 <->
  255.)

  The sum s1 is initialized to 1 instead of zero to make the length
  of the sequence part of s2, so that the length does not have to be
  checked separately. (Any sequence of zeroes has a Fletcher
  checksum of zero.)

Crash with badly behaving rs_copy_cb callback

I wrote a test which can make librsync get confused and request out-of-bounds data,
and write past the end of the output buffer.

test_delta_read.test: test_delta_read.c:34: read_memory_callback: Assertion `pos < mem->size' failed.

Aborted

// in:
static rs_result rs_patch_s_copying(rs_job_t *job) {
...;
// the code uses the callback's returned 'len' value without checking if its larger than the output buffer
// ie can write past the end of the output buffer.
    result = (job->copy_cb)(job->copy_arg, job->basis_pos, &len, &ptr);
// nothing checked with len, just memcpy blindly...
    memcpy(buffs->next_out, ptr, len);

2 commits coming in a pull request. 1 with the test, 1 with my fix.

Unit test rs__search_for_block

#50 flagged a bug in this function, but it really needs a specific unit test for building a search structure and retrieving various values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.