Giter Site home page Giter Site logo

Comments (13)

sourcefrog avatar sourcefrog commented on September 7, 2024

You're going to have to be more specific.

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

Surely! Thank you for answer!

I executed about 100 rdiff signature generation for different data sets:

Container: 59648 data size: 14.4 Gb signature size: 172 kilobytes calculated in: 69 seconds generation performance is: 213 megabytes/sec
Container: 60094 data size: 14.8 Gb signature size: 177 kilobytes calculated in: 80 seconds generation performance is: 200 megabytes/sec
Container: 60298 data size: 2.5 Gb signature size: 30 kilobytes calculated in: 14 seconds generation performance is: 199 megabytes/sec
Container: 60302 data size: 1.7 Gb signature size: 20 kilobytes calculated in: 9 seconds generation performance is: 198 megabytes/sec
Container: 60303 data size: 10.9 Gb signature size: 130 kilobytes calculated in: 59 seconds generation performance is: 196 megabytes/sec
Container: 60308 data size: 24.7 Gb signature size: 296 kilobytes calculated in: 134 seconds generation performance is: 193 megabytes/sec
Container: 60310 data size: 4.2 Gb signature size: 50 kilobytes calculated in: 22 seconds generation performance is: 193 megabytes/sec
Container: 60312 data size: 2.3 Gb signature size: 27 kilobytes calculated in: 13 seconds generation performance is: 193 megabytes/sec
Container: 60325 data size: 1.7 Gb signature size: 20 kilobytes calculated in: 9 seconds generation performance is: 193 megabytes/sec
Container: 60344 data size: 7.7 Gb signature size: 92 kilobytes calculated in: 42 seconds generation performance is: 192 megabytes/sec
Container: 60345 data size: 1.9 Gb signature size: 23 kilobytes calculated in: 10 seconds generation performance is: 192 megabytes/sec
Container: 60351 data size: 8.6 Gb signature size: 103 kilobytes calculated in: 47 seconds generation performance is: 192 megabytes/sec
Container: 60359 data size: 14.2 Gb signature size: 169 kilobytes calculated in: 78 seconds generation performance is: 191 megabytes/sec
Container: 60363 data size: 15.1 Gb signature size: 180 kilobytes calculated in: 82 seconds generation performance is: 191 megabytes/sec
Container: 60364 data size: 23.7 Gb signature size: 284 kilobytes calculated in: 127 seconds generation performance is: 191 megabytes/sec
Container: 60377 data size: 14.1 Gb signature size: 168 kilobytes calculated in: 76 seconds generation performance is: 190 megabytes/sec
Container: 60379 data size: 8.1 Gb signature size: 96 kilobytes calculated in: 43 seconds generation performance is: 191 megabytes/sec
Container: 60380 data size: 7.6 Gb signature size: 91 kilobytes calculated in: 41 seconds generation performance is: 190 megabytes/sec
Container: 60395 data size: 1.8 Gb signature size: 21 kilobytes calculated in: 10 seconds generation performance is: 190 megabytes/sec
Container: 60399 data size: 1.2 Gb signature size: 14 kilobytes calculated in: 7 seconds generation performance is: 190 megabytes/sec
Container: 60403 data size: 21.3 Gb signature size: 256 kilobytes calculated in: 115 seconds generation performance is: 190 megabytes/sec

I call rdiff with following params (I'm use 1048576 because my data in 1Mb blocks):

rdiff --block-size=1048576

And rdiff consumes whole CPU core for signature generation:

 195939 root      20   0 14564 1724  560 R 98.1  0.0   0:49.42 rdiff                                                                                  

It's not an disk system issue because our disk system is extremely fast:

cat /vz/private/62767/root.hdd/root.hdd |pv >/dev/null
^C48GB 0:00:06 [ 628MB/s] [              <=>         

Hardware:
Intel Xeon e5-2670 x 2
128GB DDR ECC
RAID-10 SSD Samsung EVO840

cat /proc/cpuinfo:

processor   : 39
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
cpu MHz     : 2499.976
cache size  : 25600 KB
physical id : 1
siblings    : 20
core id     : 12
cpu cores   : 10
apicid      : 57
initial apicid  : 57
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips    : 4999.31
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Software:

 cat /etc/issue
CentOS release 6.5 (Final

rdiff --version
rdiff (librsync 0.9.7) [x86_64-redhat-linux-gnu]
Copyright (C) 1997-2001 by Martin Pool, Andrew Tridgell and others.
http://rproxy.samba.org/
Capabilities: 64 bit files, gzip, bzip2

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

And I got final data for while data storage:

Total data size: 1611.0 Gb signature size: 18.9 megabytes calculated in: 146 minutes signature generation performance is: 187.6 megabytes/sec

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

I found "generate stopped: blocked waiting for input or output buffers" record in trace log and tried to play with buffers and I got two times speed up from 200 MB/sec to 512 MB/sec with 16 MB buffer for reading and writing:

rdiff --block-size 1048576 ---input-size 16777216 --output-size 16777216

New results:

Container: 59648 data size: 14.4 Gb signature size: 173 kilobytes blocks count 14766 calculated in: 31 seconds generation performance is: 476 megabytes/sec
Container: 60094 data size: 14.9 Gb signature size: 178 kilobytes blocks count 15237 calculated in: 31 seconds generation performance is: 483 megabytes/sec
Container: 60298 data size: 2.5 Gb signature size: 30 kilobytes blocks count 2571 calculated in: 5 seconds generation performance is: 486 megabytes/sec
Container: 60302 data size: 1.7 Gb signature size: 20 kilobytes blocks count 1711 calculated in: 4 seconds generation performance is: 482 megabytes/sec

What about buffers enlarge? Standard size 16k is really small.

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

But rdiff still eat whole my CPU core :(

    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                 
 963758 root      20   0 38968  16m  516 R 99.5  0.0   0:30.40 rdiff   

Is it possible add multithreaded systems support for rdiff speedup?

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

I tested delta generation too and got same speed issues. Speed was not more than 200MB/s:

Container 59648 source size: 14.4 Gb delta size: 518.0 MB generated in 63 seconds processing speed is 234.6 MB/sec
Container 60094 source size: 15.0 Gb delta size: 908.0 MB generated in 65 seconds processing speed is 236.9 MB/sec
Container 60298 source size: 2.5 Gb delta size: 121.0 MB generated in 10 seconds processing speed is 258.3 MB/sec
Container 60302 source size: 1.7 Gb delta size: 65.0 MB generated in 5 seconds processing speed is 342.2 MB/sec
Container 60303 source size: 11.2 Gb delta size: 627.0 MB generated in 44 seconds processing speed is 260.6 MB/sec
Container 60308 source size: 24.7 Gb delta size: 6212.0 MB generated in 297 seconds processing speed is 85.2 MB/sec
Container 60310 source size: 4.2 Gb delta size: 391.0 MB generated in 23 seconds processing speed is 186.0 MB/sec
Container 60312 source size: 2.3 Gb delta size: 271.0 MB generated in 13 seconds processing speed is 180.9 MB/sec
Container 60325 source size: 1.7 Gb delta size: 200.0 MB generated in 10 seconds processing speed is 172.5 MB/sec
Container 60344 source size: 7.8 Gb delta size: 882.0 MB generated in 48 seconds processing speed is 165.7 MB/sec
Container 60345 source size: 1.9 Gb delta size: 224.0 MB generated in 12 seconds processing speed is 165.8 MB/sec
Container 60351 source size: 8.8 Gb delta size: 520.0 MB generated in 47 seconds processing speed is 190.7 MB/sec
Container 60359 source size: 14.2 Gb delta size: 170.0 MB generated in 62 seconds processing speed is 233.9 MB/sec
Container 60363 source size: 15.1 Gb delta size: 353.0 MB generated in 71 seconds processing speed is 217.4 MB/sec
Container 60364 source size: 24.2 Gb delta size: 3426.0 MB generated in 209 seconds processing speed is 118.4 MB/sec
Container 60377 source size: 14.7 Gb delta size: 4232.0 MB generated in 172 seconds processing speed is 87.8 MB/sec
Container 60379 source size: 8.1 Gb delta size: 676.0 MB generated in 49 seconds processing speed is 170.2 MB/sec
Container 60380 source size: 7.7 Gb delta size: 513.0 MB generated in 40 seconds processing speed is 198.1 MB/sec

And cpu consumption still so much:

 885324 root      20   0  110m  66m  616 R 100.0  0.0   2:33.27 rdiff                                                                               

I'm used rdiff with following params:

/usr/bin/rdiff --input-size 16777216 --output-size 16777216 --statistics --block-size=1048576 delta /root/rdiff_signatures_25_june/60403.signature /vz/private/60403/root.hdd/root.hdd /vz/tmp/delta.60403

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

You will find detailed trace log for delta generation here: https://www.dropbox.com/s/lpa51v2ho3u2kkk/rdiff_delta_trace.log.zip

Thank you :)

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

I do some profiling for rdiff for 15GB file.

time /usr/src/librsync/rdiff signature --block-size 1048576 /vz/tmp/extracted_backup_b62e3105-2900-4e08-bbf7-b70093fdc8a5.tar.gz /vz/tmp/rdiff.signature

I got following time results:

real    1m9.843s
user    1m5.046s
sys 0m4.715s

But gprof report is more interesting:

gprof /usr/src/librsync/rdiff
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
 71.45     18.21    18.21                             rs_mdfour64
 26.89     25.06     6.85                             rs_calc_weak_sum
  0.82     25.27     0.21                             rs_mdfour_update
  0.16     25.31     0.04                             rs_log_va
  0.12     25.34     0.03                             rs_log0
  0.12     25.37     0.03                             rs_outfilebuf_drain
  0.12     25.40     0.03                             rs_scoop_input
  0.04     25.41     0.01                             rs_infilebuf_fill
  0.04     25.42     0.01                             rs_job_check
  0.04     25.43     0.01                             rs_job_input_is_ending
  0.04     25.44     0.01                             rs_job_iter
  0.04     25.45     0.01                             rs_scoop_read
  0.04     25.46     0.01                             rs_sig_s_generate
  0.04     25.47     0.01                             rs_supports_trace

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
       else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
       function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
       the function in the gprof listing. If the index is
       in parenthesis it shows where it would appear in
       the gprof listing if it were to be printed.

from librsync.

pavel-odintsov avatar pavel-odintsov commented on September 7, 2024

I checked delta generation speed with 16MB buffers and without it and got very interesting results:

with buffers
Container 62957 source size: 14.8 Gb delta size: 3025.0 MB compressed size: 1378.8 MB generated in 113 seconds processing speed is 133.8 MB/sec

wihtout buffers
Container 62957 source size: 14.8 Gb delta size: 3025.6 MB compressed size: 1379.9 MB generated in 158 seconds processing speed is 95.7 MB/sec

Conclusion: for signature and delta generation 16MB input-size/output-size buffer provides 1.5-3 speedup.

from librsync.

paulharris avatar paulharris commented on September 7, 2024

This speed issue may be resolved with latest master, due to merge of improvements related to large files.
Merge #14

from librsync.

dbaarda avatar dbaarda commented on September 7, 2024

This bug was complaining about signature generation time, which is pretty simple file-read and checksum calculation work. It's unsurprising that setting buffer size larger than block size more than doubled his throughput.

That the CPU gets pegged doing md4sum calculations is unsurprising. I don't think much can be done to optimize the md4sum calculation. The best that can be done is parallelize the md4sum calculation so we can peg multiple CPU's :-) Also maybe the new blake2sums' are faster... dunno.

At the end he mentions delta calculation, which has been significantly improved with a new hashtable for large files since this bug was filed, though using 1M blocksize would have avoided problems with the old hashtable.

The idea of parallelizing signature generation is being tracked in #68.

The only thing left that this bug could do is have the buffer size default to something like 16x block size, not just 16K, if unspecified.

from librsync.

dbaarda avatar dbaarda commented on September 7, 2024

I've been thinking a little about the bufferlen settings and am about to do something about this.

The rs_inbuflen and rs_outbuflen vars are publicly visible/setable according to librsync.h. They primarily control the rs_buffers_t size used by the whole-file API implemented in whole.c, but crucially rs_outbuflen is also used by delta.c as the maximum size buffered/output per output-literal command.

In scoop.c librsync keeps it's own internal scoop_buf buffer that data is copied into for matching against (the double-copying this implies is another problem). In delta.c we always copy all the data in the input buffers to the scoop. This buffer grows by doubling size till it is large enough to contain a full block for matching against + the largest amount of miss data buffered for a single output-literal command + the full size of the next input buffer. This means it grows as large as the next power of 2 larger than rs_outbuflen + block_size+rs_inbuflen (note we could just pre-allocate the max needed size).

For rs_inbuflen it's only used for pre-fetching data to be copied into the scoop. After a match we always need at least a whole block of data to begin checking for matches again, so having rs_inbuflen < block_size is inefficient.

For rs_outbuflen The maximum output the algorithm generates in a single iteration is the largest output-literal command (cmd bytes+literal data).

Should we use rs_outbuflen as the max size for buffering/outputing literal cmds? This makes it the only setting outside of the job_t struct that affects the delta output. Probably a better option would be to use a multiple of the block size. Using 4x blocksize and setting rs_inbuflen=blocksize means the scoop would grow large enough to fit 6xblocksize, which is probably OK.

from librsync.

dbaarda avatar dbaarda commented on September 7, 2024

After submitting #105 I believe everything mentioned in this bug is now addressed. I'm marking it closed.

from librsync.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.