Giter Site home page Giter Site logo

redundancy / go-sync Goto Github PK

View Code? Open in Web Editor NEW
580.0 23.0 66.0 201 KB

gosync is a library for Golang styled around zsync / rsync, written with the intent that it enables efficient differential file transfer in a number of ways. NB: I am unable to contribute to this at the moment

License: MIT License

Go 100.00%
go zsync file-transfer binary-data rsync

go-sync's People

Contributors

creedasaurus avatar nathanleclaire avatar redundancy avatar shawnps avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-sync's Issues

example for using as a library

How would I use go-sync as a library in my app to sync a directory to another directory? Can you provide some examples.

Patch <reference index> requires local file

>gosync h p
NAME:
   gosync patch - gosync patch <localfile> <reference index> <reference source> [<output>]

USAGE:
   gosync patch [command options] [arguments...]

DESCRIPTION:
   Recreate the reference source file, using an index and a local file that is believed to be similar.
The index should be produced by "gosync build".

<reference index> is a .gosync file and may be a local, unc network path or http/https url
<reference source> is corresponding target and may be a local, unc network path or http/https url
<output> is optional. If not specified, the local file will be overwritten when done.

must currently be a local file contrary to documentation.

>gosync.exe patch Test01.exr http://localhost/Test02.gosync http://localhost/Test02.exr
Starting patching process
open http://localhost/Test02.gosync: The filename, directory name, or volume label syntax is incorrect.

Efficient sparse file handling

Daniel,
what's about support of variable block length? Perhaps this could be used to mark (very) large holes in files, thus making them to be patched/transferred at once.

P.S. By the way, do you plan to continue working on go-sync or maybe you are about to switch it into an unmaintained project? Or a third alternative: no active development anymore, only if there are patches from others. Maybe you could update https://github.com/Redundancy/go-sync#current-state with additional infos about how you see the further development process of go-sync. You could also clarify, what are TODOs, that you're personally about to finish, and what are nice things to implement additionally, but rather a long term ones and/or welcome to do by others.

Clarification on example

I'm assuming that I'm not understanding something so please correct me.
I thought the example for the gosync application in the readme is a basic example of how you would sync 2 files..is that correct ?

gosync build filenameToPatchTo
gosync patch filenameToPatchFrom filenameToPatchTo.gosync

Steps to reproduce:
$ mkdir -p /tmp/foo
$ mkdir -p /tmp/foo1
$ dd if=/dev/urandom of=/tmp/foo/blah.txt count=10240 bs=10240
$ cd /tmp/foo1
$ gosync build blah.txt
$ gosync patch /tmp/foo/blah.txt /tmp/foo1/blah.gosync

Results:

    am@seattle2:/tmp/foo1$ gosync patch /tmp/foo/blah.txt /tmp/foo1/blah.gosync 
    panic: runtime error: index out of range

    goroutine 1 [running]:
    main.Patch(0xc2080b20c0)
            /home/am/go/src/github.com/Redundancy/go-sync/gosync/patch.go:63 +0x1734
    github.com/codegangsta/cli.Command.Run(0x796f70, 0x5, 0x796cf0, 0x1, 0x8204b0, 0x48,      0x847390, 0x1a8, 0x0, 0x0, ...)
            /home/am/go/src/github.com/codegangsta/cli/command.go:113 +0x1038

github.com/codegangsta/cli.(*App).Run(0xc2080368f0, 0xc20800a000, 0x4, 0x4, 0x0, 0x0)
/home/am/go/src/github.com/codegangsta/cli/app.go:156 +0xcf7
main.main()
/home/am/go/src/github.com/Redundancy/go-sync/gosync/main.go:53 +0x278

Expected Results:

I wasn't sure exactly what it did but I was presuming the files would be identical or they would have a different checksum or gosync diff would produce different outputs.

I was hoping to use this as a library to compare the differences between two files and possibly create a patch which I could then apply to the older file to make them identical. Would this be the sort of use case for your library ?

Would the functionality in go diff be adequate for this ?

am@seattle2:/tmp/foo1$ gosync diff /tmp/foo/blah.txt blah.gosync
Blocksize: 8192
Weak hash count: 0
Using 4 cores

Matched:
Comparisons: 104865796
Weak hash hits: 0
Weak hit rate: 0
Strong hash hits: 0
Weak hash error rate: NaN
Total matched bytes: 0
Total matched blocks: 0
Index blocks: 0
Total missing bytes: 0
Time taken: 4.862401859s

  1. Can you explain what the weak hash count is ?
  2. Why is the missing bytes 0 if foo1/blah.txt is 0 bytes ?

System:

(Ubuntu 14.04 ) (go version go1.4.1 linux/amd64)

Assign severities to TODO items

This is follow up from discussion in issue #14. Can you elaborate what go-sync is missing to be production ready regarding the stability and what are the priorities, please? How would you compare it with current version of zsync? We would like to know if we should invest resources into this technology and how many is needed. From TODO in README I would personally classify issues:

Clean up naming consistency and clarity: Block / Chunk etc
Think about turning the filechecksum into an interface

API cleanness - not related the the actual functionality, right?

Flesh out full directory build / sync

Nice to have RFE that upper layer could take care of instead (depends on the scope of this project).

Implement 'patch' payloads from a known start point to a desired end state
Provide bandwidth limiting / monitoring as part of http blocksource

Performance on client side. I would guess medium priority.

Validate full file checksum after patching
Sequential patcher to resume after error?

These would improve the stability. Are they weak points and high priorities?

Avoid marshalling / un-marshalling blocks during checksum generation

Performance on server side. I would guess low priority.

gzip source blocks (this involves writing out a version of the file that's compressed in block-increments)

Performance on client side?

Support exclude patterns

The original rsync CLI tool offers CLI --exclude flag to keep special file patterns out of a sync operation. Could go-sync support a similar configuration option?

Trailing zeroes in patched file

I'm trying to use go-sync as a library and run into random corruption of patched files. Looking at hex diff, pattern is always the same - patched file size gets rounded up to block size and extra space is filled with zeroes:

--- /dev/fd/63  2021-12-23 15:03:49.189605458 +0000
+++ /dev/fd/62  2021-12-23 15:03:49.189605458 +0000
@@ -189527,4 +189527,4 @@
 002e6bd0  bc 35 c4 35 c8 35 cc 35  d0 35 00 00 00 00 00 00  |.5.5.5.5.5......|
 002e6be0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 *
-00303600
+00304000

Block size that I use is 4096 (0x1000), original file size is 3159552, after patching it ends up as 3162112 - 771.375 blocks vs 772.0 blocks.

I double checked that I pass the correct FileSize and BlockSize. Checksum data is read separately and is 15440 bytes long, which looks sane 5440/(4+16) = 772. BlockCount in summary is set to length of checksums array, returned by chunks.LoadChecksumsFromReader. Then I construct blob source like this:

resolver := blocksources.MakeFileSizedBlockResolver(uint64(summary.GetBlockSize()), summary.GetFileSize())
blockSource := blocksources.NewReadSeekerBlockSource(myReadSeeker, resolver)

Is there something that I'm missing? Glancing at the code MakeFileSizedBlockResolver should do the right thing, and truncate last partial block end offset to be no larger than the file, but somehow those trailing zeroes crip in.

go-sync v0.0.0-20200808161209-d9b3aeb508db

A clarification in document

Hello @Redundancy,

I've recently stumbled upon go-sync and quite impressed with the underlying implementation and its pairing tests. Test cases easily double ( or nearly as much as triple in some cases ) workload and it's a bit discouraging to see them unnoticed by others. Plus, go-sync appears to have broader platform coverage as it stays away from <netinet/in.h> and <arpa/inet.h>. ๐Ÿ‘

A reflection of questions arises after following the code base.

In README, it is pointed that

The ZSync mechanism has the weakness that HTTP1.1 ranged requests are not always well supported by CDN providers and ISP proxies. When issues happen, they're very difficult to respond to correctly in software (if possible at all). Using HTTP 1.0 and fully completed GET requests would be better, if possible.

This is a very appetizing point as it is not unusual to face such issue from low-end hosting services for unbeknown reasons. Looking into DoReqeust of HTTPBlockSource where its header is composed, however, we can see the operation requests HTTP 1.1 and Range specifically. I thought there would be a fallback HTTP 1.0 measure to encounter a possible failure, but it was no avail.

I might not have a comprehensive understanding of how go-sync is built. If you could point me a direction where I should turn my head, it would be very much appreciated.

Thank you very much for this charming work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.