Giter Site home page Giter Site logo

Can you sync a directory? about go-sync HOT 15 OPEN

redundancy avatar redundancy commented on August 25, 2024
Can you sync a directory?

from go-sync.

Comments (15)

bwmarrin avatar bwmarrin commented on August 25, 2024 1

I'm excited for that moment too.

from go-sync.

Redundancy avatar Redundancy commented on August 25, 2024

So far, no.

The intent is more to provide a library for the algorithm for binary data than a final tool though, and I believe it would be relatively simple to create one if you added meta data for multiple files (names, relative locations, index data, file properties and whatever permissions you want to try and replicate).

from go-sync.

fire avatar fire commented on August 25, 2024

Currently looking at how tar implements its metadata for files.

http://golang.org/src/archive/tar/common.go?s=1401:2114#L36

I don't know if it's possible to directly use the tar format with 0 length files as the manifest. It seems like it's possible. Any idea if that's a good idea?

from go-sync.

Redundancy avatar Redundancy commented on August 25, 2024

I'd personally initially avoid a binary format, and go for json for this - it would make it easier to extend and update while taking advantage of the marshalling support in golang (just using commonsense principles based on REST). It could also allow easier debugging, and could always be served gzip encoded from almost any proper web server.

The properties of the block mechanisms in r/zsync (and therefore gosync) are such that you really aren't going to benefit from it if you're moving a lot of small files (compared to your block size), so in that sort of case you would probably be better off compressing them into an archive and sending that anyway. Given that, I'm less inclined to be initially too worried about the extra space that JSON would take, and you could always add a binary format later on.

Taking inspiration from an existing tried and tested source like archive formats seems like a good approach though, rather than reinventing the wheel. I'm not sure, for example, how the userid/groupid of a file is reconciled across two different systems.

from go-sync.

fire avatar fire commented on August 25, 2024

It appears that converting a go struct to a json file is possible. All the element of the tar Header are convertible to json and we can just use a json file of header structs.

From reading the specification, the tar format is several headers and contents concatenated together.

My plan is to walk through each folder from root, then pass the file object to FileInfoHeader() and convert into json. https://stackoverflow.com/questions/6608873/file-system-scanning-in-golang

From documentation: Because os.FileInfo's Name method returns only the base name of the file it describes, it may be necessary to modify the Name field of the returned header to provide the full path name of the file.

I want to code an example that can turn a tar archive and output a json document.


Referring to your question: tar just stores the username and group of the file. On extraction, it uses the username and group of the current user or as root write as the original user.

from go-sync.

fire avatar fire commented on August 25, 2024

I've written an example program that goes walks through a directory and outputs a json document of all the tar headers as json.

https://github.com/fire/gomanifest

from go-sync.

fire avatar fire commented on August 25, 2024

This json output is of a real directory. https://gist.github.com/fire/574760be7bd153f0ed5d

from go-sync.

fire avatar fire commented on August 25, 2024

So I generate manifests for both source and target directories and then for each element in target see if there's an element in source. If there is a difference, run go-sync on that difference else copy the target to your final.

Does this algorithm look reasonable?

Another addition would be to use xz format's integrated file integrity testing or the ability to concatenate the gosync files into one binary.

from go-sync.

Redundancy avatar Redundancy commented on August 25, 2024

Off the top of my head, MD5 or equivalent on a whole file should be pretty fast for an initial comparison. File length, modification date would be other potential indicators.

There are three cases -
It's up to date in the place you want it: great!
It's not there at all: need to copy the whole file (preferably compressed, probably with multiple tcp connections to reduce the effect of latency)
It's there, but doesn't match: (potentially) use gosync to update the contents.

At the moment, I would recommend either rsync or zsync for these things - they're tried and tested.
RSync is for pushing (source has access to target, maybe through SSH)
ZSync is for pulling (target has access to a source that is potentially just an http server)

In the longer term, if I spend a lot more time on it, the go sync command line tools could do these things too. At the moment, you're better off using tools that are thoroughly tested in production.

from go-sync.

fire avatar fire commented on August 25, 2024

Research suggests that cdns such as Maxcdn uses sftp rather than rsync for pushing to their network. The patch uploader could use https://godoc.org/github.com/pkg/sftp.

It is possible for the launcher to first use zsync and have an implementation of go-sync.

I would prefer a golang executable (with c++/c libraries) with the least amount of additional binaries.

from go-sync.

Redundancy avatar Redundancy commented on August 25, 2024

You have to be careful there - I have some experience with using FTP as a way of populating CDN origins with data files, but I think it would probably be a poor way for them to distribute the files around the world due to FTP generally using a single tcp connection and potentially suffering from the bandwidth latency product. Doing multi-part uploads was an order of magnitude difference in time on one project that I worked on, and some companies use WAN optimizers to speed up traffic.

This was one of the major reasons that go-sync is written to be able to use multiple simultaneous connections.

from go-sync.

fire avatar fire commented on August 25, 2024

Note I'm not using FTP, I'm using ssh's own protocol called SFTP. However, the performance impacts are unknown to me.

My experience with go libraries and ftp have been quite horrible.

The plan is that client use HTTPS to access the files.

from go-sync.

Redundancy avatar Redundancy commented on August 25, 2024

It's worth having a look at the latest changes (particularly noticeable in patch.go). Most of the changes shouldn't be breaking (or should be quick fixes), but could make things significantly easier to use the library at a high level.

from go-sync.

wanghaisheng avatar wanghaisheng commented on August 25, 2024

Can you sync a directory now?

from go-sync.

andreygursky avatar andreygursky commented on August 25, 2024

RSync is for pushing (source has access to target, maybe through SSH)
ZSync is for pulling (target has access to a source that is potentially just an http server)

rsync is universal: while the most common use is for backup (push), it's also used in production for synchronising local outdated iso images with updated remote ones (pull) (e.g., Debian CD/DVD weekly build ISOs).

from go-sync.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.