Giter Site home page Giter Site logo

syncer's Introduction

syncer

WARNING: This is highly experimental and will probably eat your data. Make sure you have good backups before you test it.

Build Status Crates.io

This is a filesystem that allows you to keep a seamless local view of a very large repository of files while only really having a much smaller local cache. It's meant for situations where you have too small of a disk to hold the full collection but instead would like to fetch data from a remote server on demand. The use case it was built for was having a very large collection of media (e.g., a multi-terabyte photo collection) and wanting to be able to seamlessly access it at any time on a laptop that only has a few GBs of space.

syncer is built as a FUSE filesystem so it presents a regular POSIX interface that any app should be able to use. Files are internally split into blocks and hashed. Those blocks get uploaded to any rsync end point you want (usually an SSH server). Then when the local storage exceeds the limited amount the least recently used blocks get evicted. They get brought back into local storage on demand by fetching them from the remote server again.

Current State

The basic program works and syncs to a remote rsync/ssh server. This should be enough for a photo collection which is mostly a set of fixed files that don't get changed a lot. But this is still highly experimental and might eat your data. The basic existing features are:

  • The standard POSIX filesystem works and persists to disk (tested on Linux and OSX)
  • Pushing to the remote server and pulling on demand works as well
  • Speed is quite similar to direct to disk with more CPU usage (see the Performance section)

Still on the TODO list:

  • Stress test and build a repeatable testing set for all POSIX operations
  • Tune for performance more thoroughly
  • Implement a better sync endpoint than just rsync/ssh as setting up those connections repeatedly is very time consuming. A simple daemon to send/receive blocks that maybe even allows multi-server failover and redundancy would be nice. Or maybe something like the S3 protocol would fit.
  • Allow marking certain files/directories as allways available locally so you can set it on the thumbnail dir of a photo application and get fast browsing at all times
  • Expose a Time Machine like interface showing read-only snapshots of the filesytem (already present in the data but not exposed)
  • Figure out a good way to evict old data (currently all history is kept)

Performance

Proper benchmarking is still needed but the current state should be good enough for most uses:

  • A simple write benchmark (15GB rsync from a local folder) showed that syncer is reasonably competitive to normal disk writing. Syncer got 49MB/s and the equivalent rsync directly to disk got 54MB/s. CPU usage was higher but not worriyingly so. That's to be expected as syncer is hashing all the blocks with Blake2 (which is very fast but not irrelevant).
  • Syncer has 16 parallel threads and fine grained locks which allows concurrent usage of multiple files/directories without issue.
  • Fetching/sending from/to the server is dependent on your specific network characteristics. But since blobs smaller than 64kB are never evicted from local cache, reading metadata (listing directories and accessing file properties) tends to be quite fast and small files will also all be local. Large files that are not local have bearable performance as long as your network is good since data blocks are 1MB.

Reports of specific use cases that are too slow are more than welcome.

Usage

To install or upgrade just do:

$ cargo install -f syncer

To start the filesystem do something like:

$ syncer init source someserver:~/blobs/ 1000
$ syncer mount source mnt

That will give you a filesystem at mnt that you can use normally. The data for it comes from the data folder locally and the server. At most syncer will try to use 1GB locally and then fetch from server when needed.

Contributing

Bug reports and pull requests welcome at https://github.com/pedrocr/syncer

Meet us at #chimper on irc.libera.chat if you need to discuss a feature or issue in detail or even just for general chat. To just start chatting go to https://web.libera.chat/#chimper

syncer's People

Contributors

pedrocr avatar runfalk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

syncer's Issues

SOLVED: error: failed to run custom build command for `fuse v0.3.1`

I'm not a Rustc or Cargo native, so not sure how easy this is to fix but will look into it.

Building a new clone of master on Ubuntu 19.04 using Rustc v1.44.0, Cargo v1.44.0:

cargo update && cargo build

Fails at this point:

   [snip]
   Compiling fuse v0.3.1
   Compiling libsqlite3-sys v0.9.3
error: failed to run custom build command for `fuse v0.3.1`

Caused by:
  process didn't exit successfully: `/home/mrh/src/fuse/syncer/target/debug/build/fuse-e19dbd79cd2cb2b4/build-script-build` (exit code: 101)
--- stderr
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Failure { command: "\"pkg-config\" \"--libs\" \"--cflags\" \"fuse\" \"fuse >= 2.6.0\"", output: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "Package fuse was not found in the pkg-config search path.\nPerhaps you should add the directory containing `fuse.pc\'\nto the PKG_CONFIG_PATH environment variable\nNo package \'fuse\' found\nPackage fuse was not found in the pkg-config search path.\nPerhaps you should add the directory containing `fuse.pc\'\nto the PKG_CONFIG_PATH environment variable\nNo package \'fuse\' found\n" } }', /home/mrh/.cargo/registry/src/github.com-1ecc6299db9ec823/fuse-0.3.1/build.rs:10:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Morph into a network filesystem

I wonder how painful it would be to turn syncer into a network filesystem, ie to enable multiple clients to mount the same filesystem?

This would require coordination and locking, and a host of other details, but it sounds like syncer could be a nice basis for such a system.

'cargo install -f syncer' fails with 'conflicting implementations of trait `std::convert::From<&_>` ...'

Both

cargo install -f syncer

and

cargo install -f --debug syncer 

Give the following error but cargo update && cargo build and cargo build --release both succeed.

  Compiling blake2 v0.7.1
   Compiling rusqlite v0.13.0
error[E0119]: conflicting implementations of trait `std::convert::From<&_>` for type `types::to_sql::ToSqlOutput<'_>`:
  --> /home/mrh/.cargo/registry/src/github.com-1ecc6299db9ec823/rusqlite-0.13.0/src/types/to_sql.rs:26:1
   |
18 | / impl<'a, T: ?Sized> From<&'a T> for ToSqlOutput<'a>
19 | |     where &'a T: Into<ValueRef<'a>>
20 | | {
21 | |     fn from(t: &'a T) -> Self {
22 | |         ToSqlOutput::Borrowed(t.into())
23 | |     }
24 | | }
   | |_- first implementation here
25 | 
26 |   impl<'a, T: Into<Value>> From<T> for ToSqlOutput<'a> {
   |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation for `types::to_sql::ToSqlOutput<'_>`
   |
   = note: downstream crates may implement trait `std::convert::From<&_>` for type `types::value::Value`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0119`.
error: could not compile `rusqlite`.

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: failed to compile `syncer v0.5.1`, intermediate artifacts can be found at `/tmp/cargo-installmEHbMA`

Caused by:
  build failed

Integration with a massive decentralised storage system also written in Rust

Hi Pedro,

I have to say hello because I wish I'd seen this project two years ago when I began creating a FUSE based fs for the early SAFE Network APIs. That was called SAFE Drive and is paused until the latest APIs are updated, which is right about now, and today I saw syncer in my github feed.

Wow. The fit looks perfect, so I'd like to know if you are still interested in this project and maybe taking it into new areas. My crude but mostly working SAFE Drive was written in JS, but the SAFE Network core is all written in Rust, and the API fits well with rsync so I'm interested in getting syncer to work with SAFE.

The prospect of making all your SAFE storage available locally at local disk speeds is mouth watering. Just saying hello at this point, and I'm busy with other projects, but may drop those to look into this as I've been waiting on the SAFE APIs.

You can also find me via this post about Syncer on the SAFE forum.

All the best,

Mark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.