Giter Site home page Giter Site logo

rdedup's Introduction

rdedup - data deduplication with compression and public key encryption

Travis CI Build Status crates.io Gitter Chat

Introduction

Warning: beta quality software ahead

rdedup is a tool providing data deduplication with compression and public key encryption written in Rust programming language. The primary use case is storing deduplicated and encrypted backups.

My use case

I use rdup to create backup archive, and syncthing to duplicate my backups over a lot of systems. Some of them are more trusted (desktops with disk-level encryption, firewalls, stored in the vault etc.), and some not so much (semi-personal laptops, phones etc.)

As my backups tend to contain a lot of shared data (even backups taken on different systems), it makes perfect sense to deduplicate them.

However I don't want one of my hosts being physically or remotely compromised, give access to data inside all my backups from all my systems. Existing deduplication software like ddar or zbackup provide encryption, but only symmetrical (zbackup issue, ddar issue) which means you have to share the same key on all your hosts and one compromised system gives access to all your backup data.

To fill the missing piece in my master backup plan, I've decided to write it myself using my beloved Rust programming language.

How it works

rdedup works very much like zbackup and other deduplication software with a little twist:

  • Thanks to public key cryptography, secure passpharse is required only when restoring data, while adding and deduplicating new data does not.
  • Everything is synchronization friendly. Dropbox, Syncthing and similar should work fine for data synchronization.

When storing data, rdedup will split it into smaller pieces - chunks - using rolling sum, and store each chunk under unique id (sha256 digest) in a special format directory: repo. Then the whole backup will be described as index: a list of digests.

Index will be stored internally just like the data itself. Recursively, this reduces each backup to one unique digest, which is saved under given name.

When restoring data, rdedup will read the index, then restore the data, reading each chunk listed in it.

Thanks to rolling sum chunking scheme, when saving frequently similar data, a lot of common chunks will be reused, saving space.

What makes rdedup unique, is that every time new repo directory is created, a pair of keys (public and secret) is generated. Public key is saved in the storage directory in plain text, while secret key is encrypted with key derived from a passphrase.

Every time rdedup saves a new chunk file, it's data is encrypted using public key so it can only be decrypted using the corresponding secret key. This way new data can always be added, with full deduplication, while only restoring data requires providing the passphrase to unlock the private key.

Nice little detail: rdedup supports removing old names and no longer needed chunks (garbage collection) without passphrase. Only the data chunks are encrypted, making operations like garbage collection safe even on untrusted machines.

Technical Details

  • bup methods of splitting files into chunks is used
  • sha256 sum of chunk data is used as digest id
  • libsodium's sealed boxes are used for encryption/decryption:
    • ephemeral keys are used for sealing
    • chunk digest is used as nonce
  • private key is encrypted using libsodium crypto secretbox using random nonce, and key derived from passphrase using password hashing and random salt

Installation

If you have cargo installed:

cargo install rdedup

If not, I highly recommend installing rustup (think pip, npm for Rust, only better)

Usage

See rdedup -h for help.

Supported commands:

  • rdedup init - create a repo directory with keypair used for encryption.
  • rdedup ls - list all stored names.
  • rdedup store <name> - store data read from standard input under given name.
  • rdedup load <name> - load data stored under given name and write it on standard output
  • rdedup rm <name> - remove the given name. This by itself does not remove the data.
  • rdedup gc - remove any no longer reachable data

In combination with rdup this can be used to store and restore your backup like this:

rdup -x /dev/null "$HOME" | rdedup store home
rdedup load home | rdup-up "$HOME.restored"

rdedup's People

Contributors

aidanhs avatar dpc avatar nikolay avatar spikebike avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.