Giter Site home page Giter Site logo

lookbusy1344 / foldercompare Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 211 KB

Rust and C# apps to compare files in 2 directory trees, by name, or hash

License: MIT License

C# 51.04% Rust 46.38% Batchfile 0.48% PowerShell 2.09%
cli comparison csharp rust

foldercompare's Introduction

Folder Compare

Building Rust app CodeQL

A small CLI project to compare files in 2 directory trees, by name, or hash.

This project is implemented in two languages, Rust and C#. It was written to compare the performance of the two languages, and to explore their comparative ergonomics.

The two implementations are in /CSharp implemented in .NET 8 C#, and /Rust in Rust 1.74.1 (December 2023). Both have almost identical behaviour. See below for benchmarks.

Both were developed on Windows, but should work on Linux and Mac.

Building

Build the C# version use dotnet publish -c Release or the supplied Publish.cmd (Or Visual Studio 2022, which is free)

Build the Rust version with cargo build -r

Details

The program walks the first folder tree (given by -a) and records all filenames, sizes and optionally hashes. It does the same with the second folder tree (given by -b). Files are compared (using -c) and differences listed. Comparison can be via:

Comparison Description
--comparison Name Filename only (default, fast)
--comparison NameSize Filename and file size (fast)
--comparison Hash SHA2 hash, disregarding filenames (slow)

Comparison by name only checks the filename itself, not the path. Eg a/b/file.txt and d/e/file.txt will be considered the same file.

Usage

folder_compare.exe -a <folder> -b <folder> [-c <comparison>] [-r] [-f]

Eg:

folder_compare.exe -a ./target/debug -b ./target/release -c hash

MANDATORY PARAMETERS:

    -a, --foldera                First folder to compare
    -b, --folderb                Second folder to compare

OPTIONS:

    -c, --comparison [value]     Comparison to use (Name, NameSize or Hash). Default is Name
    -r, --raw                    Raw output, for piping
    -f, --first-only             Only show files in folder A missing from folder B (default is both)
    -o, --one-thread             Only use one thread (don't scan the two folders in parallel)
    -h, --help                   Help

Hashing uses SHA256 and is obviously much slower than just comparing on name and/or size.

Implementation notes

Implementing pluggable comparers (name / name & size / hash) is more difficult in Rust than in C#. C# allows different implementations of IEqualityComparer<FileData>.

In Rust you have to use 'unit structs' to mark the different comparisons, and then implemented Eq, PartialEq and Hash traits on FileData<..marker struct..> for each comparison technique.

This does mean that FileData<a> isn't type compatible with FileData<b>, which is an ugly side effect. An implementation of HashSet that took lambdas for hashing and comparison would be useful here!

Benchmarks

Benchmarks from Hyperfine, run on a wheezy old laptop. Code from ver 1.0.6 (8a7eb6c2b949615ec77). The C# version is compiled to a native binary, to improve startup speed. All times in milliseconds (lower is better). Test folders have 800-1200 file differences.

Benchmark Rust single-thread C# single-thread Difference Rust parallel C# parallel Difference
Comparing by name 61 65 x1.03 (dead-heat) 49 70 x1.4
Second run 65 65 50 71
Comparing by hash 1808 1994 x1.1 1253 1390 x1.12
Second run 1801 1974 1246 1410

Hashing is obviously more expensive than comparison by filename. The parallel code is around 30% faster than single-threaded (a maximum of 2 threads are used, and only for the folder enumeration and hashing).

The C# code performs suprisingly well, only 112% of Rust speed for the heavier workload of hashing. This is impressive given Rust's higher cognative load.

C# Publishing

Publish.cmd is provided to simplify publishing. NativeAOT compilation is used, to build a large but comparatively fast native binary. It contains just:

dotnet publish FolderCompare.csproj -r win-x64 -c Release

Testing scripts

Tests are written in Powershell, so they can be used for both implementations.

PS > cd .\Testing
PS > .\TestCSharp.ps1
PS > .\TestRust.ps1

foldercompare's People

Contributors

lookbusy1344 avatar johndoe1995 avatar

Watchers

 avatar  avatar

Forkers

wangwei90

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.