Giter Site home page Giter Site logo

repofactor's Introduction

Finding the causes of repository bloat

This project contains a bunch of tools to help analyse the largest blobs (by "on disk" storage) in a repository.

Here is a sample sequence of commands showing typical usage:

  • Typically start with a clean clone of the repository that you want to analyse. It can be bare. For reasonable performance it should be cloned onto "local" disk on a reasonably fast Linux machine.

  • Add these tools to your PATH or use a full path to each script or executable.

  • Run these tools from the repository undergoing analysis and cleaning.

  • Work out a suitable threshold size by running generate-larger-than with experimental parameters. 50000 might be a good starting point. The size is "average bytes after compression by Git".

  • Generate a sorted list of objects with file information

    generate-larger-than 50000 | sort -k3n | add-file-info >../largeobjs.txt

  • Make a report showing the summary of each commit together with the paths which introduce the large objects, their uncompressed size and file information

    report-on-large-objects ../largeobjs.txt

Filtering out large blobs

  • Create a temporary work directory and export RFWORK_DIR to point to this directory (defaults to the current directory).

  • Again, run all commands from the repository being analysed.

  • From the above report, edit down a list of blob ids that can be eliminated. Call this large-objects.txt.

  • Generate a remove script

    make-remove-blobs large-objects.txt >"$RFWORK_DIR"/remove-blobs.pl
    chmod +x "$RFWORK_DIR"/remove-blobs.pl
    
  • Optionally edit the remove script to filter out any paths that are not required at the same time

  • Run the filter branch

    run-filter-branch

  • Create a new "easy rebase" script for moving work-in-progess branches from the old history to the new history

    make-mtnh >"$RFWORK_DIR"/move-to-new-history

  • Push the rewritten refs and the rewrite-commit-map branch to all central repositories

  • Deploy move-to-new-history for users to use

repofactor's People

Contributors

hashpling avatar

Stargazers

Kessler D. avatar Lance Taylor avatar  avatar Trung Pham avatar Tugay Arslan avatar  avatar Brendan Bennett avatar STYLIANOS IORDANIS avatar Haneef avatar David Blanchard avatar Aymen Kharroubi avatar Filip Stefanov avatar Humberto avatar ddddd avatar Jose Celano avatar Jason Harrison avatar Harsh Kapadia avatar sam bacha avatar JBarbosa avatar Marcus Bowyer avatar  avatar Kevin Campbell avatar Dennis Skærup Højlund Andersen avatar Darin Egan avatar Victor Irzak avatar Brett Veenstra avatar Roman Shramko avatar Kurt De Greeff avatar Michal Laskowski avatar Ernst Salzmann avatar Denis Denisov avatar Vlad Voloshyn avatar Yaroslav Ravlinko avatar Daniel Martyn avatar Paul van Klaveren avatar Tony Fahrion avatar Kamen Naydenov avatar Bruno Kindt avatar Nicolae Vlădescu avatar  avatar Kurt Starsinic avatar  avatar Geoffrey Giesemann avatar Valerii Sorokobatko avatar Andres Aravena avatar Andre Gleichner avatar Ivan Prisyazhnyy avatar Michael Mendy avatar  avatar Jeffrey Macko avatar Nicolas Limare avatar Sean McClintock avatar Michael Maguire avatar Michael Edgar avatar Timothy Pettersen avatar Rory Bradford avatar Berkus Decker avatar Mike Seplowitz avatar Grant Hollingworth avatar Angelos Evripiotis avatar  avatar Andy Jordan avatar Pablo Aguiar avatar

Watchers

Mike Seplowitz avatar James Cloos avatar Angelos Evripiotis avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.