Giter Site home page Giter Site logo

tubbz-alt / git_conv Goto Github PK

View Code? Open in Web Editor NEW

This project forked from freebsd/git_conv

0.0 1.0 0.0 1.13 MB

Scripts used for the svn to git conversion of the FreeBSD repositories

Dockerfile 0.06% QMake 0.23% C++ 36.74% Shell 15.54% Perl 25.18% Makefile 0.26% C 20.42% Yacc 0.75% Lex 0.83%

git_conv's Introduction

Subversion to Git conversion scripts for the FreeBSD project

These are the scripts and data used for the svn to git conversion of the FreeBSD repositories. Your help is requested in making sure the produced result is an acceptable representation of the current SVN repository. Fixing some few historical glitches are in scope if the effort is reasonable. Please let us know what you find.

Gimme the repo!

git clone https://cgit-beta.freebsd.org/src.git && cd src
git config --add remote.origin.fetch '+refs/notes/*:refs/notes/*' && git fetch

Same for the doc and ports repos. You can expect some things to be different. Most importantly you should check whether branch and mergepoints (especially for vendor branches, but also project branches) are there and make sense.

Neither vendor, user or projects branches are "visible" by default. backups refs are deleted branches and cvs2svn contains some of the detritus left over from the CVS days and the cvs2svn conversion. The internal namespace for now only has the access and mentors file, detailing when people got their various commit bits.

git config --add remote.origin.fetch '+refs/vendor/*:refs/vendor/*'
git config --add remote.origin.fetch '+refs/projects/*:refs/projects/*'
git config --add remote.origin.fetch '+refs/user/*:refs/user/*'
git config --add remote.origin.fetch '+refs/backups/*:refs/backups/*'
git config --add remote.origin.fetch '+refs/cvs2svn/*:refs/cvs2svn/*'
git config --add remote.origin.fetch '+refs/internal/*:refs/internal/*'
git fetch

Note that internal, projects and user branches also exist for the doc repo and ports has projects and releng as well. vendor, cvs2svn and backups are exclusive to the src repo though.

  • user/ branches are never merged back into master
  • MFHs into user/ or projects/ branches are just cherry-picks to keep git log --graph somewhat readable and as merges wouldn't convey any useful information, really.
  • vendor tags were never flattened post-creation, as that would advance them off of the mainline branch and make them invisible to a simple git log
  • release/1.0_cvs et al. are snapshots of the checked out CVS source code, including expanded $Id$ tags.
  • no other keywords are expanded
  • various vendor-foo suffixes have been collapsed into 1 vendor namespace, except for a few vendors where merging the userland and kernel bits is not straightforward due to how they interleave with the merge and branch history.
  • some branches have their history "extended", that is, commits under the cvs2svn area were properly attached.
  • ... and most of these commits have actually been inlined directly into the mainline tree to keep the history more "linear" and associate the commit with the original author and commit message.

How to analyze the results

Here are some tips that were deemed useful in making sense of resulting history. Please add a "graph notes log" alias to your .gitconfig:

[alias]
  gnlog = log --graph --pretty=format:'%Cred%h %C(green)%t %Creset %C(red)%ad %Creset-%C(yellow)%d%Creset %s %n      %N %-GG' --date=short

Show the full tree of a vendor area

git gnlog `git show-ref|grep vendor/sendmail|cut -d" " -f1`

Find a certain SVN revision on master

git show -p `git log --format=%h --notes --grep=revision=294706$`

Show how/where/when a vendor branch was merged into master over time

git gnlog vendor/zstd/dist master

(but you'll need to search in the massive output for where the vendor branch is being merged, if you know a better way to represent this, please let us know!)

Look for commits with more than 5 parents and log them

git log --format='%H %P' --all|awk '{if (NF > 5) { print NF " " $0}}'|sort -rn|cut -d" " -f2|xargs -n1 -I% git snlog -n 1 %

If you want to run the conversion yourself and play with the rules, read on.

Required Software

pkg install qt5-qmake qt5-core subversion git shlock

(for Debian: apt install qt5-qmake qtbase5-dev libapr1-dev libsvn-dev subversion git shlock)

A patched copy of svn2git aka. svn-all-fast-export has been added to this repo using: git subtree add --squash --prefix svn2git https://github.com/freebsd/svn2git master

A copy of parsecvs has been added using: git subtree add --squash --prefix parsecvs https://github.com/BartMassey/parsecvs master

cd svn2git && qmake && make && cd ..
cd parsecvs && make && cd ..

Setup

You'll need to download a seed of the SVN repository dump (it takes weeks to bootstrap otherwise). NOTE: Depending on the mirror nearest to you, the metadata of this mirrors is actually off by several seconds for several of the commits and is also sometimes missing the author. Furthermore, using the one-true SVN repo as the source will drop all forced commits. Don't ask me why this happens.

fetch https://download.freebsd.org/ftp/development/subversion/svnmirror-base-r358354.tar.xz
fetch https://download.freebsd.org/ftp/development/subversion/svnmirror-ports-r527184.tar.xz
fetch https://download.freebsd.org/ftp/development/subversion/svnmirror-doc-r53937.tar.xz
tar xf svnmirror-base-r358354.tar.xz
tar xf svnmirror-doc-r53937.tar.xz
tar xf svnmirror-ports-r527184.tar.xz

Conversion runs

  • (optional:) fetch the current SVN state using ./svnmir.sh -1
  • convert either base, doc, or ports using ./git_conv { base | doc | ports }

Additional runs of svnmir.sh or git_conv will run incrementally.

NOTE: This is not longer true, while it works fine for git version 2.24 and 2.25, at least version 2.27 will cause git-fast-import upon reading the marks file to eat up all RAM and eventually crash with out of memory.

On on moderately fast system with an SSD and/or enough RAM for the buffer cache, this should take about 2h to finish for src and will produce about 10GiB of intermediate data. The final src repo size should be around 1.7GiB.

Re-runs will re-use the previous packfiles and will likely be faster. Currently src finishes in about 1 hour, doc in 2 minutes and ports in 45 minutes.

What you get

  • src will have: master, stable/N, releng/N, release/N.M
  • doc will have: master, release/N.M
  • ports will have: master, branches/YYYYQx

In the future, project branches will be individual forks of the repos. The vendor area is not visible by default, as it's only relevant for maintainers of contrib software.

Further information and documentation can be found on the Wiki sites at https://github.com/freebsd/git_conv/wiki

For the curious

blaming the whole project takes about a day of wall clock time using this naive approach:

git ls-tree -r --name-only -z HEAD|xargs -0n1 git blame -e --line-porcelain | sed -n 's/^author-mail //p' | sort -f | uniq -ic | sort -nr

It only blames the current state of the repo, not all of history, mind you. After several hairy fixes, this could be reduced from more than 6000 lines down to just 182 lines that are now falsely attributed to author "cvs2svn". The files are gnu/usr.bin/grep/AUTHORS and a few under usr.sbin/ppp.

License

The included svn2git is GPLv3; see svn2git/LICENSE. The included parsecvs is GPLv2; see parsecvs/COPYING. The scripts and configuration files are licensed under CC0.

git_conv's People

Contributors

bdragon28 avatar bsdimp avatar dspinellis avatar emaste avatar glebius avatar uqs avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.