mikjo / bigitr Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 5.0 260 KB

Git/CVS bidirectional synchronization tool

License: Apache License 2.0

Python 99.34% Makefile 0.39% Shell 0.27%

bigitr's People

Contributors

Stargazers

Watchers

Forkers

pombreda sassoftware sasbcl aycs

bigitr's Issues

unittest in Python 2.6 does not implement assertIsNone

Change to use assertEquals so that the test suite can pass on Python 2.6

test_other_commands fails

The test_other_commands test has been succeeding on development machines but not on other machines.

bigitrd auto-update configuration from Git

If configuration for bigitr and bigitrd is kept in Git, it would be possible for bigitrd to check at the beginning of each sync cycle for updates, and to pull and reload configuration if there have been any changes, in order to automate applying updates.

getting current branch uses fragile porcelain

In general, scripts aren't expected to use porcelain, but because the porcelain implements much of the workflow bigitr automates, I have chosen to use porcelain anyway. However, finding the current branch is particularly egregious and there's a much better option.

child processes can be interrupted by signals

The SIGHUP and SIGTERM signals are meant to wait until the current job is finished processing, and then stop cleanly. However:

Traceback (most recent call last):
  File "/home/bigitr/bigitr/bigitr/__init__.py", line 70, in process
    self.do(repository, Git, requestedBranch=branch)
  File "/home/bigitr/bigitr/bigitr/__init__.py", line 99, in do
    if not self.newContent(Git):
  File "/home/bigitr/bigitr/bigitr/util.py", line 58, in wrapper
    ret = fn(self, *args, **kwargs)
  File "/home/bigitr/bigitr/bigitr/__init__.py", line 111, in newContent
    oldRefs = Git.refs()
  File "/home/bigitr/bigitr/bigitr/git.py", line 66, in refs
    'git', 'show-ref', '--head', error=False)
  File "/home/bigitr/bigitr/bigitr/shell.py", line 71, in read
    output = s.communicate()
  File "/usr/lib64/python2.6/subprocess.py", line 683, in communicate
    stdout = self.stdout.read()
IOError: [Errno 4] Interrupted system call

test_other_commands "poisoned" by previous test_lowlevel1 intentional error

test_lowlevel1 intentionally causes a RuntimeError: Not committing empty branch 'b1' from git branch 'master' to test that this case is caught. However, this error is preserved when caching TESTROOT data for subsequent tests, and then test_other_commands fails when it finds the old error in the logs.

test_lowlevel1 needs to clean up its logs before packing the cached root. Probably logs should not be propagated at all across tests.

Describe the testdata/TESTROOT files in CONTRIBUTING.md

Failing tests can cause cached TESTROOT files to perpetuate errors and cause false tests failures. At least document this quirk in CONTRIBUTING.md...

Do not delete files ignored in other system

Currently, by design, if CVS is synced to Git and a file is ignored by .gitignore, then if Git is synced back to CVS, that file is deleted—with one exception: the .cvsignore file itself.

At least for simple cases, it would make sense that if Bigitr has honored a .*ignore file when syncing in one direction, honoring it in the other direction would be to not delete it from the original source. In the CVS→Git→CVS case, that would leave files in CVS not deleted if they were left out of Git due to a .gitignore file, whereas right now they are deleted from CVS.

This would be a change in behavior, but since it would only be failing to delete a file, most likely not a fatal change in behavior.

timestamps in logs use month number as minute number

I recently saw a bigitr log file that showed incorrect timestamps by minutes. On a long-running process that failed, the timestamp for the failed process was off by 14 minutes (reported as 17:02 when it really was seen failing at 17:16), yet I have validated that the system time is correct so it is not a system time offset error.

file/directory conflict breaks CVS checkouts

If a perfectly reasonable Git commit happens to try to create a file where a directory exists, or a directory where a file exists, on any branch, the cvs commit command will fail and the CVS checkout will be left in an inconsistent state that will cause future CVS operations to fail.

bigitrd error threshold

When something goes wrong, bigitrd can send lots of error emails very quickly.

It would be convenient to be able to specify separate thresholds for repository errors (errors where repository owners get email) and bigitrd errors (where only the daemon administrator is mailed) which, if exceeded, either cause bigitrd to stop running or to pause for a configurable amount of time; in either case sending an email to the bigitrd administrator(s) announcing the fact.

Remove usage of --set-upstream from git branch

Git now complains:
The --set-upstream flag is deprecated and will be removed. Consider using --track or --set-upstream-to

longstory_test.py uses branch --set-upstream

command line for kill/reload currently-running bigitrd

Currently, I am stopping or reloading bigitrd by running commands like:
kill -SIGTERM $(cat ~/.bigitrd-pid)
kill -SIGHUP $(cat ~/.bigitrd-pid)

It would be more obvious to run something like:
bigitrd --stop
bigitrd --reload

Handle CVS directories checked into Git

Currently, if CVS directories are (incorrectly) checked into Git repositories, they overwrite the CVS directories in the CVS checkout used to export from Git out into CVS, which breaks the export process with occasionally obscure error messages from the CVS server (depending on the contents of the CVS files).

It would be possible to either exclude CVS directories from the export, or to raise a useful error with a message explaining the problem and stop the process.

CVS export should use -D

When the -D option is not used with CVS export, each file is considered sequentially, and so commits during the export may be partially reflected, causing the export to be inconsistent. This is more likely with larger repositories. Currently, -D now is used only for CVS trunk, but it should instead be used for all CVS exports.

Note that -D now is expanded client-side, so it is the best option rather than using an explicit date specification.

This works well only if the system running bigitr has its clock in sync with the CVS server.

newly-added root file failed to export from Git to CVS

Among multiple other changes, the root directory which had previously contained only subdirectories had a file added to it in Git. In exporting that to CVS, bigitr invoked cvs add with an empty argument, and then failed because cvs exited with an error code. The next synchronization run did not print a failure because it there were not additional changes in Git to trigger an export.

SIGHUP leaves old lock files around

When restarting itself, bigitr should pass a command line argument with the current name of the lockfile so that after the new lockfile is created, the old lockfile can be removed.

crlf auto-normalization can break export process

JGit (and thus EGit) currently does not honor eol specification in .gitattributes so if users are using EGit to commit, they will sometimes disagree with git about line termination for some files.

With crlf normalization enabled for the bigitr user, sometimes a checkout can get into a state where it is hard to merge; even git reset --hard HEAD leaves the checkout in a state where it cannot merge.

Even if the source repository is later fixed to have consistent line termination, the checkout managed by bigitr is "stuck" because it cannot do a fast-forward merge due to the "changes" in the local directory due to auto-normalization.

file→directory conversion breaks Git export to CVS

When a file is removed and replaced by a directory in Git, and the file previously existed in CVS, then bigitr first calls "cvs remove name" and then calls "cvs add name" which causes the error:

cvs server: the directory `name' cannot be added because a file of the
cvs [server aborted]: same name already exists in the repository.

enable cvs mainline synchronization

Currently, bigitr can sync only with CVS branches, because it always uses the -r argument to the CVS command line to choose the branch. However, in CVS the mainline is distinguished only by the lack of a branch argument and there is no reserved word with the semantics of "not on a branch". (Some people assume that HEAD has this meaning, but HEAD in CVS is actually completely different.)

In order to enable sync with CVS, bigitr needs to reserve a name to use in bigitr configuration to indicate CVS mainline.

MAIN seems like a good choice. It seems unlikely that any serious user of CVS would use "MAIN" to indicate a non-mainline branch, and even less likely that they would then want to use bigitr to synchronize it with Git.

file modes not preserved in some copy operations

CVS→Git preserves file modes because util.copyTree preserves modes. Git→CVS does not preserve file modes because util.copyFiles does not preserve modes. Both should preserve the file modes.