Giter Site home page Giter Site logo

src-d / borges Goto Github PK

View Code? Open in Web Editor NEW
52.0 52.0 20.0 65.47 MB

borges collects and stores Git repositories.

Home Page: https://docs.sourced.tech/borges/

License: GNU General Public License v3.0

Go 99.13% Makefile 0.28% Dockerfile 0.20% Shell 0.39%
git git-archive

borges's People

Contributors

ajnavarro avatar alcortesm avatar bzz avatar carlosms avatar erizocosmico avatar jfontan avatar kuba-- avatar mcarmonaa avatar mcuadros avatar realdoug avatar smacker avatar smola avatar tsolakoua avatar vmarkovtsev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

borges's Issues

Clarify how to build

As a developer of borges I want to know how to build the project.

Right now, the README.md mentions you should run make packages and you will find your binary at borges_linux_amd64/borges and borges_darwin_amd64/borges.

While that is true, a bin/borges directory/file is also created. so we should clarify that.

Also it is important to run rm Makefile.main; rm -rf .ci before calling make, otherwise we will not be calling to an updated version of the Makefile.main file.

mark one repository as main in for each root group

Initial heuristic will be:

  • Check is_fork (when it is available) src-d/rovers#144
  • The one with more PRs.
  • The one with more commits.
  • If there is a tie on everything, pick the lowest ULID.

This will be implemented in borges:

  • add is_fork to the database #122
  • mark is_fork in remotes in rooted repository

create producer that handles both new repositories and updates

A policy will be needed to schedule:

  • New repositories from rovers.
  • Update of repositories based on commit frequency
  • Update based on repository status (pending or failed)
  • Requeue buried jobs
  • Add better logging (to see what's beeing fetch on update)
  • Etc?

running borges on a repository twice fails: verify that update works

In case when borges consumer is run for the 2nd time with same repos, after they have already been fetched before (and so .siva files exist) - all consumers fail on all the repos in exactly the same way as in #52

Steps to reproduce

#rabbitMQ and Postgres running
echo "git://github.com/bootstrapworld/wescheme-compiler2012.git" > 1-java.txt
borges producer --source=file --file ./1-java.txt
borges consumer --workers=1

# wait to finish

borges producer --source=file --file ./1-java.txt
borges consumer --workers=1

Actual result:

DBUG[06-16|12:00:51] job started                              module=borges WorkerID=0 RepositoryID=015cac74-7608-2e54-557f-22629fde8566
EROR[06-16|12:00:53] job errored                              module=borges WorkerID=0 RepositoryID=015cac74-7608-2e54-557f-22629fde8566 error="object not found"

Expected result:

Consumer: failing jobs immediately re-scheduled forever

As a developer running Borges consumer, I would like to prevent infinite loop of rescheduling for failing jobs.

Right now, if input queue only has 2 jobs, both of which are not processable - consumer will immediately reschedule them forever, keeping CPU buys.

It is also not possible to just remove those from the queue - it seems that they got put back to the queue right after a failing attempt to process them, forever.

How to reproduce:

echo 'git://github.com:damoeb/kalipo.git
git://github.com/CyanogenMod/android_hardware_qcom_fm.git' > 2.txt

borges producer --file 2.txt
borges consumer --workers=2

Full log below

DBUG[06-29|22:45:57] job started                              module=borges WorkerID=1 RepositoryID=015cf599-df58-597a-3513-5c496ef77f94 caller=consumer.go:44
DBUG[06-29|22:45:57] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:57] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:57] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:57] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769157287452998 caller=archiver.go:109
DBUG[06-29|22:45:57] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
DBUG[06-29|22:45:57] repository model obtained                job=015cf599-df58-597a-3513-5c496ef77f94 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:57] endpoint selected                        job=015cf599-df58-597a-3513-5c496ef77f94 endpoint=git://github.com/CyanogenMod/android_hardware_qcom_fm.git caller=archiver.go:100
DBUG[06-29|22:45:57] local temporary directory created        job=015cf599-df58-597a-3513-5c496ef77f94 temp-path=local_repos/015cf599-df58-597a-3513-5c496ef77f94/1498769157312643440 caller=archiver.go:109
DBUG[06-29|22:45:57] local repository created                 job=015cf599-df58-597a-3513-5c496ef77f94 caller=archiver.go:116
EROR[06-29|22:45:57] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:57] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:57] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:57] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:57] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:57] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:57] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769157592851853 caller=archiver.go:109
DBUG[06-29|22:45:57] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:57] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:57] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:57] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:57] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:57] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:57] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:57] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769157853411811 caller=archiver.go:109
DBUG[06-29|22:45:57] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:58] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:58] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:58] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:58] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:58] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:58] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:58] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769158118079322 caller=archiver.go:109
DBUG[06-29|22:45:58] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:58] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:58] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:58] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:58] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:58] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:58] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:58] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769158405943242 caller=archiver.go:109
DBUG[06-29|22:45:58] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:58] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:58] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:58] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:58] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:58] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:58] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:58] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769158653526154 caller=archiver.go:109
DBUG[06-29|22:45:58] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:58] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:58] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:58] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:58] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:58] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:58] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:58] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769158971635041 caller=archiver.go:109
DBUG[06-29|22:45:58] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:59] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:59] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:59] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:59] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:59] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:59] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:59] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769159215042159 caller=archiver.go:109
DBUG[06-29|22:45:59] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:59] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:59] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:59] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:59] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:59] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:59] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:59] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769159463250685 caller=archiver.go:109
DBUG[06-29|22:45:59] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:59] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:59] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:59] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:59] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:59] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:59] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:59] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769159708196999 caller=archiver.go:109
DBUG[06-29|22:45:59] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:45:59] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:45:59] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:45:59] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:45:59] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:45:59] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:45:59] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:45:59] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769159968231309 caller=archiver.go:109
DBUG[06-29|22:45:59] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:46:00] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:46:00] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:46:00] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:46:00] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:46:00] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:46:00] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:46:00] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769160228889781 caller=archiver.go:109
DBUG[06-29|22:46:00] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:46:00] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:46:00] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:46:00] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:46:00] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:46:00] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:46:00] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:46:00] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769160483977602 caller=archiver.go:109
DBUG[06-29|22:46:00] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:46:00] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:46:00] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:46:00] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:46:00] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:46:00] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:46:00] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:46:00] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769160823397140 caller=archiver.go:109
DBUG[06-29|22:46:00] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
EROR[06-29|22:46:01] error fetching repository                job=015cf599-df42-721c-ecdc-bb521f41eb96 error="repository not found" caller=archiver.go:134
EROR[06-29|22:46:01] job errored                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 error="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=consumer.go:49
EROR[06-29|22:46:01] error on job                             module=worker id=0 err="fetching git://github.com:damoeb/kalipo.git failed: repository not found" caller=worker.go:46
DBUG[06-29|22:46:01] job started                              module=borges WorkerID=0 RepositoryID=015cf599-df42-721c-ecdc-bb521f41eb96 caller=consumer.go:44
DBUG[06-29|22:46:01] repository model obtained                job=015cf599-df42-721c-ecdc-bb521f41eb96 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|22:46:01] endpoint selected                        job=015cf599-df42-721c-ecdc-bb521f41eb96 endpoint=git://github.com:damoeb/kalipo.git caller=archiver.go:100
DBUG[06-29|22:46:01] local temporary directory created        job=015cf599-df42-721c-ecdc-bb521f41eb96 temp-path=local_repos/015cf599-df42-721c-ecdc-bb521f41eb96/1498769161145569590 caller=archiver.go:109
DBUG[06-29|22:46:01] local repository created                 job=015cf599-df42-721c-ecdc-bb521f41eb96 caller=archiver.go:116
DBUG[06-29|22:46:01] changes obtained                         job=015cf599-df58-597a-3513-5c496ef77f94 roots=1 caller=archiver.go:143
EROR[06-29|22:46:01] job errored                              module=borges WorkerID=1 RepositoryID=015cf599-df58-597a-3513-5c496ef77f94 error="reference not found" caller=consumer.go:49
EROR[06-29|22:46:01] error on job                             module=worker id=1 err="reference not found" caller=worker.go:46

Actual result:

  • 2 jobs fail and get re-scheduled forever in infinite loop, Workers using CPU
  • can not remove them from Rabbit queue
  • can not get empty queue

Expected result:

  • 2 jobs fail (1st due to malformed URL, 2nd due to #72 ) and workers become idle (0%CPU)
  • go to http://localhost:8081 and take 2 items from the queue (\wo requeue them)
  • get empty queue

Consumer: 2 workers doing same job

When there are 20 repos and 20 workers, borges consumer seems to schedule same job (for a very slow repository, by occasion) on 2 different nodes.

DBUG[06-30|09:13:56] job started                              WorkerID=16 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443
DBUG[06-30|09:13:56] job started                              WorkerID=14 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443
........
DBUG[06-30|09:56:53] repository processed                     job=015cf846-fc76-7817-beaf-ca4784ad9443 caller=archiver.go:148
INFO[06-30|09:56:53] job done                                 WorkerID=14 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443
DBUG[06-30|09:57:09] repository processed                     job=015cf846-fc76-7817-beaf-ca4784ad9443 caller=archiver.go:148
INFO[06-30|09:57:09] job done                                 WorkerID=16 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443 

Full log of these 2 jobs below

DBUG[06-30|09:13:56] job started                              module=borges WorkerID=16 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443 caller=consumer.go:44
DBUG[06-30|09:13:56] job started                              module=borges WorkerID=14 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443 caller=consumer.go:44

....
DBUG[06-30|09:13:56] repository model obtained                job=015cf846-fc76-7817-beaf-ca4784ad9443 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-30|09:13:56] endpoint selected                        job=015cf846-fc76-7817-beaf-ca4784ad9443 endpoint=git://github.com/upsilonproject/upsilon-common.git caller=
archiver.go:100
DBUG[06-30|09:13:56] local temporary directory created        job=015cf846-fc76-7817-beaf-ca4784ad9443 temp-path=local_repos/015cf846-fc76-7817-beaf-ca4784ad9443/14988140
36250934306 caller=archiver.go:109
...

DBUG[06-30|09:13:56] local repository created                 job=015cf846-fc76-7817-beaf-ca4784ad9443 caller=archiver.go:116

DBUG[06-30|09:13:56] repository model obtained                job=015cf846-fc76-7817-beaf-ca4784ad9443 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-30|09:13:56] endpoint selected                        job=015cf846-fc76-7817-beaf-ca4784ad9443 endpoint=git://github.com/upsilonproject/upsilon-common.git caller=
archiver.go:100
DBUG[06-30|09:13:56] local temporary directory created        job=015cf846-fc76-7817-beaf-ca4784ad9443 temp-path=local_repos/015cf846-fc76-7817-beaf-ca4784ad9443/14988140
36252872401 caller=archiver.go:109

DBUG[06-30|09:13:56] local repository created                 job=015cf846-fc76-7817-beaf-ca4784ad9443 caller=archiver.go:116
.....

DBUG[06-30|09:14:50] changes obtained                         job=015cf846-fc76-7817-beaf-ca4784ad9443 roots=1 caller=archiver.go:143
DBUG[06-30|09:15:06] changes obtained                         job=015cf846-fc76-7817-beaf-ca4784ad9443 roots=1 caller=archiver.go:143
..........

DBUG[06-30|09:56:53] repository processed                     job=015cf846-fc76-7817-beaf-ca4784ad9443 caller=archiver.go:148
INFO[06-30|09:56:53] job done                                 module=borges WorkerID=14 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443 caller=consumer.go:51
DBUG[06-30|09:57:09] repository processed                     job=015cf846-fc76-7817-beaf-ca4784ad9443 caller=archiver.go:148
INFO[06-30|09:57:09] job done                                 module=borges WorkerID=16 RepositoryID=015cf846-fc76-7817-beaf-ca4784ad9443 caller=consumer.go:51

Steps to reproduce

# checkout https://github.com/src-d/borges/pull/69
glide install
make packages
./bin/borges_linux_amd64/borges producer --source=file --file /work/top20repos.txt
time ./bin/borges_linux_amd64/borges consumer --workers=20

top20repos.txt in details

git://github.com/stevschmid/track-o-bot.git
git://github.com/rr-/szurubooru.git
git://github.com/walterbender/physics.git
git://github.com/fedora-infra/kitchen.git
git://github.com/ictofnwi/steep.git
git://github.com/jimbrooke/InvisibleHiggs.git
git://github.com/kr2/Hugo.git
git://github.com/seanfisk/dotfiles.git
git://github.com/magnuskiro/master.git
git://github.com/upsilonproject/upsilon-common.git
git://github.com/damoeb/kalipo.git
git://github.com/erelsgl/limdu.git
git://github.com/upsilonproject/upsilon-common.git
git://github.com/haerfest/toy-programs.git
git://github.com/katherineng/terrace.git
git://github.com/bootstrapworld/wescheme-compiler2012.git
git://github.com/Citytechinc/cq-component-maven-plugin.git
git://github.com/LucidTechnics/Airlift.git
git://github.com/EventStore/EventStore.JVM.git
git://github.com/Zucka/girafAdmin.git

Calculated roots is always 0

DBUG[06-28|17:18:53] endpoint selected                        job=015cd435-fcc4-ce6c-e0c6-117250210f24 endpoint=git://github.com/Citytechinc/cq-component-maven-plugin.git caller=archiver.go:100
DBUG[06-28|17:18:53] local temporary directory created        job=015cd435-fcc4-ce6c-e0c6-117250210f24 temp-path=local_repos/015cd435-fcc4-ce6c-e0c6-117250210f24/1498663133963769638 caller=archiver.go:109
DBUG[06-28|17:18:53] local repository created                 job=015cd435-fcc4-ce6c-e0c6-117250210f24 caller=archiver.go:116
DBUG[06-28|17:19:00] changes obtained                         job=015cd435-fcc4-ce6c-e0c6-117250210f24 roots=0 caller=archiver.go:143

Roots are always 0, then no rooted repository is created never (that means no siva files).

Connecting to the database but not finding the tables should throw an error

As a runner of borges,
I want to know if the database is missing the necessary tables,
so that I can add them myself and try again running borges.

Right now borges just sits there silently doing nothing when it does not find the required tables in the database.

Alternatively Borges can add the tables itself instead of complaining that they are not there.

Consumer: can not process the repository `error="reference not found"`

Was running locally \w #70 and #69 merged

It seems

all fail with similar error

EROR[06-29|19:37:51] job errored                              module=borges WorkerID=1 RepositoryID=015cf4de-12fb-544d-8256-d12250ab8ac4 error="reference not found" caller=consumer.go:49
EROR[06-29|19:37:51] error on job                             module=worker id=1 err="reference not found" caller=worker.go:46
DBUG[06-29|19:37:51] job started                              module=borges WorkerID=1 RepositoryID=015cf4de-12fb-544d-8256-d12250ab8ac4 caller=consumer.go:44
DBUG[06-29|19:37:51] repository model obtained                job=015cf4de-12fb-544d-8256-d12250ab8ac4 status=pending last-fetch=nil references=0 caller=archiver.go:93
DBUG[06-29|19:37:51] endpoint selected                        job=015cf4de-12fb-544d-8256-d12250ab8ac4 endpoint=git://github.com/CyanogenMod/android_hardware_qcom_fm.git caller=archiver.go:100
DBUG[06-29|19:37:51] local temporary directory created        job=015cf4de-12fb-544d-8256-d12250ab8ac4 temp-path=local_repos/015cf4de-12fb-544d-8256-d12250ab8ac4/1498757871629244230 caller=archiver.go:109
DBUG[06-29|19:37:51] local repository created                 job=015cf4de-12fb-544d-8256-d12250ab8ac4 caller=archiver.go:116
DBUG[06-29|19:38:02] changes obtained                         job=015cf4de-12fb-544d-8256-d12250ab8ac4 roots=1 caller=archiver.go:143

For debugging - it would be nice not to get it infinity re-scheduled but rather keep it somewhere for a while (separate queue?). update this was moved to #73

changes: a reference can be from a non commit type object.

We need to add the logic to resolve a reference hash if it is pointing to a annotated tag. If it is pointing to an object that is neither a commit or a tag object, an error must be logged and continue the process with the next reference.

tests: if postgres is not running, tests segfault

With no Postgres running, there is a test that segfauls.
Tests or suites should check if postgres is running and fail earlier with a proper error.

โžœ  borges git:(7862e25) โœ— go test -v .                                           
=== RUN   TestArchiver
=== RUN   TestArchiver/TestFixtures
=== RUN   TestArchiver/TestFixtures/no_previous_references_and_no_updates
=== RUN   TestArchiver/TestFixtures/one_existing_reference_is_removed_(output_with_no_references)
=== RUN   TestArchiver/TestFixtures/one_existing_reference_is_removed_(output_with_references)
=== RUN   TestArchiver/TestFixtures/one_reference_changes_his_hash
=== RUN   TestArchiver/TestFixtures/all_references_are_new
=== RUN   TestArchiver/TestFixtures/all_references_are_deleted
=== RUN   TestArchiver/TestFixtures/all_references_are_up_to_date
=== RUN   TestArchiver/TestFixtures/all_reference_are_new_except_two_(up_to_date)
=== RUN   TestArchiver/TestFixtures/all_reference_are_new_except_one_(updated)
=== RUN   TestArchiver/TestFixtures/all_reference_are_new_except_one_(updated_with_new_init)
=== RUN   TestArchiver/TestFixtures/all_reference_are_new_except_one_(one_root_removed)
=== RUN   TestArchiver/TestNotExistingRepository
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xb4a7e2]

goroutine 58 [running]:
testing.tRunner.func1(0xc4203f31e0)
	/usr/local/go/src/testing/testing.go:622 +0x29d
panic(0xc4f8c0, 0x11f8980)
	/usr/local/go/src/runtime/panic.go:489 +0x2cf
github.com/src-d/borges.(*ArchiverSuite).TestNotExistingRepository(0xc4200202d0)
	/home/smola/dev/go/src/github.com/src-d/borges/archiver_test.go:163 +0x1e2
reflect.Value.call(0xc4204a4780, 0xc42049c1e0, 0x13, 0xd50ee6, 0x4, 0xc4203a3f80, 0x1, 0x1, 0xc4203a3ec0, 0xd4e180, ...)
	/usr/local/go/src/reflect/value.go:434 +0x91f
reflect.Value.Call(0xc4204a4780, 0xc42049c1e0, 0x13, 0xc4203a3f80, 0x1, 0x1, 0xbb50a3, 0x19, 0x0)
	/usr/local/go/src/reflect/value.go:302 +0xa4
github.com/src-d/borges/vendor/github.com/stretchr/testify/suite.Run.func2(0xc4203f31e0)
	/home/smola/dev/go/src/github.com/src-d/borges/vendor/github.com/stretchr/testify/suite/suite.go:102 +0x25f
testing.tRunner(0xc4203f31e0, 0xc4204a2500)
	/usr/local/go/src/testing/testing.go:657 +0x96
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:697 +0x2ca
exit status 2
FAIL	github.com/src-d/borges	0.063s

store repository URL in rooted repositories

URL which was used to fetch a repository should be present in rooted repositories. We can do this by leveraging git remotes config.

The result would be:

  • For each remote repository, there is a remote with the repository ULID as its remote name.
  • We would not create special names for the references like now, ULID would be just the remote name in reference names like refs/remotes/<ULID>/<branch>

Consumer: archiving 1 out of 2 roots failed

There were 3 non-fork repositories out of 178 that partially failed with archiving 1 out of 2 roots failed.

Full log in details

t=2017-06-30T15:17:07+0000 lvl=warn msg="job warning" module=borges WorkerID=6 RepositoryID=015cf947-2bcb-f0ca-b015-9e8313156282 error="push to rooted repo 136ccaaf984fca7727ff568c727ed172477e2ed0 failed: failed to update ref" caller=consumer.go:56
t=2017-06-30T14:50:02+0000 lvl=eror msg="error on job" module=worker id=3 err="fetching git://github.com/nocl/calculate-3-core.git failed: repository not found" caller=worker.go:46
t=2017-06-30T15:17:07+0000 lvl=eror msg="job errored" module=borges WorkerID=6 RepositoryID=015cf947-2bcb-f0ca-b015-9e8313156282 error="archiving 1 out of 2 roots failed: 136ccaaf984fca7727ff568c727ed172477e2ed0" caller=consumer.go:49
t=2017-06-30T15:17:07+0000 lvl=eror msg="error on job" module=worker id=6 err="archiving 1 out of 2 roots failed: 136ccaaf984fca7727ff568c727ed172477e2ed0" caller=worker.go:46

t=2017-06-30T15:27:09+0000 lvl=eror msg="job errored" module=borges WorkerID=12 RepositoryID=015cf947-2bc3-1004-373f-b680e2d64cdf error="archiving 1 out of 2 roots failed: ccdc498d08dceffcb799288c2be5dd5e67de54e8" caller=consumer.go:49
t=2017-06-30T15:27:09+0000 lvl=eror msg="error on job" module=worker id=12 err="archiving 1 out of 2 roots failed: ccdc498d08dceffcb799288c2be5dd5e67de54e8" caller=worker.go:46

t=2017-06-30T15:30:01+0000 lvl=eror msg="job errored" module=borges WorkerID=4 RepositoryID=015cf947-2bf6-8d53-ced4-81f86b34cfe7 error="archiving 1 out of 2 roots failed: 8d2859b2727e8f04026b4904ae287387e37d9dcb" caller=consumer.go:49
t=2017-06-30T15:30:01+0000 lvl=eror msg="error on job" module=worker id=4 err="archiving 1 out of 2 roots failed: 8d2859b2727e8f04026b4904ae287387e37d9dcb" caller=worker.go:46

Error archiving some of the roots in multiple repositories

t=2017-08-10T23:23:35+0000 lvl=eror msg="repository processed with errors" module=borges worker=1 job=015dcc48-290e-57f5-7b9d-2bb5e57854f5 endpoint=git://github.com/Martin9527/FFmpeg error="archiving 1 out of 1 roots failed: 77bb6835ba752bb9335d208963a53227bbb1bc63" caller=archiver.go:162
t=2017-08-10T23:23:35+0000 lvl=eror msg="job finished with error" module=borges worker=1 job=015dcc48-290e-57f5-7b9d-2bb5e57854f5 error="archiving 1 out of 1 roots failed: 77bb6835ba752bb9335d208963a53227bbb1bc63" caller=archiver.go:76
t=2017-08-10T23:23:35+0000 lvl=eror msg="error on job" module=borges worker=1 error="archiving 1 out of 1 roots failed: 77bb6835ba752bb9335d208963a53227bbb1bc63" caller=worker.go:50
t=2017-08-11T00:31:24+0000 lvl=eror msg="repository processed with errors" module=borges worker=9 job=015dcc48-2a24-276b-83cd-8885126832e2 endpoint=https://git.openstack.org/openstack/deb-nova error="archiving 3 out of 3 roots failed: bf6e6e718cdc7488e2da87b21e258ccc065fe499, fd2a015f8b37e6edba8c20d62a67ee68bc48c7e6, a5d70a6d729cc62dec86f371f4b1c00ce13226a4" caller=archiver.go:162
t=2017-08-11T00:31:24+0000 lvl=eror msg="job finished with error" module=borges worker=9 job=015dcc48-2a24-276b-83cd-8885126832e2 error="archiving 3 out of 3 roots failed: bf6e6e718cdc7488e2da87b21e258ccc065fe499, fd2a015f8b37e6edba8c20d62a67ee68bc48c7e6, a5d70a6d729cc62dec86f371f4b1c00ce13226a4" caller=archiver.go:76
t=2017-08-11T00:31:24+0000 lvl=eror msg="error on job" module=borges worker=9 error="archiving 3 out of 3 roots failed: bf6e6e718cdc7488e2da87b21e258ccc065fe499, fd2a015f8b37e6edba8c20d62a67ee68bc48c7e6, a5d70a6d729cc62dec86f371f4b1c00ce13226a4" caller=worker.go:50
t=2017-08-11T01:43:08+0000 lvl=eror msg="repository processed with errors" module=borges worker=5 job=015dcc48-2bf2-1873-04f5-232f160d65fa endpoint=git://github.com/jakedouglas/rails error="archiving 1 out of 1 roots failed: db045dbbf60b53dbe013ef25554fd013baf88134" caller=archiver.go:162
t=2017-08-11T01:43:08+0000 lvl=eror msg="job finished with error" module=borges worker=5 job=015dcc48-2bf2-1873-04f5-232f160d65fa error="archiving 1 out of 1 roots failed: db045dbbf60b53dbe013ef25554fd013baf88134" caller=archiver.go:76
t=2017-08-11T01:43:08+0000 lvl=eror msg="error on job" module=borges worker=5 error="archiving 1 out of 1 roots failed: db045dbbf60b53dbe013ef25554fd013baf88134" caller=worker.go:50
t=2017-08-11T06:00:08+0000 lvl=eror msg="repository processed with errors" module=borges worker=6 job=015dcc48-33b7-8dc9-a0ed-0eae80e4d6f6 endpoint=https://git.openstack.org/openstack/glance error="archiving 2 out of 4 roots failed: 78b45ee909994eef9fbdeba59bb91d9fc955272b, 5b628a687302a6084aed7c293afc8cacfb8b5de9" caller=archiver.go:162
t=2017-08-11T06:00:09+0000 lvl=eror msg="job finished with error" module=borges worker=6 job=015dcc48-33b7-8dc9-a0ed-0eae80e4d6f6 error="archiving 2 out of 4 roots failed: 78b45ee909994eef9fbdeba59bb91d9fc955272b, 5b628a687302a6084aed7c293afc8cacfb8b5de9" caller=archiver.go:76
t=2017-08-11T06:00:09+0000 lvl=eror msg="error on job" module=borges worker=6 error="archiving 2 out of 4 roots failed: 78b45ee909994eef9fbdeba59bb91d9fc955272b, 5b628a687302a6084aed7c293afc8cacfb8b5de9" caller=worker.go:50
t=2017-08-11T07:21:41+0000 lvl=eror msg="repository processed with errors" module=borges worker=1 job=015dcc48-3299-3b44-deb7-fc113ff78ff5 endpoint=https://git.openstack.org/openstack/fuel-plugin-contrail error="archiving 2 out of 3 roots failed: e4e4a9ab3eba7b0ef8ccc2b0dca57ada3313c6e8, ac5fe3832f3e0544aaaab11685b9bd4c3ff75095" caller=archiver.go:162
t=2017-08-11T07:21:42+0000 lvl=eror msg="job finished with error" module=borges worker=1 job=015dcc48-3299-3b44-deb7-fc113ff78ff5 error="archiving 2 out of 3 roots failed: e4e4a9ab3eba7b0ef8ccc2b0dca57ada3313c6e8, ac5fe3832f3e0544aaaab11685b9bd4c3ff75095" caller=archiver.go:76
t=2017-08-11T07:21:42+0000 lvl=eror msg="error on job" module=borges worker=1 error="archiving 2 out of 3 roots failed: e4e4a9ab3eba7b0ef8ccc2b0dca57ada3313c6e8, ac5fe3832f3e0544aaaab11685b9bd4c3ff75095" caller=worker.go:50
t=2017-08-10T23:25:34+0000 lvl=eror msg="repository processed with errors" module=borges worker=9 job=015dcc48-2926-65c9-6f1a-71c5ddc9188a endpoint=https://git.openstack.org/openstack/ceilometer error="archiving 1 out of 4 roots failed: ca8dc0f250726f450de26d7dcdbed1cd3aacaea1" caller=archiver.go:162
t=2017-08-10T23:25:35+0000 lvl=eror msg="job finished with error" module=borges worker=9 job=015dcc48-2926-65c9-6f1a-71c5ddc9188a error="archiving 1 out of 4 roots failed: ca8dc0f250726f450de26d7dcdbed1cd3aacaea1" caller=archiver.go:76
t=2017-08-10T23:25:35+0000 lvl=eror msg="error on job" module=borges worker=9 error="archiving 1 out of 4 roots failed: ca8dc0f250726f450de26d7dcdbed1cd3aacaea1" caller=worker.go:50
t=2017-08-11T01:16:31+0000 lvl=eror msg="repository processed with errors" module=borges worker=0 job=015dcc48-2b40-ed37-0c6c-4f0b10e8f2d9 endpoint=git://github.com/annkun/cocos2d-x error="archiving 1 out of 1 roots failed: 50c22d725f7c2273a371b5af06854c177016839d" caller=archiver.go:162
t=2017-08-11T01:16:31+0000 lvl=eror msg="job finished with error" module=borges worker=0 job=015dcc48-2b40-ed37-0c6c-4f0b10e8f2d9 error="archiving 1 out of 1 roots failed: 50c22d725f7c2273a371b5af06854c177016839d" caller=archiver.go:76
t=2017-08-11T01:16:31+0000 lvl=eror msg="error on job" module=borges worker=0 error="archiving 1 out of 1 roots failed: 50c22d725f7c2273a371b5af06854c177016839d" caller=worker.go:50
t=2017-08-11T01:22:27+0000 lvl=eror msg="repository processed with errors" module=borges worker=6 job=015dcc48-2b61-dcc2-073d-d39229a15e1e endpoint=git://github.com/chenzhirong/pentaho-kettle error="archiving 2 out of 2 roots failed: d12411617d3df9341db4e1c0456a470136371efe, 90bfad2be3b15c9165be0804f96a07956bd495fe" caller=archiver.go:162
t=2017-08-11T01:22:27+0000 lvl=eror msg="job finished with error" module=borges worker=6 job=015dcc48-2b61-dcc2-073d-d39229a15e1e error="archiving 2 out of 2 roots failed: d12411617d3df9341db4e1c0456a470136371efe, 90bfad2be3b15c9165be0804f96a07956bd495fe" caller=archiver.go:76
t=2017-08-11T01:22:27+0000 lvl=eror msg="error on job" module=borges worker=6 error="archiving 2 out of 2 roots failed: d12411617d3df9341db4e1c0456a470136371efe, 90bfad2be3b15c9165be0804f96a07956bd495fe" caller=worker.go:50
t=2017-08-11T04:36:22+0000 lvl=eror msg="repository processed with errors" module=borges worker=1 job=015dcc48-3300-23e2-d992-932d2edf0e94 endpoint=git://github.com/ismaquias/redesComplexas error="archiving 1 out of 1 roots failed: ad3b6c0dc95b6b05ed294f3d6513e2649a622462" caller=archiver.go:162
t=2017-08-11T04:36:22+0000 lvl=eror msg="job finished with error" module=borges worker=1 job=015dcc48-3300-23e2-d992-932d2edf0e94 error="archiving 1 out of 1 roots failed: ad3b6c0dc95b6b05ed294f3d6513e2649a622462" caller=archiver.go:76
t=2017-08-11T04:36:22+0000 lvl=eror msg="error on job" module=borges worker=1 error="archiving 1 out of 1 roots failed: ad3b6c0dc95b6b05ed294f3d6513e2649a622462" caller=worker.go:50

We should improve the error reporting for that part because right now we don't know why archiving these roots failed.

borges consumer - `push to rooted repo xxx failed: object not found`

Even after merging #46 consumer fails on https://github.com/erelsgl/limdu

borges consumer --workers=1
DBUG[06-14|15:54:00] job started                              module=borges WorkerID=0 RepositoryID=015ca6ce-13ea-686e-66f2-3e4ce0548bc5
WARN[06-14|15:54:04] job warning                              module=borges WorkerID=0 RepositoryID=015ca6ce-13ea-686e-66f2-3e4ce0548bc5 error="push to rooted repo 98f1194fd33ba3fbe75869b31b7cb502c8454d59 failed: object not found"
EROR[06-14|15:54:04] job errored                              module=borges WorkerID=0 RepositoryID=015ca6ce-13ea-686e-66f2-3e4ce0548bc5 error="archiving 1 out of 1 roots failed: 98f1194fd33ba3fbe75869b31b7cb502c8454d59"

and same on https://github.com/Zucka/girafAdmin

DBUG[06-14|17:33:15] job started                              module=borges WorkerID=0 RepositoryID=015ca6ce-1471-74d4-c8a3-85f95b201e03
WARN[06-14|17:35:03] job warning                              module=borges WorkerID=0 RepositoryID=015ca6ce-1471-74d4-c8a3-85f95b201e03 error="push to rooted repo 4cf0e481540315d86d2b5eac7f0986e7fbc61e52 failed: object not found"
EROR[06-14|17:35:03] job errored                              module=borges WorkerID=0 RepositoryID=015ca6ce-1471-74d4-c8a3-85f95b201e03 error="archiving 1 out of 2 roots failed: 4cf0e481540315d86d2b5eac7f0986e7fbc61e52"

same on https://github.com/CyanogenMod/android_hardware_qcom_fm

DBUG[06-15|09:52:37] job started                              module=borges WorkerID=2 RepositoryID=015caab7-f2b6-7529-1260-179145f81e3d
WARN[06-15|09:53:53] job warning                              module=borges WorkerID=2 RepositoryID=015caab7-f2b6-7529-1260-179145f81e3d error="push to rooted repo 371d637753718494c6e4ae0bf72424d662766e77 failed: failed to update ref"
EROR[06-15|09:53:53] job errored                              module=borges WorkerID=2 RepositoryID=015caab7-f2b6-7529-1260-179145f81e3d error="archiving 1 out of 1 roots failed: 371d637753718494c6e4ae0bf72424d662766e77"

one more is https://github.com/Zucka/girafAdmin

borges consumer - repository records in DB are not updated

After fetching 10 repos though file produce/borges consumer, the only place to get hash of the first commit of the RooteRepo - a repositories DB table, does not have a commit hash in it:

  • all repos have pending status

     select count(1) from repositories where status <> 'pending';
  • _references field, where URL should be is empty

     select * from repositories where id = '015ca6ce-1405-cab8-c0ce-6d74a9b4b937'::uuid;

Add more logs

We need more logs to know what is happening when a repository is in process.

README: document configuration

Command line arguments and environment variables required to run borges should be documented.

SQL schema is outdated, update or remove it.

AMQP best practices recommends one channel per thread

In the current version, the Consumer is a single thread reading from an AMQP Channel and distributing the jobs among several workers in a Workerpool.

The number of workers in the workerpool can be adjusted dynamically, but in doing so, the flow control over the AMQP channel gets outdated, resulting in either too few jobs in-flight or too many, given the new number of workers in the workerpool.

Most best practices articles and the very own AMQP specs hint that the correct way to do this is to actually have a channel per thread (worker), which has the extra benefit that we can start using Channel.Qos for its real intended purpose, which is to keep the consumers busy while keeping mosts jobs safe in the broker.

This change will imply a major refactor of consumer, instead of having:

a single thread consumer fetching from a single channel from the broker and distributing jobs among the several workers in a dynamic workerpool

we will need to change to:

a dynamic pool of consumers, each of them with a single associated worker and fetching from their own broker channel; this is, we will have as many consumer and channels as we have workers

A few links about this topic:

While none of this quotes address our problem at hand directly and some of them miss the point entirely as we are not accessing the same channel from different threads, you can get an idea of what was the original purpose and design of channels and their expected usage.

Add `borges version`

As a developer responsible to deployment of Borges to k8s cluster,
I would like to be able to tell for sure which version is currently deployed.

This can be done by executing borges --version or some other similar command

Producer: on adding 200 repos, only 178 are in DB

  • borges producer --source=file --file ./top200repos.txt
  • got 200 messages in Rabbit, borges.buriedQueue empty
     curl -s -u guest:guest "http://localhost:8081/api/queues/%2F/borges" | jq .messages
    
  • got 178 records in DB
     select count(1) from repositories;
    

top200repos.txt
200 lines files, cat top200repos.txt | sort -u | uniq -d -c is empty so there is no duplicated and wc -l is 200

Consumer: panic: runtime error: index out of range

On https://github.com/snowballstem/snowball-website Borges panics, most probably due to go-git

t=2017-07-01T11:53:40+0000 lvl=dbug msg="job started" module=borges WorkerID=16 RepositoryID=015cfd9f-846b-dd0b-23c6-f2098939fa91 caller=consumer.go:44
t=2017-07-01T11:53:40+0000 lvl=dbug msg="repository model obtained" job=015cfd9f-846b-dd0b-23c6-f2098939fa91 status=pending last-fetch=nil references=0 caller=archiver.go:93
t=2017-07-01T11:53:40+0000 lvl=dbug msg="endpoint selected" job=015cfd9f-846b-dd0b-23c6-f2098939fa91 endpoint=git://github.com/snowballstem/snowball-website.git caller=archiver.go:100
t=2017-07-01T11:53:40+0000 lvl=dbug msg="local temporary directory created" job=015cfd9f-846b-dd0b-23c6-f2098939fa91 temp-path=local_repos/015cfd9f-846b-dd0b-23c6-f2098939fa91/1498910020147732856 caller=archiver.go:109
t=2017-07-01T11:53:40+0000 lvl=dbug msg="local repository created" job=015cfd9f-846b-dd0b-23c6-f2098939fa91 caller=archiver.go:116
t=2017-07-01T11:53:43+0000 lvl=dbug msg="changes obtained" job=015cfd9f-846b-dd0b-23c6-f2098939fa91 roots=1 caller=archiver.go:143
panic: runtime error: index out of range

goroutine 3053 [running]:
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.PatchDelta(0xcbfda40000, 0x2d0000, 0x3ffe00, 0xca51047862, 0x3, 0x1e59e, 0x0, 0xd0c280, 0xc4728c3a00)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/patch_delta.go:60 +0x5b3
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.ApplyDelta(0xd0c280, 0xc4728c3900, 0xd0c280, 0xc4728c3a00, 0xca51026000, 0x21865, 0x3fe00, 0xd5e718, 0x9f9340)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/patch_delta.go:33 +0x149
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*Decoder).fillOFSDeltaObjectContent(0xc4202aff60, 0xd0c280, 0xc4728c3900, 0xc, 0xc5f3009160, 0xc4202afbd8, 0x1)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/decoder.go:382 +0x202
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*Decoder).decodeByHeader(0xc4202aff60, 0xc4728c38c0, 0x0, 0x0, 0x0, 0x0)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/decoder.go:279 +0x3d6
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*Decoder).DecodeObject(0xc4202aff60, 0xd0c280, 0xc6d5a61400, 0x0, 0x0)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/decoder.go:214 +0x66
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*Decoder).decodeObjects(0xc4202aff60, 0x1eff, 0x0, 0x0)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/decoder.go:155 +0x40
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*Decoder).doDecode(0xc4201def60, 0xa5caf0, 0xc4201def60)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/decoder.go:145 +0x103
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*Decoder).Decode(0xc4202aff60, 0x0, 0x0, 0xc400000000, 0x0, 0x0)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/plumbing/format/packfile/decoder.go:129 +0xa1
github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem/internal/dotgit.(*PackWriter).buildIndex(0xc579c5ea00)
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem/internal/dotgit/writers.go:64 +0xc5
created by github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem/internal/dotgit.newPackWrite
        /root/go/src/github.com/src-d/borges/vendor/gopkg.in/src-d/go-git.v4/storage/filesystem/internal/dotgit/writers.go:52 +0x3b5

Some repositories exceeded the deadline

After less than one day running, there are a few repositories that exceeded the deadline (which is 10h), we should investigate the cause of that.

The repositories are the following on the staging cluster:

https://git.openstack.org/openstack/fuel-plugin-contrail

  • root=ac5fe3832f3e0544aaaab11685b9bd4c3ff75095
  • root=e4e4a9ab3eba7b0ef8ccc2b0dca57ada3313c6e8

https://git.openstack.org/openstack/glance

  • root=5b628a687302a6084aed7c293afc8cacfb8b5de9
  • root=78b45ee909994eef9fbdeba59bb91d9fc955272b

git://github.com/jakedouglas/rails

  • root=db045dbbf60b53dbe013ef25554fd013baf88134

https://git.openstack.org/openstack/deb-nova

  • root=a5d70a6d729cc62dec86f371f4b1c00ce13226a4
  • root=fd2a015f8b37e6edba8c20d62a67ee68bc48c7e6
  • root=bf6e6e718cdc7488e2da87b21e258ccc065fe499

git://github.com/Martin9527/FFmpeg

  • root=77bb6835ba752bb9335d208963a53227bbb1bc63

git://github.com/ismaquias/redesComplexas

  • root=ad3b6c0dc95b6b05ed294f3d6513e2649a622462

git://github.com/chenzhirong/pentaho-kettle

  • root=90bfad2be3b15c9165be0804f96a07956bd495fe
  • root=d12411617d3df9341db4e1c0456a470136371efe

git://github.com/annkun/cocos2d-x

  • root=50c22d725f7c2273a371b5af06854c177016839d

https://git.openstack.org/openstack/ceilometer

  • root=ca8dc0f250726f450de26d7dcdbed1cd3aacaea1

TestReferenceUpdate of the TestArchiver suite fails randomly

Here are two consecutive executions of the same test, with different results.

; go test -v -run='TestArchiver$' --testify.m='TestReferenceUpdate' github.com/src-d/borges
=== RUN   TestArchiver
=== RUN   TestReferenceUpdate
=== RUN   TestReferenceUpdate/all_reference_are_new_except_one_(updated)
=== RUN   TestReferenceUpdate/all_reference_are_new_except_one_(updated_with_new_init)
=== RUN   TestReferenceUpdate/all_reference_are_new_except_one_(one_root_removed)
--- PASS: TestReferenceUpdate (0.41s)
    --- PASS: TestReferenceUpdate/all_reference_are_new_except_one_(updated) (0.00s)
    --- PASS: TestReferenceUpdate/all_reference_are_new_except_one_(updated_with_new_init) (0.00s)
    --- PASS: TestReferenceUpdate/all_reference_are_new_except_one_(one_root_removed) (0.00s)
--- PASS: TestArchiver (0.41s)
PASS
ok  	github.com/src-d/borges	0.421s
;
;
;
;
; go test -v -run='TestArchiver$' --testify.m='TestReferenceUpdate' github.com/src-d/borges
=== RUN   TestArchiver
=== RUN   TestReferenceUpdate
=== RUN   TestReferenceUpdate/all_reference_are_new_except_one_(updated)
=== RUN   TestReferenceUpdate/all_reference_are_new_except_one_(updated_with_new_init)
=== RUN   TestReferenceUpdate/all_reference_are_new_except_one_(one_root_removed)
--- FAIL: TestReferenceUpdate (0.42s)
    --- PASS: TestReferenceUpdate/all_reference_are_new_except_one_(updated) (0.00s)
    --- FAIL: TestReferenceUpdate/all_reference_are_new_except_one_(updated_with_new_init) (0.00s)
	Error Trace:	archiver_test.go:54
    	Error:		Not equal: 9 (expected)
    			        != 8 (actual)
    		
    --- PASS: TestReferenceUpdate/all_reference_are_new_except_one_(one_root_removed) (0.00s)
--- FAIL: TestArchiver (0.42s)
FAIL
exit status 1
FAIL	github.com/src-d/borges	0.421s

Use Not_found status in Repository model.

Right now we are throwing an error and sending the repository to the dead letter queue instead of mark it as Not_found:

EROR[07-03|16:09:54] error fetching repository                job=015d0886-bab7-ce81-5e4a-fbc616d98684 error="repository not found" caller=archiver.go:134
EROR[07-03|16:09:54] job errored                              module=borges WorkerID=0 RepositoryID=015d0886-bab7-ce81-5e4a-fbc616d98684 error="fetching git://github.com/zentooo/rc.git failed: repository not found" caller=consumer.go:49
EROR[07-03|16:09:54] error on job                             module=worker id=0 err="fetching git://github.com/zentooo/rc.git failed: repository not found" caller=worker.go:46

We should update the Repository status to not_found, print a warning instead of an error, and do not send the repository to the dead letter queue.

Borges GC process implementation

  • Add a process that applies garbage collection in siva files to reduce his size.
  • This process should remove siva files without references too.

Consumer: out of 200, 2 messages suck in the queue

After 24h for processing top 178 repositories (same file as in #78 ), there were 2 that seems stuck(re-schedulled?) in the main queue forever \w CPU% != 0.

Both had payload gaxSZXBvc2l0b3J5SUTEEAFc+Ucr70NX9CQFPcYv14s= with is base64(msgpack) and quite hard to debug :/

According to http://sugendran.github.io/msgpack-visualizer/ and http://kawanet.github.io/msgpack-lite/ it seems to be

81 AC 52 65 70 6F 73 69 74 6F 72 79 49 44 82 A4 74 79 70 65 A6 42 75 66 66 65 72 A4 64 61 74 61 DC 00 10 01 5C CC F9 47 2B CC EF 43 57 CC F4 24 05 3D CC C6 2F CC D7 CC 8B

or

{
  "RepositoryID": {
    "type": "Buffer",
    "data": [
      1,
      92,
      249,
      71,
      43,
      239,
      67,
      87,
      244,
      36,
      5,
      61,
      198,
      47,
      215,
      139
    ]
  }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.