Giter Site home page Giter Site logo

bradleyfalzon / gopherci Goto Github PK

View Code? Open in Web Editor NEW
101.0 8.0 15.0 484 KB

GopherCI was a project to help you maintain high-quality Go projects, by checking each GitHub Pull Request, for backward incompatible changes, and a suite of other third party static analysis tools.

Home Page: https://gopherci.io

License: BSD 2-Clause "Simplified" License

Go 96.48% Shell 1.52% CSS 1.77% Makefile 0.24%
continuous-integration linter go static-analysis

gopherci's Introduction

gopherci's People

Contributors

bradleyfalzon avatar hydroflame avatar mike-trewartha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gopherci's Issues

Max build time

GopherCI should set a maximum build time, appears the some bugs in Docker cause an execution to never return:

Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: panic: close of nil channel
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: goroutine 618 [running]:
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: panic(0x8078a0, 0xc4203fcac0)
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: github.com/docker/containerd/supervisor.(*Supervisor).execExit.func1(0xc420450be0, 0xc4200f9a00, 0x0)
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: /go/src/github.com/docker/containerd/supervisor/exit.go:90 +0x10c
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: created by github.com/docker/containerd/supervisor.(*Supervisor).execExit
Jan 28 19:40:17 www1.c.gopherci-prod.internal dockerd[6742]: /go/src/github.com/docker/containerd/supervisor/exit.go:91 +0xee
Jan 28 19:40:18 www1.c.gopherci-prod.internal dockerd[6742]: time="2017-01-28T19:40:18.860760730+10:30" level=error msg="libcontainerd: failed to receive event from containerd: rpc error: code = 13 desc = transport is closing"
Jan 28 19:40:18 www1.c.gopherci-prod.internal dockerd[6742]: time="2017-01-28T19:40:18.955680383+10:30" level=info msg="libcontainerd: new containerd process, pid: 30204"
Jan 28 19:40:19 www1.c.gopherci-prod.internal dockerd[6742]: time="2017-01-28T19:40:19.501709668+10:30" level=info msg="libcontainerd: new containerd process, pid: 30211"

Support gofmt check

Support gofmt (and gofmt -s)

Probably ignore the changes they suggest, it's not immediately clear which lines we should comment on. Perhaps just comment that gofmt has not been ran.

Ideally it'd only check the lines changed, but it might be simpler to just check the entire project (perhaps excluding vendor), revgrep can take a patch and gofmt can output a diff, but I think this may require more code to support, it maybe easier (at least initially) to just comment if any gofmt outputs a diff (or diff <(gofmt -d .) <("")).

Multiple issues for a line should be combined

In some cases one tool, or many tools report multiple issues for the same line. Currently, GopherCI posts multiple comments for the same line.

Should just be a matter of adjusting FilterComments or similar preprocessing step in or around

func (i *Installation) FilterIssues(ctx context.Context, owner, repo string, prNumber int, issues []db.Issue) (suppressed int, filtered []db.Issue, err error) {

screenshot 2017-04-14 07 25 45

Max comments per build

In some cases, less so once we ignore vendor folder, there may be a very large number of issues and we've previously had issues being flagged with abuse.

2017/01/31 10:37:30 revgrep found 82 issues
2017/01/31 10:37:46 queueListen: processing error: could not write comment on https://github.com/bradleyfalzon/hydrocarbon/pull/1: POST https://api.github.com/repos/bradleyfalzon/hydrocarbon/pulls/1/comments: 403 You have triggered an abuse detection mechanism and have been temporarily blocked from content creation. Please retry your request again later.

Until we support private repositories disable them

Currently we do not support private repositories #22, but people are able to cause GopherCI to try and check them.

These clones should fail due to an unauthenticated clone, but that's not a friendly error. Additionally, soon #28 will be available and won't have authentication support (will assume everything is public), so I want to ensure private repositories are skipped.

  • In internal/github/handlers.go
  • Have PushConfig and PullRequestConfig set repository.Private bool in AnalyseConfig
  • Have Analyse check to ensure repository is not private
  • If so, SetStatus to succeeded and text stating we're skipping private repos and perhaps link to #22 for more details.

Support running tests

As a user, GopherCI should replace all basic CI tasks that I might be using elsewhere, such as running tests.

All tests would run, not those covering lines of code that have changed in the PR, although that could be possible but unlikely to have real world benefits (tests should always pass).

Users may want to configure skipping tests in vendor directories.

This may be related to #7 in the generic sense.

I made a small note in #25 that there may be value re-running tests as the base branch changes, merge the branch (if possible/no conflicts), and then run the tests.

Support detecting apicompat's removed members from struct/interface type

https://github.com/bradleyfalzon/apicompat provides GopherCI's backwards compatibility testing. In detecting a removed member of a struct or interface, apicompat uses the type's position, because the member was removed - it no longer has a line number, revgrep then doesn't detect this line as a changed line and excludes the output from the GopherCI build.

  1. Have apicompat detect the hunk position itself.
  2. Have apicompat encode the removed line number in the message, such as file:1 member removed (was file:2), then revgrep could detect this and determine the hunk position.
  3. Don't pass it through revgrep, but somehow determine the hunk position ourselves.
  4. Don't pass it through revgrep, but still report the issue to the user in some way.
  5. Ignore the fact it couldn't comment and just break the build, leaving a message somewhere about the removed member.

A variation of the last 2 or 3 is likely, but more thought and a plan is required.

Related to bradleyfalzon/apicompat#37.

Support multiple versions of Go

As as user, I would like to know which versions of Go my project fails to compile on, so I know the minimum Go version required and if a PR changes (increases or decreases) this.

This would likely work closely with running Go tests #9, as my project may require specific behaviour in later versions (where Go would still compile fine) as opposed to new feature (where older Go versions wouldn't have the same exported types/functions etc).

Users may also like to test against tip, but not have these failures break the build (requires #3).

Support checking repository on push to branch (not via PR)

Owners can push directly to master/other branches:

  • We need the permissions and workflow figured out before a release is public, as it's not clear exactly how to add more permissions to an integration at a later stage.
  • Note many people would push to their repository then make a PR, this workflow should ideally not cause GopherCI to run on push and then on PR creation.
  • Ensure we only initiate a single build or have a max number of builds, if someone pushes multiple commits, we shouldn't test every commit (someone may push thousands, we can't build for each one). Ideally, we'd build checking the between the first and last commit.

Multiple issues for the same line should be made with a single comment

Currently if multiple tools need to comment on the same line, multiple comments are made.

Instead a single comment should be made, but this does make commenting on that particular problem harder.

Think more about this and either change behaviour, or leave as is. I'm leaning towards being more compact.

Analyer's git diff can fail with insufficient history

The analyser obtains a diff of the branch/ref to pass to revgrep, which uses it to filter issues to lines which have recently changed.

In the case of a push, the diff is generated using the head's ref and a relative number of commits. For example if the head ref is abcd and there's 5 commits being pushed, the diff is between abcd~5 and abcd. We use relative numbers because the "before" diff might have been overwritten.

But this doesn't work in some scenarios, I believe it should fail for first time pushes (there is no history to diff), but also when a user (@fortytw2, I'm looking at you) pushes a branch to an existing repo but it doesn't have any shared history.

It's likely that analyser just needs to check if the baseRef exists at the baseURL, and if so, do the diff as normal, if not, just do a git show on the headRef at the headURL.

Continuous checking of repositories as tools are updated

I'd like the ability for all my repositories to be rechecked continuously as new tools are added (or as a user, I enable new tools) and also catch existing issues as tools are updated.

  • Don't check forks.
  • Decide on the interval, perhaps every week or every 30 days.
  • Decide on a mechanism to exclude existing issues i.e. just notify me on new issues discovered.

It's the latter part that may make this difficult.

Analyser.FileSystem is not thread safe

It needs to create entire GOPATHs itself, not reusing a global that's shared between threads.

This was a big oversight on my behalf, but should be straight forward to fix.

Never post duplicate comments on a PR

There's a condition, since supporting synchronise events in #24, there's a condition where GopherCI will post the same comment on a PR multiple times if it receives additional pushes without resolving the previous issue.

It's not immediately clear how to resolve this, as a user could push additional commits which changes the new issue's line number or the hunk position, so it may not (or may be) possible to use the GitHub APIs (https://developer.github.com/v3/pulls/comments/#list-comments-on-a-pull-request) to know whether this issue is a duplicate. position should always be adjusted, so just checking body, file and position matches should be enough to detect a duplicate.

Support multiple checks

Currently only go vet is hardcoded as a check, there's countless other, to include, provide a mechanism to do this.

Support whether tools break a build or just comment

Some users may choose that a particular tool:

  • comments on the affected line
  • comments on the issue itself
  • breaks a build (sets the status to failed)
  • a combination of the above

These should be configurable per tool see #8, and should always have an external link available in the status API showing a summary of issues per PR see #28.

Optionally support leaktest

https://github.com/fortytw2/leaktest is useful to detect leaking go routines, but it requires instrumenting every test.

It would be good if GopherCI could optionally support this, and automatically instrument the tests or the templates used by go tests (I don't know how go test templates work, so this may or may not be possible).

Note, tests will not be able to be ran in parallel, and there may be a need to exclude certain tests.

Support projects with dependencies

As a user, my project may have dependencies that need to fetched, I may vendor my dependencies and choose not to commit them, therefore GopherCI needs to fetch these specific versions.

GopherCI would need to support many vendoring tools, or may choose to only support the tool chosen by vendoring committee. These would need to be installed in the docker image: https://github.com/bradleyfalzon/gopherci-env

https://github.com/heroku/heroku-buildpack-go may have some advice for this.

For the moment though, we should at least attempt to go get them with go get -t -v ./... See

// fetch dependencies, some static analysis tools require building a project

Cases:

  • No vendoring, just go get
  • Detect each vendoring tool and execute the necessary installs, we may just detect one tool and use that, what about sub packages if they vendor using different tools?
  • Once installed, what if vendor files didn't specify all dependencies, go get after that or let the build fail (probably)?

Support private repositories

Currently only public repositories are tested. We should support private repositories too.

This may require:

  • Ability to clone a private repository.
  • Mark the repository as private in the database when storing build metadata, to ensure public cannot see the individual build status (use the following to verify https://platform.github.community/t/check-user-access-for-a-given-repo/2661).
  • Remove public records of public repositories when they are changed to private (may not be required, as we don't store public records that will need to be removed, we'll just require authentication).
  • API requests for private repositories need to be authenticated (such as those in GitHub handler as well as the summary page for grabbing diff).
  • Removal of ignoring private repositories done in #73.

Relates to #8 due to per repository public visibility.

Note, this may not support dependencies that are also private.

Summary page for all repos status'

As a user, I should be able to see an overview of all my repos and see what issues I might need to solve. This could be:

  • Repo Name
  • Protected Branches (if I want all branches protected, show me those that aren't)
  • License (alert if there's a missing or incorrect license)
  • Outstanding issues
  • Outstanding PRs
  • Results from tools (could just be a sum of issues from all tools or breakdown per tool)

This would need the ability to scan an entire repo without revgrep filters, and then continue to update it when there's new commits. Providing a single page overview of the health of all your repos based on the rules you've set.

Only check reverse dependencies of changed packages

As a user, as long as the results and costs are the same, I want CI to run as quickly as possible. One way to do this (as pointed out by @dominikh) is to only run static analysis checks on packages that have changed, and those that depend on them.

GopherCI would itself determine all the packages that have changed (how?), including code, dependencies in GOPATH and vendored dependencies, and using dominikh's rdeps package https://github.com/dominikh/go-tools/tree/master/cmd/rdeps, could determine the reverse dependencies of these.

This may affect #11, if we're testing against golang's tip/master, then we likely should not try this optimisation. As there's a high chance everything will be affected most of the time, and determining the changed packages in tip would be different than the package recently cloned.

Behave correctly on non-Go repositories

As a user, I should be able to enable GopherCI on all my repositories, and GopherCI should either not appear in the GitHub Status API at all, or it should never fail.

Currently it fails because on an empty repository (a repository without any go files with a package declaration), go get fails.

root@916823289afa:/go/src/github.com/bf-test/gopherci-itest# go get -t ./...
# cd /go/src/github.com/bf-test/gopherci-itest; git pull --ff-only
You are not currently on a branch. Please specify which
branch you want to merge with. See git-pull(1) for details.

    git pull <remote> <branch>

package github.com/bf-test/gopherci-itest/...: exit status 1

If there's a .go file with a package declaration anywhere, this isn't an issue.

Ideally, GopherCI would never be executed at all, but more realistically, we may need to close the repository before we know whether there's Go files there at all. By the time we clone, the status API has already been set.

It may be possible to use GitHub API to check the repositories file list or similar, but this is GitHub specific. It may have its own inconsistencies (we need to ensure it reports the file list at that specific commit - the head ref in this case). This would save bandwidth, not downloading repositories that can't be checked, but we still need to fix the underlying issue for a VCS that isn't GitHub.

EDIT: So I think the approach I'll take is to use the GitHub API (another API call for a PR, but a push contains changed/added/removed files) to determine if Go files were added/changed/removed. If none, we won't touch the status API at all. This would be GitHub specific, so docs for Analyser should be updated to require repositories that have Go code, or handle the scenario more gracefully).

Clone to correct directory structure in GOPATH

Currently the test project just clones to $GOPATH/src/gopherci, but it really needs to go to $GOPATH/src/github.com/user/repository, at least for the moment this'll support github.com users, but will need to be updated for other repo paths outside of GitHub.

This'll need to be handled in each Analyser (FileSystem and Docker), and will likely need to be passed in from handling the PR.

Support syncronize action on PullRequestEvent

Currently GopherCI only checks the initial PR, not when additional commits are added.

I believe this is the syncronize action of the GitHub PullRequestEvent, it's ignored

if e.Action == nil || *e.Action != "opened" {
return fmt.Errorf("ignoring PR #%v action: %q", *e.Number, *e.Action)
}

Just need to understand what's in the event, and likely, add it as allowed in the if statement.

Go get not working

I am trying to simply go get the repo, but it doesn't appear to be working.

โœ” ~ 07:55 $ go get -u github.com/bradleyfalzon/gopherci
# cd /home/klauer/dev/go/src/github.com/bradleyfalzon/gopherci; git submodule update --init --recursive
fatal: No url found for submodule path 'vendor/github.com/dgrijalva/jwt-go' in .gitmodules
package github.com/bradleyfalzon/gopherci: exit status 128

Add comments to commits

When implementing checking pushes (not pull requests), I neglected to have GopherCI write comments on those commits.

I don't know why I skipped doing that, I believe this to be a mistake, and GopherCI should post comments whether it was discovered in a PR or a push.

Support using Docker for running tests

As as user all the checks being ran should run in a container, so any code executing inside the container would find it harder to access other resources. It would also allow us to more easily run checks in different Go versions.

This would replace the FileSystem analyser which has race conditions and is shared between multiple resources, but I don't want to keep it around if I don't need to.

GCPPubSubQueue got stuck creating a topic

After a recently deployment, GopherCI failed to restart completely, getting stuck creating the topic for GCP PubSub.

Got stuck

q.topic, err = client.CreateTopic(ctx, topicName)

Logs:

Jan 27 14:23:15 www1.c.gopherci-prod.internal gopherci[3190]: 2017/01/27 14:23:15 Connecting to "mysql" db name: "gopherci", username: "gopherci", host: "104.154.53.253", port: "3306"
Jan 27 14:23:16 www1.c.gopherci-prod.internal gopherci[3190]: 2017/01/27 14:23:16 Applied 0 migrations to database
Jan 27 14:23:16 www1.c.gopherci-prod.internal gopherci[3190]: 2017/01/27 14:23:16 Using analyser "docker"
Jan 27 14:23:16 www1.c.gopherci-prod.internal gopherci[3190]: 2017/01/27 14:23:16 Docker server "www1.c.gopherci-prod.internal" version "1.13.0" on "CentOS Linux 7 (Core)"
Jan 27 14:23:16 www1.c.gopherci-prod.internal gopherci[3190]: 2017/01/27 14:23:16 Docker image "gopherci/gopherci-env:latest" (sha256:623042c04d6dabeb32ba2e2376818b964eac9ef464c5d73c73e6e71f7b810bbe) created 2017-01-26 22:54:29.495705226 +0000 UTC
Jan 27 14:23:17 www1.c.gopherci-prod.internal gopherci[3190]: 2017/01/27 14:23:17 NewGCPPubSubQueue: creating topic "gopherci-ci-v1"

The pubsub client library takes a context, so should be simple enough to add a timeout.

Ignore non-Go projects

As a user, I might have non-Go projects which don't need to be checked. I don't want the status API to appear as pending at all if there are no Go files at all.

After a clone we could find . -type f -name '*.go', and if there's no Go files, abort early. Right now, this would result in GopherCI appearing in Status API for a short period, but wouldn't run any tools, fetch dependencies etc. For large projects, this could be slow, but would almost certainly be faster than other CI tools (if there's no queue).

Another option maybe checking the GitHub API for which languages exist in the repository, but this would then be GitHub specific, and I'd like the avoid that if we can.

Another benefit to this change is lowering the resources that GopherCI uses.

Support storing CI output in some separate storage

Currently all output from the CI job is sent to stdout/stderr, for users that aren't running the GopherCI daemon themselves, they won't be able to see the results.

Should be able to log to either file, or database.

This may necessitate some unique build identifier, that remains unique across restarts and multiple CI instances (such as UUIDv4). Yes, let's use UUIDv4, it's ugly but it works.

Analyser is the wrong abstraction

On each PR, GitHub (and other VCSs eventually) call an Analyser interface which could be Docker or File System. But there's lots of logic inside File System that is duplicated in Docker.

Instead GitHub PR should call some function which accepts an Executor interface, which builds all the exec.Cmd and passes them into an executor, which could be Docker or File System.

package analyser

func Analyse(exec Executor, tools []db.Tool, config Config) ([]Issue, error) {
    exec.Execute([]string{"git", "clone"}
    // etc
}

type Executor interface {
    Execute(args []string) (combined []byte, err error)
}

This would also mean the executers are instantiated for each PR, instead of once at start, so we might need a NewExecuter func being passed around instead of an interface.

No, maybe everything still implements an Analyser, which has a method NewEnvironment() (Executer, error) which can be called for each PR and returns a new environment to work in.

Support Queues for Jobs

Currently GopherCI accepts and processes a job immediately, there's a few flaws with this:

  • If a process dies, the job is not retried and the PR is stuck in a status waiting state.
  • If multiple jobs are received they are ran concurrently, regardless of free system resources.

The work should be pushed to some queue to be executed serially between workers. Ideally the chosen daemon would be simple, or at least optional, for anyone to install and run themselves.

Further tweak GCPPubSub to better implement queues

Recently Google's Pub/Sub client has taken a more focused direction on performance in their Pub/Sub for Google Cloud Platform. This has their API, which was resolved in #70, however, further discussions with the team, it's apparent that we must now use the lower level API client instead of the pubsub client.

The discussion is in googleapis/google-cloud-go#566 (comment) and that comment contains a snippet we can use for the basics. It's likely I'll write this as an external package so others could achieve the same blocking behaviour, and have the internal queue be a thin wrapper around that.

Support per user and per repo configurations

As a user, I should be able to configure which checks run across all my repositories as well as support a per repository configuration (perhaps via .gopherci.yml - so anyone with write access to the repo, or sending a PR could correctly configure the installation).

I may also want to ignore failures that occur within the vendor folder, but this may be a global setting for the installation.

Relates to #3.

GopherCI posting comments to the wrong line

There appears to be some strange edge case where GopherCI is posting comments on the wrong line (or another similar issue).

See: https://github.com/shurcooL/Go-Package-Store/pull/71/files/f0ae285da42fb7815473782830c155f55a95f374#diff-bf6382eea9161c56361844f18dfcbe14

Note, it may just be https://github.com/bradleyfalzon/revgrep is detecting an issue when it shouldn't, and may be related to the file being renamed.

GopherCI normally gets this right, so it's an interesting edge case.

screenshot 2017-04-14 07 34 52

Summary for a single PR's result

Not everyone wants comments on the PRs themselves, some checks are just noisy. We should support a summary view, either on a dedicated page or just as a PR comment.

Summary page should also support re-running build, for intermittent issues, help development of GopherCI (not having to send another PR each time), and possibly rerun tests if the base branch changes (but we should detect this as being discussed in #25)

See related #3, #17, #27

GCPPubSubQueue could not send a message

Just had the following occur:

Jan 28 14:06:35 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/28 14:06:35 github: parsed webhook event: *github.PullRequestEvent
Jan 28 14:06:35 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/28 14:06:35 transport: http2Client.notifyError got notified that the client transport was broken write tcp 10.138.0.2:44948->
Jan 28 14:06:35 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/28 14:06:35 github: event handler error: GCPPubSubQueue: could not publish job: rpc error: code = 13 desc = transport: write
Jan 28 14:06:35 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/28 14:06:35 GCPPubSubQueue: could not read next message: rpc error: code = 13 desc = transport is closing

The last message sent was:

Jan 27 20:24:16 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/27 20:24:16 GCPPubSubQueue: published a message with a message ID: 53288212754175
Jan 27 20:24:16 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/27 20:24:16 github: parsed webhook event: *github.PushEvent
Jan 27 20:24:16 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/27 20:24:16 GCPPubSubQueue: processing ID 53288212754175, published at 2017-01-27 09:54:16.217 +0000 UTC
Jan 27 20:24:16 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/27 20:24:16 GCPPubSubQueue: ack'd ID 53288212754175
Jan 27 20:24:16 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/27 20:24:16 GCPPubSubQueue: adding ID 53288212754175 to job channel
Jan 27 20:24:16 www1.c.gopherci-prod.internal gopherci[5876]: 2017/01/27 20:24:16 GCPPubSubQueue: successfully added ID 53288212754175 to job queue

No other significant events occurred in between that time (just received a few more webhooks which were correctly ignored).

The last line in the logs GCPPubSubQueue: could not read next message: rpc error: code = 13 desc = transport is closing was caused by the listener goroutine, so when the pub/sub client tried to send a message, the underlying transport had issues, then the receiver waiting for new messages received an error:

msg, err := itr.Next()
switch {
case err == iterator.Done:
log.Println("GCPPubSubQueue: stopping listening")
return
case err != nil:
log.Println("GCPPubSubQueue: could not read next message:", err)
time.Sleep(3 * time.Second) // back-off
continue
}

For the client/receiving side, I might even consider panicking, I don't know what to do by this point. Does the underlying transport causing an error mean I need to clean up and reconnect? Because I can't be processing jobs at the time this error occurs (I may have one acknowledged job in the queue tbh), I could just panic and let the process restart itself. But I could lose a job, and I don't know whether I need to restart or not. It's likely the client sdk handled the error, or else, I wouldn't expect to be able to call Next() again without an error.

So if this happens again, we should try and send the queue another message and see if it's processed again.

Perhaps if anything, the GitHub webhook handler should retry queuing the message with a slight couple-second delay, potentially before responding with 200 OK to GitHub (they won't like that).

Gracefully handle git clone errors

GopherCI can sometimes have a failure in cloning the repo. It commonly happens when GopherCI runs after the branch has been removed or ref has been overwritten (force push).

In these cases GopherCI should probably still fail the build (eventually there may be a UI option to force build success), but with a more helpful error message instead of a generic internal error.

Relates to #88.

Support vanity import paths

Vanity import paths, such as package main // import "example.com/pkg" is not currently supported as it's not cloned into the correct directory.

We'll need to either have this as a configuration option or after cloning, detect the vanity import paths and move accordingly (if possible).

See notes in #49.

Found 1 issues

I can't believe I let this happen, but when there's 1 issue, GitHub Status API text says Found 1 issues, should be Found 1 issue. See #44

GCPPubSubQueue stopped receiving messages

At 2017/01/26 04:09 ACDT, we received an alert that the GCPPubSubQueue wasn't being processed. It was resolved at 2017/01/26 06:33 ACDT by restarting the gopherci process.

GopherCI itself is responsible for this, and was successfully running, receiving other webhooks, but a background goroutine failed to receive and process messages from the GCP PubSub queue.

  1. main creates unbuffered channel
  2. main passes a write only channel to GCPPubSubQueue, which is used to signal a job needs to run
  3. main also receives a queue.Queuer from NewGCPPubSubQueue, which is provided to github package to add messages to the Google PubSub queue
  4. main then starts a queueListen, which listens on the unbuffered channel for jobs and depending on the type, calls different methods to process it (blocking to ensure only 1 event is processed at a time).

The last successful job to be processed happened like so:

Jan 25 08:34:33 github: parsed webhook event: *github.PullRequestEvent
Jan 25 08:34:33 event: &github.PullRequestEvent{Action:(*string)(0xc420226410), Number:(*int)(0xc420226500), PullRequest:(*githu
b.PullRequest)(0xc4203d6360), Changes:(*github.EditChange)(nil), Repo:(*github.Repository)(0xc4201b4000), Sender:(*github.User)(0xc42052da20), Installation:(*github.Installation)(0xc4202743c0)}
Jan 25 08:34:33 prevent: &github.Installation{ID:(*int)(0xc42047e270), Account:(*github.User)(nil), AccessTokensURL:(*string)(ni
l), RepositoriesURL:(*string)(nil)}
Jan 25 08:34:33 GCPPubSubQueue: published a message with a message ID: 53103672313656
Jan 25 08:34:34 GCPPubSubQueue: processing ID 53103672313656, published at 2017-01-24 22:04:33.767 +0000 UTC
Jan 25 08:34:34 main: reading job type *github.PullRequestEvent
Jan 25 08:34:34 queue processing error: ignoring PR #16 action: "closed"

Note some columns have been removed that were redundant, duplicate date column, process IDs and hostnames

  1. Firstly we received the webhook from GitHub:
    log.Printf("github: parsed webhook event: %T", event)
    log.Printf("event: %#v", event)
  2. We then detected it was a *github.PullRequestEvent and therefore needed to be queued
    case *github.PullRequestEvent:
    log.Printf("prevent: %#v", e.Installation)
    err = g.queuer.Queue(e)
  3. Queuer (in this case GCPPubSubQueue) successfully queued the message
    log.Println("GCPPubSubQueue: published a message with a message ID:", msgIDs[0])
  4. Moments later a GCPPubSubQueue background goroutine picked up the job in the queue and added to the main queuer channel
    log.Printf("GCPPubSubQueue: processing ID %v, published at %v", msg.ID, msg.PublishTime)
  5. main.queueListen received the message from the channel
    log.Printf("main: reading job type %T", job)
  6. Detected the type and passed it along to the relevant handler, which detected the PR event was a close event and therefore nothing to do:
    func (g *GitHub) PullRequestEvent(e *github.PullRequestEvent) error {
    if e.Action == nil || *e.Action != "opened" {
    return fmt.Errorf("ignoring PR #%v action: %q", *e.Number, *e.Action)
  7. main.queueListen received the error (note, this shouldn't be an error)
    log.Println("queue processing error:", err)

That behaviour is expected, and at this point main.queueListen wouldn't blocked listening to the queueChan or context closing.

Over time, we continued to receive hook events from GitHub, but they were PushEvents (we don't currently support) or IntegrationInstallation events (they are processed immediately and without queues).

Next, we received:

Jan 26 04:03:53 github: parsed webhook event: *github.PullRequestEvent
Jan 26 04:03:53 event: &github.PullRequestEvent{Action:(*string)(0xc42047e790), Number:(*int)(0xc42047e890), PullRequest:(*github.PullRequest)(0xc4203d6240), Changes:(*github.EditChange)(nil), Repo:(*github.Repository)(0xc4201b4580), Sender:(*github.User)(0xc42052c160), Installation:(*github.Installation)(0xc42026b520)}
Jan 26 04:03:53 prevent: &github.Installation{ID:(*int)(0xc420477b90), Account:(*github.User)(nil), AccessTokensURL:(*string)(nil), RepositoriesURL:(*string)(nil)}
Jan 26 04:03:53 GCPPubSubQueue: published a message with a message ID: 53153322486157

Just like before:

  1. We receive the hook from GitHub
  2. GCPPubSubQueue publishes the message and receives an ID
    log.Println("GCPPubSubQueue: published a message with a message ID:", msgIDs[0])

But, unlike before, the background GCPPubSubQueue.listen goroutine never picks up the message.

Here's the logs with a grep for GCPPubSubQueue

Jan 25 08:34:33 GCPPubSubQueue: published a message with a message ID: 53103672313656
Jan 25 08:34:34 GCPPubSubQueue: processing ID 53103672313656, published at 2017-01-24 22:04:33.767 +0000 UTC
Jan 26 04:03:53 GCPPubSubQueue: published a message with a message ID: 53153322486157
Jan 26 04:16:35 GCPPubSubQueue: published a message with a message ID: 53153869748866
Jan 26 04:17:39 GCPPubSubQueue: published a message with a message ID: 53154023226121
Jan 26 04:26:56 GCPPubSubQueue: published a message with a message ID: 53154516087058
Jan 26 04:57:24 GCPPubSubQueue: published a message with a message ID: 53155951693825

We continue to receive hooks from GitHub, continue to add those to the queue, but we never receive them. GCPPubSubQueue.listen goroutine doesn't return without logging, and no other messages were received in the meantime.

The only other errors in between 'Jan 25 08:34:33' and 'Jan 26 04:03:53'

Jan 25 08:34:34 www1.c.gopherci-prod.internal gopherci[13275]: 2017/01/25 08:34:34 queue processing error: ignoring PR #16 action: "closed"
Jan 25 21:58:57 www1.c.gopherci-prod.internal gopherci[13275]: 2017/01/25 21:58:57 transport: http2Client.notifyError got notified that the client transport was broken EOF.
Jan 26 02:07:17 www1.c.gopherci-prod.internal gopherci[13275]: [mysql] 2017/01/26 02:07:17 packets.go:130: write tcp 10.138.0.2:44654->XXX.XXX.XXX.253:3306: write: broken pipe
Jan 26 02:07:17 www1.c.gopherci-prod.internal gopherci[13275]: [mysql] 2017/01/26 02:07:17 packets.go:130: write tcp 10.138.0.2:44654->XXX.XXX.XXX.253:3306: write: broken pipe
Jan 26 04:03:53 www1.c.gopherci-prod.internal gopherci[13275]: 2017/01/26 04:03:53 prevent: &github.Installation{ID:(*int)(0xc420477b90), Account:(*github.User)(nil), AccessTokensURL:(*string)(ni
l), RepositoriesURL:(*string)(nil)}

The packets.go errors just appears to be mysql driver related, the transport error maybe related as it appears to occur in https://github.com/grpc/grpc-go/blob/cb653e4b6150b81ba5157618b57c6f910a6a99f7/transport/http2_client.go#L1126

The grpc library is used by pubsub package: https://godoc.org/cloud.google.com/go/pubsub?imports

But we've received those messages quite a few times without an issue (count month day)

      1 Jan 23
     25 Jan 24
      9 Jan 25

So I'm really not sure what could have caused this. Monitoring does detect this issue (queue > n length for x period of time), and restarting the GopherCI process does also fix it.

I'll add some more debug to gopherci to determine whether the problem is internal, I'd like to add some kind of broadcast message to the queue continuously, so all works could log they received the message, this'd help us narrow down when the issue occurs.

Currently, we're only using a single instance for processing messages, and if this was just a client error, another server could process that job so customers aren't affected. We should do this anyway, but we didn't need it to handle the load just yet.

Note times are ACDT.

Support reading rules from a config file instead of DB

For self hosted deployments, someone may simply prefer to store the gh_installations table, tool configurations and per repository configurations in a config file instead of the database.

This would be separate from .env, but there maybe a case for having just a single configuration file only, perhaps using github.com/spf13/viper

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.