openzfs / zfs-buildbot Goto Github PK

The OpenZFS Buildbot Configuration

License: BSD 2-Clause "Simplified" License

Python 37.72% CSS 5.72% Shell 56.57%

zfs-buildbot's Introduction

This repo is archived. Alternate processes for contributing ZFS changes include following the illumos procedures, or opening a PR against ZFSonLinux.

The reasoning behind this are outlined in an email to the [email protected] mailing list, which is reproduced below:

The OpenZFS code repository on github (http://github.com/openzfs/openzfs) is a clone of the illumos repo, with basically identical code. The OpenZFS repo made it easier to contribute ZFS code to illumos, by leveraging the github pull request and code review processes, and by automatically building illumos and running the ZFS Test Suite.

Unfortunately, the automated systems have atrophied and we lack the effort and interest to maintain it. Meanwhile, the illumos code review and contribution process has been working well for a lot of ZFS changes (notably including ports from Linux).

Since the utility of this repo has decreased, and the volunteer workforce isn't available to maintain it, we will be archiving http://github.com/openzfs/openzfs in the coming week. Thank you to everyone who helped maintain this infrastructure, and to those who leveraged it to contribute over 500 commits to ZFS on illumos! Alternate processes for contributing ZFS changes (including those in open PR's) include following the illumos procedures, or opening a PR against ZFSonLinux.

zfs-buildbot's People

Stargazers

Watchers

zfs-buildbot's Issues

Allow ZFS commit message to specify SPL dependency

As described in openzfs/zfs#3935 it would be useful if a ZFS commit message could specify either a SPL git hash or branch name as a dependency. In the short term this would allow us to automate the testing of merging the SPL in to the ZFS source tree. Perhaps something like this:

Example commit

Example commit message.

Requires-spl: refs/pull/PR/head

Buildbot could use a distro version update

(I'm happy to twiddle this in my still-for-a-bit-copious free time if anyone likes...)

Currently, we build against:

Centos 7 {x86_64} (build+test)
Centos 8 {x86_64} (build+test)
Centos 8 Stream {x86_64} (build+test)
Debian 8 {armel, ppc64} (build)
Debian 10 {arm64,x86_64} (build)
Fedora 33 {x86_64} (build+test)
FBSD {12,13,14}-CURRENT snapshots {x86_64} (build+test)
Kernel.org git tip builtin {x86_64} (build)
Ubuntu {16.04} {i386} (build)
Ubuntu {18.04, 20.04} {x86_64} (build+test)

I would suggest at a minimum adding:

~~Some CentOS 8 alternative that will keep getting updates after 2021, be it Rocky or Alma or w/e {x86_64} (build+test)~~ Whoops, missed #234. Nice.
Debian 11 {arm64,x86_64} (build+test)
Fedora 35 {x86_64} (build+test)
FreeBSD something {not little-endian} (build+test)

It might also be useful to add a couple weirder things like:

Some Linux (kernel and OpenZFS built with Clang) (build and test)
Some Linux (kernel + OpenZFS built with KASAN) (build and test)

Though unless you could source somewhere for regularly built Clang kernels, that could be a significant burden, and building the whole kernel and then running the test suite each time could suck...maybe a bot that runs once a day/week against master, or doesn't block merges (e.g. best effort) would be better for one or both of those?

I'd also like to know whether there's a reason not to just make at least the non-x86_64 buildbots testbots too? Does it just take far too long?

I don't know when people want to drop support for older things, so I don't know when it would make sense to drop the Debian 8 trees. (It might also make sense to try using ELTS for armel if we want to see what some users might likely be running and not just "old kernel", though it's not available for ppc64, and that should probably involve a recurring payment to Freexian if using it in a non-personal setting...)

The FBSD/something BE suggestion is because it'd be nice to make sure the FBSD codepaths don't somehow have BE-specific issues. I'd suggest sparc64 for the variety (and because of openzfs/zfs#12008 not being noticed for a long time), but sourcing a build slave for that that would finish runs before the sun burns out could prove tricky, and qemu is...not amazing at sparc64. (I also don't know how the ppc64 slaves are done from quickly looking at the config, so I don't know if that's just paying the qemu cost or actual machines hosted somewhere or if there's a source for ppc64 VMs somewhere...)

Just some thoughts. Not exactly the highest priority, but keeps coming to mind periodically.

Automate OpenZFS patch porting

We recently discovered that it is possible to cherry-pick patches directly from the OpenZFS tree. Git will automatically attempt to find the right file to apply a hunk to if it is given enough time to process a change.

To allow git the time to search, you need the following in your git config

[merge]
    renameLimit = 999999

Now, in order to cherry-pick, clone your fork of zfs and add both zfsonlinux/zfs and openzfs/openzfs as remotes. Perform a git fetch --all. Now you can perform a git cherry-pick <OpenZFS commit> to port patches to ZoL.

It would be nice to have a script that would attempt to perform the above process automatically given an OpenZFS commit.

Add install-fedora-on-zfs

The project https://github.com/Rudd-O/install-fedora-on-zfs has the ability to build a bootable disk image that boots from ZFS (and LUKS beneath it, if so desired). I do not know how buildbot works so I can't at this time produce a pull request to do it, but here's the details:

script source: master branch from the above URL
script inputs: --with-prebuilt-rpms=path/to/splandzfs/rpms/ --releasever=fedora-releasever
script output: root.img (positional parameter)

The script will boot-cycle the root.img two times — one to test that the no-hostonly initrd works fine, and another to test that the hostonly initrd works fine. This happens automatically within QEMU and requires no intervention (but, in the event that the boot process fails, QEMU will be killed after 12 minutes by the script).

Everything the script does is logged in detail, so diagnosing and reproducing any issue is trivial after that.

Debian 8 x86_64 (TEST) reports bc not found

build step 13, xfstests failed. stdio output shows that it is failing to find 'bc' (the math program) when checking its environment, before running any actual tests.

+ sudo -E ./check -zfs -x dio -x sendfile -x user
bc not found
+ RESULT=1
+ exit 1

This string appears in the xfstests-zfs code, in common.config:

130 export SED_PROG="`set_prog_path sed`"
131 [ "$SED_PROG" = "" ] && _fatal "sed not found"
132
133 export BC_PROG="`set_prog_path bc`"
134 [ "$BC_PROG" = "" ] && _fatal "bc not found"
135
136 export PS_ALL_FLAGS="-ef"
137

Add the ZFS Test Suite

Add the ZFS Test Suite to the list of tests. This may require applying some of the changes made in openzfs/zfs#3578. Specifically, we should disable all tests which are currently failing until they can be inspected.

priority of building

I've noticed that the build system build commits in the order that they were pushed to the pull requests. It also seems that all commits in a pull request are rebuilt, even if only a new commit is pushed at the end.

If possible, it would be nice to:

First build all 'final' commits, i.e. the ones where currently the TEST builds are performed.
Then single-commit pull requests are not so easily stalled by multi-commit requests.
Prefer to build commits that have most other successfully built commits.
Then code that fails style check and later fast builds will not hold the slower builders.

keepalive_interval never used when making a test slave

mkEC2TestSlave passes keepalive_interval=60 to mkEC2UserDataSlave. mkEC2UserDataSlave does not handle keepalive_interval. Solution is simple, add handling for keepalive_interval in mkEC2UserDataSlave. Only question is, what's an appropriate default value for keepalive_interval?

Update the Instance Type to M5.Large

You're stilling using the m3.large instance type which is $.14 an hour standard and 2vCPU,7.5GB-RAM. You may be able to get better and cheaper performance by migrating to the newer M5.Large instances which are $0.096 an hour and 2vCPU/8GB-RAM.

Add Posix Test Suite

Add Posix Test Suite to the list of test suites or determine that xfstests already provides similar test coverage.

Fix Requires-spl syntax

After submitting a PR with the Requires-spl: keyword all subsequent build requests will use the same SPL commit instead of defaulting back to master. Restarting the build master resets the SPL dependency and is the current work around the issue.

I don't think bb-build-linux.sh actually builds ZFS in

I had a long bug report written, but deleted it, so here's the short one.

I was wondering how openzfs/zfs#12056 could happen, and then I tried to reproduce it and found another incompatibility which I'll be filing shortly, so I was wondering how that could happen and checked output from the kernel.org builtin bot, and noticed that none of the CC/AS/LD/etc steps mentioned zfs, so I ran bb-build-linux.sh locally, and the resulting .config after it completed said "CONFIG_ZFS is not set" and did not contain ZFS.

I'll be opening a PR once I debug this, I just thought you probably should know.

Update xfstests

The zfs branch in our fork of xfstests needs to be rebased on the latest version of xfstests. This should be done in a way which is acceptable to the upstream xfstests maintainers. This would allow us to use a stock version of xfstests.

Support multiple buildslaves

The buildbot configuration should be updated to intelligently handle multiple buildslaves per builder. Specifically we would want the builder to prefer an on-demand t2.micro slave for builds unless the pending builder queue was more than half a dozen deep. At which point it should be allowed to power up a larger spot instance to speed things up.

This can be done with buildbot adding multiple slaves per builder and providing a custom nextSlave() function which includes the above logic.

Fix `buildbot sighup` password authentication

When starting a latent ec2 build slave a random password is generated for that slave to use. This works fine until buildbot sighup is run to pick up configuration changes on the master. This somehow results in the master and the slave expecting different passwords so the slave is no longer able to connect. We'd like to be able to use buildbot sighup to pickup configuration changes because it doesn't require us to stop any currently running builds.

List of public push webhooks

It would be nice to have a list of the public (if any) PUSH web hooks in the ZoL repos.

I'm considering setting up a Jenkins auto builder at home, but I need (would like) the URLs to subscribe to.

Introduce a rebase check to the STYLE builder

As pull requests become older, it is highly likely that bug fixes have been performed. Some of these bug fixes/improvements are either fixes to issues found in the test suite or changes that may impact the test suite (i.e. disabled/enabled tests). To combat this, we can introduce a check in the STYLE builder which can try and determine how "stale" a pull request is.

SPL build not used during subsequent ZFS build

ZFS was built against the incorrect SPL, even though the SPL build showed the correct revision hash and reported success.

See build 51 on CentOS 7.1 x86_64: http://build.zfsonlinux.org/builders/CentOS%207.1%20x86_64%20%28TEST%29/builds/51

From the SPL build stdio:

+ test -f /etc/buildslave
+ . /etc/buildslave
++ BB_MASTER=build.zfsonlinux.org:9989
++ BB_NAME=CentOS-7.1-x86_64-testslave
++ BB_PASSWORD=PRMZQJZJV9NCI6GEFFRNW1C0
++ BB_ADMIN='Automated latent BuildBot slave <[email protected]>'
++ BB_DIR=/var/lib/buildbot/slaves/zfs
+ CONFIG_LOG=configure.log
+ case "$BB_NAME" in
+ CONFIG_OPTIONS='--enable-debug --with-spec=redhat'
+ MAKE_LOG=make.log
+ MAKE_OPTIONS=
+ INSTALL_LOG=install.log
+ ./autogen.sh
+ ./configure --enable-debug --with-spec=redhat
+ make pkg
+ case "$BB_NAME" in
+ sudo rm spl-0.6.5-8_ga3f5cc4.el7.centos.src.rpm spl-dkms-0.6.5-8_ga3f5cc4.el7.centos.src.rpm spl-kmod-0.6.5-8_ga3f5cc4.el7.centos.src.rpm spl-dkms-0.6.5-8_ga3f5cc4.el7.centos.noarch.rpm
+ sudo yum -y localinstall kmod-spl-0.6.5-8_ga3f5cc4.el7.centos.x86_64.rpm kmod-spl-devel-0.6.5-8_ga3f5cc4.el7.centos.x86_64.rpm spl-0.6.5-8_ga3f5cc4.el7.centos.x86_64.rpm spl-debuginfo-0.6.5-8_ga3f5cc4.el7.centos.x86_64.rpm
+ exit 0
program finished with exit code 0

From the ZFS build config.log:

configure:21277: checking spl source directory
configure:21326: result: /usr/src/spl-0.6.5
configure:21337: checking spl build directory
configure:21373: result: /usr/src/spl-0.6.5/3.10.0-229.20.1.el7.x86_64
configure:21385: checking spl source version
configure:21408: result: 0.6.5-11_ge7b75d9
configure:21419: checking spl file name for module symbols

Note that the ZFS config.log reports a different SPL source version

Add libvirt latent build slaves

Adding support for libvirt latent build slaves will enable testing for architectures other than x86_64. Currently only x86_64 systems are supported by ec2.

http://docs.buildbot.net/latest/manual/cfg-buildslaves-libvirt.html

Handle spot instance termination notices

Because spot instances are used for ZFS testing it's possible that Amazon will terminate them in the middle of a test run if the spot price changes. A 2 minute warning is provided to the instance before termination, we should check for this warning and use it to cleanly terminate buildbot. See the following link for Amazon's recommendation for how to handle termination notices.

https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/

Add builders for 13-RELEASE and without --enable-debug per platform?

Differences between FBSD 13-RELEASE and 13-STABLE and --enable-debug versus not resulted in openzfs/zfs#13145, where building on 13-RELEASE was broken in two places and the CI hadn't noticed.

It'd be nice if non-debug builds didn't break without the CI noticing.

(While I'm asking for ponies, maybe a Debian 11 or sid builder so things like #13083 and #13103 are found sooner?)

Is there a known-issues.html with PRs?

I see that known-issues.sh knows how to generate the page with PR failures included, but that that option presumably isn't being used here.

Is there/could there be a version of that page added to whatever cron generates it that includes PRs, perhaps with a shorter timescale (e.g. 7/14 days)? I suspect it would make things like the issues behind openzfs/zfs#12663 or #238 flamingly obvious...

Avoid rebuilding previously built commits when a pull request is refreshed.

When a pull request is refreshed, the ZFS buildbot may rebuild commits that have already been built and haven't changed. It appears that the pull request event w/ synchronize action points us to a full list of commits. So to avoid duplicates, we would need to add some form of mechanism to the buildbot to keep track of what has been recently built.

We began discussing this in #40. Creating an issue so we don't lose track of it.

Suggestions from @inkdot7:

Regarding duplicates: https://developer.github.com/v3/activity/events/types/#pushevent makes me suspect that there could be an distinct member of each commit, that may be of use.

Otherwise, a dictionary might be more clear in what it does. One could keep a list of e.g. the 100 last unique commit sha that have been propagated through CustomGitHubEventHandler. Then it will not grow too large. With the current turn-around in the buildbots, that should be plenty. The main purpose is to prevent back-to-back rebuilds when additional fix-up commits are pushed. Each time handle_pull_request is invoked, a local dictionary for quick lookup could be made from the list.

Add EBS volumes for testing

All testing is currently performed against file vdevs since block storage wasn't easily available prior to migrating to EC2. However, the EC2 latent build slaves support adding volumes to instances. The master.cfg should be updated to attach at least one volume and the test scripts updated to use it. This will allow us to test the vdev_disk.c implementation.

Use latest AMIs

One of the limitations of the buildbot infrastructure is that to keep current the master.cfg must be updated manually as new AMIs are published. The buildbot documentation described ami filters as the solution to this issue. We should investigating using them in our master.cfg, see:

http://docs.buildbot.net/latest/manual/cfg-buildslaves-ec2.html

Upgrade buildbot to >= 2.7.x

We're still running buildbot 0.8.x, which requires python2 and is super old. We should upgrade to 2.7.x or higher. I've already started work on this.

not specifying "Requires-spl:" fails to clone spl-0.7-release branch

somehow stray" gets appended to git clone branch parameter

 watching logfiles {}
 argv: ['git', 'clone', '--branch', 'spl-0.7-release"', 'https://github.com/zfsonlinux/spl.git', '.']
 using PTY: False
Cloning into '.'...
fatal: Remote branch spl-0.7-release" not found in upstream origin

check build logs from openzfs/zfs#8227

and a workaround by adding Requires-spl: spl-0.7-release
openzfs/zfs#8256

not sure how this " gets there and what leaks it.

Introduce Spell Checking

Introduce spell checking to the STYLE builder. @ka7 has produced a tool to accomplish this (https://github.com/ka7/misspell_fixer). It would be nice to add this to avoid misspellings in the future.

openzfs / zfs-buildbot Goto Github PK

zfs-buildbot's Introduction

zfs-buildbot's People

Stargazers

Watchers

Forkers

zfs-buildbot's Issues

Recommend Projects

Recommend Topics

Recommend Org