reproducible-containers / buildkit-cache-dance Goto Github PK
View Code? Open in Web Editor NEWSave `RUN --mount=type=cache` caches on GitHub Actions ( Forked from https://github.com/overmindtech/buildkit-cache-dance )
License: Apache License 2.0
Save `RUN --mount=type=cache` caches on GitHub Actions ( Forked from https://github.com/overmindtech/buildkit-cache-dance )
License: Apache License 2.0
How to reproduce:
test_ownership
FROM ubuntu
RUN groupadd -g 9999 app && useradd -m -g 9999 -u 9999 app
USER app
WORKDIR /home/app
COPY . /home/app
RUN --mount=type=cache,uid=9999,gid=9999,target="/home/app/tmp_data" \
echo "Listing BEFORE writing." &&\
(ls -l /home/app/tmp_data/test.txt || echo "File not yet created") &&\
echo "THIS IS A TEST" > /home/app/tmp_data/test.txt &&\
echo "Listing file AFTER writing." &&\
ls -l /home/app/tmp_data/test.txt
docker buildx build --progress plain -t test-ownership .
.root
user to preserve ownership when extracting) Extract cache by running node ./buildkit-cache-dance/dist/index.js --extract --cache-map '{"<path-to-cache-directory>/cache_dir": {"target": "/home/app/tmp_data", "uid": "9999", "gid": "9999"}}'
.touch invalidate_cache
.docker buildx prune
.node ./buildkit-cache-dance/dist/index.js --cache-map '{"<path-to-cache-directory>/cache_dir": {"target": "/home/app/tmp_data", "uid": "9999", "gid": "9999"}}'
docker buildx build --progress plain -t test-ownership .
.You should see error like this:
> [stage-0 5/5] RUN --mount=type=cache,uid=9999,gid=9999,target="/home/app/tmp_data" echo "Listing BEFORE writing." && (ls -l /home/app/tmp_data/test.txt || echo "File not yet created") && echo "THIS IS A TEST" > /home/app/tmp_data/test.txt && echo "Listing file AFTER writing." && ls -l /home/app/tmp_data/test.txt:
0.379 Listing BEFORE writing.
0.380 -rw-r--r-- 1 root root 15 Jun 3 15:42 /home/app/tmp_data/test.txt
0.381 /bin/sh: 1: cannot create /home/app/tmp_data/test.txt: Permission denied
------
Dockerfile:11
--------------------
10 |
11 | >>> RUN --mount=type=cache,uid=9999,gid=9999,target="/home/app/tmp_data" \
12 | >>> echo "Listing BEFORE writing." &&\
13 | >>> (ls -l /home/app/tmp_data/test.txt || echo "File not yet created") &&\
14 | >>> echo "THIS IS A TEST" > /home/app/tmp_data/test.txt &&\
15 | >>> echo "Listing file AFTER writing." &&\
16 | >>> ls -l /home/app/tmp_data/test.txt
17 |
--------------------
ERROR: failed to solve: process "/bin/sh -c echo \"Listing BEFORE writing.\" && (ls -l /home/app/tmp_data/test.txt || echo \"File not yet created\") && echo \"THIS IS A TEST\" > /home/app/tmp_data/test.txt && echo \"Listing file AFTER writing.\" && ls -l /home/app/tmp_data/test.txt" did not complete successfully: exit code: 2
Also, the output of ls
command shows that the owner is root
.
Hi everyone,
Having some trouble getting this working for both Go modules (go env GOMODCACHE) and Go build cache (go env GOCACHE). The Github Actions setup-go
action currently takes care of saving/restoring both of those caches on the runner machine, but does not do anything to help with the Buildkit mount cache. I'm not entirely sure how to use this solution to achieve the Buildkit mount caching -- does anyone have a simple, working example? Thanks!
Hi @AkihiroSuda thank you for picking up maintenance of this important action!
We have added two features on a fork over at https://github.com/dcginfra/buildkit-cache-dance and I wonder if you would be interested in PRs to add these features to v2 of the action, now that its use is recommended in the official Docker documentation. We have two main changes:
The changes require the user's Dockerfile to be modified with cache IDs like this:
FROM ubuntu:22.04
RUN \
--mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-cache \
--mount=type=cache,target=/var/lib/apt,sharing=locked,id=apt-lib \
apt-get update && apt-get install -y gcc
And the action is called something like this:
- name: inject cache mounts into docker
uses: reproducible-containers/buildkit-cache-dance@mount-id-example
with:
mounts: |
apt-cache
apt-lib
The main change is in the Dancefile, which is generated on the fly with as many mounts and copy operations as necessary. There is no need to pass the cache-source
and cache-target
separately anymore because the cache is identified by its unique ID instead, like this:
- name: Prepare list of cache mounts for Dancefile
uses: actions/github-script@v6
id: mounts
with:
script: |
const mountIds = `${{ inputs.mounts }}`.split(/[\r\n,]+/)
.map((mount) => mount.trim())
.filter((mount) => mount.length > 0);
const cacheMountArgs = mountIds.map((mount) => (
`--mount=type=cache,sharing=shared,id=${mount},target=/cache-mounts/${mount}`
)).join(' ');
const s3commands = mountIds.map((mount) => (
`aws s3 sync --no-follow-symlinks --quiet s3://${{inputs.bucket}}/cache-mounts/${mount} /cache-mounts/${mount}`
)).join('\n');
core.setOutput('cacheMountArgs', cacheMountArgs);
core.setOutput('s3commands', s3commands);
- name: Inject cache data into buildx context
shell: bash
run: |
docker build ${{ inputs.cache-source }} --file - <<EOF
FROM amazon/aws-cli:2.13.17
COPY buildstamp buildstamp
RUN ${{ steps.mounts.outputs.cacheMountArgs }} <<EOT
echo -e '${{ steps.mounts.outputs.s3commands }}' | sh && \
chmod 777 -R /cache-mounts || true
EOT
EOF
The code is currently still written in JS, and is quite tightly bound to S3 (since that is what we need) but I'd love to see features like this supported in the maintained version of the action, since there has been a lot of discussion about this (as I'm sure you're aware). Thoughts?
When using docker --mount=type=cache, there is an optional argument of id
. This can be used to cache the same directory under a different volume to prevent collisions. I'm not sure that it makes a ton of sense to use in this context, but when I was following a guide they provided an id for the mount. However, this action does not accept an id. This was confusing because it led to this action not working (as its injection/extraction steps dont set the id, so it wont match during the real run). I'm not sure how important it is to add, but could be nice for consistency and preventing confusion
I'm running into this in the post inject step, and I'm not sure how to resolve it:
Post job cleanup.
+ : 'Argv0: /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/post'
++ dirname /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/post
+ dir=/home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4
++ read_action_input skip-extraction
++ /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/read-action-input skip-extraction
+ '[' '' == true ']'
+ : 'Prepare Timestamp for Layer Cache Busting'
+ date --iso=ns
++ read_action_input scratch-dir
++ /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/read-action-input scratch-dir
+ tee scratch/buildstamp
tee: scratch/buildstamp: No such file or directory
2024-03-16T01:00:08,006374192+00:00
I'm trying to update to v3 of this action (thanks again @aminya) on a small repo handling Go builds. The cache is successfully created on the first run, but fails on restore during the second run with the following error:
Run reproducible-containers/buildkit-cache-dance@v3
with:
cache-map: {
"cache-go-build": "/root/.cache/go-build",
"go-pkg-mod": "/go/pkg/mod"
}
skip-extraction: true
scratch-dir: scratch
FROM busybox:1
COPY buildstamp buildstamp
RUN --mount=type=cache,target=/root/.cache/go-build --mount=type=bind,source=.,target=/var/dance-cache cp -p -R /var/dance-cache/. /root/.cache/go-build || true
FROM busybox:1
COPY buildstamp buildstamp
RUN --mount=type=cache,target=/go/pkg/mod --mount=type=bind,source=.,target=/var/dance-cache cp -p -R /var/dance-cache/. /go/pkg/mod || true
[Error: EACCES: permission denied, rmdir 'go-pkg-mod/go.uber.org/[email protected]/internal'] {
errno: -13,
code: 'EACCES',
syscall: 'rmdir',
path: 'go-pkg-mod/go.uber.org/[email protected]/internal'
}
Error: EACCES: permission denied, rmdir 'go-pkg-mod/go.uber.org/[email protected]/internal'
Maybe we need to sync/flush the filesystem first before delete is possible? Why is it necessary to do this cleanup step, maybe we could just continue without deleting the source dir?
Post inject var-lib-apt into docker
is executed before Post Cache var-lib-apt
, but Post inject var-cache-apt
is executed after Post Cache var-cache-apt
buildkit-cache-dance/.github/workflows/test.yml
Lines 17 to 36 in 0db49bb
I'm trying to set cache mount for https://nextjs.org/ build cache but for some reason it doesn't extract it from the image.
I prepared repo with reproduction:
https://github.com/adam187/docker-mount-cache
For yarn cache is working as expected but for yarn cache it doesn't for some reason
https://github.com/adam187/docker-mount-cache/actions/runs/7278155682/job/19831741928
I am attempting to use buildkit-cache-dance
to cache pip dependencies in a GitHub Actions workflow but am encountering issues where the cache is not being used.
My example repo: mgaitan/pip-docker-cache-dance
Consider this commit where I removed a dependency, while supposely the rest are available in the cache.
However the logs indicates that despite the cache directive, pip dependencies are being downloaded again.
I'd appreciate any insights or assistance to resolve this issue.
Mostly a discussion point, but I've noticed large caches take a very long time to inject/extract. In my case, I have a 467MB cache of npm modules. This is downloaded from github's cache into the workflow runner in 8s, and thus is often worth caching for GH workflows. However, the injection step takes 53s. It seems like a lot of this time is transferring context, supposedly 1.8GB worth (taking 30s). I'm not sure if this is because the context is no longer zipped like when GH downloads it, or if something else is causing the 500MB to triple in size. The extraction step takes 2m56s. Happy to provide any logs as needed.
For our case we use local cache driver for multiple runners accordingly has folder for layers cache and for mount-cache (with this actions), but action delete folder and need create workaround with cp/mv before/after action
Mb you can add option to disable sourceFolder
rm while injecting?
I am probably missing something, but doing the same as the example results in an empty cache :
- name: Cache apk
uses: actions/cache@v4
id: cache-apk
with:
path: |
var-cache-apk
key: ${{ runner.os }}-apk-cache-${{ hashFiles('Dockerfile') }}
save-always: true
restore-keys: |
${{ runner.os }}-apk-cache-
- name: Inject apk cache into Docker
uses: reproducible-containers/[email protected]
with:
cache-map: |
{
"var-cache-apk": {
"target": "/var/cache/apk",
"id": "apk-cache"
}
}
save-always: true
skip-extraction: ${{ steps.cache-apk.outputs.cache-hit }}
Dockerfile snippet using this cache :
RUN --mount=type=cache,id=apk-cache,target=/var/cache/apk \
<<EOT
set -e
echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
apk update
apk upgrade
apk add ${PACKAGES}
EOT
I have the same issue with other caches I tried (pnpm store, build dist folders, etc), so that's not related specifically to this path but to the way I implement the action (according to the documentation).
Not sure if this is within the realm of what this action intends to support, but I think theres a valid use-case for wanting an explicit step which extracts a folder from the build. Currently, it is only possible to have a "post" extraction step as part of the combined inject/extract action.
My use case is to extract files for upload to a CDN. In this case, I want to run an explicit upload action after the extraction step. However, since the extract currently runs in the "post" stage after all "non-post" actions, this is hard to do cleanly (its very much designed to be used with a caching action that also runs during the "post" stage).
I saw this was possible in v1, but removed in v2 in favor of the simpler combined model.
I'm attempting to load a cache I know exists into a cache mount
- name: Cache All node_modules folders
uses: actions/cache@v3
with:
path: ${{ github.workspace }}/**/node_modules
key: ${{ runner.os }}-node_modules-${{ env.cache-name }}-${{ hashFiles('**/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-node_modules-${{ env.cache-name }}-
${{ runner.os }}-node_modules-
${{ runner.io }}-
I know the above works because the output of the pnpm install is as follows
Run pnpm install --frozen-lockfile --prefer-offline
Scope: all 11 workspace projects
Lockfile is up to date, resolution step is skipped
Packages: +3757
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
. postinstall$ rm -rf node_modules/@types/react-native
. postinstall: Done
Done in 3.4s
I've tried something as follows but it doesn't seem to mount correctly?
- name: Load pnpm cache into Docker Container
uses: reproducible-containers/[email protected]
with:
cache-source: ${{ runner.os }}-pnpm-${{ env.cache-name }}-${{ hashFiles('**/pnpm-lock.yaml') }}
cache-target: /mono/**/node_modules
RUN --mount=type=cache,id=pnpm,target=/mono pnpm install --prod --frozen-lockfile
But alas no luck. Any pointers?
dockerfile
RUN --mount=type=cache,target=/root/.cache/go-build,sharing=locked,id=go-build-cache \
--mount=type=cache,target=/go/pkg/mod,sharing=locked,id=go-pkg-mod \
go mod download
COPY . .
RUN --mount=type=cache,target=/root/.cache/go-build \
--mount=type=cache,target=/go/pkg/mod \
go build -ldflags="-s -w" -o business main.go
git.yml
- name: Go Build Cache for Docker
uses: actions/cache@v4
with:
path: |
go-build-cache
go-pkg-mod
key: go-cache-multiarch-${{ hashFiles('go.mod') }}
restore-keys: |
go-cache-multiarch-
- name: inject go-build-cache into docker
# v1 was composed of two actions: "inject" and "extract".
# v2 is unified to a single action.
uses: reproducible-containers/[email protected]
with:
cache-map: |
{
"go-build-cache": {
"target": "/root/.cache/go-build",
"id": "go-build-cache"
},
"go-pkg-mod": {
"target": "/go/pkg/mod",
"id": "go-pkg-mod"
}
}
skip-extraction: true
Error while cleaning cache source directory: Error: EACCES: permission denied, unlink 'go-build-cache/[email protected]/.gitignore'. Ignoring...
And it takes about the same amount of time to build as it would if we didn't use the cache.
But, my local cache builds very quickly.
I have two separate caches (one for dependencies, and one for next build caches within a monorepo) that I want to use in my docker build.
The problem is the cache-dance action seems to use the same source folder when extracting the caches. When I added my next build cache to the workflow, it seems that now it overwrites the mounted directed from the first inject step and now the yarn dependencies do not get detected by the docker build.
If I combine all my caches together I'll lose the ability to skip extraction on the dependencies which change much less frequently. I will waste time extracting my yarn cache despite it not changing.
- name: Set up Yarn build cache
id: yarn-cache
uses: actions/cache@v4
with:
path: yarn-build-cache
key: ${{ matrix.platform }}-yarn-${{ hashFiles('yarn.lock') }}
restore-keys: |
${{ matrix.platform }}-yarn-
- name: Set up Next build cache
id: next-cache
uses: actions/cache@v4
with:
path: |
next-build-cache
nx-build-cache
key: ${{ matrix.platform }}-next-${{ matrix.app.name }}-${{ hashFiles('yarn.lock') }}-${{ hashFiles(format('apps/{0}/**', matrix.app.name)) }}
restore-keys: |
${{ matrix.platform }}-next-${{ matrix.app.name }}-${{ hashFiles('yarn.lock') }}-
- name: Inject caches into Docker
uses: reproducible-containers/buildkit-cache-dance@v3
with:
cache-map: |
{
"yarn-build-cache": "/fe/.yarn/cache"
}
skip-extraction: ${{ steps.yarn-cache.outputs.cache-hit }}
- name: Inject next cache into Docker
uses: reproducible-containers/buildkit-cache-dance@v3
with:
cache-map: |
{
"next-build-cache": "/fe/apps/${{ matrix.app.name }}/.next/cache",
"nx-build-cache": "/fe/.nx"
}
skip-extraction: ${{ steps.next-cache.outputs.cache-hit }}
The extraction step takes several minutes for large caches such as caching ~500MB of npm modules. If the actions/cache step had a cache hit, the extraction step is pointless (as the cache will not even be rewritten in that case anyway). This is actually fairly common since lockfiles dont change much. If this action could accept an input for "should-extract", this could be tied to the output of "cache-hit" from actions/cache, and save a few minutes on a lot of runs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.