Giter Site home page Giter Site logo

Cache dependencies in CI runner about alns HOT 16 CLOSED

n-wouda avatar n-wouda commented on July 23, 2024
Cache dependencies in CI runner

from alns.

Comments (16)

N-Wouda avatar N-Wouda commented on July 23, 2024 1

@markkvdb @fahaddd-git thanks for your inputs! I have added some caching in #119 that reduces CI runtimes around 10x to one minute or less. That is more than sufficient for now, so I closed this issue.

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

It's mostly poetry that cannot resolve the dependencies very quickly. That seems to be a common issue: python-poetry/poetry#2094.

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

Relevant pre-commit stuff: https://tobiasmcnulty.com/posts/caching-pre-commit/

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

The setup-python action for GitHub Actions supports caching poetry dependencies out of the box. Minimal example to make things run:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.9"]
      fail-fast: false

    steps:
      - uses: actions/checkout@v3
      - name: Install Poetry
        run: pipx install poetry==1.2.0
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
          cache: "poetry"
      - name: Install Dependencies
        run: poetry install
      - name: Test
        run: poetry run bash scripts/test.sh

I don't know why but pipx is installed by default for the GitHub runners.

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

Btw, the code (structure) of this project is really nice. Once, every few months I have a peek at how it's evolving!

If you have to get some more inspiration for good coding practices (in Python) I suggest to have a look at the work of Tiangolo. He's the author of the excellent FastAPI and Typer packages. Next to the code, his packages also have very clear and extensive documentation from which you could get some inspiration.

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

@markkvdb thanks! I tried this a few days ago in a commit in #107, but the issue is that we do not ship a lockfile (ALNS is a library, not an application, so we do not require fixed versions of our dependencies). I have been thinking of testing with fixed dependency versions, but haven't gotten around to anything yet.

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

You don't actually need a lock file (poetry.lock). You can let the CI run the poetry install (with dependency resolving) run once and reuse this across runs.

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

The caching will be done based on the hash of the python version and the pyproject.toml file, so as long as nothing changes there you don't need to resolve for each run.

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

If I'm wrong about that, you can use some version of the CI file here:

https://github.com/tiangolo/fastapi/blob/4638b2c64e259b90bef6a44748e00e405825a111/.github/workflows/test.yml#L30

Let me know if none of my suggestions work because then I'll have a look as well to understand what I'm not understanding 😃

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

The caching will be done based on the hash of the python version and the pyproject.toml file, so as long as nothing changes there you don't need to resolve for each run.

Are you sure? Based on my understanding of the action, it'll also hash in the lock file contents. And that will be resolved anew by poetry every time (which is very slow), and likely not match an existing cached set of dependencies (since our dependencies have their own dependencies, and if any one of those has recently been updated, our cache is invalidated).

But I could be misunderstanding this. I'll have a look around at how other projects solve this (soon-ish).

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

If I'm wrong about that, you can use some version of the CI file here:

https://github.com/tiangolo/fastapi/blob/4638b2c64e259b90bef6a44748e00e405825a111/.github/workflows/test.yml#L30

This looks promising, thanks!

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

I'm quite surprised that setup-python does not save the environment (with temporary lock file) across runs. I never tested it but if it's really like that, then adding a manual caching step should definitely work. In my opinion, this is cleaner than expecting developers to consistently run the pre-commit hooks.

If you do want to use the pre-commit workflow, then it might be wise to add a step in the CI to check for consistency between your lock file and the configuration file. I know it exists for pyproject.toml <-> poetry.lock but I suspect that it should also work for pyproject.toml <-> requirements.txt.

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

@markkvdb I don't think I understand the link between the dependencies in the pyproject.toml file (and, by extension, in poetry.lock and the like), and the pre-commit workflow. As far as I know, pre-commit gets its steps/dependencies directly from the repositories linked in .pre-commit-config.yaml, at the revs fixed there. Can you explain?

from alns.

markkvdb avatar markkvdb commented on July 23, 2024

Never mind, I misunderstood how you are using pre-commit. I was thinking that pre-commit was responsible for making sure that the requirements.txt file was up-to-date but it's something completely different...

It seems that manually handling the caching might the best if setup-python doesn't handle this. It does require you to wait for the dependency resolver once but that's what it is in that case.

One note regarding long resolving times is that it might make sense to have a look which package poetry has difficulty resolving. You can get an idea by running poetry lock in verbose mode with poetry lock -vv. Whenever I had this issue it usually came down to one dependency struggling to find a match. You can then manually set the right version (for the time being) to not have these long resolve times.

from alns.

N-Wouda avatar N-Wouda commented on July 23, 2024

@markkvdb thanks! I'm not entirely sure how we should approach dependency pinning yet. We have some minimum versions we could pin, to at least make sure those work. At the same time, we also want to make sure recent(-ish) versions of our dependencies work well too. I'm not sure if we should pin those, or leave it to pypi to get us whatever's most recent.

I'll have a look at how scipy, statsmodels, and the like handle this soon!

from alns.

fahaddd-git avatar fahaddd-git commented on July 23, 2024

Just brainstorming ideas in this thread here to make things faster in the CI. Without the lockfile this seems to be a little more challenging.

  1. Cache the install of Poetry itself
  2. Run a cronjob at some interval to create cache using deps that are downloaded. Hash the pyproject.toml as the key.
  3. User triggers CI- check if cache hit on the pyproject.toml file. If so deps are gotten from cache. If not, deps are installed then cached with hashed pyproject.toml as key.

I'm not sure if the cache from the previous cron run in step 2 would need to be invalidated or if the generated poetry.lock needs to be stored as an artifact.

Per the first item I'm happy to open a PR for this if it's a useful addition.

from alns.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.