Giter Site home page Giter Site logo

conda-incubator / conda-recipe-manager Goto Github PK

View Code? Open in Web Editor NEW
6.0 9.0 4.0 4.46 MB

A project for libraries and automated tools that manage and manipulate conda recipe files.

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.23% Python 98.77%

conda-recipe-manager's Introduction

conda-recipe-manager

Table of Contents

Overview

A project for libraries and automated tools that manage and manipulate conda recipe files.

This project started out as a recipe parser library in Anaconda's percy project.

Getting Started

General Installation

Install into your current environment

make install

Install into a custom environment

make environment
conda activate conda-recipe-manager

Developer Notes

make dev
conda activate conda-recipe-manager

The dev recipe will configure a conda environment named conda-recipe-manager with development tools installed.

pre-commit is automatically installed and configured for you to run a number of automated checks on each commit.

NOTE: As of writing, only a handful of files are checked by the linter and pre-commit. ANY NEW FILES should be added to these checks.

Running pre-commit checks

The provided Makefile also provides a handful of convenience recipes for running all or part of the pre-commit automations:

  1. make test: Runs all the unit tests
  2. make test-cov: Reports the current test coverage percentage and indicates which lines are currently untested.
  3. make lint: Runs our pylint configuration, based on Google's Python standards.
  4. make format: Automatically formats code
  5. make analyze: Runs the static analyzer, mypy.
  6. make pre-commit: Runs all the pre-commit checks

Release process

  1. Update CHANGELOG.md
  2. Update the version number in pyproject.toml
  3. Ensure environment.yaml is up to date with the latest dependencies
  4. Create a new release on GitHub with a version tag.
  5. Manage the conda-forge feedstock, as per this doc

conda-recipe-manager's People

Contributors

schuylermartin45 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conda-recipe-manager's Issues

Multiline strings that don't use `>` or `|` characters get interpreted as lists of strings

A summary section like this (found in r-highr-feedstock):

  summary: Provides syntax highlighting for R source code. Currently it supports LaTeX and HTML
    output. Source code of other languages is supported via Andre Simon's highlight
    package (<http://www.andre-simon.de>).

Appears to be valid YAML. The conversion process (or more likely, the parser) turns the above string into:

  summary:
    - Provides syntax highlighting for R source code. Currently it supports LaTeX and HTML
    - "output. Source code of other languages is supported via Andre Simon's highlight"
    - package (<http://www.andre-simon.de>).

^ Which causes rattler-build to fail as the value must be a scalar type for summary.

Consider using `rattler-build` as our packaging system

I had this idea while writing #2

At some point in the near future, we should consider "eating our own dog food" and use rattler-build to package this project.

For a period of time, we might want to consider having two build-recipe workflows on PRs/merges; 1 for conda-build and one for rattler-build

`crm patch <JSON patch> <file>` Command

To enable experimentation and shell-scripting small recipe edits, I would like to propose a simple CLI command to edit recipe files.

Given a file and a JSON-patch blob, this command will apply an edit to a recipe file. This command should validate the patch blob and recipe file provided.

Examples:

$ crm patch {"op": "not_an_op", "foo": "/about/license", "value": "MIT"} /foo/bar/meta.yaml
ERROR: Illegal JSON patch provided

$ crm patch {"op": "add", "path": "/about/license", "value": "MIT"} /foo/bar/not_a_recipe.txt
ERROR: Failed to recognize `/foo/bar/not_a_recipe.txt` as a recipe file.

$ crm patch {"op": "add", "path": "/about/license", "value": "MIT"} /foo/bar/meta.yaml
Success! Changes have been saved to `/foo/bar/meta.yaml`

Bonus points: provide a diff of the changes made.

$ crm patch --diff {"op": "add", "path": "/about/license", "value": "MIT"} /foo/bar/meta.yaml
Success! Changes have been saved to `/foo/bar/meta.yaml`
about:
  summary: foo
------42-42
  license: Apache 2.0
+++++42-42
  license: MIT

There are likely a few other flags we can add to make this more useful for users. Maybe log the file to STDOUT instead of writing to it directly?

Handle `m2-` logic found in JINJA variables used in R recipes

55 recipes in the Anaconda Recipes test data (mostly R packages) have conditionalized JINJA variables that do not translate into something rattler-build supports.

Here is what is commonly used:

{% set posix = 'm2-' if win else '' %}
{% set native = 'm2w64-' if win else '' %}

NOTE: Sometimes p and n are used as variable names instead.

Here is what is conda-recipe-manager converts that to:

context:
  posix: "'m2-' if win else ''"
  native: "'m2w64-' if win else ''"

Here's the full debug dump:

{
      "error": "Error:   \u00d7 Parsing: failed to parse match spec: ''m2-'' is not a valid package name. Package names can only contain 0-9, a-z, A-Z, -, _, or .",
      "recipe_count": 55,
      "recipes": [
        "ccache-feedstock/recipe/recipe.yaml",
        "cmake-no-system-feedstock/recipe/recipe.yaml",
        "double-conversion-feedstock/recipe/recipe.yaml",
        "r-ape-feedstock/recipe/recipe.yaml",
        "r-brglm-feedstock/recipe/recipe.yaml",
        "r-checkmate-feedstock/recipe/recipe.yaml",
        "r-coin-feedstock/recipe/recipe.yaml",
        "r-diffobj-feedstock/recipe/recipe.yaml",
        "r-fansi-feedstock/recipe/recipe.yaml",
        "r-fastmap-feedstock/recipe/recipe.yaml",
        "r-feather-feedstock/recipe/recipe.yaml",
        "r-fs-feedstock/recipe/recipe.yaml",
        "r-gert-feedstock/recipe/recipe.yaml",
        "r-git2r-feedstock/recipe/recipe.yaml",
        "r-glmnet-feedstock/recipe/recipe.yaml",
        "r-glue-feedstock/recipe/recipe.yaml",
        "r-gsl-feedstock/recipe/recipe.yaml",
        "r-hexbin-feedstock/recipe/recipe.yaml",
        "r-influencer-feedstock/recipe/recipe.yaml",
        "r-irkernel-feedstock/recipe/recipe.yaml",
        "r-irlba-feedstock/recipe/recipe.yaml",
        "r-isoband-feedstock/recipe/recipe.yaml",
        "r-kohonen-feedstock/recipe/recipe.yaml",
        "r-magrittr-feedstock/recipe/recipe.yaml",
        "r-mapproj-feedstock/recipe/recipe.yaml",
        "r-mime-feedstock/recipe/recipe.yaml",
        "r-mnormt-feedstock/recipe/recipe.yaml",
        "r-modelmetrics-feedstock/recipe/recipe.yaml",
        "r-pcapp-feedstock/recipe/recipe.yaml",
        "r-pki-feedstock/recipe/recipe.yaml",
        "r-ps-feedstock/recipe/recipe.yaml",
        "r-pspline-feedstock/recipe/recipe.yaml",
        "r-purrr-feedstock/recipe/recipe.yaml",
        "r-randomforest-feedstock/recipe/recipe.yaml",
        "r-readr-feedstock/recipe/recipe.yaml",
        "r-reticulate-feedstock/recipe/recipe.yaml",
        "r-rgl-feedstock/recipe/recipe.yaml",
        "r-rjsonio-feedstock/recipe/recipe.yaml",
        "r-rodbc-feedstock/recipe/recipe.yaml",
        "r-roxygen2-feedstock/recipe/recipe.yaml",
        "r-rserve-feedstock/recipe/recipe.yaml",
        "r-rsqlite-feedstock/recipe/recipe.yaml",
        "r-sf-feedstock/recipe/recipe.yaml",
        "r-sourcetools-feedstock/recipe/recipe.yaml",
        "r-survival-feedstock/recipe/recipe.yaml",
        "r-terra-feedstock/recipe/recipe.yaml",
        "r-tidygraph-feedstock/recipe/recipe.yaml",
        "r-urca-feedstock/recipe/recipe.yaml",
        "r-xfun-feedstock/recipe/recipe.yaml",
        "r-xml2-feedstock/recipe/recipe.yaml",
        "r-xts-feedstock/recipe/recipe.yaml",
        "r-yaml-feedstock/recipe/recipe.yaml",
        "r-zip-feedstock/recipe/recipe.yaml",
        "r-zoo-feedstock/recipe/recipe.yaml",
        "the_silver_searcher-feedstock/recipe/recipe.yaml"
      ]
}

Add `merge_build_host` transformation

We are missing a transformation of the merge_build_host field from CEP-14:

  # merge the build and host environments (used in many R packages on Windows)
  # was `merge_build_host`
  merge_build_and_host_envs: bool (defaults to false)

rattler-build error:

Error:   × Parsing: invalid field `merge_build_host`.
    ╭─[50:3]
 49 │   noarch: generic
 50 │   merge_build_host: $${{ true if win }}
    ·   ────────┬───────
    ·           ╰── here
 51 │   rpaths:
    ╰────

False positive integration test failures

rattler-build will fail a dry-run build if environment variables referenced by env.get() are not present.

For our purposes, this is analogous to a false-positive and potentially obfuscates other test failures as this check appears to happen before some other validation checks.

We should find a way to mitigate this issue. At the very least, this issue affects all R recipe files.

Multi-repo processing and JSON-parseable output with stats

A few things:

  • It would be beneficial if the conversion script could be run across multiple projects at once
  • As part of this, the script should be able to output to a JSON format for easy parsing by other scripts
  • The JSON output should include statistics about runtime and errors encountered so bulk operations can be easily understood.

Use pixi instead of a Makefile

Pixi is a modern alternative to a Makefile and natively integrates with the conda-ecosystem to install dependencies. I think it would work nicely for this project. You can even use the pyproject.toml support as of recently.

I can give it another go if you are up for it :)

Warn/handle when a selector is on a JINJA variable

cctools-ld64-feedstock contains a recipe that has a selector on a variable. In this particular case, the selector is unnecessary and is selectively used anyways.

{% set native_compiler_subdir = 'linux-64' %}  # [linux]

We should add support for a warning as a human will need to determine the intent/danger of doing this. In (hopefully) most cases, we can ignore the selector and just keep the variables. This will at least allow a human to correct the issue.

A `:` in a quoted string causes the conversion tool to incorrectly patch files

protobuf-feedstock contains a recipe file that has an edge case that is not being handled properly.

There are a few commands listed that contain selectors. When the selectors get patched to the new if/then blocks, the command string gets mangled. The : in the command statement appears to be parsed as YAML at some point and that causes the conversion script to emit the wrong output.

Problematic lines:

    - python -c "from google.protobuf.internal import api_implementation; assert api_implementation.Type() == 'cpp'"  # [unix and python_impl != "pypy"]
    - python -c "from google.protobuf.internal import api_implementation; assert api_implementation.Type() == 'python'"  # [win or (unix and python_impl == "pypy")]

Debug tree:

      |- <Collection Node>
        |- if
          |- unix and python_impl != "pypy"
        |- then
          |- python -c "from google.protobuf.internal import api_implementation; assert api_implementation.Type()
          |- == 'cpp'"
      |- <Collection Node>
        |- if
          |- win or (unix and python_impl == "pypy")
        |- then
          |- python -c "from google.protobuf.internal import api_implementation; assert api_implementation.Type()
          |- == 'python'"

One canonical recipe file to rule them all

When conda-recipe-manager-feedstock exists (see this PR) we should consider redacting the recipe file in this project.

I don't want to have to maintain two sources of truth. This comes with some caveats:

  • The GHAs that call out to conda-build and rattler-build will need to be updated to pull from the feedstock?
  • This might have a bootstrapping issue when it comes time to release a new version(?). If a dependency is added, you would need to accept that the builds will fail until a new version is fully released.

What to do with `/build/rpaths`?

r-highr-feedstock contains a list of strings under /build/rpaths:

build:
  merge_build_host: True  # [win]
  # If this is a new build for the same version, increment the build number.
  number: 0
  # no skip
  noarch: generic

  # This is required to make R link correctly on Linux.
  rpaths:
    - lib/R/lib/
    - lib/

This does not appear in CEP-13 nor CEP-14 and rattler-build does not currently support this path. There is a new /build/dynamic_linking/rpaths but that is a different transformation which we recently added support for.

rattler-build error:

Error:   × Parsing: invalid field `rpaths`.
    ╭─[51:3]
 50 │   merge_build_host: $${{ true if win }}
 51 │   rpaths:
    ·   ───┬──
    ·      ╰── here
 52 │     - lib/R/lib/
    ╰────

`bioconda_recipes_03` and `bioconda_recipes_04` fail and don't complete

Kicking the can on this, I can't currently figure this out.

These two bioconda integration tests seem to cause the GitHub runner to lock-up with a yellow spinning wheel. The tests are marked as failed, without completing the post-run stages. The timeout checks for rattler-build don't seem to trigger.

Here is an example run:
https://github.com/conda-incubator/conda-recipe-manager/actions/runs/8482951360/job/23243473204?pr=15

Work on these unit tests is still on-going in #15

These tests appear to work fine on my Mac locally.

Selectors in a list are not being converted properly

From the conda-build element chat:

  build:
    - python                                 # [build_platform != target_platform]
    - cross-python_{{ target_platform }}     # [build_platform != target_platform]

is not quite corretly translated (only the first of the two elements makes it into the then part of the condition.

Use `conda-recipe-manager` as the starting end point for all scripts

@wolfv brought this up in conversation, and it would be consistent with some of the other tools we have at Anaconda.

We should move to use conda-recipe-manager as the base for current and future tools. Although it is more to type, it would lend itself to providing a better user experience, aggregating all the commands into 1 namespace.

Example:

conda-recipe-manager convert <args>
conda-recipe-manager <tool> <args>

Output schema reference by default

It might be nice to output the schema reference so that you get nice completions by default:

# yaml-language-server: $schema=https://raw.githubusercontent.com/prefix-dev/recipe-format/main/schema.json

SPDX correction

From r-highr-feedstock:

Error:   × Parsing: failed to parse SPDX license: unknown term
  │ See <https://spdx.org/licenses> for the list of valid licenses.
  │ Use 'LicenseRef-<MyLicense>' if you are using a custom license.
    ╭─[75:12]
 74 │ about:
 75 │   license: GPL-3
    ·            ──┬──
    ·              ╰── here
 76 │   summary:
    ╰────

Ideas:

  • We should throw a warning when a license is not recognized in the conversion phase
  • We could potentially auto-correct the issue(?)

Licenses that can't be matched should be flagged as custom

Licenses we can't match to a known SPDX compliant license should be flagged as custom per the SPDX spec. This should make rattler-build happy with any remaining license files we can't automatically fix. Some may actually be custom licenses.

Lowercase all fields when initially parsed?

While investigating some integration testing logs, I found that, on occasion, recipe files use capitalized field names, like Skip instead of skip.

This obviously breaks schema validation in rattler-build but presumably conda-build doesn't care.

We could easily correct for this BUT it does not appear this issue happens particularly often, so it is a lower priority issue.

Fails to parse recipe files containing `environ[]` JINJA variable

The R series of packages on conda-forge (along with a few other packages) all use an environ["PREFIX"] JINJA variable in their /about/license_file field that causes the parser to fail.

A quick search shows that there are many recipes in our integration tests that fail from using environ["<STRING>"] in other places.

NOTE:

CEP-14 does seem to indicate the syntax might have changed in the new format:

  # the script environment. You can use Jinja to pass through environment variables
  # with the `env` key (`${{ env.get("MYVAR") }}`).
  env: {string: string}

Example

Here is an example from the integration test work (r-rann-feedstock):

$ convert conda_forge_recipes_01/r-rann-feedstock/recipe/meta.yaml 
EXCEPTION: An exception occurred while parsing the recipe file
while parsing a flow mapping
  in "<unicode string>", line 1, column 4:
    - {{ environ["PREFIX"] }}/lib/R/sha ... 
       ^
expected ',' or '}', but got '['
  in "<unicode string>", line 1, column 13:
    - {{ environ["PREFIX"] }}/lib/R/share/licens ... 
                ^
0 errors and 0 warnings were found.

`build/` section audit

There are many changes to the build/ section that are not currently supported by the conversion script. We need to perform a thorough audit of this section and support all the conversions.

`crm edit-cli` Command

This is a more advanced form of the feature requested in #83 . Both commands can exist simultaneous as they serve different use cases.

To further enable our users and help onboard new developers to the project, I think we should develop an interactive CLI for making changes to recipe files. This tool will walk recipe maintainers through how to use the parsing libraries provided by conda-recipe-manager. This tool will also be incredibly useful for those who want to play around, prototype, and test out new recipe editing commands.

Here is a rough user story:

$ crm edit-cli /foo/baz/not_a_meta_file.txt
ERROR: Invalid recipe file.  # Comment: return an exist code

$ crm edit-cli /foo/bar/meta.yaml
> help
Provides a CLI for editing conda recipe files

USAGE: crm edit-cli <file>

Commands:
  help - Displays this help menu
  patch - Perform a JSON-patch-like patching operation on the recipe file
  diff - Shows the current diff between the original recipe and the modified recipe
  history - Renders the list of successful patch commands as conda-recipe-manager library code
  commit [file] - Saves changes to the current recipe file or to a new file (if provided)
  
> patch
  Operation? > foo
  Invalid operation. Try again.
  Operation? > add
  Path? > /about/license
  Value? > foobar
  Unrecognized license detected. Try Again or override.
  Value? > MIT
  Resulting Patch Blob: {"op": "add", "path": "/about/license", "value": "MIT"}
  Success: True 

> history
  parser = RecipeParser("/foo/bar/meta.yaml")
  parser.patch({"op": "add", "path": "/about/license", "value": "MIT"})

> diff
about:
  summary: foo
------42-42
  license: Apache 2.0
+++++42-42
  license: MIT

> commit
File changes saved to `/foo/bar/meta.yaml`

We will likely add support for more commands as we see fit (like editing selectors or JINJA variables. We will also need to ensure we support both V0 and V1 recipe formats.

Integration Testing with `rattler-build`

In the not too distant future, I would like to start developing some kind of integration testing with rattler-build.

Here are some starting goals:

  • On PRs/merges, take a series of recipes that we know that we can successfully convert from the v0 -> v1 recipe format.
  • Once converted, run these recipes through the latest release of rattler-build
  • Ideally, we may also want the ability to trigger these integration tests independently of a PR/merge

This should give us some confidence that we work well with our target build platform.

Handle top-level requirements on multi-output recipes

boto3-stubs-feedstock in the Anaconda Recipes test data is a good example of this (so is curl-feedstock). When a multi-output recipe has a requirements section at the top level, rattler will throw an error. This is not supported in the V1 format.

I think in this scenario the top-level requirements applies to all the multi-output requirements sections but I would have to double check on the expected behavior.

Add Automatic Sphinx Documentation

As we onboard new users, we need to make the library more accessible. Some of the classes have gotten very big and complex.

This project enforces Sphinx-styled documentation and we should use that to our advantage. If we autogenerate documentation, lirbrary users will have a guide where they can quickly look-up library functions and features (without having to sift through the code).

As we don't have a clear place to host this documentation, I would like to start out by having a tool that scans the project, generates Markdown files, and commits them to docs/. Ideally this should run as part of our pre-commit process. Doing this should ensure that commits and PRs will fail if the docs have not been updated, keeping our docs automatically up to date.

I would also like to have a make docs command to trigger the documentation manually.

Open Questions:

  • Should we publish private class, function, module, etc information? Can files/functions marked with _ be ignored?

`pip_check` improvement: check `imports`

From Wolf, over chat:

I would also say that if a package did not specify an imports in the tests, we don't need to deal with the pip check either
Although I did recently find out that the imports can also be used to test Perl imports :'(

`/build/script` needs to be upgraded

The /build/script section is currently not being upgraded. In simple scenarios, this works out fine. In complex situations, we need to use the new Script object

This will be a pretty involved upgrade process that will need to be heavily conditionalized. From a brief search, there are already many variants AND we have to parse-out shell variable syntax in the existing script_env field.

Complete the previously started git/GitHub helper script

Introduced in the branches smartin_update_feedstock_script and smartin_update_feedstock_script_gitpython, this issue tracks the progress of finishing one version of the feedstock updater script.

Background

At some point in the future, we will need to start bulk-converting recipe files. To alleviate this problem, I introduced a new conda-recipe-manager command update_feedstock.py intended to streamline the conversion process for package builders. In essence the script tries to:

  • Pull-down a given feedstock
  • Convert the meta.yaml file into a recipe.yaml file (V0 -> V1 recipe conversion)
  • If a successful rattler-build dry run build completes, commit and push these changes and create a PR.
    • Potentially run a full-build over a dry-run build as validation. Ideally full builds are deployed as GHA tests when the PR is created.

This script needs to handle the work streams for both conda-forge and AnacondaRecipe package builders. Eventually this script may be called by an automated bot that traverses all recipes owned by an organization. For now, we want a human-centric CLI for package builders to experiment with.

I personally found gitpython a little too clunky to work with and found the documentation frustrating. Hence why that work was put into a separate branch. I have been meaning to try pygit2 instead but haven't had the time to get back to this project.

Inclusive language conversions

After a brief discussion with @beeankha , we noticed that the new recipe format replaces outdated terminology for new inclusive language in a handful of places, primarily with allow/deny lists in recipe files.

I don't think the current conversion work addresses these changes yet, but we will need to in order to pass the schema checks in rattler-build.

Address drop in rattler-build compatibility

[email protected] Added a new error check that dropped our recipe compatibility enough to start failing some of our integration tests.

In the conda-forge dataset, we now get:

    "Error:   \u00d7 Parsing: failed to render Jinja expression: undefined value: No compiler": 11,

Figuring out this issue and finding a solution will fix our failing integration test(s).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.