conda-incubator / conda-project Goto Github PK

View Code? Open in Web Editor NEW

27.0 27.0 10.0 316 KB

Tool for encapsulating, running, and reproducing projects with Conda environments

Home Page: https://conda-incubator.github.io/conda-project/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

conda-project's People

Contributors

Stargazers

Watchers

Forkers

mattkram andyglick albertdefusco skupr-anaconda jessewiles rxm7706 dbast philippjfr iammelvink travishathaway

conda-project's Issues

[ENH] Multiple conda environments

The environment.yml file specification currently only supports one env. We can use multiple yaml files combined with environment specification in conda-project.yml to be able to maintain and extend multiple environments in a single project.

commands:
  server:
    cmd: ...
    env_spec: default
  test:
    cmd: pytest
    env_spec: test

env_specs:  
  default:
    filename: environment.yml  # default behavior
  test:
    filename: environment-test.yml
  dev:
    filename: environment-test.yml
    dependencies:
    - notebook
    - ...

init vs create

Do we "create" a new project (directory) or "init" inside a directory?

Strict channel priority

Consider enforcing strict channel priority by default, but can be disabled by local .condarc.

[ENH] Run a configured command with the local environment

The need

A user has an environment.yml file and wants to specify a command that can be run with this project. The user would like to declare this command in a YAML file

name: my-project

dependencies:
  - python=3.8
  - tranquilizer
  - pip
  - pip:
    - requests

variables:
  FOO: bar

What to expect

The user can write a conda-project.yml file declaring one or more commands. Nice-to-have: Jinja2 templating for standard things like port number.

commands:
  default:
    cmd: tranquilizer api.py
    env_spec: default # this is the default value and not required in the project file

And execute the command

conda project run [default]
...

The environment will be prepared first if the user has not already done so.

Consider dropping extensive use of .yml/.yaml test parameters

Tests that use the project_directory_factory fixture become parameterized for .yaml and .yml suffixes. It may not be necessary that every use of project_directory_factory be parameterized in this way. Perhaps the logic that looks for YAML files (project and environment) can be refactored to speed up testing.

Standards for local environment paths

I see that VSCode Python extension enables creation of a local env with the prefix ./.conda/. I wonder if there is an accepted standard here or one that could be proposed.

[ENH] project initialization

Provide a method similar to conda create ... that also creates and environment.yml file (and lock). It should take the same arguments as conda create and potentially even allow setting local condarc parameters.

Pep517 and pep518

What if the contents of the project is a python package with a defined build system such that pip install . is meant to call poetry.

My intention is that conda-project can be the provider of Python, compilers, and other system-level tools for the package to compile. The environment.yml might look like

dependencies:
  - python=3.8
  - c++ # the appropriate name of this compiler to be determined
  - pip:
    - .

Improve test performance

The tests are very slow, which increases development time and uses more CI resources than necessary.

Investigate and implement ways to improve test performance while maintaining the same level of coverage.

List of ideas below:

Remove .yml/.yaml parametrized fixture. Add a single unit test to assure both extensions will work. This will cut time in half.
Consider relying on conda-lock's vendored conda`, removing the conda version parametrization.

[ENH] local condarc

Conda does not currently read configuration settings from a .condarc file in your local directory, but it can be implemented in conda-project using the $CONDARC and setting it to the project directory for all conda commands

https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#searching-for-condarc

What about pyproject.toml?

conda-lock supports pyproject.toml, how would conda-project operate against a pyproject.toml? Would an environment.yml or conda-project.yml file still be required. What might a conda-project extension look like or do?

Secondly, pep621 brings a [project] table into pyprojet.toml. Again, what could conda-project do with the table entries or extend them?

[ENH] Option to re-lock a project with the smallest number of changes

Say you’ve got a project that is locked and that you want to update the version of one of its direct dependencies. You then change its specification:

name: Project
packages:
  - python=3.8
  - notebook
  - panel
-  - pandas=1.2
+  - pandas=1.3
env_specs:
  default: {}

Now it's time to lock the project again. It would be really nice if there was an option to re-lock the project so that the new lock has as few updates as possible. In the example and with this option, the new lock may:

just have the updated version of pandas, and that's it
or, have the updated version of pandas and potentially some new dependencies added in 1.3 and/or newer versions of its dependencies if those that were locked prove not to longer satisfy the 1.3 requirements.

Fix `conda` capitalization in help text

I believe there was recently a decision and guidance on capitalization of conda. We should ensure we use that convention in the help text.

❯ conda project
usage: conda-project [-h] [-V] command ...

Tool for encapsulating, running, and reproducing projects with Conda environments
...

Subprocess or Conda API?

Tools like conda-lock and anaconda-project require Conda but do not depend on Conda in their dependencies. They also integrate with Conda through subprocess calls.

Is there a disadvantage to using the Conda Python API and adding Conda a dependency in the meta.yaml?

[ENH] Quickly validate if lock file needs to be updated

As a conda-project user, I would like to quickly check whether conda project update would require a re-lock.

I would like to ensure that my lock file is only forced to be re-generated if my environment specification file(s) have changed. As a result, I can have a pre-commit hook that runs to ensure that my environment.yml and conda-project.yml are never out-of-sync in git, if I choose to use that hook.

Separate run defined commands from ad-hoc execution

Currently #38 utilizes conda project run [command] [args ...] to run either a defined command or any executable in the path of the activated env in your shell.

The run command should instead only execute defined commands, with optional arguments. A separate command, perhaps exec will support ad-hoc, one-time execution for undefined commands

Standardize on pep585 and pep604

pep585: parameterized typing with standard collections
pep604: Use | instead of typing.Union

Ensure that yaml anchors work

I think it's possible that forcing pydantic to fail on extra unsupported keys breaks some features of yaml

explore support for authenticated channels

https://github.com/conda-incubator/conda-lock#--strip-auth---auth-and---auth-file

support .env for environment variables

perhaps we can use the dot-env package (if that's the right name)

Add .gitignore to the created environment

See python/cpython#83417.

There is a general need for something like .projectignore and .gitignore within conda-project. By automatically adding a .gitignore to the installed envs we can be really certain they won't get checked in to a repo even if the project-level .gitignore is removed

Migrate from `setup.py` to `pyproject.toml`, still using `setuptools`

We receive this deprecation warning:

Due to recent updates to setuptools, we can now use its pyproject.toml support for editable and non-editable installs from static package metadata, instead of dynamically running setup.py.

Tutorial

In addition to things like getting started and reference docs is it useful to have a dedicated tutorial?

Consider enforcing repo.anaconda.com in tests

#22 (comment)

Activate leads to multiple shell init; messy PATH env var

I'm running into a problem where conda-project activate launches me into a shell with a muddled (and incorrect) PATH variable for the conda environment. See session below:

(base) π conda-project create -n learn-cp --directory cp-tutorial -c conda-forge --prepare notebook r-shiny r-essentials r-plotly python=3.10
Locking dependencies for default: done
Project created at /Users/USER/me/cp-tutorial

Downloading and Extracting Packages


Downloading and Extracting Packages

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
environment created at /Users/USER/me/cp-tutorial/envs/default

(base) π cd cp-tutorial/

# HOORAY!!  👇 
(base) π conda-project run which R
/Users/USER/me/cp-tutorial/envs/default/bin/R

# BOO!! 👇 
(base) π conda-project activate
## Project environment default activated in a new shell.
## Exit this shell to de-activate.
conda activate /Users/USER/me/cp-tutorial/envs/default

(base) π conda activate /Users/USER/me/cp-tutorial/envs/default

(default) π which R
/usr/local/bin/R  # RStudio distribution

# Lots of PATH setting :(
(default) π echo $PATH
/Users/USER/.docker/bin /Users/USER/bin /opt/homebrew/bin /usr/local/bin /System/Cryptexes/App/usr/bin /usr/bin /bin /usr/sbin /sbin /Users/USER/me/cp-tutorial/envs/default/bin /Users/USER/miniconda3/condabin /Users/USER/.docker/bin /Users/USER/bin /opt/homebrew/bin

It looks like, because the environment is replicated and passed into the new shell, the PATH variable continues to grow as each stop along the init trail prefixes itself. This might be OK as long as the conda environment's bin directory (/Users/USER/me/cp-tutorial/envs/default/bin above ☝️ ) is at the front. However, that's not happening currently.

Decide version convention

We can do semver or a variant of calver.

context-aware cli help

It will be helpful to the user if conda project (no arguments) and conda project --help provided specific information for the project in the current directory (or in --directory location), for things like envs and commands

[BUG] Strict utilization of channels

It is possible to write an environment.yml file without the channels: key, but this is not the best practice that we want to promote for conda-project. Secondly, even with the channels: key the globally configured channels are still respected.

Anaconda-project solves this by using --override-channels when the env is created there by ignoring the condarc channels: and forcing strict adherence to what is defined for the project.

Conda lock does the right thing here and sets override-channels and fails when channels: is missing so it would be good to piggyback on that

Drop Python 3.7 testing

Given that there are some testing errors due to changes in MagicMock between Python 3.7 and 3.8 in #38 and that NumPy no longer provides Python 3.7 support it may be beneficial for conda-project to discontinue testing against it.

[ENH] Prepare the local environment from environment.yml

The Need

the user has an environment.yml defining Conda, and optionally pip dependencies and environment variables, in their project directory and needs to create the local environment.

name: my-project

dependencies:
  - python=3.8
  - tranquilizer
  - pip
  - pip:
    - requests

variables:
  FOO: bar

What to expect

> conda project prepare
Collecting package metadata (repodata.json): done
Solving environment: done
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: / Ran pip subprocess with arguments:
['/Users/adefusco/Desktop/p/envs/default/bin/python', '-m', 'pip', 'install', '-U', '-r', '/Users/adefusco/Desktop/p/condaenv.mcxrqet4.requirements.txt']
Pip subprocess output:
Collecting requests
  Using cached requests-2.27.1-py2.py3-none-any.whl (63 kB)
Collecting charset-normalizer~=2.0.0
  Using cached charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
Requirement already satisfied: certifi>=2017.4.17 in ./envs/default/lib/python3.8/site-packages (from requests->-r /Users/adefusco/Desktop/p/condaenv.mcxrqet4.requirements.txt (line 1)) (2021.10.8)
Collecting idna<4,>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.9-py2.py3-none-any.whl (138 kB)
Installing collected packages: urllib3, idna, charset-normalizer, requests
Successfully installed charset-normalizer-2.0.12 idna-3.3 requests-2.27.1 urllib3-1.26.9

done
#
# To activate this environment, use
#
#     $ conda activate /Users/adefusco/Desktop/p/envs/default
#
# To deactivate an active environment, use
#
#     $ conda deactivate

classmethod or member method for create function in Python API

For reference: #22 (comment)

right now to create a new project you can

from conda_project import CondaProject

new_project = CondaProject.create('project-directory', dependencies=['python=3.8'])

This was initially modeled after Pandas classmethods like from_dict.

An alternative proposed by @mattkram is

from conda_project import CondaProject

new_project = CondaProject('project-directory')
new_project.create(dependencies=['python=3.8'])

conda-project's scope and alternatives

Hey all,

As you may know in the HoloViz group we have a project called pyctdev whose goal was to make running packaging/project management tasks easier. As a user you'd run doit env_create to create an environment, doit develop_install to install the dependencies, doit test_unit to run the test suite, doit package_build to build a package, doit package_upload to publish the built package, etc. Each one of these commands having various options, and all being driven by a flag/env var ecosystem that would either be conda or pip, making pyctdev a unified interface for conda and pip users.

While pyctdev was a noble idea that would make managing multiple projects easier, in practice the devs don't find it particularly appealing (holoviz-dev/pyctdev#104). So I've started to have a look around to see whether we could improve pyctdev or find a replacement or simply define a better workflow.

What is not so surprising in hindsight, but is still striking, is the variety of tools and workflows that are offered to Python users/developers. These tools can take care of Python version management, virtual environment management, dependency management, command execution, packaging and publishing (and more). These tools are more or less specialized. I've made a quick list, spending more time I would certainly find many more. I've added some info about their potential handling of conda when I found it.

pyenv
tox and tox-conda
nox (has some conda support)
pipenv (has some conda support)
pip-tools
hatch and hatch-conda
pdm (has some conda support)
pyflow
poetry
doit
invoke
poetpoet
devpy/pydevtool: two projects related to the efforts of SciPy to develop a nice developer CLI (based on doit)

What retained my attention looking at these tools is that there's a few of them that look like conda-project, at least as far as I understand them after just having a brief look. That would include poetry (11M downloads/month), hatch (90K downloads/month), pdm (75K downloads/month) and pyflow (200 downloads/month, very few!); they usually can manage dependencies, execute commands, ..., and build and publish projects.

conda users are generally Python and Google users, sooner or later they will face all this mess :) As such I believe it will be important for conda-project to clearly define its motivation and scope:

Why and when should I use a tool like conda-project?
How does conda-project define a "project"? (the tools I mentioned about all refer to project management, there's now the pyproject.toml file, so it's a pretty overloaded term)
When should I not use conda-project?

Comparing the capabilities of conda-project to other similar tools could be useful too to guide users when they choose their tooling.

Now a question, is conda-project going to allow users to build and publish their project? I.e. can I use conda-project to maintain a Python package? Or a conda package? Or both :) ? If not, I believe that it should be clearly stated, as I think that's not how Python users understand what a Python project is.

Finally, I would just like to say that now that conda supports plugins, I hope for the sake of its users that it won't end up with what seems to be a pretty messy situation.

Note

Here's a small record of how the four tools mention project in their docs:

poetry says in its README: "Poetry helps you declare, manage and install dependencies of Python projects, ensuring you have the right stack everywhere."
hatch says on its Github repo: "Modern, extensible Python project management"
pyflow says in its README: "Pyflow streamlines working with Python projects and files."
pdm mentions projects many times in its README

Prepare vs. Install

anaconda-project uses the subcommand prepare to represent the action that installs all dependencies into an environment (or environments). Other environment management tools such as poetry and pipenv use the verb install.

Personally, I find install to be more intuitive. I.e.:

conda project lock
conda project install

I'm opening this issue to discuss renaming this subcommand with an optional alias and potential deprecation path.

I propose to:

rename prepare to install
optionally alias prepare to invoke install
optionally define a deprecation path to emit a warning and eventually deprecate the prepare terminology

Thoughts?

Explore potential for async api

With the API discussion in #36 I think there may be value in exploring the development of an async API. Are there use cases where an async API can be advantageous?

refactor lock() context managers

From #22 (comment)

Try profiling conda-project operations

This seems like a good opportunity to try out https://github.com/plasma-umass/scalene, available on conda-forge

docs

readthedocsd

Secret support

I will be useful for variables to be overridden by docker/k8s style secret storage. The path to the runtime secrets folder can be provided as an env-var config to conda-project or a global cli argument (--secrets-dir).

If a mathching secret value is found it will override a value provided in the conda-project.yml file, but .env and shell variables will still retain the ability to override values.

I have identified cases where docker/k8s secrets are used to store the contents of a secret file, but my project requires the path to the file rather than the contents. For example this happens in Google services auth routines that need the path to a Json file with service account credetials. For these cases I propose a parameter that declares that if a secret is found for the variable its path is returned rather than its contents.

variables:
  FOO: bar
  CREDENTIALS_PATH:
    secret_path: True

In this example FOO has a default value that may get overridden by the contents of a secret at <secrets_dir>/FOO. CREDENTIALS_PATH will be set to the path <secrets_dir>/CREDENTIALS_PATH if found.

Finally, much of the functionality already implemented for variables and the secrets described above are implemented in pydatnic. There may be value in using its models for project variables.

https://docs.pydantic.dev/usage/settings/

Change naming convention for lockfiles

Currently, we generate lockfiles with conda-lock with the following format: {env_name}.conda-lock.yml. However, this has the downside that the files are not sorted consecutively.

Instead, we should use the format conda-lock.{env_name}.yml so it is clear what these files are for, and they appear consecutively in the file browser.

Add project variables to [de]activate.d scripts

It would be interesting to see how we can add project variables <prefix>/etc/conda/[de]activate.d collection so that conda activave ./envs/ will work.

A particular challenge with this will be that conda-project.yml can change after the env has been installed. It would be great if the activate.d scripts can look in the project file for the latest vars rather than locking them

Consider adopting conda-plugin approach

Two questions arise here

is it appropriate to require a newer version of conda on our user base?
can we build the conda plugin while still maintaining compatibility with older conda versions?

[BUG] Warn/fail when environment deviates from its spec

The scenario is that someone can run prepare to install the conda environment and then do conda install -p ./envs/default.

Conda project should then detect a change to the live env and warn or fail activities like run or archive. The prepare and lock steps should also be able to correct the state of the env as well.

[BUG] Update env to match what is specified in the yaml file

The need

User creates environment.yml
User runs conda project prepare to make the env from scratch
User modifies environment.yml where dependencies are either added or removed
conda project prepare should always modify the previously installed env to match what the environment.yml says.

Under the hood I would like to use conda env update --prune ... for this, but there is a open issue: conda/conda#7279. What happens is that conda env update --prune will install new packages that have been added to the yaml file, but will not uninstall packages that have been removed.

conda/conda#9614 may be a workable solution.

Add examples

Let's have a few example project directories added to the repo. For things like

environment.yml without conda-project.yml (perhaps not locked)
multi-env (locked)
commands and variables

[ENH] enable and ensure pip support

Make sure that pip support in environment.yml and conda project create works and is properly tested.

Migrate CLI to `click`

Migrate the CLI to use click instead of the current approach.

[ENH] Lock project dependencies

The need

The user has an environment.yml file specifying the minimal dependencies and the supported platforms. The user would like to have a fully-locked specification of the dependencies to ensure reproducibility for each platform.

name: my-project

dependencies:
  - python=3.8
  - tranquilizer
  - pip
  - pip:
    - requests

variables:
  FOO: bar

platforms:
  - osx-64
  - osx-arm64
  - linux-64
  - win-64

What to expect

The user runs a command to create a conda-lock.yml file locking dependencies for all platforms listed in the environment.yml (or a default set if not specified).

conda project lock

If a hash of the environment.yml file changes against one captured in the conda-lock.yml file commands like prepare and run should issue a warning.

Open questions

When a user adds dependencies to the environment.yml file and runs lock a second time should the full environment be re-solved or use a procedure like Conda's --freeze-installed to keep the previously locked dependencies where they are?
When running lock (without changes to the environment.yml file) should it re-lock the dependencies from scratch or do nothing? (anaconda-project uses update to re-lock)

Build anaconda-project -> conda-project conversion capability

I may not want to have a backwards-compatible runtime, but instead offer a way to translate the anaconda-project.yml† to appropriate conda-project files. We would need to setup a minimum level of parity between the two. One of the topics not yet considered here is jinja2 templating in commands.

†It may be possible to attempt a direct translation of anaconda-project-lock.yml to conda-lock.yml, but it may require reaching out the repo to retrieve certain metadata (like hashes) that anaconda-project-lock.yml does not capture.

[ENH] Build a project archive

The need

The user has a project directory, containing at least environment.yml, but can have a structure like below. The user would like to quickly create a .zip or .tar.[bz2|gz] file containing the essentials of their project (not including the live Conda environment) and respecting .projectignore and/or .gitignore files.

- project-dir/
  - .projectignore (or .gitingore or both)
  - .condarc
  - environment.yml
  - conda-project.yml
  - source.py
  - data/
    - cool-data.csv
  - envs/                          <------- this file generated by the prepare command

What to expect

conda project archive /path/to/<filename>.<ext>

pre-load intake catalog as builtin catalog

The scenario

environments:
  default:
    - environment.yml
catalogs:
  - https://....yaml
  - ./catalog.yaml

What do we have to do get this to work where the catalog is automatically loaded?

import intake

intake.gui

conda-incubator / conda-project Goto Github PK

conda-project's People

Contributors

Stargazers

Watchers

Forkers

conda-project's Issues

The need

What to expect

The Need

What to expect

The need

The need

What to expect

Open questions

The need

What to expect

Recommend Projects

Recommend Topics

Recommend Org