mamba-org / quetz Goto Github PK

View Code? Open in Web Editor NEW

277.0 277.0 75.0 2.78 MB

The Open-Source Server for Conda Packages

Home Page: https://quetz.readthedocs.io/en/latest/

License: BSD 3-Clause "New" or "Revised" License

Python 96.78% Shell 0.11% HTML 0.48% Mako 0.05% CMake 0.06% Dockerfile 0.20% C 1.83% Jinja 0.48%

quetz's People

Contributors

Stargazers

Watchers

Forkers

jingmouren mariobuikhuizen goanpeca wolfv nowster adriendelsalle sylvaincorlay tom--pollard stjordanis alanderex btel davidbrochart sanzoghenzo madhur-tandon beenje ahendriksen stevenryoung skyworksinc mario21ic andyglick ernstluring classicvalues baszalmstra dorgeln janjagusch maresb living1069 fcollonval geopars sophiehallstedtqc jonashaag outpace-bio my20002000 ryan-salo rigzba21 riccardoporreca gabm martinrenou michaelkora dhirschfeld brichet derthorsten hbcarlos kuepe-sl simonbohnen rhomspuron ywu jessewiles pacmansyu krande ivergara andreasalbertqc pavelzw sbivol yyyasin19 rominf nexr mbestipa ruben-arts trungleduc robinholzingerqc genostack zeissflo mbrukman hind-m vineetp6 anacronic-io kekulai atrawog robinholzi sr-pvgils pmlandwehr esss

quetz's Issues

channel names should be case insensitive

Currently if I have a RoboStack channel, it's different from robostack. However, I think (also for security reasons) that those names should be case insensitive and both should point to the same channel.
Same for user names etc.

Server auth for on-premise setup and client authorization

For a sealed-off on-premise setup, github auth is almost surely out of question, are there any plans to provide something like LDAP or another authentication method?

Another use case where conda is also lacking is having multiple on-premise channels with different permission configurations. E.g., there's teams 1, 2, 3, all of them have access to channel A, only teams 1 and 2 have access to channel B, and teams 2 and 3 have access to channel 3. I believe this would be the most common enterprise setup if you're running a single server instance. You could run multiple instances and manage it at the network layer but then it becomes harder to manage, upgrade etc as you end up with a bajillion of servers. I don't have an exact suggestion but maybe that's something you've considered?

Thanks for great work 👍

Refactor conf system and cli

I would like to refactor both the configuration system and the cli to propose something more modular and leverage existing libs.

Motivations:

replace the implementation of type checking, default value, required or not, section vs entry, by an existing lib doing that better
have a modular and extensible config system that paves the way for having pluggable authenticators, storage, etc.
benefit from a cli allowing by design args parsing and not only a config file

I think about traitlets that provides all of those features.
It could be also very convenient to reuse/share some JupyterHub implementations.

There is also pydantic and the promising pydantic-cli. That way we reuse the already installed pydantic dep required by FastAPI.
Looks like some jupyter projects are also assessing pydantic. See this issue.

I'm still surprised that traitlets never gained so popular that pydantic is. Is there some reasons explaining this difference of popularity that could help us to make a choice?

Do you have some thoughts about that @wolfv @SylvainCorlay ?

conda version ordering

we need to implement or take a function to get a version ordering that adheres to the conda spec.

Bootstrap install

Hi!

It could be convenient to create a simple quetz CLI (using Typer ?) to deploy a new Quetz server.

Is there any plan of containerized Quetz with docker, docker-compose, helm, etc.?
Is it related to leveraging JHub machinery for this project?

Thank you for this project!

Think about cache proxying packages

One interesting idea that came up was to configure quetz in a way that it proxies the (requested) packages for e.g. Anaconda.org

This could be easy to implement:

We just need to serve the repodata from the original location
Download & forward the packages upon request

Parse config from a file

Is there are preferred format (yaml, json, ini, toml)?

While this could perhaps be a separate issue, it would also be nice to be able to have configs parsed as command line options (configargparse would be good for this).

uvicorn could then be run programmatically, with the port/listen address (and other settings) loaded from the config.

Staging mode for repodata patches

Currently applying repodata patches is hard to test. It would be cool to have a plugin to stage repodata patches so that one can run some tests against the patches, before rolling them out to everyone.

[quetz-cli] add a option to copy config

I think it would be good to have a --copy-config option so that people can use the dev_config.toml easily for bootstrapping.

Implement a whatprovides backend

It would be awesome to have a "whatprovides" backend that has an index over

binaries (commands) that are in packages
so libraries in packages
python modules that are in packages

So that we can have a mamba command that can do mamba repoquery whatprovides ncurses.6.so and the correct packages are returned.

Other queries:

mamba repoquery whatprovides 7zip

to find the packages that contain 7zip.exe etc.

Figure out how to handle private channels

I think while we have private channels, we need to implement how we handle them properly. I think right now every user can see the contents of private channels, for example.

Create channel pkgstore folder and empty noarch on creation

@nowster do you have an opinion on wether it would be a good idea to create the pkgstore folder / bucket and initialize the noarch channel repodata (empty) upon channel creation?

Errors then running the README.md test/dev example (proposed fix at the end of the issue)

running the README.MD installation process of quetz

environment creation -> OK
getting the sources from github -> OK
installation (pip install -e quetz) -> OK
running the dev server -> OK

quetz run test_quetz --create-conf --dev --reload --port 8888

uploading xtensor packages -> OK
getting the xtensor packages from the server with mamba -> BROKEN
- mamba trace:

  $ mamba install --strict-channel-priority -c http://localhost:8888/channels/channel0 -c conda-forge xtensor

                  __    __    __    __
                 /  \  /  \  /  \  /  \
                /    \/    \/    \/    \
███████████████/  /██/  /██/  /██/  /████████████████████████
              /  / \   / \   / \   / \  \____
             /  /   \_/   \_/   \_/   \    o \__,
            / _/                       \_____/
            |/
        ███╗   ███╗ █████╗ ███╗   ███╗██████╗  █████╗
        ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗
        ██╔████╔██║███████║██╔████╔██║██████╔╝███████║
        ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║
        ██║ ╚═╝ ██║██║  ██║██║ ╚═╝ ██║██████╔╝██║  ██║
        ╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝  ╚═╝

        mamba (0.5.1) supported by @QuantStack

        GitHub:  https://github.com/TheSnakePit/mamba
        Twitter: https://twitter.com/QuantStack

█████████████████████████████████████████████████████████████

channels/channel0/linux- [====================] (00m:00s) Done
channels/channel0/noarch [====================] (00m:00s) 404 Failed

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/mamba/mamba.py", line 941, in exception_converter
        raise e
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/mamba/mamba.py", line 935, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/mamba/mamba.py", line 894, in _wrapped_main
        result = do_call(args, p)
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/mamba/mamba.py", line 778, in do_call
        exit_code = install(args, parser, "install")
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/mamba/mamba.py", line 411, in install
        index = get_index(
      File "/home/quetz/miniconda3/envs/quetz/lib/python3.8/site-packages/mamba/utils.py", line 73, in get_index
        is_downloaded = dlist.download(True)
    RuntimeError: Multi-download failed.

 $ /home/quetz/miniconda3/envs/quetz/bin/mamba install --strict-channel-priority -c http://localhost:8888/channels/channel0 -c conda-forge xtensor

  environment variables:
                 CIO_TEST=<not set>
        CONDA_DEFAULT_ENV=quetz
                CONDA_EXE=/home/quetz/miniconda3/bin/conda
             CONDA_PREFIX=/home/quetz/miniconda3/envs/quetz
           CONDA_PREFIX_1=/home/quetz/miniconda3
    CONDA_PROMPT_MODIFIER=(quetz)
         CONDA_PYTHON_EXE=/home/quetz/miniconda3/bin/python
               CONDA_ROOT=/home/quetz/miniconda3/envs/quetz
              CONDA_SHLVL=2
           CURL_CA_BUNDLE=<not set>
                     PATH=/home/quetz/miniconda3/envs/quetz/bin:/home/quetz/miniconda3/condabin:
                          /home/quetz/bin:/usr/local/bin:/usr/local/sbin:/usr/local/sbin:/usr/lo
                          cal/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/sna
                          p/bin
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>

     active environment : base
    active env location : /home/quetz/miniconda3/envs/quetz
            shell level : 2
       user config file : /home/quetz/.condarc
 populated config files :
          conda version : 4.8.4
    conda-build version : 3.19.2
         python version : 3.8.5.final.0
       virtual packages : __glibc=2.23
       base environment : /home/quetz/miniconda3/envs/quetz  (writable)
           channel URLs : http://localhost:8888/channels/channel0/linux-64
                          http://localhost:8888/channels/channel0/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/quetz/miniconda3/envs/quetz/pkgs
                          /home/quetz/.conda/pkgs
       envs directories : /home/quetz/miniconda3/envs/quetz/envs
                          /home/quetz/.conda/envs
               platform : linux-64
             user-agent : conda/4.8.4 requests/2.24.0 CPython/3.8.5 Linux/4.4.0-187-generic ubuntu/16.04.7 glibc/2.23
                UID:GID : 1004:1004
             netrc file : None
           offline mode : False


An unexpected error has occurred. Conda has prepared the above report.

trace on quetz server side:

INFO:     127.0.0.1:38086 - "GET /channels/channel0/noarch/repodata.json HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:38082 - "GET /channels/channel0/linux-64/repodata.json HTTP/1.1" 200 OK

getting the xtensor package with conda -> BROKEN
- conda trace:

$ conda  install --strict-channel-priority -c http://localhost:8888/channels/channel0 -c conda-forge xtensor
Collecting package metadata (current_repodata.json): failed

UnavailableInvalidChannel: The channel is not accessible or is invalid.
  channel name: channels/channel0
  channel url: http://localhost:8888/channels/channel0
  error code: 404

You will need to adjust your conda configuration to proceed.
Use  conda config --show channels  to view your configuration's current state,
and use  conda config --show-sources  to view config file locations.

server logs

INFO:     127.0.0.1:38064 - "GET /channels/channel0/noarch/current_repodata.json HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:38060 - "GET /channels/channel0/linux-64/current_repodata.json HTTP/1.1" 200 OK
INFO:     127.0.0.1:38064 - "GET /channels/channel0/noarch/repodata.json HTTP/1.1" 404 Not Found

trying without http://localhost:8888 channel -> OK

$ conda  install -d --strict-channel-priority -c conda-forge xtensor
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /home/quetz/miniconda3

  added / updated specs:
    - xtensor


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    xtensor-0.21.5             |       hc9558a2_0         172 KB  conda-forge
    xtl-0.6.17                 |       hc9558a2_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         234 KB

The following NEW packages will be INSTALLED:

  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
  libgomp            conda-forge/linux-64::libgomp-9.3.0-h24d8f2e_16
  python_abi         conda-forge/linux-64::python_abi-3.8-1_cp38
  xtensor            conda-forge/linux-64::xtensor-0.21.5-hc9558a2_0
  xtl                conda-forge/linux-64::xtl-0.6.17-hc9558a2_0

The following packages will be UPDATED: etc....

content of the test repository (after xtensor upload)

$ tree ../test_quetz/channels/
../test_quetz/channels/
 -- channel0/
    |-- channeldata.json
    |-- channeldata.json.bz2
    |-- index.html
    |-- linux-64/
    |   |-- current_repodata.json
    |   |-- current_repodata.json.bz2
    |   |-- index.html
    |   |-- repodata.json
    |   |-- repodata.json.bz2
    |    -- xtensor-0.16.1-0.tar.bz2
     -- osx-64/
        |-- current_repodata.json
        |-- current_repodata.json.bz2
        |-- index.html
        |-- repodata.json
        |-- repodata.json.bz2
         -- xtensor-0.16.1-0.tar.bz2

3 directories, 15 files

local hack

faking a noarch repo

mkdir ../test_quetz/channels/channel0/noarch
echo '{}' > ../test_quetz/channels/channel0/noarch/repodata.json
echo '{}' > ../test_quetz/channels/channel0/noarch/current_repodata.json

restart a xtensor upload

$ conda  install -d  --strict-channel-priority -c http://localhost:8888/channels/channel0 -c conda-forge xtensor
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /home/quetz/miniconda3

  added / updated specs:
    - xtensor


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    xtensor-0.16.1             |                0         109 KB  http://localhost:8888/channels/channel0
    xtl-0.4.16                 |    h6bb024c_1000          47 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         155 KB

The following NEW packages will be INSTALLED: etc ...

Pull-mirror functionality to experiment with bot ideas

It would be good to have a way to fill the quetz package server with an initial set of packages that can be selected through selectors and downloaded e.g. from the anaconda.org repo.

The quetz server should probably have facilities to periodically scrape the endpoint for new package versions.

cc @beckermr

Rollback database on handle_package_files() errors

quetz/quetz/main.py

Lines 627 to 631 in 5ec70ae

    
           pkgstore.create_channel(channel_name) 
        
           dest = os.path.join(condainfo.info["subdir"], file.filename) 
        
           file.file._file.seek(0) 
        
           pkgstore.add_package(file.file, channel_name, dest)

If any of the pkgstore operations fail, we should likely then revert changes made to the database. A 'force' would likely also fix this issue should it ever occur.

mirror synchronisation creates indexes for untouched subdirs

after synchronising each subdir, we mirror synchronisation creates indexes for all exisiting subdirs (even the ones it did not modify)

API permission errors

Logged in without a role I can do:

See members from channels

Add packages to a channel

When trying to upload a file there is an error message Internal Server Error in both endpoints.
I tried to upload a pdf to package new channel two test and my user doesn't have a role.

Package variants, backports

Currently, one pain point in conda-forge are variants and backports.

It would be good to have a package-view that clearly shows what packages are available, and what package variants exist.
The variants can be read from the package "hash".

Common variants are

the compiler (version)
the python version
the boost-cpp version
many other pinned packages

However, during migrations these things sometimes go out of sync. E.g. when a migration hasn't finished for a dependency but a new version of the package was released, it might only be built against the migrated version of the dependency.

Therefore, being able to backport versions is sometimes quite nice. With conda-forge, this can be done by creating a branch in the repository, and merging a PR to a branch. But the github UI doesn't make it easy to discover and understand those branches.
Having a clear visualization of all the package versions and available variants, and a straight-forward way to create backports from a user interface could be a nice improvement.

For other packages (e.g. those derived from PyPI) we know the ancient versions that exist. It could also be cool to be able to quickly select a bunch of ancient versions that one would want to have available, and a bot comes around and makes the PR's.

Integrate with "feedstocks", build farm & grayskull

It could be cool to integrate quetz with grayskull and a "build farm" (the build farm could e.g. be github repositories as in the case of conda-forge).
Grayskull project has already explored a webservice to generate recipes from pypi packages.

If we had that as a plugin we could automatically generate new recipes as needed from the frontend of quetz and add them to the build queue, and then they would appear on the package server.

Multiple copyrights?

Hi!

I'm not used to copyrights and licenses. I noticed multiple files added to the project with a different copyright than the one in the LICENSE file. See indexing file.

As a contributor, I authored the cli file and modified many others but used or kept the QuantStack copyright as is.
The GitHub license agreement talks about licensing but not about copyrights though.

Some arguments:

my contributions are already acknowledged by commits and files history (the commit log already records copyright of my contributions without explicit notice)
if each contributor adds his own explicit copyright header when creating or modifying a file, it would be a mess
during the project lifetime, someone's contributions on a file may disappear due to modifs/patches/refactoring/etc. Maintaining a consistent copyright header would be a nightmare.
AUTHORS and CONTRIBUTORS files at project level are maybe more appropriate to give special public recognition

What are the rules on that @wolfv?

Thank you!

Document QUETZ_GITHUB_CLIENT_ID and delete the setup_env?

Just in case, before we go public, we should probably erase the setup_env.sh :)

remove "dev" endpoints from production servers

Changelog / as-of request

Having a changelog for conda channels may have many benefits

enable simpler (full) mirroring
allowing to make requests for packages "as of" a certain dates when only certain package versions were available for better reproducibility. This may be useful for usecases such as binder (cc @minrk).
channels that don't have a changelog could simply ignore the command line argument.

/api/packages/... should return how many packages available for pagination

Currently we can use /api/packages/channel/ for retrieving the package data in json format.

In order to implement pagination, we have skip and limit, but we don't have an easy way to get the total number of packages in a given channel.

Maybe we shoudl return that info with the same request?

logging dummyusers won't work

in the last update (commit 5409fc) logging dummy users using /api/dummyuser/alice etc. won't work (it gets redirected to the home page without user session active)

by bisecting I found that the regression was at this commit:
d33ac74

Per-channel TTL for the cache data

We should have a per-channel TTL for the cache data so that the server returns the appropriate data in the headers.

These are the headers considered by mamba & conda (mod and ETag)

https://github.com/mamba-org/mamba/blob/276e5a5ca2f394b240a7f166ebfa6e1a1849bbe1/src/subdirdata.cpp#L346-L352

Handle repodata patches

We need to figure out how repodata patches work, and how we can apply them in quetz.
This is important work, also with regards to the mirroring feature.

For example: should only the "main" server generate the canonical repodata.json (with patches applied) and the mirror servers use the repodata.json from that server, or should each mirror server generate their own repodata with the patches?
And how will that affect the changelog feature? #87

cc @btel

Failed package uploads still appear in repodata.json

Took me a while to get the s3 backend working. Resulted in lots of failed attempts at uploading. In spite of the package failing to upload due to s3 permission errors, the package still appears in the repodata.json. Quetz should probably be smart enough to not update the repodata.json if the package fails to upload to s3

fsspec filesystem handling link error

with the recent changes the following error seems to happen on Fedora 32 and Ubuntu 16.04:

INFO:     127.0.0.1:38530 - "POST /api/channels/channel0/files/ HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/applications.py", line 102, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/middleware/sessions.py", line 75, in __call__
    await self.app(scope, receive, send_wrapper)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/routing.py", line 550, in __call__
    await route.handle(scope, receive, send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/fastapi/routing.py", line 196, in app
    raw_response = await run_endpoint_function(
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/fastapi/routing.py", line 149, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/wolfv/Programs/quetz/quetz/main.py", line 400, in post_file
    handle_package_files(channel.name, files, dao, auth, force,
  File "/home/wolfv/Programs/quetz/quetz/main.py", line 458, in handle_package_files
    pkgstore.add_package(channel_name, file.file, dest)
  File "/home/wolfv/Programs/quetz/quetz/pkgstores.py", line 55, in add_package
    pkg.write(src.read())
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/fsspec/transaction.py", line 24, in __exit__
    self.complete(commit=exc_type is None)
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/fsspec/transaction.py", line 37, in complete
    f.commit()
  File "/home/wolfv/miniconda3/envs/quetz/lib/python3.8/site-packages/fsspec/implementations/local.py", line 244, in commit
    os.replace(self.temp, self.path)
OSError: [Errno 18] Invalid cross-device link: '/tmp/tmpfdh48fat' -> '/home/wolfv/Programs/quetz/test_quetz/channels/channel0/linux-64/coin-or-cgl-0.60.3-hf484d3e_0.tar.bz2'

I don't know yet how to fix it.

Any ideas @nowster ?

I am not sure why copyfileobj should attempt to link here...

Per-channel and per-package size limits

This may be used to prevent spamming of public instances of quetz.

There could be some exceptions set by channel owners for specific packages.

add timeouts with retries to mirror server requests

consider using request retries:

https://findwork.dev/blog/advanced-usage-python-requests-timeouts-retries-hooks/#combining-timeouts-and-retries

or tenacity:

https://tenacity.readthedocs.io/en/latest/

Fine-grained permissions for pulling?

Hi!

This issue aims to start a discussion about how to implement this feature.

What

Fine-grained perms for pulling is a major feature for individuals/groups/companies who wants:

access control/segregation
log activities (security purpose, metrics aggregation)

How

Implement a token based authentication for pulling to stay 100% compatible with conda CLI and anaconda packages repos.
Tokens could be generated for:

users
groups (contain users)
organizations (contain groups and/or users)

Discussion

repodata.json

The Quetz server should provide a repodata.json view depending on how is asking for it:

we should not display information of what packages are hosted by the server if they are private and user does not have the permission to download it
have a correct package resolution then a failure when trying to download some artefacts is not logical

The machinery to provide this user view has to be very efficient, maybe using caching. Large channels could generate heavy workloads.

Generate current_repodata.json

If we want compatibility with conda, we should think about how to generate the current_repodata.json subset.
This is implemented in conda-build index function in conda and apparently that's also used by the anaconda server implementation (as @beckermr has added some patches there)

https://github.com/conda/conda-build/blob/eae86a43ddf8aaf60ed850a2c54e512320b05a99/conda_build/index.py#L756

[feature-request] auto-transmute

The new .conda package format is more efficient than the old .bz2 format.

The conda-package-handling library can transmute .bz2 packages into the new .conda format. To make the quetz server as fast and efficient as possible it would be great to be able cache transmuted packages if the source was a .bz2 package so that quetz only served .conda packages.

new ci builds fail on unit tests

with the following error:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    """
    h2/frame_buffer
    ~~~~~~~~~~~~~~~
    
    A data structure that provides a way to iterate over a byte buffer in terms of
    frames.
    """
>   from hyperframe.exceptions import InvalidFrameError, InvalidDataError
E   ImportError: cannot import name 'InvalidDataError' from 'hyperframe.exceptions' (/home/runner/test_env/lib/python3.8/site-packages/hyperframe/exceptions.py)

/home/runner/test_env/lib/python3.8/site-packages/h2/frame_buffer.py:9: ImportError

https://github.com/mamba-org/quetz/pull/108/checks?check_run_id=1153496549

It's likely due to a recent release of h2 4.0.0 (known to work with 3.2.0): https://github.com/python-hyper/hyper-h2/releases/tag/v4.0.0

sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedObject) type "blob" does not exist

With the following config, I tried to create a new quetz instance:

config (internal info masked with {var})

[github]
# TODO: Figure out if there are any other auth schemes available?
# Register the app here: https://github.com/settings/applications/new
client_id = "{id}"
client_secret = "{secret}"

[sqlalchemy]
# TODO: See if we can use an aurora postgres backend here
database_url = "postgres+psycopg2://{username}:{password}@{host}:5432/quetz"
#database_url = "sqlite:///./quetz.sqlite"

[session]
# openssl rand -hex 32
secret = "{secret}"
https_only = false

[s3]
bucket_prefix="s3://{bucket}"

And got the following stack trace:

$ quetz create quetz_run --copy-conf ./dev_config.toml
Copying config file from ./dev_config.toml to quetz_run/config.toml
Traceback (most recent call last):
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedObject: type "blob" does not exist
LINE 3:  id BLOB NOT NULL,
            ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/userenvs/quetz/quetz/bin/quetz", line 33, in <module>
    sys.exit(load_entry_point('quetz', 'console_scripts', 'quetz')())
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/quetz/src/quetz/cli.py", line 286, in create
    db = get_session(config.sqlalchemy_database_url)
  File "/opt/quetz/src/quetz/database.py", line 26, in get_session
    Base.metadata.create_all(engine)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 4555, in create_all
    bind._run_visitor(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2097, in _run_visitor
    conn._run_visitor(visitorcallable, element, **kwargs)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1656, in _run_visitor
    visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/sql/visitors.py", line 145, in traverse_single
    return meth(obj, **kw)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 783, in visit_metadata
    self.traverse_single(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/sql/visitors.py", line 145, in traverse_single
    return meth(obj, **kw)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 827, in visit_table
    self.connection.execute(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 72, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1068, in _execute_ddl
    ret = self._execute_context(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/opt/userenvs/quetz/quetz/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedObject) type "blob" does not exist
LINE 3:  id BLOB NOT NULL,
            ^

[SQL:
CREATE TABLE users (
	id BLOB NOT NULL,
	username VARCHAR,
	PRIMARY KEY (id)
)

]
(Background on this error at: http://sqlalche.me/e/13/f405)

Looks like this might need to be replaced with the LargeBinary type according to this somewhat random github issue and the sqlalchemy docs. I'll give this LargeBinary thing a shot and report back

API keys endpoint doesn't contain no-role keys

After creating some keys, it's still empty...

Refactor conf system and cli

I would like to refactor both the configuration system and the cli to propose something more modular and leverage existing libs.

Motivations:

replace the implementation of type checking, default value, required or not, section vs entry, by an existing lib doing that better
have a modular and extensible config system that paves the way for having pluggable authenticators, storage, etc.
benefit from a cli allowing by design args parsing and not only a config file

I think about traitlets that provides all of those features.
It could be also very convenient to reuse/share some JupyterHub implementations.

I'm still surprised that traitlets never gained so popular that pydantic is. Is there some reasons explaining this difference of popularity that could help us to make a choice?

Do you have some thoughts about that @wolfv @SylvainCorlay ? @btel ? Any others?

Support for S3

It's great to see an open-source Conda/Mamba package repository, definitely a step in the right direction.

It would be great if Quetz were to use the S3 API to store objects. Aside from the ease of scaling horizontally it would provide, we could:

Use object locking to avoid unwanted side-effects from simultaneous uploads (e.g. race conditions during indexing);
Defer writes to disk (indexing aside), relying on the durability and reliability of S3;

Would you be open to a contribution implementing support for this?

GitHub-style search syntax

Where people can filter with author:wolfv is:pr...

We could do labels like:

channel:conda-forge version:>0.2.5

or stuff like that

Bake-in the notion of mirror

In order to reduce the cost of community hosting of quetz, and enable organizations to have a local mirror, we could bake in the notion of official mirror of quetz servers, with a push mechanism.

A user of the main Quetz server could register a mirror for a channel in the UI, and get

an initial "backfill" of the available packages.
pushes of uploaded packages to their mirror.

/docs doesn't work correctly

Tracked this issue down to: swagger-api/swagger-ui#6249

Virtual channels/repo

Hi!

I was thinking about creating virtual channels/repos as an aggregate of multiple channels/repos with search order. It is a capability handled by repo managers.

A virtual channel/repos allows access to multiple channels or repository (local, remote/proxy, or other virtual) through a single request url. The virtual channel configuration sets the priorities.

Why:

simplify access and configuring to multiple channels (alternative to a more complex .condarc file)
provide a single merged repodata file, processed by the server?
- minimize api requests/network load
- impact on package resolution efficiency?
no risk of a bad configuration file
enforce search order by exposing only the virtual channel and not the others separately
suggestions?

Does it makes sense?
It a starting point, to be discussed!

noarch channel is not created until a noarch package is uploaded

Running quetz with the s3 backend. Uploaded one osx-64 package and it successfully made it into the s3 bucket, but the noarch directory in that s3 bucket wasn't created (or maybe its a database level issue, not totally sure). In any event, saw this in the logs:

INFO:     10.100.180.208:45382 - "GET /channels/dev/osx-64/current_repodata.json HTTP/1.1" 200 OK
INFO:     10.100.180.208:45380 - "GET /channels/dev/noarch/current_repodata.json HTTP/1.1" 404 Not Found
INFO:     10.100.180.208:45382 - "GET /channels/dev/noarch/repodata.json HTTP/1.1" 404 Not Found

and got this message on the client side:

Collecting package metadata (current_repodata.json): failed

UnavailableInvalidChannel: The channel is not accessible or is invalid.
  channel name: channels/dev
  channel url: http://quetz.dev.dsci.zones.dtn.services/channels/dev
  error code: 404

You will need to adjust your conda configuration to proceed.
Use `conda config --show channels` to view your configuration's current state,
and use `conda config --show-sources` to view config file locations.

After uploading a noarch package, things work fine. Probably need to make sure that a noarch subdir exists at some point.

Bot integrations

This is an issue to discuss bot-related ideas to support the conda-forge usecases.

One idea is to have actions that run on- or after package upload (or maybe when the package is freshly indexed.
These actions can be registered as Python plugins (similar to jupyter server plugins), and store data on the disk or the database.

Ideas for actions:

Push package to quetz mirrors
Extract metadata from packages:
- collect all file names that are in a package to create a reverse index from filename to package cc @croth1
- extract symbol names or Python AST function signatures to enrich the metadata for bot consumption cc @CJ-Wright
- figure out what programming language a package belongs to to potentially namespace it later on
Send out email notifications to feedstock maintainers

Env var override suggestion

Having spent some time recently with airflow, I wonder what your thoughts are on handling env var overrides in Quetz similar to how they do it in Airflow:

AIRFLOW__{CONFIG_SECTION}__{CONFIG_KEY}

Basically separate QUETZ, CONFIG_SECTION and CONFIG_KEY with double underscores __ instead of single underscores _. This has a couple of benefits: easier to parse visually and also enforces uniqueness between sections. In theory you could have config like:

[config_test]
url = 'some url'
[config]
test_url = 'some other url'

and so with the current scheme in Quetz, these two would have identical environmental variables of QUETZ_CONFIG_TEST_URL whereas with the double underscore you'd get QUETZ__CONFIG_TEST__URL and QUETZ__CONFIG__TEST_URL

Thoughts?

Delete channel endpoint

Currently, there is no way to delete a channel.
Do we have endpoints to delete other things?

adding two versions of same package with sqlite backend causes segfaults

steps to reproduce:

# start quetz without  reload
quetz run test_quetz_test12  --dev --copy-conf dev_config.toml 

export QUETZ_API_KEY=...

curl -X POST "http://localhost:8000/api/channels" \
   -H  "accept: application/json" \
   -H  "Content-Type: application/json" \
   -H  "X-API-Key: ${QUETZ_API_KEY}" \
   -d '{"name":"channel12", "private":false }'
   
 quetz-client http://localhost:8000/api/channels/channel12 quetz/tests/data/test-package-0.1-0.tar.bz2 
 
 quetz-client http://localhost:8000/api/channels/channel12 quetz/tests/data/test-package-0.2-0.tar.bz2

Results = segmentation fault

Triaging

this might be related that the background tasks in fastapi run in separate threads. When we add files the indexes are updated in a background task using the function quetz.indexing.update_indexes:

quetz/quetz/main.py

Lines 728 to 729 in d0b7b2d

    
           if update_indexes: 
        
               background_tasks.add_task(indexing.update_indexes, dao, pkgstore, channel_name)

Running the function directly (in foreground) does not cause the segfault. The problem might be that we are re-using the same database session (attached to dao object) in several threads, that is known to cause problems with sqlite. postgres does not seem to have this problem (although reusing db session in mutliple threads still seem evil).

UI for repoquery depends

mamba repoquery depends is a great tool for understanding a packages' dependencies. It could be made even more useful if there were a nice UI which allowed for collapsing/filtering the dependencies of a package (or group of packages)

	pkgstore.create_channel(channel_name)

	dest = os.path.join(condainfo.info["subdir"], file.filename)
	file.file._file.seek(0)
	pkgstore.add_package(file.file, channel_name, dest)

	if update_indexes:
	background_tasks.add_task(indexing.update_indexes, dao, pkgstore, channel_name)