Giter Site home page Giter Site logo

datalad / datalad-container Goto Github PK

View Code? Open in Web Editor NEW
11.0 9.0 17.0 564 KB

DataLad extension for containerized environments

Home Page: http://datalad.org

License: Other

Makefile 0.21% Python 98.86% Shell 0.71% Batchfile 0.04% Jinja 0.18%
container datalad

datalad-container's Introduction

 ____          _           _                 _
|  _ \   __ _ | |_   __ _ | |      __ _   __| |
| | | | / _` || __| / _` || |     / _` | / _` |
| |_| || (_| || |_ | (_| || |___ | (_| || (_| |
|____/  \__,_| \__| \__,_||_____| \__,_| \__,_|
                                   Container

Build status Travis tests status codecov.io Documentation License: MIT GitHub release PyPI version fury.io DOI Conda

This extension enhances DataLad (http://datalad.org) for working with computational containers. Please see the extension documentation for a description on additional commands and functionality.

For general information on how to use or contribute to DataLad (and this extension), please see the DataLad website or the main GitHub project page.

Installation

Before you install this package, please make sure that you install a recent version of git-annex. Afterwards, install the latest version of datalad-container from PyPi. It is recommended to use a dedicated virtualenv:

# create and enter a new virtual environment (optional)
virtualenv --system-site-packages --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate

# install from PyPi
pip install datalad_container

It is also available for conda package manager from conda-forge:

conda install -c conda-forge datalad-container

Support

The documentation of this project is found here: http://docs.datalad.org/projects/container

All bugs, concerns and enhancement requests for this software can be submitted here: https://github.com/datalad/datalad-container/issues

If you have a problem or would like to ask a question about how to use DataLad, please submit a question to NeuroStars.org with a datalad tag. NeuroStars.org is a platform similar to StackOverflow but dedicated to neuroinformatics.

All previous DataLad questions are available here: http://neurostars.org/tags/datalad/

Acknowledgements

DataLad development is supported by a US-German collaboration in computational neuroscience (CRCNS) project "DataGit: converging catalogues, warehouses, and deployment logistics into a federated 'data distribution'" (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411). Additional support is provided by the German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform. This work is further facilitated by the ReproNim project (NIH 1P41EB019936-01A1).

datalad-container's People

Contributors

adswa avatar asmacdo avatar bpoldrack avatar christian-monch avatar dependabot[bot] avatar jsheunis avatar jwodder avatar kyleam avatar loj avatar mhaid avatar mih avatar nobodyinperson avatar trellixvulnteam avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datalad-container's Issues

Consider skopeo-based adapter

skopeo "is a command line utility that performs various operations on container images and image repositories". It has a copy command to "copy container images between various storage mechanisms".

Here's an example that copies a container from DockerHub to a local OCI-layout directory.

$ skopeo copy docker://busybox:latest oci:bb:latest
Getting image source signatures
Copying blob sha256:7c9d20b9b6cda1c58bc4f9d6c401386786f584437abbe87e58910f8a9a15386b
 742.94 KB / 742.94 KB [====================================================] 0s
Copying config sha256:8cf90cc9e23fce3bb22a95933b0f1008115828369857f09825dfb376b175f897
 575 B / 575 B [============================================================] 0s
Writing manifest to image destination
Storing signatures

$ tree --charset=ascii bb
bb
|-- blobs
|   `-- sha256
|       |-- 7c9d20b9b6cda1c58bc4f9d6c401386786f584437abbe87e58910f8a9a15386b
|       |-- 8cf90cc9e23fce3bb22a95933b0f1008115828369857f09825dfb376b175f897
|       `-- 96fed174fbe8d6aeab995eb9e7fc03a6326abbc25adb5aa598e970dfe8b32c6d
|-- index.json
`-- oci-layout

2 directories, 5 files

I need to look into skopeo more, but here are some things to consider:

  • The above command works without docker installed or the daemon running.

  • Like the docker save output, the OCI layout can be nicely stored and de-duplicated as annex content in the dataset.

  • The above layout should still work with a solution to attach URLs to individual layers (gh-98).

    Edit: Taking a slightly closer look I'm not sure, so s/should/might/.

  • This may help with podman integration (gh-89) since IIUC podman still works from a local store of containers and we'd need a similar save/load adapter.

  • Similar to the command above, we should be able to copy from the dataset directory to the local daemon storage.

  • One of the motivations of the docker adapter is lack of singularity support on Windows, so we probably wouldn't be able to completely replace the current approach with skopeo, which I don't think has any Windows support. And docker load doesn't support the OCI layout (there's a draft PR from February),

  • skopeo would be an optional dependency.

  • skopeo isn't yet packaged for Debian.

provide updateurl for docker:// containers?

there is no updateurl in .datalad/config for docker:// (singularity) containers even though it is possible to update (replace) them in the future

$  datalad -l debug containers-add -u docker://poldracklab/mriqc mriqc
save(ok): /home/chrispycheng/proj/nuisance (dataset)
containers_add(ok): /home/chrispycheng/proj/nuisance/.datalad/environments/mriqc/image (file)
action summary:
  containers_add (ok: 1)
  save (ok: 1)
$ cat .datalad/config 
...
[datalad "containers.mriqc"]
        image = .datalad/environments/mriqc/image
        cmdexec = singularity exec {img} {cmd}

Support 'podman'

https://podman.io

What is Podman? Podman is a daemonless container engine for developing, managing, and running OCI Containers on your Linux System. Containers can either be run as root or in rootless mode. Simply put: alias docker=podman.

singularity build fails to create correct image from docker://poldracklab/mriqc on smaug

I did initially "containers-add", which completed Ok but image didn't "feel right". So did again - again the same. Did "manually":

$ singularity pull docker://poldracklab/mriqc
...
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/jupyter/kernels/python3/logo-64x64.png'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/jupyter/nbextensions'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/jupyter/nbextensions/jupyter-js-widgets'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/jupyter/nbextensions/jupyter-js-widgets/extension.js'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/jupyter/nbextensions/jupyter-js-widgets/extension.js.map'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/man'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/man/man1'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/share/man/man1/ipython.1.gz'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/xgboost'
WARNING: Warning handling tar header: Can't create 'usr/local/miniconda/xgboost/libxgboost.so'
WARNING: Warning handling tar header: Can't create 'usr/local/src/mriqc'
WARNING: Warning handling tar header: Can't create 'usr/local/src/mriqc/mriqc'
WARNING: Warning handling tar header: Can't create 'usr/local/src/mriqc/mriqc/VERSION'
Exploding layer: sha256:a4cca34e324bf4caf8ec6dce3881783e8fe1e3a6bb554234b1bd8a9110e064b7.tar.gz
Exploding layer: sha256:b21ed841574bd5ca0816169ed39d4fb446e37ee8446e7ab3a8d0802185a17df8.tar.gz
ERROR  : tar extraction error: Write failed
WARNING: Warning handling tar header: Write failed
ERROR  : tar extraction error: Write failed
WARNING: Warning handling tar header: Write failed
WARNING: Building container as an unprivileged user. If you run this container as root
WARNING: it may be missing some functionality.
Building Singularity image...
Singularity container built: ./mriqc.simg
Cleaning up...
Done. Container is at: ./mriqc.simg

$ echo $?
0

and there is now an image but it is incomplete. Not sure yet what is the reason etc. Singularity is 2.6.1-2~nd90+1 .
But I wonder if we should parse output and warn if there was any WARNING from singularity

More meaningful error message(s) whenever cmdexec contains "unknown" placeholders

$> datalad containers-list
containers(ok): /home/yoh/proj/repronim/containers/singularity/bids/bids-validator--1.2.3 (file)

$> datalad containers-run 
[ERROR  ] u'dspath' [containers_run.py:__call__:87] (KeyError) 

since I thought (from datalad run --help) that dspath is available. But I think it would be better to provide custom error message like

 `cmdexec` references an unknown placeholder `dspath`

or alike.
Also, it might be nice to verify (if possible) all those placeholders in containers-add whenever a custom --call-fmt was provided?

Provide URLs to individual docker image layers

It was postponed until there is some interest in storing/running docker images, but we seems didn't even create an issue for that.
There was some interest from users expressed so here is this issue: added docker image layers do not have URLs to point to the docker hub so they could later be fetched on another box:

(git-annex)hopa:/tmp/test-docker[master]
$> datalad containers-add -i kwyk-img -u dhub://neuronets/kwyk:version-0.2-cpu kwyk
[INFO   ] Saved neuronets/kwyk:version-0.2-cpu to /tmp/test-docker/kwyk-img 
save(ok): /tmp/test-docker (dataset)
containers_add(ok): /tmp/test-docker/kwyk-img (file)
action summary:
  containers_add (ok: 1)
...
$> git annex whereis kwyk-img | head
whereis kwyk-img/1374c101c6f7038762c71038589946d60dcf6ea66dd9b89d511474b727aa7f0e/VERSION (1 copy) 
  	0aac68f2-5a96-4826-b5ff-69ec5d31863e -- yoh@hopa:/tmp/test-docker [here]
ok
whereis kwyk-img/1374c101c6f7038762c71038589946d60dcf6ea66dd9b89d511474b727aa7f0e/json (1 copy) 
  	0aac68f2-5a96-4826-b5ff-69ec5d31863e -- yoh@hopa:/tmp/test-docker [here]

also may be there should be .gitattributes created in the image directory to instruct .json files to be committed directly to git not annex

Provide some progress reporting or at least INFO msg upon long running containers-add

ATM running smth like

datalad containers-add -u docker://poldracklab/mriqc mriqc

seems to just do something without providing any feedback to the user for minutes on what is going on - might as well be stuck. Providing at least an INFO message such as

[INFO] Fetching and adding container image from docker://poldracklab/mriqc to singularity

would demystify the situation

Link containers provided by subdatasets

It is beneficial to be able to maintain "toolchain" datasets that provide a range of containers (of the right versions). With those it is sufficient to link to one of such datasets as a subdataset to provide a wide range of functionality, instead of duplicating container configuration effort over and over.

There is an issue however: How to do this, such that:

  • no (or very little) duplicate configuration settings are created that all have the potential to get outdated (image moved, URL changed, ...)
  • functionality is not impaired wrt to a straight inclusion of a container image into a dataset

Proposal

  • containers-add gets enhanced with the ability to add a toolchain subdataset
  • for each container in a toolchain dataset, the parent dataset gets the variable datalad.container.<name>.providedby = <subdataset path> set.
    • This will make sure that there are no future namespace conflicts within the scope of the parent dataset
    • It will make list work (needs ability to mark individual containers as known, but absent when subdataset is not installed)
    • It will make remove work (the variable get's removed and subsequently the container is unknown to the parent, independent of the subdataset state)
    • It minimizes the linkage (and potential for breakage) between parent dataset and subdataset configuration

Pretty much all container commands would need to be adjusted to resolve container properties from the identified subdataset, but that should be a minor effort.

I do not foresee problems regarding a future containers-update command. It could simply update --merge any toolchain subdataset known, and update the config.

containers-remove -i issues an undue warning

adina@head1:/data/movieloc/backup_store/saccs$ datalad containers-remove -i  BIDSSacc
remove(ok): ../../../../backup/data/movieloc/saccs/.datalad/environments/BIDSSacc/image
[WARNING] path does not exist: /data/movieloc/backup_store/saccs/.datalad/environments/BIDSSacc/image [save(/data/movieloc/backup_store/saccs/.datalad/environments/BIDSSacc/image)] 
save(ok): /data/movieloc/backup_store/saccs (dataset)
containers_remove(ok): /data/movieloc/backup_store/saccs (dataset)
action summary:
  containers_remove (ok: 1)
  remove (ok: 1)
  save (ok: 1)
adina@head1:/data/movieloc/backup_store/saccs$ git show --stat
commit 62def3acd3847326363491964452134ea3000de9 (HEAD -> master)
Author: Adina Wagner <[email protected]>
Date:   Fri Nov 30 13:03:34 2018 -0500

    [DATALAD] Remove container BIDSSacc

 .datalad/config                      | 4 ----
 .datalad/environments/BIDSSacc/image | 1 -
 2 files changed, 5 deletions(-)

Environments are added to git by default

By default, datalad create currently creates the following .gitattributes file:

% cat .datalad/.gitattributes
# Text files (according to file --mime-type) are added directly to git.
# See http://git-annex.branchable.com/tips/largefiles/ for more info.
** annex.largefiles=nothing
metadata/objects/** annex.largefiles=(largerthan=20kb)

By these rules, images in .datalad/environments will be added to git rather than git-annex. I think datalad-container should add something like environments/** annex.largefiles=anything to .datalad/.gitattributes to prevent this.

{pwd} placeholder for container's execmd

there was #34 with closed-without-merge #35 which added {pwd} to be provided at datalad run level. It is frequently (if not most of the time) needed to run within current directory within container.
Singularity doesn't chdir to original pwd unless it is bound mounted. Since by default it bind mounts HOME and /tmp/, in those directories things work as expected. But if your current directory somewhere else -- you would need an explicit --bind. With singularity 2.6.1-2~nd100+1 (on smaug):

$ cmd='singularity run /mnt/btrfs/datasets/datalad/crawl-misc/nwb/najafi-2018-nwb/environments/dandi-pynwb.simg bash -c "pwd"'

$ for d in ~/proj /tmp/subdir /data; do echo "D: $d"; ( builtin cd $d; eval $cmd; ); done
D: /home/yoh/proj
/home/yoh/proj
D: /tmp/subdir
/tmp/subdir
D: /data
/home/yoh

If PWD is bind mounted -- then things seems to work as expected. So default cmdexec for singularity containers is probably not usable when execution happens outside of HOME or /tmp. We would probably need to add an explicit --bind of PWD. For that purpose we would need to expose {pwd} placeholder. I also thought that it might be benefitial to be able to mount the entire dataset (when running within subdirectory), so ideally we add also {dspath} (which could differ from {img_dspath} e.g. when image is in a subdataset.

Related:

container exec vs run and our defaults - should we support configuration for both modes?

ATM we rely on singularity exec as being the default way to interact with the container. It makes sense in particular to align datalad run and datalad containers-run invocation so the {cmd} is present in both. BUT

  • many (if not the majority) containers are prepared to have their target application run upon run. So it makes sense to "run" those containers instead of "exec" and manually providing the {cmd}
  • there might be an entry point script which adjusts environment for making it possible to use target application. E.g. neurodocker generated containers for FSL would do source /etc/fsl/fsl.sh for us, and without going through the run container environment would not be fully prepared.

Since it is a common for both singularity and docker, I wonder if we should add an option to containers-add to specify the type of invocation (run vs exec) which would then provide appropriate default config within .datalad/config depending on that specification. That would mean though, that for those containers which are run'ed, there might (or might not) be {cmd} specified on cmdline. E.g. for reproin ATM we specify --entrypoint "/neurodocker/heudiconv.sh" so we could still run the container but with the {cmd} and so that environment is fully prepared by that script

petition: for the sanity of humanity rename to datalad-containers

OR rename all commands to be single : container-run, container-add, container-list.
This way it would stay consistent between the name of the extension and added commands namespace.

IMHO the easiest to rename the whole module and also it would sounds a bit more kosher, and inline with siblings, subdatasets etc commands of datalad. With plural version, only the containers-run would be a bit awkward (since we run only one container right, or not?) but still imho ok
We could provide a compatibility version of datalad-container pkg for pypi to ease upgrades.

Outline scope of extension (in docs)

Many thing are moving fast ATM. I would be good to have a short statement on how this extension fits in the larger datalad ecosystem. What will it do, what commands are envisioned, what will it not do?

Beyond what is already in #3 I see:

  • a run command wrapper that call run with the right args to execute stuff in a chosen container (will need datalad/datalad#2464
  • a remove command as a counter part to add
  • #1

and that is probably it for a start.

provide proper error message when image is not found on singularity hub

ATM

$> datalad containers-add blah -u shub://ReproNim/containers:bids-validator--non-existent 
[ERROR  ] 'image' [containers_add.py:_resolve_img_url:42] (KeyError)
traceback/details:
$> datalad --dbg containers-add bids-validator -i images/bids/bids-validator--1.3.1.sing --update --call-fmt '{img_dspath}/scripts/singularity_cmd run {img} {cmd}' -u shub://ReproNim/containers:bids-validator--1.3.1 
Traceback (most recent call last):
  File "/usr/bin/datalad", line 8, in <module>
    main()
  File "/usr/lib/python3/dist-packages/datalad/cmdline/main.py", line 494, in main
    ret = cmdlineargs.func(cmdlineargs)
  File "/usr/lib/python3/dist-packages/datalad/interface/base.py", line 628, in call_from_parser
    ret = list(ret)
  File "/usr/lib/python3/dist-packages/datalad/interface/utils.py", line 435, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/usr/lib/python3/dist-packages/datalad/interface/utils.py", line 529, in _process_results
    for res in results:
  File "/usr/lib/python3/dist-packages/datalad_container/containers_add.py", line 222, in __call__
    imgurl = _resolve_img_url(url)
  File "/usr/lib/python3/dist-packages/datalad_container/containers_add.py", line 42, in _resolve_img_url
    url = shub_info['image']
KeyError: 'image'

> /usr/lib/python3/dist-packages/datalad_container/containers_add.py(42)_resolve_img_url()
-> url = shub_info['image']
(Pdb) p shub_info
{'detail': 'Not found.'}

I think we might want to define a proper exception (ImageNotFoundError base class, and SingularityHubImageNotFoundError) to be raised in corresponding spot(s), and then caught/reported consistently across backends with an more informative error message error, e.g.
"Failed to resolve image url for shub://ReproNim/containers:bids-validator--non-existent . Response was: 'Not found'" or alike

singularity hub can return HTML instead of expected json

I saw a cryptic errors from datalad:

Expecting value: line 2 column 1 (char 1) [decoder.py:raw_decode:400] (JSONDecodeError)

upon containers-add. The reason is that singularity hub might return html (not json) whenever it gets "pissed" due to high volume of requests from an IP:

(git)smaug:~/proj/repronim/containers[master]git
$> datalad --dbg containers-add bids-validator -i images/bids/bids-validator--1.3.1.sing --update --call-fmt '{img_dspath}/scripts/singularity_cmd run {img} {cmd}' -u shub://ReproNim/containers:bids-validator--1.3.1
Traceback (most recent call last):
  File "/usr/bin/datalad", line 8, in <module>
    main()
  File "/usr/lib/python3/dist-packages/datalad/cmdline/main.py", line 494, in main
    ret = cmdlineargs.func(cmdlineargs)
  File "/usr/lib/python3/dist-packages/datalad/interface/base.py", line 628, in call_from_parser
    ret = list(ret)
  File "/usr/lib/python3/dist-packages/datalad/interface/utils.py", line 435, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/usr/lib/python3/dist-packages/datalad/interface/utils.py", line 529, in _process_results
    for res in results:
  File "/usr/lib/python3/dist-packages/datalad_container/containers_add.py", line 222, in __call__
    imgurl = _resolve_img_url(url)
  File "/usr/lib/python3/dist-packages/datalad_container/containers_add.py", line 41, in _resolve_img_url
    shub_info = loads(req.text)
  File "/usr/lib/python3/dist-packages/simplejson/__init__.py", line 518, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3/dist-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

> /usr/lib/python3/dist-packages/simplejson/decoder.py(400)raw_decode()
-> return self.scan_once(s, idx=_w(s, idx).end())
(Pdb) up
> /usr/lib/python3/dist-packages/simplejson/decoder.py(370)decode()
-> obj, end = self.raw_decode(s)
(Pdb) p s
'\n<!DOCTYPE html>\n<html>\n  <head>\n    <meta charset="UTF-8">\n    <title>Beep Boop, Not Found!</title>\n    <style>\n    #robot {\n        position: absolute;\n        top:25%;\n        left:25%;\n        text-align: center;\n        font-family: Roboto, sans-serif;\n    }\n    </style>\n    <meta http-equiv="X-UA-Compatible" content="IE=edge">\n    <meta charset="utf-8" />\n    <link href=\'https://fonts.googleapis.com/css?family=Roboto\' rel=\'stylesheet\' type=\'text/css\'>\n  </head>\n  <body>\n    <div id="robot">\n        <h1 style="color:#999">You\'ve done that too many times, try again tomorrow.</h1>\n        <img width="300px" src="/static/img/hang10_robot.png">\n    </div>\n  </body>\n</html>\n'

I guess we should at least fail with some descriptive message when we detect such kind of a response or (ideally for a user) implement "wait for it to calm down" behavior. I bet there is some preset delay we could wait (I think ATM it is just some seconds or a few minutes).

Provide INFO log message upon fetching docker image

#69 was a similar, but singularity oriented. I was adding a docker image and there also were no a sign of life for a while:

$> datalad containers-add -i kwyk-img -u dhub://neuronets/kwyk:version-0.2-cpu kwyk
[INFO   ] Saved neuronets/kwyk:version-0.2-cpu to /tmp/test-docker/kwyk-img 
save(ok): /tmp/test-docker (dataset)
containers_add(ok): /tmp/test-docker/kwyk-img (file)
action summary:
  containers_add (ok: 1)
  save (ok: 1)

Needs Conda recipe/package

I am not sure if singularity would ever be natively there but imho isn't needed to have this package there as well

Undesired interactivity

I just wanted to know whether there are containers present....

(datalad3-dev) mih@meiner /tmp/func (git)-[master] % datalad containers-list
Container location
path within the dataset where to store containers [.datalad/environments]: 

After I hit return, I am left with a dirty dataset:

--- a/.datalad/config
+++ b/.datalad/config
@@ -2,3 +2,5 @@
        id = e1b670f4-373b-11e8-9540-f0d5bf7b5561
 [datalad "metadata"]
        nativetype = dicom
+[datalad "containers"]
+       location = .datalad/environments

add: local path for --url fails with git-annex 6.20180626

As expected, following git-annex 6.20180626, running containers-add with a local url and no additional configuration will fail:

%> datalad containers-add foo --url /tmp/foo.simg
Configuration does not allow accessing file:///tmp/foo.simg
Configuration does not allow accessing file:///tmp/foo.simg
git-annex: addurl: 1 failed
[ERROR  ] CommandError: command 'addurl'
| Error, annex reported failure for addurl (url='file:///tmp/foo.simg'): {'command': 'addurl', 'success': False, 'file': None} [containers_add(/tmp/dl/another/.datalad/environments/foo/image)]
containers_add(error): /tmp/dl/another/.datalad/environments/foo/image (file) [CommandError: command 'addurl'
Error, annex reported failure for addurl (url='file:///tmp/foo.simg'): {'command': 'addurl', 'success': False, 'file': None}]
containers_add(error): /tmp/dl/another/.datalad/environments/foo/image (file) [no image at /tmp/dl/another/.datalad/environments/foo/image]
action summary:
  containers_add (error: 2)

git-annex's output to stderr mentions that it's a configuration issue [*], but from that information alone I doubt many DataLad users would know what to do next. So I think we should do something here. The options I can think of:

  • Give a more specific error message (at the level of containers_add or annexrepo.add_url_to_file)

    We could tell the user the do something like git config annex.security.allowed-url-schemes file, but we shouldn't be recommending that setting to users. It'd be nicer if we could recommend a one-time override with datalad -c annex.security.allowed-url-schemes=file ..., but that setting isn't propagated to the git-annex call (and would fail on subsequent git annex get calls).

  • Stop supporting local files as an argument to --url

  • Add locals files via direct copying rather than an add_url_to_file call

    I think the main downside would be that the local source is no longer recorded by git-annex and we can't run git annex get on the file.

[*] I'm not sure why it does so twice; I guess add_url_to_file makes two addurl calls at some point, but I haven't checked.

Needs `--cmd` equivalent to support container runscripts

Suppose we have a singularity image with a runscript. It would need no actual command to execute in a container, the container is already the command itself. ATM the containers-run API cannot handle the situation where one needs to pass an option to the container runscript. Argparse would consider this to be an option to containers-run, and fail.

Solution: replicate the --cmd option that datalad already has for the main parser to syntactically disambiguate this case.

2 tests fail locally: test_add_noop and test_docker

$ python -m nose -s -v datalad_container
...
======================================================================
ERROR: datalad_container.tests.test_containers.test_add_noop
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 607, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/yoh/proj/datalad/datalad-container/datalad_container/tests/test_containers.py", line 42, in test_add_noop
    on_failure='ignore')
  File "/home/yoh/proj/datalad/datalad-master/datalad/distribution/dataset.py", line 527, in apply_func
    return f(**kwargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 490, in eval_func
    return return_func(generator_func)(*args, **kwargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 478, in return_func
    results = list(results)
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 427, in generator_func
    result_renderer, result_xfm, _result_filter, **_kwargs):
  File "/home/yoh/proj/datalad/datalad-master/datalad/interface/utils.py", line 520, in _process_results
    for res in results:
  File "/home/yoh/proj/datalad/datalad-container/datalad_container/containers_add.py", line 249, in __call__
    copyfile(url, image)
  File "/usr/lib/python2.7/shutil.py", line 97, in copyfile
    with open(dst, 'wb') as fdst:
IOError: [Errno 13] Permission denied: '/home/yoh/.tmp/datalad_temp_test_add_noopcjESpu/dummy'

======================================================================
FAIL: datalad_container.tests.test_schemes.test_docker
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 841, in newfunc
    return func(*args, **kwargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 607, in newfunc
    return t(*(arg + (filename,)), **kw)
  File "/home/yoh/proj/datalad/datalad-container/datalad_container/tests/test_schemes.py", line 33, in test_docker
    re_=True, match=False)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 415, in ok_file_has_content
    assert_re_in(content, file_content, **kwargs)
  File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 1158, in assert_re_in
    msg or "Not a single entry matched %r in %r" % (regex, c)
AssertionError: Not a single entry matched 'There is no runscript defined for this container' in ['#!/bin/sh\nOCI_ENTRYPOINT=\'\'\nOCI_CMD=\'"sh"\'\n# ENTRYPOINT only - run entrypoint plus args\nif [ -z "$OCI_CMD" ] && [ -n "$OCI_ENTRYPOINT" ]; then\n    SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT} $@"\nfi\n\n# CMD only - run CMD or override with args\nif [ -n "$OCI_CMD" ] && [ -z "$OCI_ENTRYPOINT" ]; then\n    if [ $# -gt 0 ]; then\n        SINGULARITY_OCI_RUN="$@"\n    else\n        SINGULARITY_OCI_RUN="${OCI_CMD}"\n    fi\nfi\n\n# ENTRYPOINT and CMD - run ENTRYPOINT with CMD as default args\n# override with user provided args\nif [ $# -gt 0 ]; then\n    SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT} $@"\nelse\n    SINGULARITY_OCI_RUN="${OCI_ENTRYPOINT} ${OCI_CMD}"\nfi\n\neval ${SINGULARITY_OCI_RUN}\n\n']

is that happening just to me?

`add` does not fail on conflicting image location

% datalad containers-remove conversion
save(ok): /home/data/psyinf/scratch/multires3t/bids (dataset)                                                                                                                        
containers_remove(ok): /home/data/psyinf/scratch/multires3t/bids (dataset)
action summary:
  containers_remove (ok: 1)
  save (ok: 1)
(datalad3-dev) mih@zing ...e/data/psyinf/scratch/multires3t/bids (git)-[master] % datalad containers-add conversion -u shub://mih/ohbm2018-training:heudiconv --call-fmt 'singularity exec --bind {{pwd}} {img} {cmd}'
  .datalad/environments/conversion/image already exists; not overwriting
git-annex: addurl: 1 failed                                                                                                                                                          
save(ok): /home/data/psyinf/scratch/multires3t/bids (dataset)                                                                                                                        
containers_add(ok): /home/data/psyinf/scratch/multires3t/bids/.datalad/environments/conversion/image (file)                                                                          
action summary:                                                                                                                                                                      
  containers_add (ok: 1)                                                                                                                                                             
  save (ok: 1)

result_renderer for containers-list

containers-list needs a result renderer to not report absolute paths to the images but the containers' names to be used by other commands to reference them (or name + path alternytively).

Adding image to inputs may surprise users that use "{inputs}" placeholder

Taking containers-run as a drop-in replacement for run, a user wouldn't expect to see the image at the end of the input list.

%> datalad containers-run "echo foo >foo"
%> datalad containers-run --input=foo "echo {inputs} | sed 's/ /\n/g' >input-list"
%> cat input-list
foo
.datalad/environments/bbox/image

We could add yet another argument (something like inputs_extra) to run.run_command and have containers-run use that directly. I don't think that's pretty, but I can't think of anything else at the moment.

incorrect operation of containers-run from within subdirectories

smaug:~/proj/nuisance$ cat .datalad/config
[datalad "dataset"]
        id = 1a495a66-b20d-11e8-9335-002590f97d84
[datalad "containers.reproin"]
        updateurl = shub://ReproNim/reproin:0.5.3
        image = .datalad/environments/reproin/image
        cmdexec = singularity exec {img} {cmd}
[datalad "containers.mriqc"]
        image = .datalad/environments/mriqc/image
        cmdexec = .datalad/environments/_bin/singularity-exec {img} {cmd}
[datalad "containers.simple-workflow"]
        image = .datalad/environments/simple-workflow/image
        cmdexec = singularity exec {img} {cmd}

and the shim is

smaug:~/proj/nuisance$ cat .datalad/environments/_bin/singularity-exec
#!/bin/bash

# A shim to take care about running sanitized home etc

bindir=$(dirname $0 | xargs readlink -f)
#echo "I: bindir=$bindir"
homedir=$(echo $bindir | xargs dirname )/_home
topdir=$(echo $bindir | xargs dirname | xargs dirname | xargs dirname )

echo "I: exec'ing with sanitized singularity with"
echo "I:   HOME=$homedir"
echo "I:   $topdir bind mounted"
singularity exec \
        -H "$homedir" \
        -B "$topdir:$topdir" \
        -e \
        "$@"

if I cd under code/ subdirectory containers-run fails in a number of ways:

smaug:~/proj/nuisance/code$ datalad containers-run  -n mriqc --explicit pwd
[INFO   ] Making sure inputs are available (this may take some time) 
[WARNING] path not associated with any dataset [get(/home/chrispycheng/proj/.datalad/environments/mriqc/image)] 
[WARNING] Input does not exist: /home/chrispycheng/proj/.datalad/environments/mriqc/image 
[INFO   ] == Command start (output follows) ===== 
I: exec'ing with sanitized singularity with
I:   HOME=/home/chrispycheng/proj/nuisance/.datalad/environments/_home
I:   /home/chrispycheng/proj/nuisance bind mounted
ERROR  : Image path ../.datalad/environments/mriqc/image doesn't exist: No such file or directory
ABORT  : Retval = 255
[INFO   ] == Command exit (modification check follows) ===== 
CommandError: command '.datalad/environments/_bin/singularity-exec ../.datalad/environments/mriqc/image pwd' failed with exitcode 255
Failed to run '.datalad/environments/_bin/singularity-exec ../.datalad/environments/mriqc/image pwd' under '/home/chrispycheng/proj/nuisance'. Exit code=255.

for a shimmed one, and here is for the regular one:

smaug:~/proj/nuisance/code$ datalad containers-run  -n reproin --explicit pwd
[INFO   ] Making sure inputs are available (this may take some time) 
[WARNING] path not associated with any dataset [get(/home/chrispycheng/proj/.datalad/environments/reproin/image)] 
[WARNING] Input does not exist: /home/chrispycheng/proj/.datalad/environments/reproin/image 
[INFO   ] == Command start (output follows) ===== 
ERROR  : Image path ../.datalad/environments/reproin/image doesn't exist: No such file or directory
ABORT  : Retval = 255
[INFO   ] == Command exit (modification check follows) ===== 
CommandError: command 'singularity exec ../.datalad/environments/reproin/image pwd' failed with exitcode 255
Failed to run 'singularity exec ../.datalad/environments/reproin/image pwd' under '/home/chrispycheng/proj/nuisance'. Exit code=255

so it first seems to account for subdirectory while deducing (relative) image path, but then does get and the execution from the repo top directory, thus failing on both ends.

  • FWIW containers-list seems to be ok
smaug:~/proj/nuisance/code$ datalad containers-list
containers(ok): /home/chrispycheng/proj/nuisance/.datalad/environments/mriqc/image (file)
containers(ok): /home/chrispycheng/proj/nuisance/.datalad/environments/reproin/image (file)
containers(ok): /home/chrispycheng/proj/nuisance/.datalad/environments/simple-workflow/image (file)
action summary:
  containers (ok: 3)

Support shub:// urls

I think we discussed it but I see no issue. Since singularity hub currently furnishes only short lived URLs, all the urls added by datalad-container to point to singularity hub are useless. So I guess we need to support shub://org/repo[@digest] URIs. If you @kyleam provide me a recipe I will look into adding it for support to datalad downloads.

Ideally we would need to add a helper to rewrite already existing (not working) URLs to possibly make use of updateurl, and figure out correct digest(s). In some cases, those urls would point to immutable images, so that only image would be the only one.

git annex fails to download from singularity-hub

Here is the debug output for an example URL. Downloading this URL with wget works just fine. @yarikoptic @kyleam any idea if there is maybe a user-agent issue? Also just registering the URL (with --fast and or --relaxed merely postpones the failure to an eventual annex get).

% git annex -d addurl 'https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media' --file dummy   
[2018-05-19 09:45:02.290434316] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2018-05-19 09:45:02.295979398] process done ExitSuccess
[2018-05-19 09:45:02.296058451] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2018-05-19 09:45:02.297488175] process done ExitSuccess
[2018-05-19 09:45:02.297736902] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..ffb28d91b96596e76e2b64911d545b22eecd4f7f","--pretty=%H","-n1"]
[2018-05-19 09:45:02.299333376] process done ExitSuccess
[2018-05-19 09:45:02.299815717] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2018-05-19 09:45:02.300670291] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2018-05-19 09:45:02.302931015] read: git ["config","--null","--list"]
[2018-05-19 09:45:02.304304726] process done ExitSuccess
[2018-05-19 09:45:02.304524805] read: git ["--git-dir=../../home/mih/dicom_demo/functional/.git","--work-tree=../../home/mih/dicom_demo/functional","--literal-pathspecs","show-ref","git-annex"]
[2018-05-19 09:45:02.305947331] process done ExitFailure 1
addurl https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2Fdatalad%2Fdatalad-container%2F208665e1d1f286e453eebc9fff2849b57d9d96f1%2F4a24a5a578b6ea9cb93abed9699b4e93%2F4a24a5a578b6ea9cb93abed9699b4e93.simg?generation=1526713626447377&alt=media 

failed
[2018-05-19 09:45:02.492113498] process done ExitSuccess
[2018-05-19 09:45:02.492673519] process done ExitSuccess
git-annex: addurl: 1 failed

from annex get:

(from web...) 

(from web...) 


  Unable to access these remotes: web

  Try making some of these repositories available:
        00000000-0000-0000-0000-000000000001 -- web
failed
[2018-05-19 09:44:32.893129352] process done ExitSuccess
[2018-05-19 09:44:32.893735716] process done ExitSuccess
git-annex: get: 1 failed

`add` crashed on no-arg invocation

% datalad containers-add
[ERROR  ] join() argument must be str or bytes, not 'NoneType' [genericpath.py:_check_arg_types:149] (TypeError) 

docker adapter: Provide better handling of non-unique references

As I was working on gh-98, I hit into a case where save resulted in a multi-image directory that load can't handle.

To trigger this, set up at least two images so that a reference without a tag will match both.

$ docker pull busybox:musl
$ docker pull busybox:latest
$ docker images busybox
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
busybox             musl                8cd3c91eb512        13 days ago         1.44MB
busybox             latest              19485c79a9bb        13 days ago         1.22MB

If we call save with the image ID or a tag that narrows things to a single image, things work fine:

$ cd $(mktemp -d --tmpdir dc-XXXXXXX)
$ python -m datalad_container.adapters.docker save busybox:latest latest
[INFO   ] Saved busybox:latest to latest 
Saved busybox:latest to latest
$ python -m datalad_container.adapters.docker run latest ls
latest  notag

But if we leave off the tag, save will dump everything matched, and load will rightly complain that the directory doesn't have a unique image ID.

$ python -m datalad_container.adapters.docker save busybox notag
[INFO   ] Saved busybox to notag
Saved busybox to notag
$ python -m datalad_container.adapters.docker run notag ls
[...]
  File "/home/kyle/src/python/datalad-container/datalad_container/adapters/docker.py", line 95, in load
    image_id = "sha256:" + get_image(path)
[...]
ValueError: Could not find a unique JSON configuration object in notag

The tree output for both these directories is included below. Note the multiple top-level json files.

save should check that the directory it produces only has one image and abort if it doesn't.

tree output
notag
|-- 0520e8d0c59f650aa60000987f6ac9ce99121c38c4283ef6eb92689e28c45144
|   |-- json
|   |-- layer.tar
|   `-- VERSION
|-- 19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d.json
|-- 65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542
|   |-- json
|   |-- layer.tar
|   `-- VERSION
|-- 8cd3c91eb5121065cdbae44e77b70a0b2848904d0d75fd8a9ecffb2747b3d741.json
|-- manifest.json
`-- repositories
latest
|-- 19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d.json
|-- 65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542
|   |-- json
|   |-- layer.tar
|   `-- VERSION
|-- manifest.json
`-- repositories

Support systemd-nspawn

Here is the info: https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html

Advantages:

  • pretty much comes with any modern distro.
  • works with chroot directories and filesystem images
  • allows for filesystem overlays

Disadvantages:

  • needs executing user to have permission to execute systemd-nspawn, i.e. become root (typically via sudo)

What needs to be implemented:

  • pretty much nothing
  • demo in docs with pointers on how to create images/chroots and how to add such a container are sufficient

Alternatives:

Store 'updateurl' in config

With #14 we are getting the situation that the URL given to add will be resolved to an actual download URL. In this case (and if the resolved one is different) we should store (again) the URL provided, in order to enable later updates from that source -- probably via a dedicated command.

Possibility to overwrite/update an existing container?

I had already a container for the version 1.0 and decided to upgrade to 1.1:

adina@head1:/data/movieloc/backup_store/saccs$ datalad containers-list
containers(ok): /data/movieloc/backup_store/saccs/.datalad/environments/BIDSSacc/image (file)
containers(ok): /data/movieloc/backup_store/saccs/.datalad/environments/fsl/image (file)
action summary:
  containers (ok: 2)
adina@head1:/data/movieloc/backup_store/saccs$ datalad containers-add --url shub://AdinaWagner/BIDSsacc:1.1 BIDSSacc
[WARNING] Running addurl resulted in stderr output:   while adding a new url to an already annexed file, url does not have expected file size (use --relaxed to bypass this check) https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media
git-annex: addurl: 1 failed
 
[ERROR  ] CommandError: command 'addurl'
| Error, annex reported failure for addurl (url='https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media'): {'command': 'addurl', 'file': '.datalad/environments/BIDSSacc/image', 'success': False} [containers_add(/data/movieloc/backup_store/saccs/.datalad/environments/BIDSSacc/image)] 
containers_add(error): /data/movieloc/backup_store/saccs/.datalad/environments/BIDSSacc/image (file) [CommandError: command 'addurl'
Error, annex reported failure for addurl (url='https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media'): {'command': 'addurl', 'file': '.datalad/environments/BIDSSacc/image', 'success': False}]
containers_add(ok): /data/movieloc/backup_store/saccs/.datalad/environments/BIDSSacc/image (file) [CommandError: command 'addurl'
Error, annex reported failure for addurl (url='https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media'): {'command': 'addurl', 'file': '.datalad/environments/BIDSSacc/image', 'success': False}]
action summary:
  containers_add (error: 1, ok: 1)
  save (notneeded: 1)

I was initially "pleased" that containers-add didn't just say "cannot do it since you already have a container with such name", but the reason of failure was not obvious. Apparently an attempt to provide a new image for an already registered container would fail if remote file changed its size. The reason becomes obvious if we look at non--json invocation of addurl where more information is provided (@joeyh - would be nice to have it also reported in the msg of the json record):

adina@head1:/data/movieloc/backup_store/saccs$ 'git' '-c' 'receive.autogc=0' '-c' 'gc.auto=0' 'annex' 'addurl' '--json' '--json-progress' '--file=.datalad/environments/BIDSSacc/image' 'https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media' '--'
  while adding a new url to an already annexed file, url does not have expected file size (use --relaxed to bypass this check) https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media
{"command":"addurl","success":false,"file":".datalad/environments/BIDSSacc/image"}
git-annex: addurl: 1 failed
adina@head1:/data/movieloc/backup_store/saccs$ 'git' '-c' 'receive.autogc=0' '-c' 'gc.auto=0' 'annex' 'addurl' '--file=.datalad/environments/BIDSSacc/image' 'https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media' '--'
addurl https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media 
  while adding a new url to an already annexed file, url does not have expected file size (use --relaxed to bypass this check) https://www.googleapis.com/download/storage/v1/b/singularityhub/o/singularityhub%2Fgithub.com%2FAdinaWagner%2FBIDSsacc%2F0d4acf64f18fb59b91e99a7eba785b986f5d60b9%2F2a6fd9f0256615a8151b55ea00e8524d%2F2a6fd9f0256615a8151b55ea00e8524d.simg?generation=1543600024406021&alt=media
failed

datalad-containers version 0.2.1

Release please

Enabled zenodo, we would get an entry with the new release. Currently at 0.2.1-15-g2b9462e

man pages (such as datalad-containers-run) are not provided by debian package

we do ship manpages for core commands in datalad package. But not for datalad-container package apparently there is no man pages:

smaug:~
$> dpkg -L datalad-container | grep man
$> apt-cache policy datalad-container 
datalad-container:
  Installed: 0.5.0-1~nd100+1
  Candidate: 0.5.0-1~nd100+1
  Version table:
 *** 0.5.0-1~nd100+1 450
        450 http://neuro.debian.net/debian buster/main amd64 Packages
        450 http://neurodebian.ovgu.de/debian buster/main amd64 Packages
        100 /var/lib/dpkg/status
     0.2.2-2 500
        500 http://debian.osuosl.org/debian buster/main amd64 Packages

some action/progress indication while doing heavy download

ATM containers-add is showing no sign of any actions to be done and just hangs there possibly for hours (depending on bandwidth) to only spit out at the end the end feedback

(datalad) [bids@rolando QA] > datalad containers-add reproin --url shub://ReproNim/reproin
save(ok): /inbox/BIDS/dbic/QA (dataset)
containers_add(ok): /inbox/BIDS/dbic/QA/.datalad/environments/reproin/image (file)
action summary:
  containers_add (ok: 1)
  save (ok: 1)

At least some clue to the user (Running singularity pull ... which might take a while) or may be even channeling output from the command (by may be not swallowing it) back to the screen here could be a better course of action

singularity home directory

on some systems $PWD is not automatically bind mounted. see this: apptainer/singularity#150

i had to do this to work on our local system (centos)

$ singularity exec -C -H `pwd` .datalad/environments/heudiconv/image ls -a
.  ..  .datalad  .git  .gitattributes  .gitmodules  inputs

it would be nice to be able to adjust the execution of datalad containers-run
also is datalad containers-run equivalent to datalad run singularity exec ... from a metadata capture standpoint?

{img_dspath} -- full or relative?

As a workaround for #94 I have tried to use --bind {img_dspath} . Results surprised me twice:

  1. img_dspath was provided a relative path (.) not absolute path
  2. singularity (2.6.1-2~nd100+1) bugs out to take '.' as a relative path to some other than current directory:
$> pwd; singularity exec --bind . environments/dandi-pynwb.simg pwd
/mnt/btrfs/datasets/datalad/crawl-misc/nwb/najafi-2018-nwb

$> cat .datalad/config 
[datalad "dataset"]
        id = a0f8dc24-ad72-11e9-8c86-002590f97d84
[datalad "containers.dandi-pynwb"]
        updateurl = shub://dandi/najafi-2018-nwb
        image = environments/dandi-pynwb.simg
        cmdexec = singularity exec --bind {img_dspath} {img} {cmd}

I see the value in being able to refer to img_dspath as a relative path... may be I was the one who suggested that even?! ;) but now I wonder what could/should we do to be able to specify singularity a proper full path?

FTR (edit 1):

singularity 3.1.1 seems to bind any PWD, and works out `--bind .` correctly
$> singularity exec -c --bind . /home/yoh/proj/dandi/nwb-datasets/najafi-2018-nwb/environments/dandi-pynwb.simg ls $PWD
abide  abide2  adhd200	corr  crcns  datapackage.json  dbic  devel  dicoms  hbnssi  indi  kaggle  labs	neurovault  nidm  openfmri  workshops

"Trick the system" via bind mounts to avoid --reckless or CoW to fulfill YODA?

To fulfill YODA (sub)dataset should have original data (and containers) defined/accessible within it as submodules. So some other workflows (e.g. ReproNim/containers#7) which would operate from the level of super-dataset would break that "promise". And the most kosher way is to do something like

  • datalad create -d . derivative && cd derivative
  • datalad install -d . https://github.com/ReproNim/containers
  • datalad install -d . --reckless -s ../somesourcedata -g sourcedata
    which could be very HEAVY, inodes, etc Also not entirely kosher since we need public URL to be recorded in .gitmodules
  • datalad containers-run -i sourcedata -o . -n containers/bids-app ...
  • datalad uninstall sourcedata

BUT the bind-mounts facility of the containers could assist us! we could bind mount that original dataset location if it was installed from "upstairs"

  • datalad install -d . -s ../somesourcedata sourcedata # or ideally even not that, just pick up commit hexsha and manage git to commit addition of the submodule,
  • datalad uninstall sourcedata
  • datalad containers-run -i sourcedata -o . -n containers/bids-app --bind ../somesourcedata:sourcedata ...

So, if our datalad install, if operating on local paths could

  • figure out which proper url to add (not ../sourcedata)
  • but also record "localurl" or alike within .gitmodules
  • that localurl/ could be used as a bind mount (after possibly verifying that the path is all kosher in terms of changes/version)

This way we could fulfill YODA without demanding blowing up (even if only temporarily) disk space/inodes usage.

just an idea

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.