Giter Site home page Giter Site logo

Comments (9)

stain avatar stain commented on September 4, 2024

Example descriptions generated by extract-dockerfile

From a Dockerfile we describe a ContainerRecipe (specializes SoftwareSourceCode

{
    "@context": "http://www.schema.org",
    "@type": "ContainerRecipe",
    "name": "vsoch/salad",
    "description": "A Dockerfile build recipe",
    "containerImage": "gliderlabs/alpine:3.4",

    "labels": [
        [
            "MAINTAINER toasterlint \"[email protected]"
        ]
    ],
    "environment": [
        "RPCPORT=4000"
    ],
    "entrypoint": [
        "/entrypoint"
    ],
}

(see openschemas/specifications#10)

From a Docker image we describe a ContainerImage:

{
    "environment": [
        "SRC_DIR=/go/src/github.com/vsoch/salad/"
    ],
    "entrypoint": [
        "/code/salad"
    ],
    "description": "A Dockerfile build recipe",
    "name": "vanessa/sregistry",
    "ContainerImage": "iron/go:dev",
    "operatingSystem": "linux",
    "softwareVersion": "sha256:8d1e7f244db9e7cb85d5867bb3230f756460900e5801ff2303e44a79369640f4",
    "identifier": [
        "vanessa/sregistry:latest"
    ],
    "url": "https://hub.docker.com/r/vanessa/sregistry",
    "alternateName": "Singularity Registry",
    "softwareHelp": "https://singularityhub.github.io/sregistry",
    "citation": "http://joss.theoj.org/papers/050362b7e7691d2a5d0ebed8251bc01e",
    "license": "https://github.com/singularityhub/sregistry/blob/master/LICENSE",
    "keywords": "container, containers, singularity, singularity registry",
    "softwareRequirements": [
        "Pip > xmlsec==1.3.3"
    ],
    "@context": "http://www.schema.org",
    "@type": "ImageDefinition"
}

Above extract-dockerfile has actually extracted the softwareRequirements of pip installs from inside the container.

(however this type is called ContainerImage rather than ImageDefinition so some stability with upstream specs would be needed - see openbases/extract-dockerfile#6)

from ro-crate.

vsoch avatar vsoch commented on September 4, 2024

See discussion in openbases/extract-dockerfile#6 - there was some discussion over the name, my preference is for what is represented in https://openschemas.github.io/specifications/ because (as you correctly bring up) an ImageDefinition could refer to other kinds of images, but ContainerImage is more clear.

from ro-crate.

dgarijo avatar dgarijo commented on September 4, 2024

This is interesting! Would this need to be related to cwl as well? (which defines how to invoke the image as opposed to the definition of the image itself)

In Dockerpedia they have done a thorough extraction of images, although it's not aligned with schema. Maybe we can use their service for extraction too. An example: https://dockerpedia.inf.utfsm.cl/resource/SoftwareImage/dockerpedia-pegasus_workflow_images_latest

from ro-crate.

vsoch avatar vsoch commented on September 4, 2024

I don't think it would be wise to "hard code" (so to speak) any particular workflow manager or description (e.g., cwl, snakemake, nextflow) directly into the specification. On the other hand, if there is an appropriate field to describe this same entity, it would be logical to include (e.g., if I find that it's snakemake, I should look for a Snakefile somewhere...)

For CWL, is there a definitive specification for interaction? For example, for a scif container, you can be absolutely sure how to discover applications inside (singularity run container.sif apps) and then how to run / inspect / shell / otherwise interact with an application you just found (e.g., singularity run container.sif run <app>.

from ro-crate.

dgarijo avatar dgarijo commented on September 4, 2024

CWL has a field for pulling from a docker container. Maybe that could be the hook.
My point is not necessarily to use a particular workflow spec. What I want to record is how the app in the container is supposed to be invoked and how to pass on the files. Since cwl describes this, it could be a starting point

from ro-crate.

vsoch avatar vsoch commented on September 4, 2024

Yes, understood! To be more clear, there are many different tools that describe in a structured way how a container (or app inside) is supposed to be invoked. Actually, those two things are different - cwl could describe an app in a container (and it would have to be provided via the entrypoint so the user could run it to find it) while SCIF describes how to invoke the container itself (of which cwl could be one or more entrypoints).

But from how you describe it - that there is a field for pulling the container, this sounds like it would need to be stored outside of the container, which is another point to discuss. SCIF is a specification that describes standard interaction with a container, and is installed inside the container, along with the SCIF filesystem and other metadata files that are defined for each app.

from ro-crate.

craig-willis avatar craig-willis commented on September 4, 2024

This is a necessary use case for Whole Tale. A few questions:

  • What about RO-Crates with repo2docker compatible configurations?
  • In the case of a Docker image, is the idea that the RO-Crate would contain a tar archive of the image or a reference to the image in a registry (or either)?
  • While not containers in the same sense, sciunit and reprozip also produce re-executable packages that could be parts of RO-Crates. Are these in scope?

from ro-crate.

vsoch avatar vsoch commented on September 4, 2024

Having a repo2docker configuration is an interesting and useful idea, but I think it would be done in addition to a container recipe - repo2docker in and of iteself doesn't translate to reproducibility - it just means that (assuming a version of repo2docker is available) you could build a container for it. You can think of it like an extra layer to essentially create a Dockerfile (that could be built). It also assumes a user "joyvan" that when converted to Singularity (e.g., for use on HPC) makes things a bit challenging because of the cardinal rule "the user inside the container is the user outside the container."

Re-reading what @stain mentioned - it sounds like he wants the full container, in which case Docker wouldn't be as feasible as it means layers that need to be assembled and require the Docker daemon. A Singularity (sif) binary would be more reasonable, albeit large, and still require Singularity to run. It's really the case that any level of recipe without the container runs the risk of not being able to be built, so probably providing the container somewhere is needed. In the case of Singularity, the recipe file is kept inside the container as well. In the case of Docker, the recipe (and other metadata) would serve as an external way to peep inside without invoking the container.

I'm not super familiar with RO-crates, but reading the description:

RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata.

it does sound like a wrapper (with metadata) to a container is wanted? The container, considered as some kind of data, could also fit into the specification, and as @stain showed, metadata could be extracted for the jsonld.

from ro-crate.

jmfernandez avatar jmfernandez commented on September 4, 2024

Re-reading what @stain mentioned - it sounds like he wants the full container, in which case Docker wouldn't be as feasible as it means layers that need to be assembled and require the Docker daemon.

Indeed, you can generate with docker save a tar file with the different layers from one or more tagged docker images, which can be used later to generate a singularity image with singularity import.

I also agree the container recipe is worth to be saved (or referenced plus a fingerprint), as the base image of the recipe could contain a bug, and you would like to re-create it.

from ro-crate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.