Add support to `ga4ghdos` CURIE

Workflow Execution Service backend

WfExS (which could be pronounced like "why-fex", "why-fix" or "why-fixes") project aims to automate next steps:

Fetch and cache a workflow from either:
- A TRSv2-enabled WorkflowHub instance (which provides RO-Crates).
- A TRSv2 (2.0.0-beta2 or 2.0.0) enabled service. Currently tested with Dockstore.
- A straight URL to an existing RO-Crate in ZIP archive describing a workflow.
- A git repository (using this syntax for the URI)
- A public GitHub URL (like this example).
Identify the kind of workflow.
Fetch and set up workflow execution engine (currently supported Nextflow and cwltool).
Identify the needed containers by the workflow, and fetch/cache them. Depending on the local setup, singularity, apptainer, docker, podman or none of them will be used.
Fetch and cache the inputs, represented either through an URL or a CURIE-represented PID (public persistent identifier).
Execute the workflow in a secure way, if it was requested.
Optionally describe the results through an RO-Crate, and upload both RO-Crate and the results elsewhere in a secure way.

Relevant docs:

INSTALL.md: In order to use WfExS-backend you have to install first at least core dependencies described there.
TODO.md: This development is relevant for projects like EOSC-Life or EJP-RD. The list of high level scheduled and pending developments can be seen at .
README_LIFECYCLE.md: WfExS-backend analysis lifecycle and usage scenarios are briefly described with flowcharts there.
README_REPLICATOR.md: It briefly describes WfExS-config-replicator.py usage.

Additional present and future documentation is hosted at development-docs subfolder, until it is migrated to a proper documentation service.

Presentations and outreach

Fernández JM, Rodríguez-Navas L and Capella-Gutiérrez S. Secured and annotated execution of workflows with WfExS-backend [version 1; not peer reviewed]. F1000Research 2022, 11:1318 (poster) (https://doi.org/10.7490/f1000research.1119198.1)

Laura Rodríguez-Navas (2021): WfExS: a software component to enable the use of RO-Crate in the EOSC-Life collaboratory.
FAIR Digital Object Forum, CWFR & FDO SEM meeting, 2021-07-02 [video recording], [slides]

Laura Rodríguez-Navas (2021):
WfExS: a software component to enable the use of RO-Crate in the EOSC-Life tools collaboratory.
EOSC Symposium 2021, 2021-06-17 [video recording] [slides

Salvador Capella-Gutierrez (2021):
Demonstrator 7: Accessing human sensitive data from analytical workflows available to everyone in EOSC-Life
Populating EOSC-Life: Success stories from the demonstrators, 2021-01-19. https://www.eosc-life.eu/d7/ [video] [slides]

Bietrix, Florence; Carazo, José Maria; Capella-Gutierrez, Salvador; Coppens, Frederik; Chiusano, Maria Luisa; David, Romain; Fernandez, Jose Maria; Fratelli, Maddalena; Heriche, Jean-Karim; Goble, Carole; Gribbon, Philip; Holub, Petr; P. Joosten, Robbie; Leo, Simone; Owen, Stuart; Parkinson, Helen; Pieruschka, Roland; Pireddu, Luca; Porcu, Luca; Raess, Michael; Rodriguez- Navas, Laura; Scherer, Andreas; Soiland-Reyes, Stian; Tang, Jing (2021):
EOSC-Life Methodology framework to enhance reproducibility within EOSC-Life.
EOSC-Life deliverable D8.1, Zenodo https://doi.org/10.5281/zenodo.4705078

WfExS-backend Usage

An automatically generated description of the command line directives is available at the CLI section of the documentation.

Also, a description about the different WfExS commands is available at the command line section of the documentation.

Configuration files

The program uses three different types of configuration files:

Local configuration file: YAML formatted file which describes the local setup of the backend (example at workflow_examples/local_config.yaml). JSON Schema describing the format (and used for validation) is available at wfexs_backend/schemas/config.json and there is also automatically generated documentation (see config_schema.md). Relative paths in this configuration file use as reference the directory where the local configuration file is living.
- cacheDir: The path in this key sets up the place where all the contents which can be cached are hold. It contains downloaded RO-Crate, downloaded workflow git repositories, downloaded workflow engines. It is recommended to have it outside /tmp directory when Singularity is being used, due undesirable side interactions with the way workflow engines use Singularity.
- workDir: The path in this key sets up the place where all the executions are going to store both intermediate and final results, having a separate directory for each execution. It is recommended to have it outside /tmp directory when Singularity is being used, due undesirable side interactions with the way workflow engines use Singularity.
- crypt4gh.key: The path to the secret key used in this installation. It is paired to crypt4gh.pub.
- crypt4gh.pub: The path to the public key used in this installation. It is paired to crypt4gh.key.
- crypt4gh.passphrase: The passphrase needed to decrypt the contents of crypt4gh.key.
- tools.engineMode: Currently, local mode only.
- tools.containerType: Currently, singularity, docker or podman.
- tools.gitCommand: Path to git command (only used when needed)
- tools.dockerCommand: Path to docker command (only used when needed)
- tools.singularityCommand: Path to singularity command (only used when needed)
- tools.podmanCommand: Path to podman command (only used when needed)
- tools.javaCommand: Path to java command (only used when needed)
- tools.encrypted_fs.type: Kind of FUSE encryption filesystem to use for secure working directories. Currently, both gocryptfs and encfs are supported.
- tools.encrypted_fs.command: Command path to be used to mount the secure working directory. The default depends on value of tools.encrypted_fs.type.
- tools.encrypted_fs.fusermount_command: Command to be used to unmount the secure working directory. Defaults to fusermount.
- tools.encrypted_fs.idle: Number of minutes of inactivity before the encrypted FUSE filesystem is automatically unmounted. The default is 5 minutes.
Workflow configuration file: YAML formatted file which describes the workflow staging before being executed, like where inputs are located and can be fetched, the security contexts to be used on specific inputs to get those controlled access resources, the parameters, the outputs to capture, ... (Nextflow example, CWL example). JSON Schema describing the format and valid keys (and used for validation), is available at wfexs_backend/schemas/stage-definition.json and there is also automatically generated documentation (see stage-definition_schema.md).
Security contexts file: YAML formatted file which holds the user/password pairs, security tokens or keys needed on different steps, like input fetching. (Nextflow example, CWL example). JSON Schema describing the format and valid keys (and used for validation), is available at wfexs_backend/schemas/security-context.json and there is also automatically generated documentation (see security-context_schema.md).

License

Licensed under the Apache License, version 2.0 https://www.apache.org/licenses/LICENSE-2.0, see the file LICENSE for details.

	def createYAMLFile(self, matInputs, cwlInputs, filename):
	"""
	Method to create a YAML file that describes the execution inputs of the workflow
	needed for their execution. Return parsed inputs.
	"""
	try:
	execInputs = self.executionInputs(matInputs, cwlInputs)
	if len(execInputs) != 0:
	with open(filename, mode="w+", encoding="utf-8") as yaml_file:
	yaml.dump(execInputs, yaml_file, allow_unicode=True, default_flow_style=False, sort_keys=False)
	return execInputs

	else:
	raise WorkflowEngineException(
	"Dict of execution inputs is empty")

	except IOError as error:
	raise WorkflowEngineException(
	"ERROR: cannot create YAML file {}, {}".format(filename, error))

	def executionInputs(self, matInputs: List[MaterializedInput], cwlInputs):
	"""
	Setting execution inputs needed to execute the workflow
	"""
	if len(matInputs) == 0: # Is list of materialized inputs empty?
	raise WorkflowEngineException("FATAL ERROR: Execution with no inputs")

	if len(cwlInputs) == 0: # Is list of declared inputs empty?
	raise WorkflowEngineException("FATAL ERROR: Workflow with no declared inputs")

	execInputs = dict()
	for matInput in matInputs:
	if isinstance(matInput, MaterializedInput): # input is a MaterializedInput
	# numberOfInputs = len(matInput.values) # number of inputs inside a MaterializedInput
	for input_value in matInput.values:
	name = matInput.name
	value_type = cwlInputs.get(name, {}).get('type')
	if value_type is None:
	raise WorkflowEngineException("ERROR: input {} not available in workflow".format(name))

	value = input_value
	if isinstance(value, MaterializedContent): # value of an input contains MaterializedContent
	if value.kind in (ContentKind.Directory, ContentKind.File):
	if not os.path.exists(value.local):
	self.logger.warning("Input {} is not materialized".format(name))
	value_local = value.local
	if isinstance(value_type, dict): # MaterializedContent is a List of File
	classType = value_type['items']
	execInputs.setdefault(name, []).append({"class": classType, "location": value_local})
	else: # MaterializedContent is a File
	classType = value_type
	execInputs[name] = {"class": classType, "location": value_local}
	else:
	raise WorkflowEngineException(
	"ERROR: Input {} has values of type {} this code does not know how to handle".format(name, value.kind))
	else:
	execInputs[name] = value

	return execInputs

inab / wfexs-backend Goto Github PK

wfexs-backend's Introduction

Workflow Execution Service backend

Relevant docs:

Presentations and outreach

WfExS-backend Usage

Configuration files

License

wfexs-backend's People

Stargazers

Watchers

Forkers

wfexs-backend's Issues

Description

Traceback

Settings

Stage file

Local config

Description

Local config file

Stage file

Description

Background

Proposed Feature

Description

Recommend Projects

Recommend Topics

Recommend Org