Comments (8)
With some x-talk with @ewels, let's try simple YAML. It is no effort at all to parse in most languages, and with a regex like \/\*(\*(?!\/)|[^*])*\*\/
everything within a comment block can be fetched:
/*
My process description.
*/
Everything that does not look like a YAML can be easily ignored (probably a usual code comment).
What information do we want to display? I will start with a list:
- Description
- Keywords
- Tools
- Input
- Output
- Authors
Description
Just a general description about the purpose of the process / function.
Keywords
One or more keywords to be able to group processes by keyword.
Tools
A list of tool objects used in a process. A tool object can contain fields like
- description
- url
- doi
Input
Input is a list of Nextflow input definitions, and follow the format
<input qualifier> <input name> [from <source channel>] [attributes]
Maybe two fields here: the definition and a description?
Output
Same as input.
Authors
A list of GitHub users contributed to the process.
Example
How would this look like:
/*
description: Simply FASTQC
keywords:
- Quality Control
- QC
tools:
- fastqc:
description: <description here>
homepage: https://superhomepage.edu
doi: <doi here>
input:
- reads:
type: file
description: <description here>
- sample_id
type: string
description: <description here>
output:
- report:
type: file
schema: *_fastqc.{zip,html}
authors:
- @sven1103
- @drharshil
*/
process fastqc {
tag "$sample_id"
publishDir "${params.outdir}/fastqc", mode: 'copy',
saveAs: {filename -> filename.indexOf(".zip") > 0 ? "zips/$filename" : "$filename"}
input:
set val(sample_id), file(reads)
output:
file "*_fastqc.{zip,html}"
script:
"""
fastqc -q $reads
fastqc --version &> fastqc.version.txt
"""
}
This is just an example, we can work out the details. But seeing the code makes it easier to communicate what we are talking about :D
from modules.
Everything that does not look like a YAML can be easily ignored (probably a usual code comment).
I think we should try to parse everything inside the comment block as YAML. Guessing which bits are YAML and which bits are comment is a bit of a faff (there can always be yaml comments!).
Otherwise, I think this all looks great! Only thing I notice is that the inputs should be a list of a list, as there can be multiple input channels, each of which can have multiple definitions. So more like:
input:
- - reads:
type: file
description: <description here>
- sample_id:
type: string
description: <description here>
Then you can have, for example:
input:
-
- reads:
type: file
description: <description here>
- sample_id:
type: string
description: <description here>
-
- index:
type: file
description: Second input channel for a reference or whatever
This YAML syntax is a bit confusing to look at, so will definitely need some linting with nice helpful error messages ๐
from modules.
Discussing at the hackathon - suggestion is that we should have this meta information as a separate file so that it is easier to parse by other tools (including nextflow itself). If it's in a comment then it will be very difficult to get in to nextflow.
We could copy bioconda and have a meta.yml
for each module.
Note that we need things to be organised in directories for this. But we should probably have that anyway.
from modules.
Suggestions: donโt prefix each line with *
(no need for comment and makes it harder to write & parse); use valid YAML ๐ - keywords should be prefixed with -
to make it an array, description should start with : >
to make it multi-line; donโt use capitalisation in keys maybe?
from modules.
ok, I agree. All-or-nothing parsing :) But people could still have usual comment blocks, and we should not restrict them from doing so.
So I suggest to let the linting throw warnings, if a comment block cannot be parsed as YAML.
from modules.
Addressed in #9
from modules.
In the context of the discussion in #8, I was wondering if the meta.yml
could become a valid conda build recipe.
Name
, description
etc. are standard fields in a recipe already, and the rest could go into the extra
section. (https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#extra-section)
from modules.
Discussion at another hackathon - general consensus was that the current system of using separate YAML files is probably best. I think that we can close this issue now.
from modules.
Related Issues (20)
- new module: agat/spfilterfeaturefromkilllist
- new module: agat/spmergeannotations
- new module: lofreq/alnqual
- new module: annosine
- new module: tesorter
- kraken2 modules do not use compatible container images
- [FEATURE] iqtree: output additional files
- new module: updates on cellranger_multi HOT 1
- new module: dorado HOT 1
- new module: dorado HOT 1
- new module: TSEBRA
- Update test checksums in seqkit modules to allow for upgrades
- new module: HUMID
- new module: riboWaltz
- Have stub-section of vcftools-module produce files
- include bracken build module in fasta_build_add_kraken2 subworkflow
- Fix `cellranger/multi` bugs observed in `scrnaseq` development
- missing environment.yml for gridss
- Unexpected behaviour of cat/cat
- Addition of `samtools cat` in the samtools_sort module broke compatibility with input SAM files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from modules.