Giter Site home page Giter Site logo

bio-tools / biotoolsschema Goto Github PK

View Code? Open in Web Editor NEW
36.0 24.0 12.0 9.27 MB

biotoolsSchema : Tool description data model for computational tools in life sciences

License: Creative Commons Attribution Share Alike 4.0 International

HTML 99.62% Java 0.38%

biotoolsschema's Introduction

What is biotoolsSchema?

biotoolsSchema is a formalised XML schema (XSD) which defines a description model for bioinformatics software. It can be used to describe bioinformatics tools - application software with well-defined data processing functions (inputs, outputs and operations). This includes simple tools with one or a few closely related functions, and complex, multimodal tools with many functions. A broad range of software types (see below) are covered including tools available for immediate use as online services, or in a form which which you can download, install, configure and run yourself.

biotoolsSchema defines over 50 important scientific, technical and administrative attributes that support cataloguing, discovery, use and interoperability of software. It concentrates upon the salient common features, with a minimal core of 3 attributes only (name, short description and homepage), to provide maximum flexibility for applications. To enable concise information, standard identifiers are used where possible, including EDAM ontology concept IDs for specialised scientific aspects. biotoolsSchema defines 18 controlled vocabularies for technical tool aspects. Verbose information is referred to by URL.

biotoolsSchema is applicable to a broad range of software types and is used by the ELIXIR Tools & Data Services Registry bio.tools ).

Documentation (for stable version 3.3.0):

Comprehensive documentation is available:

Information standard

biotoolsSchema together with the EDAM ontology provide the foundation for an information standard for the desription of tools.

Files

File Description
biotools_dev.xsd biotoolsSchema - dev version (XML schema)
stable Current stable version of the schema + docs
docs Technical docs formatted for website (latest stable version). Hosted here (uses files copied from "stable" folder)
versions Older stable versions of the schema + docs
LICENSE biotoolsSchema license information
README.md This file

biotoolsschema's People

Contributors

bug1303 avatar dependabot[bot] avatar dfornika avatar hansioan avatar hmenager avatar joncison avatar kigaard avatar matuskalas avatar redmitry avatar smoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biotoolsschema's Issues

Reassignation of tools

Can you reassign the rights of "Orphanet portal for rare diseases and orphan drugs" and "Orphanet Rare Disease Ontology" to the user "Inserm US14" (the common user of Orphanet)

Thank you

Version the standard in a semver.org manner

Summary

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

http://semver.org/spec/v2.0.0.html

Thanks!

Bio.Tools Collections: IDs and other attributes ...

Other than numerous previous discussions in various groups, this issue is also supported by a request from Alfred PΓΌhler, the coordinator of de.NBI, from 30th June 2016. (See the next comment where the content of the request is pasted.)

It also relates to some of the changes towards version 2.0, sketched at the TWW Hackathon in May 2016 in Paris: https://docs.google.com/spreadsheets/d/1_KGr2DkulwtAjFJzNjTm08zXVphFlVZ8p29Id6XFlxc (sheets to the right of the first sheet).

My notes and suggestions to the de.NBI request, and our previous discussions, are the following:

We should include also a good possibility to identify 'Collections' within Bio.Tools. That would mean allowing at least 2 attributes for each collection: a display name, and an "ID name". These two could be the same, but could also be e.g. "de.NBI" and "denbi", respectively. That should then allow dereferencing a collection at e.g. https://bio.tools/denbi instead of just an unspecific full-text search of https://bio.tools/?q=de.NBI.

In addition, we should consider other optional attributes of collections, such as description(s), super-collections (collectionA isIncludedIn collectionB), institutions, funding, credits, etc.
(I'm not sure about the collectionA isNewVersionOf collection, though. Although in a very special case it may make sense, e.g. if Bioconductor would change its name to BioCRAN πŸ˜†)

Registration of services

Is it possible to register services, e.g. "conversion and upload service" or "biostatistics consultation service" in bio.tools?

Resource types: Docker images vs VMs

In the 'resource types' list, there is a type 'container' under which both docker images and VMs are categorized.
This is conceptually 'wrong'. Linux containerization is a totally different concept from VM. While each VM has its own OS, containers use the underlying kernel of the host OS. For containers, the underlying kernel must be Linux.

Also categorizing VMs under 'container' is confusing and misleading.

Also in dockers, the resource is called 'image' not 'container'. It is a container when it runs, but the resource is an image.

What I suggest is having two categories instead of one:

  • Virtual machine
  • Docker image

Guideline for tool short description

  1. Provide only a terse statement of the tool function: what is done not how
  2. Use a single declarative sentence in the present tense
  3. Do not include tool name

Bake this into the comment?

Provide regex restricting syntax of links to Debian packages

Thread from Andreas...

"> Forgive the very naive question, but do you maintain a list of links to packages (source, binary) currently available in

the Debian distro ?

I want to support linking to Debian packages from named tools in bio.tools.

May be either packages.debian.org or tracker.debian.org is what you are
seeking for depending from the amount of information you want to
present. For instance

https://packages.debian.org/bwa&exact=1
https://tracker.debian.org/bwa

If not a link, I guess I could just support package names; in this case, is there a valid syntax for package names (so I
can constrain this in our schema) ?

While there are syntactical constraints (lower case letters, numbers,
'-', '.', '+'; no upper case letters, no '_') you probably want to link
to existing packages which per definition will have a valid name. Or am
I missing something?"

image metadata

Can we infer the image or container format from the file that is linked to?

Also speak to Christophe Blanchet re what is the useful information to expose about images/containers - is format enough, or is more needed?

Rename Developer(s) to Main author(s) or so

Distinction between Developers and Contributors is very vague -- what is it a developer?

I suggest either renaming Developers to Main authors, or adding even slightly more granularity via generic Persons with Roles (then should perhaps merge with Contacts and their Roles)

xs:documentation source attribute must contain an URI

In beta1:

line 91:
<xs:element name="biotoolsId" type="biotoolsIdType">
xs:annotation
<xs:documentation source="The ID is a URL in the bio.tools namespace and reflects (normally exactly) the tool name and version: see http://biotools.readthedocs.io/.">Unique ID that is assigned upon registration of the software in bio.tools. /xs:documentation

line 112:
<xs:element name="shortDescription">
xs:annotation
<xs:documentation source="A single declarative sentence in the present tense, providing a terse statement of the tool function. State what is done, i.e.operation, and primary inputs and outputs, but not how. Do not include tool name. See http://biotools.readthedocs.io/.">Short and concise textual description of the software function./xs:documentation

Input/Output duplicate attributes from dataType

Input and Output elements are defined as a restriction of dataType

dataType type already has "data" and "format" elements defined.
What is the reason to duplicate them in Input/Output (with the same definitions).

Multiple EDAM concepts needed for a single output + operation|data|format HANDLES

Yet another example where multiple concepts are needed for 1 output is Meta-pipe, generating annotation of (meta)genome assembly (contigs) with found protein-coding genes, protein domains, and information about those, such as taxa, DB hits scores, etc.
The 1-only chosen type of data "Protein features" is very far from this in its generalisation, isn't it?

Rename to "bio.tools.Schema" or so, to avoid tech choice lock

Should be acted upon asap (before 2.0), not to repeat the mistake of BioXSD, not to get stuck with XSD in the name forever.

E.g. although XSD 1.1 is better and more expressive than 1.0, JSON Schema may be even more expressive. And even better schema languages may appear whenever, without warning ;-)

Q: Endpoint.Output vs Function.Output

There are two local "Output" elements that looks the same.
Are they conceptually the same or two different classes should be used for the implementation?

"Machine-understandable" but application-specific annotation inside XSD

xs:appinfo is a standard mechanism for defining business logic beyond the expressive power of the XSD language.

It avoids the need for hard-coding such logic into an application that uses the given XSD-based data format.

Example

. . .
<xs:schema ... xmlns:biotoolsai="http://biotoolsregistry.org/appinfo" ... xmlns:xs="http://www.w3.org/2001/XMLSchema" ... >
. . .
    <xs:element name="license" minOccurs="0"> 
        <xs:annotation> 
            <xs:documentation>Software or data usage license</xs:documentation> 
            <xs:appinfo> 
                <biotoolsai:usage recommended="true"/> 
                <biotoolsai:longDescription>
                    #Blah blah

                    `biotools:license` is blah blaaaah

                    ## GRRRRRRR

                    **WOOBAR**, isn't it?
                </biotoolsai:longDescription>
                <altova:exampleValues> 
                    <altova:example value="GNU General Public License v3"/> 
                </altova:exampleValues> 
            </xs:appinfo> 
        </xs:annotation> 
. . .

Multiple fixes to 2.0 alpha-01 + docs

  1. Pattern for < name > element: check it does not allow spaces.
  2. Pattern for < description > element: what characters are not allowed? Maybe change the basic type from xs:string if appropriate, to make this more restrictive.
  3. Change the pattern for < id > attribute and id> element (once settled in bio.tools URL scheme)
  4. Settle the enum for resourceType
  5. Pattern for element (once settled in bio.tools URL scheme)
  6. ORCID simple type: specify type, pattern and sample (what is the valid ORCID syntax ?)
  7. Is nesting 'choice' within 'sequence' (in contactDetails and creditDetails) is really necessary? Can we just use 'choice' ?
  8. Add 'sample' value to PMID and PMCID
  9. Document (on GitHub WIKI?) "roles" used by < credit > element: Developer, Maintainer, Other.
  10. Document (on GitHub WIKI?) "roles" used by < contact > elements: General, Developer, Technical, Scientific, Maintainer, Helpdesk.
  11. Document (on GitHub WIKI?) meaning of < relationshipType > enum
  12. Better pattern for < description > element e.g. that sentence begins in upper case and ends with full stop, only ASCII characters etc.

Add a Docker registry link

Get possibility to insert a Docker registry url for the tool, example:

docker-registry.genouest.org/bioinfo/blast (meaning version latest)

or with a version tag

docker-registry.genouest.org/bioinfo/blast:1.0

with this, user only needs to do a

docker pull *docker_url*

Schema could support multiple Docker registries urls

Add Visual C++ as language?

It is not a language by itself (C++ is) but it can be very helpful that a program was not implemented in pure C++

urlftpType could hold http(s) ?

there is an urlftpType that restricts anyURL to the either http(s) or (s)ftp (???).
on the other hand urlType is restricted to the http(s).
something is wrong here...
remove http(s) from urlftpType and rename urlType into urlhttpType. (???)

Check compatibility with relevant schemes and standards

Including:

Add "R" as an interfaceType

I could see that most of the bioconductor packages in bio.tools are annotated with
resourceType: Tool
interfaceType: Command line

Some other R packages however have:
resourceType: Library
interfaceType: API

I understand that these terms should be rather broad and only a few of them in the enumeration, however I think it would make sense to add something more specific here. What is an R package really? I could see the definition of "Command line : Text-based interface to a tool or service", which is of course also true for R, but many researchers that use R, do so in a semi-graphical interface and some of them are scared off by 'actual' command lines. I think being able to explicitly search for R packages will help those having a better user experience.

Attribute for API-compliance

Requested by ELIXIR EXCELERATE WP 7 - new attribute to capture an API is WP7-compliant

Of course, something generic is needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.