esteinig / cerebro Goto Github PK

Metagenomic diagnostics stack for low abundance sample types and clinical reporting

License: GNU General Public License v3.0

JavaScript 0.10% CSS 0.01% HTML 0.03% TypeScript 7.13% Svelte 26.73% Nextflow 9.37% Groovy 1.11% Shell 0.08% Rust 50.50% Handlebars 1.62% Python 3.30%

aneuploidy brain central-nervous-system diagnostics illumina metagenomics production public-health

cerebro's Issues

Deployment adjustments for templates

Dev deployment mode should be able to access the templates from the repository (docker-compose.yml - cerebro_api)

volumes:
  {{#if dev }}
  # Path mount for application in development
  - {{{ dev }}}:/usr/src/cerebro
  - {{{ dev }}}/templates/email:/data/templates/email:ro
  - {{{ dev }}}/templates/report:/data/templates/report:ro
  {{else}}
  # Volume mount for application in production deployment
  - cerebro_api:/data
  - {{{ outdir }}}/templates/email:/data/templates/email:ro
  - {{{ outdir }}}/templates/report:/data/templates/report:ro
  {{/if}}

Dockerfile.server that is deployed currently does not include PDF compiler testing - this is necessary to load dependent LaTeX packages so that the first report generation takes little time. However - currently the Docker container user needs to be root for this to access rott installed system dependencies for tectonic.

ENV PATH="${PATH}:/opt/cerebro/bin"
RUN cargo build --release --features pdf && cp target/release/cerebro /opt/cerebro/bin
WORKDIR /opt/cerebro

RUN cerebro report compile --base-config /usr/src/cerebro/templates/report/report.toml --output test.pdf --pdf && rm test.pdf

Perhaps we can implement sudo install of dependencies with container user and remove sudo access when built?

Diamond runtime on large contig assemblies

Optimise resource parameters

Production surveillance

Summary

Running cerebro as an accredited service requires us to monitor assay performance over time. This issues tracks the implementation of this.

Technical

Should operate with requests to the API.

Simplify user action logs

Summary

User action logs are interweaved with security logs (stored in both admin and team databases). We need to simplify the action logs so they can be used directly on the frontpage activity log.

Troubleshooting docs

Kirsty said theat troubleshooting section is important for accreditation, add:

Error in sample identifier in sample sheet - how to fix this in database and workflows

Progress bar during page navigation

State

No indication that the page is changing is shown to user at the moment - a top shell-aligned progress bar would help with this.

`Cocogitto` changelog bumps

Cocogitto conventional commits and semantic versioning script for auto-bumping versions across the project:

Sveltekit application version
Nextflow pipeline version
Changelog in mkdocs documentation templates
Rust crate versions which includes all schema version bumps

Taxonomy table screen size

Taxonomy table especially when expanded does no scale with lower screen resolutions

PhiX sequencing control

Explicit sequencing control handling and exposure in QualityControl summary modules for frontend

Invalidate all on submission of decision comment

Need an invalidate call to refresh page data when submitting a decision comment (so it shows immediately)

Production execution mode

Summary

Semi-automated production runtime for accreditation; overview for aims and progress to complete the production feature on feat/production.

Aims

Stack deployment

deployment separation for dev and production
multiple stack deployment configurations
default configs (http-local, http-local-secure, https-web, https-web-secure)
test multi-stack deployment with https-web and https-web-secure configs
interactive setting of passwords and modification of default configs
enable memory overcommit for redis services, refer to documentation

Production pipeline

User improvements

report amendments #6
page navigation progress bar #21

Testing modules

system integration test after production setup
syndrome specific integration test for results validation
smoke test for fast runtime validation

Documentation

development practices [developer]
production workflow setup [bioinformatician]
data input operating procedure [wet-lab]
workflow error recovery procedure [wet-lab, bioinformatician]
workflow error patch procedure [bioinformatician]

Steps

Full-stack setup of the Cerebro production environment for continuous operations. For setting up parallel test or development environments, see details in the documentation.

Requirements

Linux system
Mamba installation
Cerebro client installation
Cerebro stack setup and operation

Stack setup and verification

Production setup and verification

Production directory and sub-directories are setup on the system - you can read more about the types of production environments that are currently supported in the documentation.

Here we setup the RUNTIME directory where all workflows are executed, and the INPUT directory where wet-lab staff or laboratory data transfer can deposit the reads and sample sheet to trigger a worfklow execution.

# Local paths for runtime and data input
export CEREBRO_BASE_PROD=/data/cerebro/prod
export CEREBRO_INPUT_PROD=/samba/project/cerebro/prod

# Setup the runtime directory where workflows are executed
cerebro production setup-base --directory $CEREBRO_RUN_PROD

# Setup the input directory with a specific team and database upload configuration
cerebro production setup-input --directory $CEREBRO_INPUT_PROD --configuration production --team-name VIDRL --database-name "META-GP Production"

Multiple runtime and input folders can be setup for testing, development or validation configurations. Workflow execution and outputs are configured with specific production variables that ensure

Workflow setup and testing

Workflow is setup for production and integration tests are run for production.

# Check workflow help menu as sanity check
nextflow run esteinig/cerebro -r 1.0.0-nata.1 --help

# Provision the accreditation database with Cipher
nextflow run esteinig/cerebro -r 1.0.0-nata.1 -profile mamba -entry cipher --revision 1.0.0-nata.1 --outdir cipher/

# Obtain the access token for the API
export CEREBRO_API_URL="http://api.cerebro.localhost"
export CEREBRO_API_TOKEN=$(cerebro api login -u $CEREBRO_USERNAME -p $CEREBRO_PASSWORD)

# Run workflow integration tests for setup and central nervous system infections
nextflow run esteinig/cerebro -r 1.0.0-nata.1 -profile mamba,ciqa-setup@v1,ciqa-cns@v1

Sample sheet for wet-lab

Current sample sheet is focused on dry-lab operation. We need a user-safe sample sheet template that registers the library identifiers, minimal sample meta-data, wet-lab comments, aneuploidy consent and links to the files in the same input directory

Initial template: https://github.com/esteinig/cerebro/blob/feat/production/templates/production/SampleSheet.xlsx

Automated watcher and input checks

Sample sheet and fastq files (demultiplexed, de-umified) are watched and validated in the input folder. Depending on the input configuration file the watcher will run production stream and upload to the specified team-database-collection at conclusion of run - different input configuration files (folders) can be watched by different production, test, validation... watchers and outputs deposited into the appropriate database section. Triggers run of the Nextflow pipeline and notifications to Slack.

When the pipeline starts, sample identifiers are checked against the team-database-collection to ensure they are unique - the run is registered with the database and samples await confirmation of completion. If sample identifier exists in database the run fails.

Post-workflow sample checks

When the pipeline completes, sample identifiers are collected and validated against registered sample identifiers for this run. Each module (quality control, classification) is checked for completion in each sample. If a sample for some reason did not complete the module, it is marked in the database.

Post-workflow data compilation and upload

After completion, outputs are aggregated into the database models and uploaded into the specified collection via the API

Progress

Slack notifications - construct and send markdown messages
Sample sheet production template for consultation with wet-lab
Basic event polling and sub-polling of input folders
Basic input checks and validation with Slack notifications

Deselection when changing sample view

Library and controls are deselected when changing to the Report view in cerebro/data/samples/[sample] route - however, when returning to other views sometimes they are not re-selected.

Report RPM formatting mangled

Format string not written correctly

`fix/vidrl-report`: report template amendments

Report template updates based on QA feedback:

Collection date in footer
Increase disclaimer size
Place disclaimer in footnote on every page
Add a unique identifier to each report document generated
Add a issuing laboratory section with Address and Contact

Report template updates:

Placeholders spaces for values not filled-in from the interface

Module modifications

add a negative sample option in the report interface
add issuing laboratory and unique report identifier fields to ClinicalReport and .toml templates

error in main.nf - incorrect param

This should be 50 not 50c

cerebro/main.nf

Line 151 in 8a66f36

params.eukaryots_min_len = 50c

causes this error:

ERROR ~ No such variable: c

Refactor aggregate taxon model

State

Currently the cerebro.taxa section of the Cerebro data model is a HashMap<taxid, Taxon> where type taxid = String. This is a result of the aggregation function which uses sequential HashMaps to group taxa by their taxid.

Problem

HashMap is not able to be queried efficiently using MongoDB aggregation pipelines. Downstream applications eventually use a Vec<Taxon> particularly endpoints on the API.

Refactoring to Vec is necessary, but at this stage may affect a number of dependent subsystems.

Report generation interface response

Summary

Bug occurs when generating a PDF report which throws an opaque error in the UI and prevents user from navigating back to the sample table correctly.

Load samples without negative controls

Server side data requests on cerebro/data/samples/[sample] require negative control tagged load or throw an error

esteinig / cerebro Goto Github PK

cerebro's People

Contributors

Watchers

cerebro's Issues

Summary

Technical

Summary

State

Summary

Aims

Stack deployment

Production pipeline

User improvements

Testing modules

Documentation

Steps

Requirements

Stack setup and verification

Production setup and verification

Workflow setup and testing

Sample sheet for wet-lab

Automated watcher and input checks

Post-workflow sample checks

Post-workflow data compilation and upload

Progress

State

Problem

Summary

Recommend Projects

Recommend Topics

Recommend Org