esteinig / cerebro Goto Github PK
View Code? Open in Web Editor NEWMetagenomic diagnostics stack for low abundance sample types and clinical reporting
License: GNU General Public License v3.0
Metagenomic diagnostics stack for low abundance sample types and clinical reporting
License: GNU General Public License v3.0
Dev
deployment mode should be able to access the templates from the repository (docker-compose.yml
- cerebro_api
)
volumes:
{{#if dev }}
# Path mount for application in development
- {{{ dev }}}:/usr/src/cerebro
- {{{ dev }}}/templates/email:/data/templates/email:ro
- {{{ dev }}}/templates/report:/data/templates/report:ro
{{else}}
# Volume mount for application in production deployment
- cerebro_api:/data
- {{{ outdir }}}/templates/email:/data/templates/email:ro
- {{{ outdir }}}/templates/report:/data/templates/report:ro
{{/if}}
Dockerfile.server
that is deployed currently does not include PDF compiler testing - this is necessary to load dependent LaTeX packages so that the first report generation takes little time. However - currently the Docker
container user needs to be root
for this to access rott
installed system dependencies for tectonic
.
ENV PATH="${PATH}:/opt/cerebro/bin"
RUN cargo build --release --features pdf && cp target/release/cerebro /opt/cerebro/bin
WORKDIR /opt/cerebro
RUN cerebro report compile --base-config /usr/src/cerebro/templates/report/report.toml --output test.pdf --pdf && rm test.pdf
Perhaps we can implement sudo
install of dependencies with container user and remove sudo
access when built?
Optimise resource parameters
Running cerebro
as an accredited service requires us to monitor assay performance over time. This issues tracks the implementation of this.
Should operate with requests to the API.
User action logs are interweaved with security logs (stored in both admin and team databases). We need to simplify the action logs so they can be used directly on the frontpage activity log.
Kirsty said theat troubleshooting section is important for accreditation, add:
No indication that the page is changing is shown to user at the moment - a top shell-aligned progress bar would help with this.
Cocogitto conventional commits and semantic versioning script for auto-bumping versions across the project:
mkdocs
documentation templatesTaxonomy table especially when expanded does no scale with lower screen resolutions
Explicit sequencing control handling and exposure in QualityControl
summary modules for frontend
Need an invalidate call to refresh page data when submitting a decision comment (so it shows immediately)
Semi-automated production runtime for accreditation; overview for aims and progress to complete the production feature on feat/production
.
http-local
, http-local-secure
, https-web
, https-web-secure
)https-web
and https-web-secure
configsFull-stack setup of the Cerebro
production environment for continuous operations. For setting up parallel test or development environments, see details in the documentation.
Linux
systemMamba
installationCerebro
client installationCerebro
stack setup and operation
Production directory and sub-directories are setup on the system - you can read more about the types of production environments that are currently supported in the documentation.
Here we setup the RUNTIME
directory where all workflows are executed, and the INPUT
directory where wet-lab staff or laboratory data transfer can deposit the reads and sample sheet to trigger a worfklow execution.
# Local paths for runtime and data input
export CEREBRO_BASE_PROD=/data/cerebro/prod
export CEREBRO_INPUT_PROD=/samba/project/cerebro/prod
# Setup the runtime directory where workflows are executed
cerebro production setup-base --directory $CEREBRO_RUN_PROD
# Setup the input directory with a specific team and database upload configuration
cerebro production setup-input --directory $CEREBRO_INPUT_PROD --configuration production --team-name VIDRL --database-name "META-GP Production"
Multiple runtime and input folders can be setup for testing, development or validation configurations. Workflow execution and outputs are configured with specific production variables that ensure
Workflow is setup for production and integration tests are run for production.
# Check workflow help menu as sanity check
nextflow run esteinig/cerebro -r 1.0.0-nata.1 --help
# Provision the accreditation database with Cipher
nextflow run esteinig/cerebro -r 1.0.0-nata.1 -profile mamba -entry cipher --revision 1.0.0-nata.1 --outdir cipher/
# Obtain the access token for the API
export CEREBRO_API_URL="http://api.cerebro.localhost"
export CEREBRO_API_TOKEN=$(cerebro api login -u $CEREBRO_USERNAME -p $CEREBRO_PASSWORD)
# Run workflow integration tests for setup and central nervous system infections
nextflow run esteinig/cerebro -r 1.0.0-nata.1 -profile mamba,ciqa-setup@v1,ciqa-cns@v1
Current sample sheet is focused on dry-lab operation. We need a user-safe sample sheet template that registers the library identifiers, minimal sample meta-data, wet-lab comments, aneuploidy consent and links to the files in the same input directory
Initial template: https://github.com/esteinig/cerebro/blob/feat/production/templates/production/SampleSheet.xlsx
Sample sheet and fastq
files (demultiplexed, de-umified) are watched and validated in the input folder. Depending on the input configuration file the watcher will run production stream and upload to the specified team-database-collection at conclusion of run - different input configuration files (folders) can be watched by different production, test, validation... watchers and outputs deposited into the appropriate database section. Triggers run of the Nextflow pipeline and notifications to Slack.
When the pipeline starts, sample identifiers are checked against the team-database-collection to ensure they are unique - the run is registered with the database and samples await confirmation of completion. If sample identifier exists in database the run fails.
When the pipeline completes, sample identifiers are collected and validated against registered sample identifiers for this run. Each module (quality control, classification) is checked for completion in each sample. If a sample for some reason did not complete the module, it is marked in the database.
After completion, outputs are aggregated into the database models and uploaded into the specified collection via the API
Library and controls are deselected when changing to the Report
view in cerebro/data/samples/[sample]
route - however, when returning to other views sometimes they are not re-selected.
Format string not written correctly
Report template updates based on QA feedback:
Report template updates:
Module modifications
ClinicalReport
and .toml
templatesCurrently the cerebro.taxa
section of the Cerebro
data model is a HashMap<taxid, Taxon>
where type taxid = String
. This is a result of the aggregation function which uses sequential HashMaps
to group taxa by their taxid
.
HashMap
is not able to be queried efficiently using MongoDB
aggregation pipelines. Downstream applications eventually use a Vec<Taxon>
particularly endpoints on the API.
Refactoring to Vec
is necessary, but at this stage may affect a number of dependent subsystems.
Bug occurs when generating a PDF report which throws an opaque error in the UI and prevents user from navigating back to the sample table correctly.
Server side data requests on cerebro/data/samples/[sample]
require negative control tagged load or throw an error
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.