ssi-dk / bifrost Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Add the functionality of a button that you can click which would clear the existing job list
We want to be able to create a new item in samples called 'tags' which can be a multifunctional filtering tool. Tags would be saved as a list of strings. Membership in a tag can be used both for pipelines (components) in it's requirement check as well as filtering on the report side (ie only show samples belonging to the tag with this project name. Tags could also be generated from components, for example the QC stamper could assign a tag onto the sample stating it's QC is good.
This item is expected to generate multiple work items.
User feedback should be made into a component for tracking purposes even though it doesn't require anything to run on the server
Added for all repos
Create github actions to automate testing including generation of docker files, python library creation, etc. This is expected to become multiple tasks.
The remove_run.py script mentioned in the documentation is unavailable in the public repository: https://ssi-dk.github.io/bifrost/#/user_guide?id=removing-a-run-update
The scripts/ directory is in the .gitignore file.
Data plots are currently pulling information on many more samples than need be. To improve performance we can generate a density plot for each value based off of our existing data and load those in and then we only need to pull samples from the run, I believe we're currently pulling a set number of samples
Make it so the setup folder (being changed to mongoDB_setup) installs the DB Indexes.
Right now the species DB entry is also located there. Thinking that this should be installed via the component into the DB and not by the setup.
Be able to check the status of a run via the command line
Manage species via the gui instead of direct db adjustments. Now the only thing in the species DB should be a internal species name and series of names that can map to refer to it (i.e. S. aureus -> Staphylococcus Aureus)
With the change to schema 2.0 we only need a component id and sample id to run a sample_component. It'd be nice to develop a page on the dashboard to allow submitting jobs to the server.
The following components have to be added to the new set up
reslab_stamper
kma_pointmutations
species specific FBI components
So right now Im working on dockerizing and automated testing for min_read_check and I'm trying to figure out how to prepopulate the system for analysis and in my mind I should be using the run_launcher component but can't do docker in a docker easily (perhaps a docker-compose solution is the right way?). I think what I want to do down the road is have bifrost lib updated so that the requests going back and forth for testing are to the api calls or the library that'll process the apis.
Fix docker container for each component and base to point to appropriate container, also create one for latest and for dev.
Fixed for all. Really need to do this in their own branches.
Once components store the installation path information required to run things are limited to strictly id's which should also help with web interface for launching jobs
Right now we have datahandling in reporter and bifrostlib, ideally bifrostlib holds all the classes, mongo_interface the DB and the api works on top of the bifrostlib. All things should use a unified lib and not different ones.
Have some way to order the run check results by default. Perhaps it makes sense to store the preferred store order as a variable somewhere.
Line 10 in ea4ce48
Split tests into fast and slow (or unit/integration) so that the watchdog runs quick ones first and then bigger ones after for faster feedback loops
Documentation was out of date so update it all, and create a power point while I'm at it to have a master set of slides for presentations. Potentially write up a paper for bioRxiv to push as well.
I suspect the bug cited in the install script is due to the conda channel order.
Line 4 in ea4ce48
bifrost/envs/bifrost_for_install_full.yaml
Lines 3 to 6 in ea4ce48
I encountered a library issue before and it was because bioconda defers non-bio dependencies to the conda-forge channel. They depend on the following channel order:
channels:
- bioconda
- conda-forge
- defaults
https://bioconda.github.io/user/install.html#set-up-channels
View is for a set size right now, would be nice if that could scale up for larger screen real-estate
Bifrostlib is part way through updating. Was thinking that each main object needs a class and that it should be updated accordingly. Also when the schema validation goes in ideally the entry can be checked against multiple schemas which may mean 2 versions of the same function. Sample, Category, Run, are mostly done converting (to the current form but needs json validation) while things like Components, SampleComponents need to be redone.
Adjust docker images to utilize a scratch folder, this can be done by mounting the scratch folder to a matching location in the docker image. Images have to be adjusted to utilize this.
Fix bug in datahandling.py for class Run where check for no runs in DB is checking against None but returns a list so should be checked for size (ie not [])
Through the UI we want to be able to create a list of samples then group them into a collection that can be worked with. This will create a "run" object for them and can be loaded for the user through the GUI interface.
By adding loading bars/spinners for the data tables we can make the user experience nicer so that users are aware that data is loading instead of seeing no changes then all data.
Want to add shields to monitor status's and have them handled by github actions. Info on shields can be found here: https://github.com/badges/shields
This should apply specifically to data sharing and can be done via our data model and/or individual field encryption
Want to create a validation set which can be tested for new components or lab changes. Ideally samples representative of what we do at SSI and that we can periodically run on sequencing.
Right now species are required to have a true term which is stored in the database. A table can be also provided for lookup names to match to these terms. Ensure there's an interface for managing this but remember to keep in mind that components are bound to the true terms.
Duplicate of #45
Did a quick change for ssi_stamper, didn't impact others so haven't pushed to others yet
Looking to set up easier testing with a small data set including localized development but sharable DB. Figured best bet for this is with mongo atlas so trying that out and getting it working. Also made a dataset available on ENA (PRJEB39131) to run this with randomized S. aureus and E. coli
Right now all repos should be uploading to dockerhub automatically on a version number however this isn't working as intended. I need access to variables in either setup.py or dockerfile in order to pass the value accordingly in github actions.
Fixing up submodules in repo
Add bifrost_test_data as a submodule
Making bifrostlib a submodule
new desired code should be:
elif category == "component":
component_to_check = requirement.split(".")[1]
field = requirement.split(".")[2:]
expected_value = requirements[requirement]
s_c_db = get_sample_component(sample_id=self.sample_id,
component_name=component_to_check)
The pipeline will fail if run with the bifrostlib installed from the PYPI index instead of the repository because the datahandler.log method was renamed.
The contigs in the QC report are represented by numbers currently. I think there's a smart way to show this more as images. With contig lengths sorted by size and heigh to show coverage, coloring could also be done to show species for the contig. This could visually show contamination more clearly as well as plasmids or pcr products (use a log scale for coverage)
The idea here is to query our local server and see how big the queue is (might not even need to do this) and submit jobs that can fill out the queue. These jobs should be generated automatically by the system. For example a api request (or query) of samples that have not run the latest components on them. Then this list is submitted to the server when it's not busy to automatically update runs.
This will occur in all submodules
sample_component DB variable path points to sample
ie
bifrost_dir/Sample1/
instead of
bifrost_dir/Sample1/ComponentName
change to include component name
Adjust docker images so that output goes to a set folder which can then be mounted against
In the QC report on the per-sample view when someone is either going through the approval process of a sample include a optional textbox to comment on why they did their change.
Adjust components to have a unique name based on name, version, db_date which would replace in the class section of objects any references to the _id. Part of this is so if the component is installed at two different institutes they're treated as the same and not as unique due to different _id's
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.