pharmbio / cpsign-service-management Goto Github PK
View Code? Open in Web Editor NEWRepo with tools and info for managing data ingestion from new ChEMBL data for LogD training
Repo with tools and info for managing data ingestion from new ChEMBL data for LogD training
Remove model-out
and logfile
flags from configuration.json
example and params.j2
template for it will be handle in data_inception.py
.
Related to #7, we would need to parametrize line 10 and line 11 to make the stdout.txt file more readable in terms of file naming convention.
IDEA: Using the workflow_name
field under the configuration.json file. E.g.
touch /pfs/out/logs/{{ workflow_name }}-stdout.txt
java -jar ... @/pfs/{{ workflow_name }}-ingestion/input/params.txt > /pfs/out/logs/{{ workflow_name }}-stdout.txt
We should make a release so we have a stable version now that the LogD workflow seems finished. We should then also set up tags for docker to be based on releases
Make sure new docker images are built from the Dockerfiles whenever changes are pushed to master using travis, see https://github.com/pharmbio/ml-container for example
In the file manager-pod.yaml
we are still pointing to Oliver's docker hub, whereas now it should be available in the pharmbio Docker hub here: https://cloud.docker.com/u/pharmbio/repository/docker/pharmbio/logd-manager/general.
Dockerfile
manager-pod.yaml
(line 10)Currently only support gamma, epsilon and cost parameters in grid search. Need to add epsilon SVR and beta parameters as well and read properly from config file
See TODO in score_selector.py, need to select by lowest cost when multiple param configs with same efficiency
Fix overhead in math when --splits
and --nr-models
are not evenly spaced in training parameter generator, currently generates too many files
Currently as a design decision we are passing some hard-coded flags during the data ingestion (file: data_ingestion.py
), precisely at line 65:
param_additional_lines = ["\n", "--trainfile\n/pfs/{}-ingestion/data/{}\n".format(configuration["workflow_name"], smi_file_name)]
Given the enforced discrepancy of flags name between cpsign v0.6.16
and cpsing v0.7.7
, we need a simple if structure to handle this. Also it would be a good idea to remove those flags from the jinjia template (to avoid duplicated flags) and add a comment within the code explaining why we are directly passing those flags (good for documenting the code).
params.j2
templateCheers!
Create an example folder for model of configuration and params and leave only configurations.json
as a generale template for reference.
CI process is outdated, need to update for new format with new workflows
Soon my docker hub credentials will not be used longer for this repo CI. Thus they would need to be replaced with someone's else or maybe to create a travis account for Docker hub (only if it made sense though).
For all the cpsign-based service, this would allow users to leverage the same files and call each service by its name, e.g. "Logd", "hERG" etc... which they will be defined configuration.json
.
Files that would need to be revisioned:
manager_pod/manager-pod.yaml
pipeline_setup/spec-ingestion.json
pipeline_setup/spec-train.json
pipeline_setup/spec-upload.json
Ranges of parameters need to support floats as well, need to swap to numpy ranges.
Pachyderm needs an object store and a k8s PV for etcd in order to be a working distributed cluster. One of the most recommend object store alternative for pachyderm seems to be Minio.
Also mentioned in Pachyderm's documentation:
Currently stdout is written as it arrives to shared file in parallel jobs, should write to file first then append
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.