icgc-argo / argo-clinical Goto Github PK

View Code? Open in Web Editor NEW

2.0 15.0 0.0 5.68 MB

Clinical data submission for ARGO programs.

License: GNU Affero General Public License v3.0

TypeScript 97.90% Dockerfile 0.09% Makefile 0.10% JavaScript 1.73% HCL 0.01% Shell 0.18%

hacktoberfest

argo-clinical's Introduction

Argo clinical

Requirements:

node 12+
Mongo 4.0

Design

How to:

Make scripts are provided to run this application and the required MongoDB using docker. In order for these scripts to start the dev server, you must have a debugger application waiting to attach on port 9229. This is easily accomplished by running these commands in the VSCode terminal, and updating the Debugger Auto Attach setting in VSCode settings to yes.

run: make this will bootstrap everything, docker compose, and the service
make debug will only restart the clinical service, without docker compose
tests: make verify

To run local without engaging the debugger, run npm run local. Since this will not run the docker-compose setup, this requires MongoDB to be running locally (connections configured in the .env file) See Makefile for more details and options.

How to add new clinical entity:

Add new entity in the following files:

src/common-model/entities.ts:
- add to enum ClinicalEntitySchemaNames
- add to type TypeEntitySchemaNameToIndenfiterType
- add to ClinicalUniqueIdentifier: TypeEntitySchemaNameToIndenfiterType
src/common-model/functions.ts:
- update function getClinicalObjectsFromDonor to return proper entity from donor, similar to primary diagnosis
src/clinical/clinical-entities.ts:
- add a new entity that extends ClinicalEntity:
```
export interface NewEntity extends ClinicalEntity {
   entityId: number | undefined;
}
```
- update interface Donor to include the new entity

src/clinical/donor-repo.ts:

define the new schema for the new entity:

const newSchema = new mongoose.Schema(
  {
    id: { type: Number },
    clinicalInfo: {},
  },
  { _id: false },
);

add newSchema to const DonorSchema

define an id field for the new schema:

  newSchema.plugin(AutoIncrement, {
    inc_field: 'submitter_entity_id',
    start_seq: 1,
  });

src/submission/submission-entities.ts:
- add to const BatchNameRegex: Record<ClinicalEntitySchemaNames, RegExp[]>
update src/submission/validation-clinical/utils.ts - function getRelatedEntityByFK
update test/integration/stub-schema.json if a new schema is added
src/submission/submission-to-clinical/stat-calculator.ts
- update getEmptyCoreStats function to include new entity
- update schemaNameToCoreCompletenessStat to include new entity
src/submission/submission-to-clinical/merge-submission.ts
- add a new entity update function into function mergeRecordsMapIntoDonor, similiar to updatePrimaryDiagnosisInfo
- update function mergeActiveSubmissionWithDonors switch-case to make sure the new entity is updated when committing submission, similar to primary_diagnosis
update sampleFiles/sample-schema.json to include the new schema if you are using a local schema for development
add a new sample tsv to sampleFiles/clinical
If cross file validation is needed for the new entity, add submission validation for the new entity in the following file:
- src/submission/validation-clinical/index.ts:

const availableValidators: { [k: string]: any } = {
  [ClinicalEntitySchemaNames.DONOR]: donor,
  [ClinicalEntitySchemaNames.SPECIMEN]: specimen,
  [ClinicalEntitySchemaNames.PRIMARY_DIAGNOSIS]: primaryDiagnosis,
  [ClinicalEntitySchemaNames.FOLLOW_UP]: follow_up,
  [ClinicalEntitySchemaNames.NEW_ENTITY]: new_entity <--------- add here to trigger validation
}

Run schema_builder to generate a new migration-stub-schema.json when a adding a new schema.

Debugging Notes:

If file upload fails with the error TypeError: Cannot read property 'readFile' of undefined, make sure you are running Node 12+

DB migration

We use a tool called migrate-mongo: https://www.npmjs.com/package/migrate-mongo

create script: npx migrate-mongo create <issue #>-<my-script-name>
run migration: npx migrate-mongo up
rollback: npx migrate-mongo down
- With this command, migrate-mongo will revert (only) the last applied migration
status: npx migrate-mongo status

Notes:

make scripts idempotent in case they fail in the middle
if a script fails it will stay in pending state
you can't change the the contents after the script ran, it wont run again automatically
- if you need to change the script you have to write another script that does what you want
the scripts are sorted by date & time
a collection in db called changelog will keep track of executed scripts.
the docker image will excute the scripts automatically before starting the server and if fails it runs rollback script and exits
The Recalculate Core Completion migration depends on imports from the TS ./dist folder and requires a compiled build before executing a migration

Extending stub-schema for running DB migration Tests

extend upon the schema builder with the new schema
run the script using argo-clinical/test/integration/submission$ npx ts-node ./migration_utils/schema_builder.ts to generate the migration-stub-schema.json
grab the newer schema with the one you want to compare with (usually v1.0) and run it through lectern's diff calculator
paste the diff as a new entry into stub-diffs.ts
_{note: this process could eventually be improved by running lectern in a container so it can automatically serve the schema and compute the diffs}

importing rxnorm

you can see the compose directory for the simplified process it's recommended to use mysql 5.7, 5.6. or 5.5 mysql 8 has issues, but works nonetheless.

download the full zip file
mount it in the mysql container
move all mysql scripts from scripts folder to rrf folder.

the job to import this: https://jenkins.qa.cancercollaboratory.org/job/ARGO/job/devops/job/rxnorm-import/

Lectern client

work live with overture client without publishing new versions

in clinical package json:

"@overturebio-stack/lectern-client": "file:/home/ballabadi/dev/repos/overture/js-lectern-client",
go to lectern client, update code and npm run build
install the updated version npm i

how to debug schema client :

put debug point here: manager.ts function
when break point hits, step into lectern client func
place break point in lectern client file

argo-clinical's People

Contributors

Stargazers

Watchers

argo-clinical's Issues

Get submission endpoint

Submitted Ids should not be case sensitive

Actual Behaviour

Searching for a donor_id or specimen_id or sample_id is case sensitive.

Steps to Reproduce

Search for a donor id with https://clinical.qa.argo.cancercollaboratory.org/clinical/specimens/id?programId=PACA-AU&submitterId=RBSP1
see that it returns not found
Search for a donor id with
https://clinical.qa.argo.cancercollaboratory.org/clinical/specimens/id?programId=PACA-AU&submitterId=rbsp1
see that it returns SP40

Expected Behaviour

Ids SHOULD be case senstive
No matter the casing I submit, the id should not be case sensitive. Both of these requests should return SP40.
If I try and register the RBSP1 and rbsp1 they should be treated as the same specimen
Special error message

This donor was not found.
This donor was not found, but a donor exists with an id that matches this string: <>.  Did you mean this?

add endpoint to delete registration

Delete the registration from MongoDB
this will serve the clear button in the UI

Import ICGC IDs to start for Clinical Warehouse

To initialize the clinical warehouse, we need to import ICGC Donor, Sample, and Specimen ids.

Get the ICGC ids (Donor, Specimen, Sample) from the ID Server as a dump
Build a SQL Migration for database initialization
ON first deploy, the database will be initialize

This must be done before any new ids are generated to prevent any conflicts.

Specs: https://wiki.oicr.on.ca/display/icgcargotech/Initializing+the+ICGC+dataset+in+ARGO

Mock the registration dictionary

Response for the registration schema
Build the dictionary client

rename registration fields based on latest tsv sample file

< entity > _submitter_id -> submitter_ < entity > _id

Submit endpoint

Only for new entities, not updates

Implementation of the Upload Donor endpoint - upload, and validate against schema

Upload the TSV
Get the Donor schema from the storage
Call schema validation and collect errors

Dictionary processing - Normalize user input for string enums

we need to normalize the input of the enums example:
if user inputs FeMaLe we have to compare it to the enums list from dictionary (case insensitive) and then replace the value with the correct case from the enum "Female" for example

Approval endpoint

approved: change state to approved. commits data to prod DB.
not approved: hardeep would work with the submitter to correct data. No "rejected" state. Hardeep would "Reopen".... another ticket to be created.

issue from slack

testing issue from slack

Implement the Endpoint to Upload a Specimen File

PUT /submission/programs/{programId}/specimen

Upload the TSV
Get the Specimen schema
Call schema validation and collect errors, if any
If validation passes, save to the staging collection
Write the endpoint to get donors from the staging collection (does this belong here)
Write the data validation logic for donors (does this belong here)
Add swagger
Add tests

The sequence diagrams for clinical submissions can be found here : https://wiki.oicr.on.ca/display/icgcargotech/Technical+Architecture.

Registration cross entity validations

Validate the IDs (cannot reuse IDs)
Validate the data consistency, referential integrity
Write tests

In cases where IDs have previously been registered under other entities, the MUTATED_EXISTING_ERROR types are not thrown

Steps to reproduce:

Commit a succesful registration in QA.
Select 1 of the entries that have been committed, and change the data in the file so that the specimen id points to a different donor, and the specimen type is mutated.

Expected result:
Both SPECIMEN_BELONGS_TO_OTHER_DONOR & MUTATING_EXISTING_DATA for the SpecimenType field should be thrown.

Actual Result :
Only the SPECIMEN_BELONGS_TO_OTHER_DONOR field is thrown.

Endpoint to GET In Progress registration

Write the query for registration, check for existence
Get the registration by program ID

Set up Jenkins (CI/CD)

Create helm chart
Create docker file
Create argo infra values
Deploy with Mongo db

Inclue file name as part of POST /submission/program/{programId}/registration response

We need to display the file name on the UI so it needs to be saved and returned in the response.

Investigate Re-use of submission components

Current system is legacy. It allowd file upload and validation, stores and updates the data dictionary encoded in JSON. It also includes a dictionary viewer.

What pieces can we extract and reuse?

Error handling for grpc methods

AOP - suggestion of what we should do . Needs investigation.

Figure out way to catch all the errors, implementation

Initialize Donor Table

Implement dictionary validation infrastructure for registration

Validate the submitter IDs, gender , specimen type, specimen tumour//normal, sample type (enums, required)

User Auth on clinical submission

Check the user token for the correct scope to know that they are allowed to submit data:

look for PROGRAMDATA..WRITE
recieve JWT token, see that the user has the policy to write on the program data.

Implement the validation rules for specimen

Get the schema and validations from BAs
This is blocked by the validation tickets, depends on how we want to put in the validation rules

There will be some special cross-file validations that need to occur that are outside of schema validation provided in the dictionary.

Wiki Reference of validations: https://wiki.oicr.on.ca/display/icgcargotech/Clinical+Submission+Business+Requirements#ClinicalSubmissionBusinessRequirements-Cross-EntityValidations

The scope of this ticket includes implementing these validations related to specimens:

All records should belong to the same Program; if an id is found, but does not belong to the program an error should be returned.
All donor, specimen, and sample submitter IDs across all files must be registered.. If an id is submitted that is valid format, but is not found in the registry, then an error should be returned.

Store the valid registration in the DB

Create registration collection
Write the DAO

Data Dictionary Manager

When a dictionary version is updated, data needs to be marked as valid or invalid.

NOTE: this should be rolled out when we add new files.

Management API
-- endpoints to migrate, test, and check
-- endpoint to freeze uploads ?
Detecting all breaking changes
-- listed in https://wiki.oicr.on.ca/display/icgcargotech/Submission+System+Technical+Architecture
-- stabilize the migration
-- test plan on migration cases
Reporting & Statistics on donor invalidation
-- record reason of invalidation
-- notification of invalidation
Managing active submissions
-- add invalid_migration states somewhere
-- record which entity(s) have schema errors ; until the entities are replaced with valid data, then
-- check needs for upload and clear to account for this
--

Fix the authorization on Mongo Db

Enable authentication on Mongodb replica
enabled auth in helm chart and deployed
Set clinical to connect to Mongo with auth
clinical use creds with the conn string

Next steps:

enable TLS
test clinical deploy with Mongo cluster as a dependency

Throw specific error when submission update unsucessful

throw a more specific error (e.g. ConflictingUpdateError) that translates to 409 http status

this is for when service is unable to update a submission that has changed during the time that it was read to when it will be updated

Need to create a response for number of new IDs created for donors, specimens & samples

Review the contract and make sure the validation covers this.

Collect and parse errors for registration

This will lay out the foundation of how error generation will be applied.

Decide on error types
Capture column name, type of error, if it is column based
Cross validation errors

Check Program Existence

Integrate with the program service that the program that is submitted with these api exists

defensive mechanism
prevents submission of data if program service is down
this should result in a PROGRAM_DOES_NOT_MATCH error, but instead it just looks successful

If you don't do this, it results in bugs like this:

Set up Springboot and Flyway

use the overture archetype to start

Need to add 2 new error codes for registration

Currently the NEW_SAMPLE_CONFLICT & NEW_SPECIMEN_CONFLICT error codes are returned in two different scenarios, which need to be mapped to 2 different display messages.

For example , for NEW_SAMPLE_CONFLICT, there is 2 scenarios :

When the sample is linked to 2 different specimens in the file then the display message would look like -> "Sample is attached to two different specimens in your file. Samples can only be linked to a single specimen."
When the same sample has 2 different sample types in the same file, then the display message should look like "The value for this field conflicts with another value for the same Specimen"

We should split this up into 2 separate error codes in the backend (and same with NEW_SPECIMEN_CONFLICT), so that the gateway does not own this logic.

Remove case sensitive validation in schemas

We want to east the burden on the submitter by removing case-sensitive validation during submission.

For example, if the dictionary enum is female, a data submitter should be able to submit Female, female, FeMaLe etc. As long as the character match exactly, an error should not be thrown.

The dictionary casing of the value should always be the stored value; when data is returned in any response, the dictionary approved casing from the enum list should be returned in the response.

Return fields display names in get registration response

this is to enable Clients to display the proper field name based on the dictionary

Validation endpoint

Executes any validations that exist
Sets the workspace into a valid or invalid state
Validation for donor (rules will be implemented)

Integrate with dictionary service

We need to integrate lecturn with the clinical submission instead of using the mock schema that is currently hardcoded.

Call lectern on start of submission service
Check the saved schema
Need to fetch saved schema in case of failure
Write mock tests

NOT IN SCOPE :

difs

finalize commit registration endpoint

generate ids for specimen and sample
make id generation stable
run in transaction
delete the registration when that is done
On success, provide the number of registrations
write tests

Clinical submission system should be extended to accept read group information for samples

We need to be able to identify the number of lanes and read groups ids per sample (one read group for each lane).

This should be submitted in a JSON file. An example of the JSON can be found here :

https://github.com/icgc-argo/argo-metadata-schemas/blob/master/schemas/_example_docs/30.sequencing_experiment.01.ok.json

The only data which needs to be provided in this new file is :

Donor submitter ID
Specimen submitter ID
Sample submitter ID
Number of lanes/read groups
Read group submitter IDs (should add up to the number above)

Option 2 :

Create a new Analysis Type within SONG.
Want to see this in the UI. Want to see how many lanes there are per sample, and what the read group IDs are (have this in the Genomic Submissions area).
Requirement for people to take a pause and sign off before starting harmoinzation.

Build the endpoint to upload TSV file

Set up Express (for rest endpoint definition)
Parse the file body
Pass the object down

Documentation

Set up swagger

Deploy argo-clinical on kubernetes

Create the helm charts and deploy

Check file names on upload

All submitted files should be formatted in the following manner :

entityname<optional_extension>.tsv

Examples a valid names :
donor_v1.tsv
registration.tsv
specimen1.tsv

Examples of invalid names are :
thisisadonorfile.tsv

First, the file name should be checked and an appropriate message should be returned if it is invalid.

Sample Registration

User Flow for Registration:

User goes to registration page and downloads a template file.
https://projects.invisionapp.com/d/main#/console/17466078/376318010/preview
User goes to registration page and uploads their registration file.
If the file is misformatted, then errors will be returned.
https://projects.invisionapp.com/d/main#/console/17466078/378658982/preview
https://projects.invisionapp.com/d/main#/console/17466078/376318007/preview
If validation passes, then the submitted file is shown in the UI.
https://projects.invisionapp.com/d/main#/console/17466078/376318009/preview
The user can submit the registration file. If there are no changes, then they are directed to the dashboard showing registered samples.
If there are changes, then the registration is set to pending approval.
The DCC can approve the registration when ready.

Delete existing registration if user reuploads an invalid file

if the user uploads an invalid file and there is an existing registration saved, we should delete it.

Needs to provide a service that provides the registration template

On the UI, we provide the option for users to "download templates".

We need the ability to generate a file template from the latest version of the dictionary schema.

Need a full list of the entities from lecturn
Take the entity as a parameter, and generate content based on the schema for that entity
Return as a TSV file

Create the argo-clinical endpoints that are needed by SONG

Investigation on how current SONG uses the ID service
SONG needs to pull in the ARGO IDs for Donor, Specimen & Sample

Fix the error messages for the registration file

We need to make a few changes to the error messages that are provided so that we can present them appropriately on the UI.

The UI error message mockups can be found here : https://wiki.oicr.on.ca/display/icgcargotech/Sample+Registration

Along with the row number, provide the Submitter Donor ID, Submitter Specimen ID & Submitter Sample ID for each row
Along with the error type (ie. INVALID_ENUM_VALUE) and the fieldName for the field with an error, provide the value of the field which is wrong
Make sure that we are displaying all the error messages in one go. When testing, I first got a message that there was something wrong with an enum value, and then later that the program_id failed.
When there is an issue with the name of the header, or a header is missing, it should throw an appropriate error.

Integrate clinical submission with Vault

Integrate Node services with Vault kubernetes set up & implement
Does not persist the secrets

Initialize project

Set up Node project
Set up the Mongo db

Updates to support new jwt structure

The platform ui has a token utility to understand a users permissions. Since the ego jwt format is being updated, the clinical system needs to respond to that jwt change. It would be nice if all applications that can use a centralized token

Detailed Description

Update clinical to ego-token-utils (https://github.com/icgc-argo/ego-token-utils) to support the new Ego JWT structure update.