Giter Site home page Giter Site logo

icgc-argo / argo-clinical Goto Github PK

View Code? Open in Web Editor NEW
2.0 15.0 0.0 5.68 MB

Clinical data submission for ARGO programs.

License: GNU Affero General Public License v3.0

TypeScript 97.90% Dockerfile 0.09% Makefile 0.10% JavaScript 1.73% HCL 0.01% Shell 0.18%
hacktoberfest

argo-clinical's Introduction

Argo clinical

Build Status

Requirements:

  • node 12+
  • Mongo 4.0

Design

clinical Arch

How to:

Make scripts are provided to run this application and the required MongoDB using docker. In order for these scripts to start the dev server, you must have a debugger application waiting to attach on port 9229. This is easily accomplished by running these commands in the VSCode terminal, and updating the Debugger Auto Attach setting in VSCode settings to yes.

  • run: make this will bootstrap everything, docker compose, and the service
  • make debug will only restart the clinical service, without docker compose
  • tests: make verify

To run local without engaging the debugger, run npm run local. Since this will not run the docker-compose setup, this requires MongoDB to be running locally (connections configured in the .env file) See Makefile for more details and options.

How to add new clinical entity:

Add new entity in the following files:

  • src/common-model/entities.ts:

    • add to enum ClinicalEntitySchemaNames
    • add to type TypeEntitySchemaNameToIndenfiterType
    • add to ClinicalUniqueIdentifier: TypeEntitySchemaNameToIndenfiterType
  • src/common-model/functions.ts:

    • update function getClinicalObjectsFromDonor to return proper entity from donor, similar to primary diagnosis
  • src/clinical/clinical-entities.ts:

    • add a new entity that extends ClinicalEntity:
      export interface NewEntity extends ClinicalEntity {
         entityId: number | undefined;
      }
      
    • update interface Donor to include the new entity
  • src/clinical/donor-repo.ts:

    • define the new schema for the new entity:
      const newSchema = new mongoose.Schema(
        {
          id: { type: Number },
          clinicalInfo: {},
        },
        { _id: false },
      );
      
    • add newSchema to const DonorSchema
    • define an id field for the new schema:
        newSchema.plugin(AutoIncrement, {
          inc_field: 'submitter_entity_id',
          start_seq: 1,
        });
      
  • src/submission/submission-entities.ts:

    • add to const BatchNameRegex: Record<ClinicalEntitySchemaNames, RegExp[]>
  • update src/submission/validation-clinical/utils.ts - function getRelatedEntityByFK

  • update test/integration/stub-schema.json if a new schema is added

  • src/submission/submission-to-clinical/stat-calculator.ts

    • update getEmptyCoreStats function to include new entity
    • update schemaNameToCoreCompletenessStat to include new entity
  • src/submission/submission-to-clinical/merge-submission.ts

    • add a new entity update function into function mergeRecordsMapIntoDonor, similiar to updatePrimaryDiagnosisInfo
    • update function mergeActiveSubmissionWithDonors switch-case to make sure the new entity is updated when committing submission, similar to primary_diagnosis
  • update sampleFiles/sample-schema.json to include the new schema if you are using a local schema for development

  • add a new sample tsv to sampleFiles/clinical

  • If cross file validation is needed for the new entity, add submission validation for the new entity in the following file:

    • src/submission/validation-clinical/index.ts:
const availableValidators: { [k: string]: any } = {
  [ClinicalEntitySchemaNames.DONOR]: donor,
  [ClinicalEntitySchemaNames.SPECIMEN]: specimen,
  [ClinicalEntitySchemaNames.PRIMARY_DIAGNOSIS]: primaryDiagnosis,
  [ClinicalEntitySchemaNames.FOLLOW_UP]: follow_up,
  [ClinicalEntitySchemaNames.NEW_ENTITY]: new_entity <--------- add here to trigger validation
}
  • Run schema_builder to generate a new migration-stub-schema.json when a adding a new schema.

Debugging Notes:

If file upload fails with the error TypeError: Cannot read property 'readFile' of undefined, make sure you are running Node 12+

DB migration

We use a tool called migrate-mongo: https://www.npmjs.com/package/migrate-mongo

  • create script: npx migrate-mongo create <issue #>-<my-script-name>
  • run migration: npx migrate-mongo up
  • rollback: npx migrate-mongo down
    • With this command, migrate-mongo will revert (only) the last applied migration
  • status: npx migrate-mongo status

Notes:

  • make scripts idempotent in case they fail in the middle
  • if a script fails it will stay in pending state
  • you can't change the the contents after the script ran, it wont run again automatically
    • if you need to change the script you have to write another script that does what you want
  • the scripts are sorted by date & time
  • a collection in db called changelog will keep track of executed scripts.
  • the docker image will excute the scripts automatically before starting the server and if fails it runs rollback script and exits
  • The Recalculate Core Completion migration depends on imports from the TS ./dist folder and requires a compiled build before executing a migration

Extending stub-schema for running DB migration Tests

  • extend upon the schema builder with the new schema
  • run the script using argo-clinical/test/integration/submission$ npx ts-node ./migration_utils/schema_builder.ts to generate the migration-stub-schema.json
  • grab the newer schema with the one you want to compare with (usually v1.0) and run it through lectern's diff calculator
  • paste the diff as a new entry into stub-diffs.ts
  • note: this process could eventually be improved by running lectern in a container so it can automatically serve the schema and compute the diffs

importing rxnorm

you can see the compose directory for the simplified process it's recommended to use mysql 5.7, 5.6. or 5.5 mysql 8 has issues, but works nonetheless.

  • download the full zip file
  • mount it in the mysql container
  • move all mysql scripts from scripts folder to rrf folder.

the job to import this: https://jenkins.qa.cancercollaboratory.org/job/ARGO/job/devops/job/rxnorm-import/

Lectern client

work live with overture client without publishing new versions

in clinical package json:

  1. "@overturebio-stack/lectern-client": "file:/home/ballabadi/dev/repos/overture/js-lectern-client",
  2. go to lectern client, update code and npm run build
  3. install the updated version npm i

how to debug schema client :

  1. put debug point here: manager.ts function
  2. when break point hits, step into lectern client func
  3. place break point in lectern client file

argo-clinical's People

Contributors

alekspejovic avatar andricdu avatar blabadi avatar buwujiu avatar ciaranschutte avatar daniel-cy-lu avatar demariadaniel avatar dependabot[bot] avatar hlminh2000 avatar joneubank avatar justincorrigible avatar kevinfhartmann avatar mistryrn avatar rosibaj avatar rtisma avatar samrichca avatar ummulkiramr avatar wajiha-oicr avatar yalturmes avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

argo-clinical's Issues

Submitted Ids should not be case sensitive

Actual Behaviour

Searching for a donor_id or specimen_id or sample_id is case sensitive.

Steps to Reproduce

  1. Search for a donor id with https://clinical.qa.argo.cancercollaboratory.org/clinical/specimens/id?programId=PACA-AU&submitterId=RBSP1
  2. see that it returns not found
  3. Search for a donor id with
    https://clinical.qa.argo.cancercollaboratory.org/clinical/specimens/id?programId=PACA-AU&submitterId=rbsp1
  4. see that it returns SP40

Expected Behaviour

  • Ids SHOULD be case senstive
    No matter the casing I submit, the id should not be case sensitive. Both of these requests should return SP40.
  • If I try and register the RBSP1 and rbsp1 they should be treated as the same specimen
  • Special error message
This donor was not found.
This donor was not found, but a donor exists with an id that matches this string: <>.  Did you mean this?

Approval endpoint

  • approved: change state to approved. commits data to prod DB.
  • not approved: hardeep would work with the submitter to correct data. No "rejected" state. Hardeep would "Reopen".... another ticket to be created.

Implement the Endpoint to Upload a Specimen File

PUT /submission/programs/{programId}/specimen

  • Upload the TSV
  • Get the Specimen schema
  • Call schema validation and collect errors, if any
  • If validation passes, save to the staging collection
  • Write the endpoint to get donors from the staging collection (does this belong here)
  • Write the data validation logic for donors (does this belong here)
  • Add swagger
  • Add tests

The sequence diagrams for clinical submissions can be found here : https://wiki.oicr.on.ca/display/icgcargotech/Technical+Architecture.

In cases where IDs have previously been registered under other entities, the MUTATED_EXISTING_ERROR types are not thrown

Steps to reproduce:

  1. Commit a succesful registration in QA.
  2. Select 1 of the entries that have been committed, and change the data in the file so that the specimen id points to a different donor, and the specimen type is mutated.

Expected result:
Both SPECIMEN_BELONGS_TO_OTHER_DONOR & MUTATING_EXISTING_DATA for the SpecimenType field should be thrown.

Actual Result :
Only the SPECIMEN_BELONGS_TO_OTHER_DONOR field is thrown.

Investigate Re-use of submission components

Current system is legacy. It allowd file upload and validation, stores and updates the data dictionary encoded in JSON. It also includes a dictionary viewer.

What pieces can we extract and reuse?

User Auth on clinical submission

Check the user token for the correct scope to know that they are allowed to submit data:

  • look for PROGRAMDATA..WRITE
  • recieve JWT token, see that the user has the policy to write on the program data.

Implement the validation rules for specimen

  • Get the schema and validations from BAs
  • This is blocked by the validation tickets, depends on how we want to put in the validation rules

There will be some special cross-file validations that need to occur that are outside of schema validation provided in the dictionary.

Wiki Reference of validations: https://wiki.oicr.on.ca/display/icgcargotech/Clinical+Submission+Business+Requirements#ClinicalSubmissionBusinessRequirements-Cross-EntityValidations

The scope of this ticket includes implementing these validations related to specimens:

  • All records should belong to the same Program; if an id is found, but does not belong to the program an error should be returned.
  • All donor, specimen, and sample submitter IDs across all files must be registered.. If an id is submitted that is valid format, but is not found in the registry, then an error should be returned.

Data Dictionary Manager

When a dictionary version is updated, data needs to be marked as valid or invalid.

NOTE: this should be rolled out when we add new files.

  • Management API
    -- endpoints to migrate, test, and check
    -- endpoint to freeze uploads ?
  • Detecting all breaking changes
    -- listed in https://wiki.oicr.on.ca/display/icgcargotech/Submission+System+Technical+Architecture
    -- stabilize the migration
    -- test plan on migration cases
  • Reporting & Statistics on donor invalidation
    -- record reason of invalidation
    -- notification of invalidation
  • Managing active submissions
    -- add invalid_migration states somewhere
    -- record which entity(s) have schema errors ; until the entities are replaced with valid data, then
    -- check needs for upload and clear to account for this
    --

Fix the authorization on Mongo Db

  • Enable authentication on Mongodb replica
  • enabled auth in helm chart and deployed
  • Set clinical to connect to Mongo with auth
  • clinical use creds with the conn string

Next steps:

  • enable TLS
  • test clinical deploy with Mongo cluster as a dependency

Throw specific error when submission update unsucessful

throw a more specific error (e.g. ConflictingUpdateError) that translates to 409 http status

  • this is for when service is unable to update a submission that has changed during the time that it was read to when it will be updated

Collect and parse errors for registration

This will lay out the foundation of how error generation will be applied.

  • Decide on error types
  • Capture column name, type of error, if it is column based
  • Cross validation errors

Check Program Existence

Integrate with the program service that the program that is submitted with these api exists

  • defensive mechanism
  • prevents submission of data if program service is down
  • this should result in a PROGRAM_DOES_NOT_MATCH error, but instead it just looks successful

If you don't do this, it results in bugs like this:
image

Need to add 2 new error codes for registration

Currently the NEW_SAMPLE_CONFLICT & NEW_SPECIMEN_CONFLICT error codes are returned in two different scenarios, which need to be mapped to 2 different display messages.

For example , for NEW_SAMPLE_CONFLICT, there is 2 scenarios :

  1. When the sample is linked to 2 different specimens in the file then the display message would look like -> "Sample is attached to two different specimens in your file. Samples can only be linked to a single specimen."
  2. When the same sample has 2 different sample types in the same file, then the display message should look like "The value for this field conflicts with another value for the same Specimen"

We should split this up into 2 separate error codes in the backend (and same with NEW_SPECIMEN_CONFLICT), so that the gateway does not own this logic.

Remove case sensitive validation in schemas

We want to east the burden on the submitter by removing case-sensitive validation during submission.

For example, if the dictionary enum is female, a data submitter should be able to submit Female, female, FeMaLe etc. As long as the character match exactly, an error should not be thrown.

The dictionary casing of the value should always be the stored value; when data is returned in any response, the dictionary approved casing from the enum list should be returned in the response.

Validation endpoint

  • Executes any validations that exist
  • Sets the workspace into a valid or invalid state
  • Validation for donor (rules will be implemented)

Integrate with dictionary service

We need to integrate lecturn with the clinical submission instead of using the mock schema that is currently hardcoded.

  • Call lectern on start of submission service
  • Check the saved schema
  • Need to fetch saved schema in case of failure
  • Write mock tests

NOT IN SCOPE :

  • difs

finalize commit registration endpoint

  • generate ids for specimen and sample
  • make id generation stable
  • run in transaction
  • delete the registration when that is done
  • On success, provide the number of registrations
  • write tests

Clinical submission system should be extended to accept read group information for samples

We need to be able to identify the number of lanes and read groups ids per sample (one read group for each lane).

This should be submitted in a JSON file. An example of the JSON can be found here :

https://github.com/icgc-argo/argo-metadata-schemas/blob/master/schemas/_example_docs/30.sequencing_experiment.01.ok.json

The only data which needs to be provided in this new file is :

  • Donor submitter ID
  • Specimen submitter ID
  • Sample submitter ID
  • Number of lanes/read groups
  • Read group submitter IDs (should add up to the number above)

Option 2 :

  • Create a new Analysis Type within SONG.

  • Want to see this in the UI. Want to see how many lanes there are per sample, and what the read group IDs are (have this in the Genomic Submissions area).

  • Requirement for people to take a pause and sign off before starting harmoinzation.

Check file names on upload

All submitted files should be formatted in the following manner :

entityname<optional_extension>.tsv

Examples a valid names :
donor_v1.tsv
registration.tsv
specimen1.tsv

Examples of invalid names are :
thisisadonorfile.tsv

First, the file name should be checked and an appropriate message should be returned if it is invalid.

Sample Registration

User Flow for Registration:

  1. User goes to registration page and downloads a template file.
    https://projects.invisionapp.com/d/main#/console/17466078/376318010/preview
  2. User goes to registration page and uploads their registration file.
  3. If the file is misformatted, then errors will be returned.
    https://projects.invisionapp.com/d/main#/console/17466078/378658982/preview
    https://projects.invisionapp.com/d/main#/console/17466078/376318007/preview
  4. If validation passes, then the submitted file is shown in the UI.
    https://projects.invisionapp.com/d/main#/console/17466078/376318009/preview
  5. The user can submit the registration file. If there are no changes, then they are directed to the dashboard showing registered samples.
  6. If there are changes, then the registration is set to pending approval.
  7. The DCC can approve the registration when ready.

Needs to provide a service that provides the registration template

On the UI, we provide the option for users to "download templates".

We need the ability to generate a file template from the latest version of the dictionary schema.

  • Need a full list of the entities from lecturn
  • Take the entity as a parameter, and generate content based on the schema for that entity
  • Return as a TSV file

Fix the error messages for the registration file

We need to make a few changes to the error messages that are provided so that we can present them appropriately on the UI.

The UI error message mockups can be found here : https://wiki.oicr.on.ca/display/icgcargotech/Sample+Registration

  • Along with the row number, provide the Submitter Donor ID, Submitter Specimen ID & Submitter Sample ID for each row
  • Along with the error type (ie. INVALID_ENUM_VALUE) and the fieldName for the field with an error, provide the value of the field which is wrong
  • Make sure that we are displaying all the error messages in one go. When testing, I first got a message that there was something wrong with an enum value, and then later that the program_id failed.
  • When there is an issue with the name of the header, or a header is missing, it should throw an appropriate error.

Updates to support new jwt structure

The platform ui has a token utility to understand a users permissions. Since the ego jwt format is being updated, the clinical system needs to respond to that jwt change. It would be nice if all applications that can use a centralized token

Detailed Description

Update clinical to ego-token-utils (https://github.com/icgc-argo/ego-token-utils) to support the new Ego JWT structure update.

Possible Implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.