Giter Site home page Giter Site logo

d3b-center / clinical-data-flow Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 4.34 MB

📓Project management and design artifacts for Clinical Data Flow

Home Page: https://handbook.d3b.io/docs/products/clinical-data-flow

License: Apache License 2.0

Shell 100.00%
architecture fhir project-management

clinical-data-flow's People

Contributors

fiendish avatar liberaliscomputing avatar znatty22 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clinical-data-flow's Issues

Load CHOP RIMGC GIC dataset into FHIR server

Questions

  • Can we create a study for this on data tracker so that data files may be fetched from there?
  • What kind of data is this - any PHI?
  • How will we organize the patients in our FHIR server into studies/datasets? Which resource should we use to represent this? The Phenopackets FHIR model may not represent study, we should check to see if we will have to modify the model to accommodate.

Tasks

  • Create ingest package for this dataset
  • Ingest dataset to warehouse
  • Ingest from warehouse into FHIR server

Make it easier for developers to setup CDF tools + env

Create a 1-click script with a docker-compose stack to bootstrap the CDF pipeline for developers.

We need to make it easier for other developers to spin up our CDF pipeline so that they can collaborate with us, test our our pipeline for demo purposes or use it for their own development needs.

Ingest Kids First studies into D3b Warehouse

DataOps team wants to copy over studies from Kids First postgres db to D3b postgres db.

Meen created a CLI tool to do this, but still needs to provide support to DataOps users, fix any bugs, etc.

Meen help DataOps deploy this as an automated (nightly?) process.

Write an Extension generation script

We need a script that can read in a TSV or spreadsheet file with the necessary inputs to define an Extension resource and generate the the JSON file with the Extension payload.

This script should live in kf-model-fhir/scripts

Secure FHIR server with TLS

We need to start securing the FHIR server in preparation for production deployment. We also need to do this because netlify deploy preview links don't work without it - bad for PR reviews and in progress demos.

Short term:

  • Secure only the FHIR API endpoint (port 8000) with TLS. Smile CDR supports TLS 1.3

Longer term (later sprints):

  • Don't expose all ports in the firewall, only 80 and 443
  • Setup reverse proxy for all endpoints so that we only need to export port 80, 443
  • Setup TLS 1.3 for reverse proxy/webserver
  • Setup Auth0/SSO

Document feedback on FHIR Phenopackets and lessons learned on FHIR Ingest

Create a file called lessons_learned.md in https://github.com/d3b-center/clinical-data-flow/tree/master/docs. Make a PR against the clinical-data-flow repo. We can move this somewhere else if we find a better place for it.

The document should include issues with or modifications needed on the FHIR Phenopackets model. One goal of the document is to communicate the FHIR Phenopackets issues in the GA4GH phenopackets slack channel and create Github issues on the Phenopackets on FHIR repository.

Begin prototyping a FHIR data profiler/browser

Start prototyping a web app that gives someone a good visual overview of the data they have in a FHIR server. We still need to flesh out the scope of this, but putting this story here for tracking purposes

  • Use the public synthea server - https://synthea.mitre.org/ which has a lot of realistic but test data loaded into it
  • Once we complete #15 then you can start interfacing with the Kids First FHIR server

Experimentally Load PCGC data into FHIR data service

The purpose of this story is for the responsible developer(s) to get an understanding of easy it is to map data to FHIR and load data in a FHIR server. Whatever is produced from this story will inform the longer-term ETL design from the warehouse to FHIR servers.

  • Map PCGC data in the warehouse to the Phenopackets FHIR model
  • Write script (whatever responsible developer wants) to ETL PCGC data into final FHIR data service
  • Record what changes are needed to the Phenopackets FHIR model and submit issues against the Kids First FHIR model

Maintenance/Support of Ingest Library

This story is here to capture any work we're doing on the ingest library to support data wranglers and ingestion of studies. It will likely keep moving into future milestones.

Design Warehouse to FHIR service ETL

  • Refactor Ingest library target API config, message packer, and unique key generation
  • FHIR server message packer and potentially new loader too

Automate warehouse loading on kf-ingest-packages

Every time a commit is pushed to the kf-ingest-packages repository, run stages et with warehouse loading for that package. This will tightly couple the version of the code with the version of the produced data and encourage developers to keep code updated in the repo.

This trigger could also be used to generate other automated processes such as accounting, data QA tests, etc. Or these things could also be triggered off of updates to the data in the warehouse.

User and permissions management for warehouse

Create a user per study
User has createdb and usage of ExtractStage, TransformStage schema permissions
Anyone who is working on ingest for that study needs those creds
Could manage all this through a users.yaml file in this repo at warehouse/users.yaml

Create Participant StructureDefinition

Create a branch and PR on kf-model-fhir. The SD JSON file should go in kf-model-fhir/site_root/source/resources.

Make sure that any Extensions for Participant are included in the SD.

Create Biospecimen StructureDefinition

Create a branch and PR on kf-model-fhir. The SD JSON file should go in kf-model-fhir/site_root/source/resources.

Make sure that any Extensions generated for Biospecimen are included in the SD.

Research solution for FHIR staging service

Aidbox has best search capabilities so it would be great for a staging service provided:

  • You can turn validation off (this was a TBD feature a couple months ago)
  • You can still do arbitrary attribute searches on regular FHIR resources. We don't want to have to create Aidbox version of FHIR resources just to get access to this feature

If any of the above turns out to be false then we will use Smile CDR with validation turned off as the staging service

Load PCGC data into FHIR server

By the end of this sprint we should at least have Study, Participant, Biospecimen, Phenotype, Outcome, Diagnosis resources from the PCGC data loaded into our FHIR server.

Generate PFB file for PCGC study

  • Finish implementation of PFB exporter
  • Download PCGC from Kids First production DB
  • Run pfb exporter CLI on PCGC data and send to Broad Institute
  • Update Jupyter Notebook with examples

Load actual HPO codes for PCGC phenotype observations

Now:

  • Create a CodeSystem for the entire HPO ontology (https://github.com/aehrc/fhir-owl)
  • Load into FHIR server
  • Update ValueSet-phenotypic-feature-type.json to include all codes from the HPO CodeSystem
  • Reload the PCGC phenotype observations with their actual HPO codes and not the general one
    HPO:000118.

Near Term:

  • Figure out how to use/register external ontology for use in FHIR server
  • Does FHIR spec include a specification for a FHIR terminology server? Do we need to implement a terminology server for ontologies that are not supported by vanilla FHIR?

Kids First FHIR Extension spreadsheet: Participant, Biospecimen

Add a tab to the Kids First Data Service to FHIR Mapping spreadsheet for the new Extensions that need to be created.

The first pass will consist of Extensions needed for Participant and Biospecimen.

The Extension spreadsheet will be fed into the script here #49 to generate the Extension StructureDefinitions.

Load actual NCIT and MONDO codes for disease instances

Branch off of the kf-model-fhir/add-phenopackets-model branch and create Pull Request against that branch

All of your conformance resource should go in:
https://github.com/kids-first/kf-model-fhir/tree/add-phenopackets-model/site_root/source/resources

image

  • Modify Disease StructureDefinition - remove Condition.code (max cardinality to 0)
  • Create CodeSystem for all terms in MONDO -> CodeSystem-disease-term-mondo.json
  • Create CodeSystem for all terms in NCIT -> CodeSystem-disease-term-ncit.json
  • Create new Valueset to capture values from MONDO and NCIT CodeSystems -> ValueSet-disease-term.json
  • Create new extension for Disease called term -> StructureDefinition-disease-term.json
  • Bind ValueSet to Disease term extension
  • Bind new Disease term extension to Disease StructureDefinition
  • Create SearchParameter for Disease term extension -> SearchParameter-disease-term.json

Create and load search parameters for Phenopackets FHIR model

Phenopackets FHIR model is loaded into our FHIR server, but it doesn't include any SearchParameter resources at the moment. This means we won't be able to search for resources using any of the Phenopacket attributes (FHIR extensions e.g. karyotypic-sex).

Would be great if we can autogenerate the SearchParameters given the extension StructureDefinition files.

Deploy ingest warehouse to production

  • Discuss user+permission management for warehouse with devops team
    • Create group per study, group has CREATEDB permissions, add user for each ingest developer working on that study
  • Probably don't need all 3 environments, just need prd
  • Submit tickets to kf-devops for RDS infra
    • Can we use same RDS as kf data warehouse to manage ingest study dbs or is it best to
      have our own RDS instance
    • Can we use current kf metabase deployment for our Metabase needs

Prepare for Kids First FHIR Model Development

  • Clean up kf-model-fhir repo branches
  • Merge the PR for SearchParam generation from extension
  • Merge the PR for CodeSystem generation from owl
  • Publish naming conventions markdown doc on kf-model-fhir
    • For conformance and example resource files and resource identifiers

Enable searching of resources by added attributes (extensions)

Autogenerate SearchParameters for a given set of extensions
A user cannot search for resources by added attributes (extensions) unless SearchParameter resources have been created and loaded into the server for those extensions.

For example the Phenopackets FHIR model has an extension called karyotypic-sex. The karyotypic-sex extension defines a new attribute on the Patient resource. Users cannot do searches like this: /Patient?karyotypic-sex=XY right now because the FHIR server does not have a SearchParameter for it.

Add a new CLI command to the kf-model-fhir CLI to generate SearchParameters for a file or directory containing extensions.

Make PR to add SearchParameter files to kf-model-fhir

Load SearchParameters into FHIR server
Load the generated SearchParameters into the FHIR server

Spec out naming conventions for FHIR model files

Create a markdown document describing:

  • How all different types of conformance resource files will be named (e.g. StructureDefinition-Participant.json vs Participant.json vs SD_Participant.json)
  • How example resource files will be named

Create a branch and PR for this on kf-model-fhir

Create FHIR 101 Demo + Guide

Every team member should have a practical understanding of basic FHIR concepts:

How are entities and their attributes represented in FHIR
How to model things - add/remove entity attributes, change entity relationships, and change constraints on relationships and attributes
How to make entity attributes searchable
How to represent entity attributes in various ontologies

Load initial Kids First FHIR Model into server

  • Remove all of Phenopackets FHIR model

  • Load all of the following:

  • Participant StructureDefinition

  • Participant Extension StructureDefinition(s)

  • Participant SearchParameters for its Extension(s)

  • Biospecimen StructureDefinition

  • Biospecimen Extension StructureDefinition(s)

  • Biospecimen SearchParameters for its Extension(s)

  • Phenotype StructureDefinition

  • Phenotype Extension StructureDefinition(s)

  • Phenotype SearchParameters for its Extension(s)

We may also have to load the following into the server as well:

  • HPO CodeSystem
  • NCIT CodeSystem
  • MONDO CodeSystem
  • UBERON CodeSystem

Investigate search optimization for FHIR Data Dashboard

The FHIR data dashboard takes a long time to load results because it requires many calls to the FHIR server to aggregate data.

First investigate use of server side caching

If that is not enough then also investigate the following options:

Kids First Data Service to FHIR mapping spreadsheet: PT, BS

Create a spreadsheet that captures which Kids First entities we are modeling in the MVP, which FHIR resources they map to, the name and types of the Kids First attributes, and which FHIR
attributes or extensions the they map to.

Something like this:

Participant tables:

Kids First Entity Kids First Attribute Kids First Attr Type FHIR Base Type FHIR Attr FHIR Type
Participant is_proband boolean Patient None N/A
Participant gender enum Patient gender code
Phenotype hpo_id string Observation code.codings.0.code CodeableConcept.Coding.code

Need extensions for these attributes because they are not part of :

We need to know if extensions exist for these and where they are or if we need to create new extensions.

Kids First Entity Kids First Attribute Kids First Attr Type FHIR Base Type FHIR Extension FHIR Type
Participant ethnicity boolean Patient None N/A

Planning + Design Automated QC/Accounting

  • Setup meeting with DataOps team to beginning planning out design for automated quality control checks and reporting.
  • Produce an initial architecture
  • Maybe begin implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.