d3b-center / clinical-data-flow Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 4.34 MB

📓Project management and design artifacts for Clinical Data Flow

Home Page: https://handbook.d3b.io/docs/products/clinical-data-flow

License: Apache License 2.0

Shell 100.00%

architecture fhir project-management

clinical-data-flow's People

Contributors

Watchers

clinical-data-flow's Issues

Load CHOP RIMGC GIC dataset into FHIR server

Questions

Can we create a study for this on data tracker so that data files may be fetched from there?
What kind of data is this - any PHI?
How will we organize the patients in our FHIR server into studies/datasets? Which resource should we use to represent this? The Phenopackets FHIR model may not represent study, we should check to see if we will have to modify the model to accommodate.

Tasks

Create ingest package for this dataset
Ingest dataset to warehouse
Ingest from warehouse into FHIR server

Make it easier for developers to setup CDF tools + env

Create a 1-click script with a docker-compose stack to bootstrap the CDF pipeline for developers.

We need to make it easier for other developers to spin up our CDF pipeline so that they can collaborate with us, test our our pipeline for demo purposes or use it for their own development needs.

Ingest Kids First studies into D3b Warehouse

DataOps team wants to copy over studies from Kids First postgres db to D3b postgres db.

Meen created a CLI tool to do this, but still needs to provide support to DataOps users, fix any bugs, etc.

Meen help DataOps deploy this as an automated (nightly?) process.

Write an Extension generation script

We need a script that can read in a TSV or spreadsheet file with the necessary inputs to define an Extension resource and generate the the JSON file with the Extension payload.

This script should live in kf-model-fhir/scripts

Secure FHIR server with TLS

We need to start securing the FHIR server in preparation for production deployment. We also need to do this because netlify deploy preview links don't work without it - bad for PR reviews and in progress demos.

Short term:

Secure only the FHIR API endpoint (port 8000) with TLS. Smile CDR supports TLS 1.3

Longer term (later sprints):

Don't expose all ports in the firewall, only 80 and 443
Setup reverse proxy for all endpoints so that we only need to export port 80, 443
Setup TLS 1.3 for reverse proxy/webserver
Setup Auth0/SSO

Document feedback on FHIR Phenopackets and lessons learned on FHIR Ingest

Create a file called lessons_learned.md in https://github.com/d3b-center/clinical-data-flow/tree/master/docs. Make a PR against the clinical-data-flow repo. We can move this somewhere else if we find a better place for it.

The document should include issues with or modifications needed on the FHIR Phenopackets model. One goal of the document is to communicate the FHIR Phenopackets issues in the GA4GH phenopackets slack channel and create Github issues on the Phenopackets on FHIR repository.

FHIR Data Dashboard Wrap-up

Wrap up feature development so that we can begin using dashboard internally

Close out remaining PRs
Implement kids-first/kf-ui-fhir-data-dashboard#33 and kids-first/kf-ui-fhir-data-dashboard#33

Ingest PCGC data into D3b Warehouse

Generate search parameters for Participant, Biospecimen, Phenotype

Use the search parameter generation script in kf-model-fhir/scripts to generate search parameters for all Participant, Biospecimen, and Phenotype Extensions.

Modify ingest lib to persist intermediate ingest output in D3b Warehouse

Include documentation on the Warehouse schema so that users know how data is organized per study

Learn FHIR Basics - Step through FHIR Guide

Everyone should step through the guide on your own time. The earlier we all learn FHIR basics the better prepared we will be to design tools to work with FHIR.

FHIR Data Dashboard Features + Refinements

Diagram and doc for how data is being queried and aggregated
Make changes for querying resources by profile
Meet with Allison to discuss next steps
- How do we want to display ontology stuff?

Issues to be implemented are:
kids-first/kf-ui-fhir-data-dashboard#4
kids-first/kf-ui-fhir-data-dashboard#5
kids-first/kf-ui-fhir-data-dashboard#6
kids-first/kf-ui-fhir-data-dashboard#7

Explore the utility of PFB (Portable Bioinformats Format)

Convert a Kids First study (PCGC) to PFB to get a better understanding of its pros and cons.

See repo for [PFB Exporter] (https://github.com/d3b-center/d3b-lib-pfb-exporter)

See notebook for PFB exploration

Begin prototyping a FHIR data profiler/browser

Start prototyping a web app that gives someone a good visual overview of the data they have in a FHIR server. We still need to flesh out the scope of this, but putting this story here for tracking purposes

Use the public synthea server - https://synthea.mitre.org/ which has a lot of realistic but test data loaded into it
Once we complete #15 then you can start interfacing with the Kids First FHIR server

Optimize accounting process in kf-lib-data-ingest

Per kids-first/kf-lib-data-ingest#342

Accounting implementation is memory intensive and takes too long. CDF team should consider doing this in one of the upcoming sprints.

Create Phenotype StructureDefinition

Kids First MVP FHIR Model

Study
Family
Participant
Biospecimens
Phenotype
Diagnosis

Experimentally Load PCGC data into FHIR data service

The purpose of this story is for the responsible developer(s) to get an understanding of easy it is to map data to FHIR and load data in a FHIR server. Whatever is produced from this story will inform the longer-term ETL design from the warehouse to FHIR servers.

Map PCGC data in the warehouse to the Phenopackets FHIR model
Write script (whatever responsible developer wants) to ETL PCGC data into final FHIR data service
Record what changes are needed to the Phenopackets FHIR model and submit issues against the Kids First FHIR model

Maintenance/Support of Ingest Library

This story is here to capture any work we're doing on the ingest library to support data wranglers and ingestion of studies. It will likely keep moving into future milestones.

Design Warehouse to FHIR service ETL

Refactor Ingest library target API config, message packer, and unique key generation
FHIR server message packer and potentially new loader too

Automate warehouse loading on kf-ingest-packages

Every time a commit is pushed to the kf-ingest-packages repository, run stages et with warehouse loading for that package. This will tightly couple the version of the code with the version of the produced data and encourage developers to keep code updated in the repo.

This trigger could also be used to generate other automated processes such as accounting, data QA tests, etc. Or these things could also be triggered off of updates to the data in the warehouse.

Introduce Warehouse to DataOps Team

Quick walkthrough of the warehouse, schema, etc.

Python mappers for Participant FHIR resource

Stub out the Python dict representing JSON payload for the Participant FHIR resource

User and permissions management for warehouse

Create a user per study
User has createdb and usage of ExtractStage, TransformStage schema permissions
Anyone who is working on ingest for that study needs those creds
Could manage all this through a users.yaml file in this repo at warehouse/users.yaml

Create Participant StructureDefinition

Create a branch and PR on kf-model-fhir. The SD JSON file should go in kf-model-fhir/site_root/source/resources.

Make sure that any Extensions for Participant are included in the SD.

Create Biospecimen StructureDefinition

Create a branch and PR on kf-model-fhir. The SD JSON file should go in kf-model-fhir/site_root/source/resources.

Make sure that any Extensions generated for Biospecimen are included in the SD.

Research solution for FHIR staging service

Aidbox has best search capabilities so it would be great for a staging service provided:

You can turn validation off (this was a TBD feature a couple months ago)
You can still do arbitrary attribute searches on regular FHIR resources. We don't want to have to create Aidbox version of FHIR resources just to get access to this feature

If any of the above turns out to be false then we will use Smile CDR with validation turned off as the staging service

Load PCGC data into FHIR server

By the end of this sprint we should at least have Study, Participant, Biospecimen, Phenotype, Outcome, Diagnosis resources from the PCGC data loaded into our FHIR server.

Generate PFB file for PCGC study

Finish implementation of PFB exporter
Download PCGC from Kids First production DB
Run pfb exporter CLI on PCGC data and send to Broad Institute
Update Jupyter Notebook with examples

Load actual HPO codes for PCGC phenotype observations

Now:

Create a CodeSystem for the entire HPO ontology (https://github.com/aehrc/fhir-owl)
Load into FHIR server
Update ValueSet-phenotypic-feature-type.json to include all codes from the HPO CodeSystem
Reload the PCGC phenotype observations with their actual HPO codes and not the general one
HPO:000118.

Near Term:

Figure out how to use/register external ontology for use in FHIR server
Does FHIR spec include a specification for a FHIR terminology server? Do we need to implement a terminology server for ontologies that are not supported by vanilla FHIR?

Add warehouse data diagram

Add warehouse diagram to the warehouse README page show data organization
Point link in kf-lib-data-ingest docs to warehouse diagram

Kids First FHIR Extension spreadsheet: Participant, Biospecimen

Add a tab to the Kids First Data Service to FHIR Mapping spreadsheet for the new Extensions that need to be created.

The first pass will consist of Extensions needed for Participant and Biospecimen.

The Extension spreadsheet will be fed into the script here #49 to generate the Extension StructureDefinitions.

Load actual NCIT and MONDO codes for disease instances

Branch off of the kf-model-fhir/add-phenopackets-model branch and create Pull Request against that branch

All of your conformance resource should go in:
https://github.com/kids-first/kf-model-fhir/tree/add-phenopackets-model/site_root/source/resources

Modify Disease StructureDefinition - remove Condition.code (max cardinality to 0)
Create CodeSystem for all terms in MONDO -> CodeSystem-disease-term-mondo.json
Create CodeSystem for all terms in NCIT -> CodeSystem-disease-term-ncit.json
Create new Valueset to capture values from MONDO and NCIT CodeSystems -> ValueSet-disease-term.json
Create new extension for Disease called term -> StructureDefinition-disease-term.json
Bind ValueSet to Disease term extension
Bind new Disease term extension to Disease StructureDefinition
Create SearchParameter for Disease term extension -> SearchParameter-disease-term.json

Map Kids First Study and Diagnosis to FHIR

Add rows to the KF FHIR Model Mappings spreadsheet for Study and Diagnosis

Create and load search parameters for Phenopackets FHIR model

Phenopackets FHIR model is loaded into our FHIR server, but it doesn't include any SearchParameter resources at the moment. This means we won't be able to search for resources using any of the Phenopacket attributes (FHIR extensions e.g. karyotypic-sex).

Would be great if we can autogenerate the SearchParameters given the extension StructureDefinition files.

Deploy ingest warehouse to production

Discuss user+permission management for warehouse with devops team
- Create group per study, group has CREATEDB permissions, add user for each ingest developer working on that study
Probably don't need all 3 environments, just need prd
Submit tickets to kf-devops for RDS infra
- Can we use same RDS as kf data warehouse to manage ingest study dbs or is it best to
  have our own RDS instance
- Can we use current kf metabase deployment for our Metabase needs

Support ingestion of Kids First studies

An on-going story that may move from milestone to milestone.

First draft architecture diagram for FHIR based clinical data pipeline

Prepare for Kids First FHIR Model Development

Clean up kf-model-fhir repo branches
Merge the PR for SearchParam generation from extension
Merge the PR for CodeSystem generation from owl
Publish naming conventions markdown doc on kf-model-fhir
- For conformance and example resource files and resource identifiers

Enable searching of resources by added attributes (extensions)

Autogenerate SearchParameters for a given set of extensions
A user cannot search for resources by added attributes (extensions) unless SearchParameter resources have been created and loaded into the server for those extensions.

For example the Phenopackets FHIR model has an extension called karyotypic-sex. The karyotypic-sex extension defines a new attribute on the Patient resource. Users cannot do searches like this: /Patient?karyotypic-sex=XY right now because the FHIR server does not have a SearchParameter for it.

Add a new CLI command to the kf-model-fhir CLI to generate SearchParameters for a file or directory containing extensions.

Make PR to add SearchParameter files to kf-model-fhir

Create a branch off of kf-model-fhir/add-phenopackets-model and make your PR against that branch
All SearchParameters should go in https://github.com/kids-first/kf-model-fhir/tree/add-phenopackets-model/site_root/source/resources

Load SearchParameters into FHIR server
Load the generated SearchParameters into the FHIR server

Simplify FHIR model development workflow via upgrade to new IG structure

See kids-first/kf-model-fhir#71 for motivation and details

Spec out naming conventions for FHIR model files

Create a markdown document describing:

How all different types of conformance resource files will be named (e.g. StructureDefinition-Participant.json vs Participant.json vs SD_Participant.json)
How example resource files will be named

Create a branch and PR for this on kf-model-fhir

Generate Extensions for Participant, Biospecimen, Phenotype

Run the #49 script to generate extensions for Participant and Biospecimen.

Create a branch and PR on kf-model-fhir. The Extension JSON file should go in kf-model-fhir/site_root/source/resources.

Python mapper for Biospecimen Resource

Stub out the Python dict representing JSON payload for the Biospecimen FHIR resource

Create FHIR 101 Demo + Guide

Every team member should have a practical understanding of basic FHIR concepts:

How are entities and their attributes represented in FHIR
How to model things - add/remove entity attributes, change entity relationships, and change constraints on relationships and attributes
How to make entity attributes searchable
How to represent entity attributes in various ontologies

Load initial Kids First FHIR Model into server

Remove all of Phenopackets FHIR model
Load all of the following:
Participant StructureDefinition
Participant Extension StructureDefinition(s)
Participant SearchParameters for its Extension(s)
Biospecimen StructureDefinition
Biospecimen Extension StructureDefinition(s)
Biospecimen SearchParameters for its Extension(s)
Phenotype StructureDefinition
Phenotype Extension StructureDefinition(s)
Phenotype SearchParameters for its Extension(s)

We may also have to load the following into the server as well:

HPO CodeSystem
NCIT CodeSystem
MONDO CodeSystem
UBERON CodeSystem

Investigate search optimization for FHIR Data Dashboard

The FHIR data dashboard takes a long time to load results because it requires many calls to the FHIR server to aggregate data.

First investigate use of server side caching

Smile CDR - https://smilecdr.com/docs/fhir_repository/performance_and_caching.html

If that is not enough then also investigate the following options:

Someone built a FHIR search server specifically for analytics. We should check this out!
https://pathling.app/docs/
Apache Parquet (https://chat.fhir.org/#narrow/stream/179219-analytics-on.20FHIR/topic/Parquet/near/186187743)
ElasticSearch

Kids First Data Service to FHIR mapping spreadsheet: PT, BS

Create a spreadsheet that captures which Kids First entities we are modeling in the MVP, which FHIR resources they map to, the name and types of the Kids First attributes, and which FHIR
attributes or extensions the they map to.

Something like this:

Participant tables:

Kids First Entity	Kids First Attribute	Kids First Attr Type	FHIR Base Type	FHIR Attr	FHIR Type
Participant	is_proband	boolean	Patient	None	N/A
Participant	gender	enum	Patient	gender	code
Phenotype	hpo_id	string	Observation	code.codings.0.code	CodeableConcept.Coding.code

Need extensions for these attributes because they are not part of :

We need to know if extensions exist for these and where they are or if we need to create new extensions.

Kids First Entity	Kids First Attribute	Kids First Attr Type	FHIR Base Type	FHIR Extension	FHIR Type
Participant	ethnicity	boolean	Patient	None	N/A

Python Mapper for KF Phenotype FHIR resource

Stub out the Python dict representing JSON payload for the KF Phenotype FHIR resource

Planning + Design Automated QC/Accounting

Setup meeting with DataOps team to beginning planning out design for automated quality control checks and reporting.
Produce an initial architecture
Maybe begin implementation

d3b-center / clinical-data-flow Goto Github PK

clinical-data-flow's People

Contributors

Watchers

clinical-data-flow's Issues

Recommend Projects

Recommend Topics

Recommend Org