Giter Site home page Giter Site logo

marklogic-community / marklogic-healthcare-starter-kit Goto Github PK

View Code? Open in Web Editor NEW
5.0 10.0 2.0 149.85 MB

The MarkLogic Healthcare Starter Kit (HSK) is a working project for a healthcare payer data hub, particularly geared toward service to Medicaid customers. Also called an operational data store (ODS), the HSK supports a mandate by the U.S. Centers for Medicare and Medicaid Services (CMS) to comply with the FHIR (Fast Healthcare Interoperability Resources) specifications for the electronic exchange of healthcare information.

License: Apache License 2.0

JavaScript 82.04% Java 17.96%
healthcare fhir cms nosql marklogic-data-hub marklogic-server marklogic-database

marklogic-healthcare-starter-kit's Introduction

MarkLogic Healthcare Starter Kit

Description & Purpose

This README is intended as a short description of the project and instructions for getting set up and running. For more information on the project as a whole please refer to the Cookbook

The MarkLogic Healthcare Starter Kit (HSK) is a working project for a healthcare payer data hub, particularly geared toward service to Medicaid customers. Also called an Operational Data Store (ODS), the HSK supports a mandate by the U.S. Centers for Medicare and Medicaid Services (CMMS) to comply with the Fast Healthcare Interoperability Resources (FHIR) specification for the electronic exchange of healthcare information.

MarkLogic HSK is intended as a starting point for a healthcare data hub with working code, as well as sample data and configurations. It is also a good foundation for implementing FHIR-compliant data services when used in combination with the Marklogic FHIR Mapper.

Users can upload raw, heterogeneous health records and use the harmonization features inherited by the HSK from the MarkLogic Data Hub to canonicalize and master their data. MarkLogic’s powerful default indexing and other Data Hub features make it easy to explore data and models to gain additional insight for future development and operations.

Documentation for external projects, tools, and specifications referenced by this README are available as follows:

Get the Healthcare Starter Kit (HSK)

Clone the source or download a tagged release zip file from the MarkLogic HSK repository.

Deploy the HSK

The HSK was built and tested with the following prerequisites:

Installation Steps:

Note: Installation steps assume a MarkLogic Server user/role with sufficient privileges is specified. Refer to the MarkLogic Data Hub documentation if needed.

  • Download MarkLogic Data Hub Central using the link above
  • Unzip the tagged release or clone the source into a directory of your choosing.
  • At the top level of your project directory, change the mlUsername and mlPassword properties in gradle-local.properties to set your default user's username and password, based on the MarkLogic user you intend to use (admin, DrSmith, etc.).
    • The project includes several sample demo users, such as DrSmith (password demo), who is capable of running all operations.
  • Deploy Healtcare Starter Kit data hub:
    • ./gradlew mlDeploy
    • ./gradlew mlLoadData
      • Loads reference data input to user-defined steps and functions included with this project
    • ./gradlew loadOntologies
      • Loads ontologies for ICD10CM & ICD10PCS, and SNOMED-CT if it exists.

Using the HSK

There are two primary ways to access and use the deployed HSK.

  1. For GUI access, use MarkLogic Data Hub Central.
  2. For command line access, use gradle.

A mix of these methods can be used as needed by your development requirements. See Maintaining and Modifying the HSK below for more information.

Using Data Hub Central

In the top level of your project directory, run java -jar marklogic-data-hub-central-5.5.1.war

At this point, you can use Data Hub Central to run the processing flows to ingest, curate, and explore the sample data and models provided.

Using Gradle

If you prefer using the CLI to run and test flows, you can use the premade tasks we have provided to ingest & harmonize data instead via the provided gradlew utility.

Ingesting the Data

To ingest all data you can run ./gradlew ingest, or to ingest a smaller set of claims (for faster setup) you can run ./gradlew ingestSmaller.

If you would like to load sets of data individually you can run the tasks that the above depend on instead:

./gradlew ingestClaimLines # or ingestClaimLinesSmaller
./gradlew ingestClaims # or ingestClaimsSmaller
./gradlew ingestOrganizations
./gradlew ingestPatients
./gradlew ingestPayers
./gradlew ingestProviders

Curating the Data

To curate all previously ingested data you can run ./gradlew harmonizeAll.

If you would like to curate sets of data individually you can run the tasks that the above depends on instead:

./gradlew harmonizeClaims
./gradlew harmonizeOrganizations
./gradlew harmonizePatients
./gradlew harmonizeProviders

Running Unit and Integration Tests

To verify the deployment, two test suites are provided.

  • To run JUnit integration test of the complete flow from ingest to curation, use ./gradlew test
  • To run MarkLogic Unit Tests (developed in server-side Javascript), use ./gradlew mlUnitTest

The test suites can be found in the following project directories:

  • JUnit integration: src/test/java/com/marklogic/hsk
  • MarkLogic unit tests: src/test/ml-modules/root/test/suites
    • The ClaimSuite is an example of a fully self-contained, independent test suite that can be run just after setup is done, without needing to load data. The other unit test suites are not necessarily configured to run independently of data load.

Maintaining and Modifying the HSK

Extending the HSK

See the Cookbook for more information on how to extend the HSK.

As mentioned previously, this project is intended as a starting point for a healthcare data hub and provides many reusable functions & code modules. While most of the code is reusable, the sample data and ingestion/mapping steps will have to be replaced to work with your own data.

About the sample source data

The sample health population data provided in this project was generated using the Synthea synthetic health records project. It is included for illustration purposes only and should be replaced with your raw data files.

The HSK project provides sample records for 755 patients and associated healthcare providers, organizations, claims, claims transactions, and payors.

ml-gradle

The Marklogic Gradle plugin (ml-gradle) provides the commands needed to deploy, maintain, test and modify the HSK. Full documentation can be found on the ml-gradle Wiki

Data Hub Central and ml-gradle

Data Hub Central (DHC) can be used to modify entities, run ingest and curation steps, explore content, and monitor jobs. Please note that when making changes using DHC, they are not propagated to the local project directory. You can run ./gradlew hubPullChanges to download the changes made in DHC and write them to your local project directory.

./gradlew hubPullChanges will overwrite any local changes you have made to Data Hub artifacts that were not pushed to the database using ./gradlew hubDeployUserArtifacts. Code modules and configuration will not be overwritten.

Deployment best practices and caveats

If you happen to clear or delete all of your user data from the staging database, data-hub-STAGING, you will need to re-ingest the reference data by running ./gradlew mlLoadData

This will restore the reference document contents found in the referenceData/ directory into the collection required to run user-defined steps included with the project.

Loading the SNOMED-CT Ontology

If your data does not use SNOMED-CT codes this section can be skipped

If you need to load a SNOMED-CT Ontology into your HSK instance, you will need to download the ontology yourself as it requires a license for use and distribution.

Once downloaded you will need to run the ZIP file through the snomed-owl-toolkit and then run the resulting ontology-<time-run>.owl through ROBOT in order to transform the data into a format that will be understood by MLCP for ingestion. Once transformed place the ingestable file at ./src/main/ml-data/ontologies/SNOMED-CT.ttl and you will be able to run ./gradlew loadSnomedCTOntology (or if you want to use all 3 ontologies you can run ./gradlew loadOntologies).

# Example commands for transforming SNOMED-CT ontology into an ingestable format
# from the downloaded ZIP file, as run from the project directory
#
# Convert the RF2 files contained in the ZIP file to Functional OWL syntax.
# Example output filename: ~/downloads/ontology-2022-01-19_11-05-40.owl
java -jar -xms4g snomed-owl-toolkit-3.0.3-executable.jar \
     -rf2-to-owl ~/downloads/SnomedCT_PRODUCTION.zip
# Convert from Functional OWL to TTL
java -jar robot.jar convert \
     --input ~/downloads/ontology-2022-01-19_11-05-40.owl \
     --output ./src/main/ml-data/ontologies/SNOMED-CT.ttl

marklogic-healthcare-starter-kit's People

Contributors

damonfeldman avatar theflyinggriffin avatar wattsferry avatar wgehring-ml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

tguructa txkeller

marklogic-healthcare-starter-kit's Issues

Doesn't work with latest marklogic server (11)?

When following the installation steps I get the following error:

PreInstall Check: [FAILED]


{stagingPortInUseBy=null, finalPortInUseBy=null, stagingPortInUse=false, finalPortInUse=false, safeToInstall=false, serverVersion=11.0.2, jobPortInUseBy=null, jobPortInUse=false, serverVersionOk=false}- PROBLEM: Unsupported MarkLogic Server Version: 11.0.2
FIX: Update to a supported version of MarkLogic.

I'm running the latest version of MarkLogic server (11.02) in a docker container.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.