MarkLogic Healthcare Starter Kit

MarkLogic Healthcare Starter Kit

Description & Purpose

This README is intended as a short description of the project and instructions for getting set up and running. For more information on the project as a whole please refer to the Cookbook

The MarkLogic Healthcare Starter Kit (HSK) is a working project for a healthcare payer data hub, particularly geared toward service to Medicaid customers. Also called an Operational Data Store (ODS), the HSK supports a mandate by the U.S. Centers for Medicare and Medicaid Services (CMMS) to comply with the Fast Healthcare Interoperability Resources (FHIR) specification for the electronic exchange of healthcare information.

MarkLogic HSK is intended as a starting point for a healthcare data hub with working code, as well as sample data and configurations. It is also a good foundation for implementing FHIR-compliant data services when used in combination with the Marklogic FHIR Mapper.

Users can upload raw, heterogeneous health records and use the harmonization features inherited by the HSK from the MarkLogic Data Hub to canonicalize and master their data. MarkLogic’s powerful default indexing and other Data Hub features make it easy to explore data and models to gain additional insight for future development and operations.

Documentation for external projects, tools, and specifications referenced by this README are available as follows:

Get the Healthcare Starter Kit (HSK)

Clone the source or download a tagged release zip file from the MarkLogic HSK repository.

Deploy the HSK

The HSK was built and tested with the following prerequisites:

Java 8 or 11
MarkLogic Data Hub Central v5.5.1
MarkLogic Server >= v10.0-7

Installation Steps:

Note: Installation steps assume a MarkLogic Server user/role with sufficient privileges is specified. Refer to the MarkLogic Data Hub documentation if needed.

Download MarkLogic Data Hub Central using the link above
Unzip the tagged release or clone the source into a directory of your choosing.
At the top level of your project directory, change the mlUsername and mlPassword properties in gradle-local.properties to set your default user's username and password, based on the MarkLogic user you intend to use (admin, DrSmith, etc.).
- The project includes several sample demo users, such as DrSmith (password demo), who is capable of running all operations.
Deploy Healtcare Starter Kit data hub:
- ./gradlew mlDeploy
  - See Maintaining and Modifying the HSK below.
- ./gradlew mlLoadData
  - Loads reference data input to user-defined steps and functions included with this project
- ./gradlew loadOntologies
  - Loads ontologies for ICD10CM & ICD10PCS, and SNOMED-CT if it exists.

Using the HSK

There are two primary ways to access and use the deployed HSK.

For GUI access, use MarkLogic Data Hub Central.
For command line access, use gradle.

A mix of these methods can be used as needed by your development requirements. See Maintaining and Modifying the HSK below for more information.

Using Data Hub Central

In the top level of your project directory, run java -jar marklogic-data-hub-central-5.5.1.war

At this point, you can use Data Hub Central to run the processing flows to ingest, curate, and explore the sample data and models provided.

Using Gradle

If you prefer using the CLI to run and test flows, you can use the premade tasks we have provided to ingest & harmonize data instead via the provided gradlew utility.

Ingesting the Data

To ingest all data you can run ./gradlew ingest, or to ingest a smaller set of claims (for faster setup) you can run ./gradlew ingestSmaller.

If you would like to load sets of data individually you can run the tasks that the above depend on instead:

./gradlew ingestClaimLines # or ingestClaimLinesSmaller
./gradlew ingestClaims # or ingestClaimsSmaller
./gradlew ingestOrganizations
./gradlew ingestPatients
./gradlew ingestPayers
./gradlew ingestProviders

Curating the Data

To curate all previously ingested data you can run ./gradlew harmonizeAll.

If you would like to curate sets of data individually you can run the tasks that the above depends on instead:

./gradlew harmonizeClaims
./gradlew harmonizeOrganizations
./gradlew harmonizePatients
./gradlew harmonizeProviders

Running Unit and Integration Tests

To verify the deployment, two test suites are provided.

To run JUnit integration test of the complete flow from ingest to curation, use ./gradlew test
To run MarkLogic Unit Tests (developed in server-side Javascript), use ./gradlew mlUnitTest

The test suites can be found in the following project directories:

JUnit integration: src/test/java/com/marklogic/hsk
MarkLogic unit tests: src/test/ml-modules/root/test/suites
- The ClaimSuite is an example of a fully self-contained, independent test suite that can be run just after setup is done, without needing to load data. The other unit test suites are not necessarily configured to run independently of data load.

Maintaining and Modifying the HSK

Extending the HSK

See the Cookbook for more information on how to extend the HSK.

As mentioned previously, this project is intended as a starting point for a healthcare data hub and provides many reusable functions & code modules. While most of the code is reusable, the sample data and ingestion/mapping steps will have to be replaced to work with your own data.

About the sample source data

The sample health population data provided in this project was generated using the Synthea synthetic health records project. It is included for illustration purposes only and should be replaced with your raw data files.

The HSK project provides sample records for 755 patients and associated healthcare providers, organizations, claims, claims transactions, and payors.

ml-gradle

The Marklogic Gradle plugin (ml-gradle) provides the commands needed to deploy, maintain, test and modify the HSK. Full documentation can be found on the ml-gradle Wiki

Data Hub Central and ml-gradle

Data Hub Central (DHC) can be used to modify entities, run ingest and curation steps, explore content, and monitor jobs. Please note that when making changes using DHC, they are not propagated to the local project directory. You can run ./gradlew hubPullChanges to download the changes made in DHC and write them to your local project directory.

./gradlew hubPullChanges will overwrite any local changes you have made to Data Hub artifacts that were not pushed to the database using ./gradlew hubDeployUserArtifacts. Code modules and configuration will not be overwritten.

Deployment best practices and caveats

If you happen to clear or delete all of your user data from the staging database, data-hub-STAGING, you will need to re-ingest the reference data by running ./gradlew mlLoadData

This will restore the reference document contents found in the referenceData/ directory into the collection required to run user-defined steps included with the project.

Loading the SNOMED-CT Ontology

If your data does not use SNOMED-CT codes this section can be skipped

If you need to load a SNOMED-CT Ontology into your HSK instance, you will need to download the ontology yourself as it requires a license for use and distribution.

Once downloaded you will need to run the ZIP file through the snomed-owl-toolkit and then run the resulting ontology-<time-run>.owl through ROBOT in order to transform the data into a format that will be understood by MLCP for ingestion. Once transformed place the ingestable file at ./src/main/ml-data/ontologies/SNOMED-CT.ttl and you will be able to run ./gradlew loadSnomedCTOntology (or if you want to use all 3 ontologies you can run ./gradlew loadOntologies).

# Example commands for transforming SNOMED-CT ontology into an ingestable format
# from the downloaded ZIP file, as run from the project directory
#
# Convert the RF2 files contained in the ZIP file to Functional OWL syntax.
# Example output filename: ~/downloads/ontology-2022-01-19_11-05-40.owl
java -jar -xms4g snomed-owl-toolkit-3.0.3-executable.jar \
     -rf2-to-owl ~/downloads/SnomedCT_PRODUCTION.zip
# Convert from Functional OWL to TTL
java -jar robot.jar convert \
     --input ~/downloads/ontology-2022-01-19_11-05-40.owl \
     --output ./src/main/ml-data/ontologies/SNOMED-CT.ttl

marklogic-community / marklogic-healthcare-starter-kit Goto Github PK