Giter Site home page Giter Site logo

iris2bq's Introduction

iris2bq - A utility to move InterSystems IRIS Data to Google Cloud Platform's Big Query

InterSystems IRIS For The Badge

What

Let's say IRIS is contributing to workload for a Hospital system, marshalling DICOM, HL7, FHIR, or CCDA. Natively, IRIS persists these objects in various stages of the pipeline via the nature of the business processes and anything you included along the way.

Lets send that up to Google Big Query to augment and compliment the rest of our Data Warehouse data and ETL (Extract Transform Load) or ELT (Extract Load Transform) to our hearts desire.

Why

  • Were capturing a wealth of transactional Healthcare data in our IRIS for Health clusters.
  • Our Data Warehouse is in Google Cloud Platform Big Query (GCP).
  • Our Data Science team asked us to put in a process to automatically send transactional data to the Data Warehouse.

Who

Technical Actors

How

This iris2bq utility has got your back.

Exactly how again?

  • It exports the data from IRIS into DataFrames
  • It saves them into GCS as .avro to keep the schema along the data: this will avoid to specify/create the BigQuery table schema beforehands.
  • It starts BigQuery jobs to import those .avro into the respective BigQuery tables you specify.

Note: internally, it is using the Spark framework for the sake of simplicity, but no Hadoop cluster is needed. It is configured as a "local" cluster by default, meaning the application and is running standalone.

How Its Implemented

  1. IRIS for Health is implemented as a Docker Container and deployed locally running Docker.
  2. Google Cloud Platform needs a project, with a service account and the account able to use Big Query and Google Cloud Storage.
  3. iris2bq synching those two places above ^^.

Let's Go

How to run it

  • ☕ Ensure you have a Java SDK on your system OpenJDK1.8 worked for us.
  • ⬇️ Download the latest release iris2bq
  • ☁️ Create a Service Account to Google Cloud which has access to GCS and BigQuery, and create a json key
  • 📄 Create a configuration file configuration.conf for iris2bq to know where to grab and put the data:
jdbc {
  url = "jdbc:IRIS://127.0.0.1:51773/USER"
  user = "_SYSTEM"
  password = "flounder"
  tables = [ "people" ]
}

gcloud {
  project = "iris2bq-demo"
  service-account-key-path = "service.key.json"
  bq.dataset = "iris2bqdemods"
  gcs.tmp-bucket = "iris2bqdemobucket"
}

Up there in the jdbc connection block is where the magic happens for us. Just specify a list of applicable tables in IRIS for the Namespace you are connecting to (USER) and watch them appear in Big Query in our Dataset.

  • Run the application specifying the config file:
GOOGLE_CLOUD_PROJECT=iris2bq-demo GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json ./bin/iris2bq -Dconfig.file=configuration.conf
  • Done!

Add this to a scheduler (Airflow, crontab, etc) every 10min and enjoy your JOINs in BigQuery of your IRIS Data.

Give this a shot yourself

If you want to give this a shot on your own, here is a quick way to get up and running with InterSystems IRIS for Health. You will need Docker installed on your system.

Show IRIS Deployment Instructions
  1. Build Container:

    $ cd demo
    $ bash iris-docker.sh
    
  2. Create a GCP project with Big Query and Google Cloud Storage API's enabled:

    gcloud projects create iris2bq-demo--enable-cloud-apis
    
  3. -or- try your hand as a Terraformer

    $ cd demo
    $ terraform init
    $ terraform plan
    $ terraform apply

Development

Got a better idea? See a horrific bug? Grab sbt for your system.

Show iris2bq Development Instructions
  1. Build:

    $ git clone https://github.com/basenube/iris2bq.git
    $ cd iris2bq
    // Develop, Develop, Develop
    $ sbt stage

Thank You

Thanks to our valued partners InterSystems and Google Cloud Platform for the feedback and support.

Credits

We dont pretend to be very good at Scala. Most of the original guts of this thing come out of the great work of these hackers at Powerspace. We have used a similar version of this they developed for Postgres to lift transactional data to a datawarehouse for a customer without issue for over a year now.

iris2bq's People

Contributors

sween avatar ch-ronsweeney avatar

Stargazers

Scientific Cloud avatar  avatar  avatar

Watchers

 avatar James Cloos avatar  avatar Anton Umnikov avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.