CDE in a box
is a collection of software applications which enables creation, storing and publishing of "Common Data Elements" according to the CDE semantic model.
In order to use the cde-in-box solution you have to
meet following requirements.
User requirements (Person who is deploying this solution)
- Basic knowledge on Docker
- Basic GitHub knowledge
- Awareness of CDE semantic model
System requirements (Machine where this solution is being deployed)
- Docker engine
- Docker-compose application
The image below gives an overview of softwares used in the CDE in a box
solutions.
Triple store:
To store the rdf
documents generated by the CDE in a box
solution we need to have a triplestore which stores these document. In the CDE in a box
solution we use graphDB as a triplestore. To know more about the graphDB triplestore please visit this link
FAIR Data Point:
To describe the content of your resource we need a metadata provider
component. For the CDE in a box
solution we use FAIR Data Point
software that provides description (metadata) of you resource. To learn more about the FAIR Data Point please visit this link
In this section we list other known CDE in a box
solutions.
MOLGENIS CDE in a box
MOLGENIS EDC provider also provides a complete set of CDE in a box
with EDC system. To learn more about MOLGENIS implementation of the CDE in a box
solution please visit this link
To use CDE in box solutions clone this repository to your machine.
git clone https://github.com/ejp-rd-vp/cde-in-box
Once you have cloned this repository please follow the instructions below to properly configure the CDE in a box
solution.
The docker-compose.yml
file in directory cde-in-box/bootstrap
will setup up graphDB triple store and creates fdp
and cde
repositories in graphDB. These two repositories are used by other services in CDE in box so make sure that bootstrap services are property setup before you proceed further.
To run docker-compose.yml
file in cde-in-box/bootstrap
you need graphDB triple store free edition. Follow the steps below to get free edition of graphdb.
Step 1: GO to this url and registry to download GraphDB free edition.
Step 2: The download will be sent to your email. From the email follow link to download page and click
on "Download as a stand-alone server". This step will download "graphdb-free-{version}-dist.zip" file to your machine.
Step 3: Move "graphdb-free-{version}-dist.zip" file to the following location
mv graphdb-free-{version}-dist.zip cde-in-box/bootstrap/graph-db
Step 4: If your graphdb version
is different from 9.7.0
then change the version number of graph DB in the docker-compose file.
graph_db:
build:
context: ./graph-db
dockerfile: Dockerfile
args:
version: 9.7.0
Once you have done above configurations you can run bootstrap
services by running docker-compose.yml
file in cde-in-box/bootstrap
directory.
docker-compose up -d
If the deployment is successful then you can access the graphDB by visiting the following URL.
Note: If you deploy CDE in a box
solution in your laptop then check only for local deployment url.
Service name | Local deployment | Production deployment |
---|---|---|
GraphDB | http://localhost:7200 | http://SERVER-IP:7200 |
By default GraphDB service is secured so you need credentials to login to the graphDB. Please find the default graphDB's credentials in the table below.
Username | Password |
---|---|
admin |
root |
The docker-compose.yml
file in directory cde-in-box/metadata
will setup up FAIR Data Point
and connects FAIR Data Point to triple store created in the bootstrapping step.
Step 1: Before you run metadata services make sure that graphDB triple store is up running. You can check by going to following url. For local deployment (in your laptop) http://localhost:7200
for production deployment (in your server) http:server_ip:7200
Step 2: Check if fdp
repository is available in the graphDB triple store.
Once you have done above checks you can run metadata
services by running docker-compose.yml
file in cde-in-box/metadata
directory.
docker-compose up -d
If the deployment is successful then you can access the FAIR Data Point by visiting the following URL.
Service name | Local deployment | Production deployment |
---|---|---|
FAIR Data Point | http://localhost:8080 | http://SERVER-IP:8080 |
Note: If you deploy CDE in a box
solution in your laptop then check only for local deployment url.
In order to add content to the FAIR Data Point you need credentials with write access. Please find the default FAIR Data Point's credentials in the table below.
Username | Password |
---|---|
[email protected] |
password |
The transformation services take CSV
as input files. We provide CSVs
with example data and YARRRML
templates for each CDE module here. (note that this leads you to the CDE Version 2 models. If you still depend on Version 1 models, navigate up one folder to the Version 1 models).
The YARRRML
templates are always loaded from GitHub automatically, so they stay up-to-date as we change the models in EJP-RD, but the CSV
files must be added by the user.
Step 1: Folder structure
Make sure the following folder structure, relative to where you plan to keep your pre and post-transformed data, is available:
.
.cde-ready-to-go/data/
.cde-ready-to-go/data/mydataX.csv (input csv files, e.g. "height.csv")
.cde-ready-to-go/data/mydataY.csv...
.cde-ready-to-go/config/ (this is the folder where yarrrml templates will be automatically loaded from the EJP repository)
Step 2: Edit the .env file
the .env file will create the values for the environment variables in the docker compose file. The first of these baseURI
is the base for all URLs that represent your transformed data. This should be set to something like:
http://my.database.org/my_rd_data/
this will result in Triple that look like this:
<http://my.database.org/my_rd_data/person_123345_asdssaewe#ID> <sio:has-value> <"123345">
optimally, these URLs will resolve...
Step 3: Running data transformation services
Then you can run the data transformation services setup by running the docker-compose.yml
file in cde-in-box/cde-ready-to-go
directory. Be sure that you move this into the appropriate location; THE docker-compose MUST BE RUN IN THE SAME FOLDER THAT CONTAINS THE ./data and ./config and subfolders
You should then refresh your local copies of the docker images, to ensure they are up-to-date with what EJP is providing:
docker-compose pull
followed by:
docker-compose up -d
Step 4: Input CSV files
Put an appropriately columned XXXX.csv
into the cde-in-box/cde-ready-to-go/data
. Please look into this github repository for examples of CDEs CSV
files.
Step 5: Input YARRRML templates
The YARRRML
templates are always loaded from GitHub automatically on step 5, so they stay up-to-date as we change the models in EJP-RD.
Make sure the YARRRML
templates files are matching your CSV
files names XXXX_yarrrml_template.yaml
and are in the cde-in-box/cde-ready-to-go/config
folder. Please look into this github repository for CDEs YARRRML
templates.
Step 6: Executing transformations
Call the url: http://localhost:4567 or http://SERVER-IP:4567 to trigger the transformation of each CSV file, and auto-load into graphDB (this will over-write what is currrently loaded! We will make this behaviour more flexible later)
Note: If you deploy CDE in a box
solution in your laptop then check only for localhost url.
There is sample data (height.csv) in the "cde-ready-to-go/data" folder that can be used to test your installation.
YARRRML is one the core technology which has been used in our data transformation service. If you like to extend the CDE semantic model or add other semantic model to describe your data then, you have to provide custom YARRRML templates to the data transformation service. To learn more about building custom YARRRML templates please try matey webapp.