ohdsi / commondatamodel Goto Github PK

View Code? Open in Web Editor NEW

868.0 123.0 446.0 22.06 MB

Definition and DDLs for the OMOP Common Data Model (CDM)

Home Page: https://ohdsi.github.io/CommonDataModel

R 0.79% HTML 99.21% CSS 0.01%

commondatamodel's Introduction

title

output

Readme

pdf_document

html_document

toc
true

toc	toc_float
true	true

How to Use this Repository

If you are looking for the SQL DDLs and don't wish to generate them through R, they can be accessed here.

If you are looking for information on how to submit a bugfix, skip to the next section

Generating the DDLs

This module will demonstrate two different ways the CDM R package can be used to create the CDM tables in your environment. First, it uses the buildRelease function to create the DDL files on your machine, intended for end users that wish to generate these scripts from R without the need to clone or download the source code from github. The SQL scripts that are created through this process are available as zip files as part of the latest release. They are also available on the master branch here.

Second, the script shows the executeDdl function that will connect up to your SQL client directly (assuming your dbms is one of the supported dialects) and instantiate the tables through R.

Dependencies and prerequisites

This process required R-Studio to be installed as well as DatabaseConnector and SqlRender.

Create DDL, Foreign Keys, Primary Keys, and Indexes from R

First, install the package from GitHub

install.packages("devtools")
devtools::install_github("OHDSI/CommonDataModel")

List the currently supported SQL dialects

CommonDataModel::listSupportedDialects()

List the currently supported CDM versions

CommonDataModel::listSupportedVersions()

1. Use the `buildRelease` function

This function will generate the text files in the dialect you choose, putting the output files in the folder you specify.

CommonDataModel::buildRelease(cdmVersions = "5.4",
                              targetDialects = "postgresql",
                              outputfolder = "/pathToOutput")

2. Use the `executeDdl` function

If you have an empty schema ready to go, the package will connect and instantiate the tables for you. To start, you need to download DatabaseConnector in order to connect to your database.

devtools::install_github("ohdsi/DatabaseConnector")

cd <- DatabaseConnector::createConnectionDetails(dbms = "postgresql",
                                                 server = "localhost/ohdsi",
                                                 user = "postgres",
                                                 password = "postgres",
                                                 pathToDriver = "/pathToDriver"
                                                 )

CommonDataModel::executeDdl(connectionDetails = cd,
                            cdmVersion = "5.4",
                            cdmDatabaseSchema = "ohdsi_demo"
                            )

Bug Fixes/Model Updates

NOTE This information is for the maintainers of the CDM as well as anyone looking to submit a pull request. If you want to suggest an update or addition to the OMOP Common Data Model itself please open an issue using the proposal template. The instructions contained herein are meant to describe the process by which bugs in the DDL code should be addressed and/or new versions of the CDM are produced.

Just looking for the latest version of the CDM and you don't care about the R package? Please visit the releases tab and download the latest. It will include the DDLs for all currently supported versions of the CDM for all supported SQL dialects.

Typically, new CDM versions and updates are decided by the CDM working group (details to join meetings on homepage). These changes are tracked as issues in the github repo. Once the working group decides which changes make up a version, all the corresponding issues should be tagged with a version number, e.g. v5.4, and added to a project board.

Step 0

Changes to the model structure should be made in the representative csv files by adding, subtracting, or renaming fields or tables. ETL conventions are not currently tracked by CDM version unless they are conventions specific to new fields (for example CONDITION_STATUS was added in v5.3 which specifies the way in which primary condition designations should be captured).

Bug fixes are made much the same way using the csv files, but they should be limited to typos, primary/foreign key relationships, and formatting (like datetime vs datetime2).

Step 1

If you are making changes to the model structure request a new branch in the CommonDataModel repository for the new version of the CDM you are creating. Then, fork the repository and clone the newly made branch. If you are squashing bugs fork the repository and clone the master branch.

Step 1.1

For changes to the model structure, rename the table level and field level inst/csv files from the current released version to the new version. For example, if the new version you are creating is v5.4 and the most recent released version is v5.3, rename the csv files named "OMOP_CDMv5.3_Field_Level.csv" and "OMOP_CDMv5.3_Table_Level.csv" to "OMOP_CDMv5.4_Field_Level.csv" and "OMOP_CDMv5.4_Table_Level.csv". These files serve multiple functions; they serve as the basis for the CDM DDL, CDM documentation, and Data Quality Dashboard (DQD). You can read more about the DQD here.

For squashing bugs make the necessary changes in the csv file corresponding to the major.minor version you are fixing. For example, if you are working on fixes to v5.3.3 you would make changes in the v5.3 files. (skip to step 2)

Step 1.2

The csv files can now be updated with the changes and additions for the new CDM version. If a new table should be added, add a line to the Table_Level.csv with the table name and description and list it as part of the CDM schema. The remaining columns are quality checks that can be run. Details here on what those are. After adding any tables, make any changes or additions to CDM fields in the Field_Level.csv. The columns are meant to mimic how a DDL is structured, which is how it will eventually be generated. A yes in the field isRequired indicates a NOT NULL constraint and the datatype field should be filled in exactly how it would look in the DDL. Any additions or changes should also be reflected in the userGuidance and etlConventions fields, which are the basis for the documentation. DO NOT MAKE ANY CHANGES IN THE DDL ITSELF. The structure is set up in such a way that the csv files are the ground truth. If changes are made in the DDL instead of the csv files then the DDL will be out of sync with the documentation and the DQD.

Step 1.3

Within R/listSupportedVersions.R, modify the listSupportedVersions function to have the value associated with your newly created CSVs. For example, if you created CSVs named "OMOP_CDMv5.9_Field_Level.csv" and "OMOP_CDMv5.9_Table_Level.csv", you would need to add the value "5.9" to the supportedVersions variable within the function. Save this function after you have updated it.

Step 2

Once all changes are made to the csvs and package as needed, rebuild the package and then open extras/codeToRun.R. To make sure that your new version is recognized by the package run the function listSupportedVersions(). If you do not see it, make sure your new csv files are in inst/csv and that you have rebuilt the package. Once you have confirmed that the package recognizes your new version, run the function buildRelease(). You should now see a file in inst/ddl for your new version.

NOTE ABOUT CDM v6.0

Please be aware that v6.0 of the OMOP CDM is not fully supported by the OHDSI suite of tools and methods. The major difference in CDM v5.3 and CDM v6.0 involves switching the *_datetime fields to mandatory rather than optional. This switch radically changes the assumptions related to exposure and outcome timing. Rather than move forward with v6.0, please transform your data to CDM v5.4 until such time that we as a community have fully defined the role of dates vs datetimes both when it comes to the model and the evidence we generate.

commondatamodel's People

Contributors

Stargazers

Watchers

Forkers

chrisdavidson testdev123 pabulson jlaughery cerdman leeevans arestenko rmalarc jalvarado-guzman wistephens taylordelehanty e-plus-healthcare-alliance tourismlivellc fhir-cn rahikalantari listoflist rchwang lholt2452 janicepan daanelson carolchen89 twankim marzigolbaz joowoncho gcakmakcigit cmhloveukulele blindglobe jsanzucla sibusiso16 sneakythr0ws weipuz rstarr7 technoslerphile digvijay0215 hojunpark kpolimis pgwilliams tomwhite krishna-medikeri baberryyoung rwpark99 soediro whitefangbuck mi-erasmusmc proskriptive ucdavis ericavoss cephalization11 a-stri ickevinwilkie yoosoo0 taylordeatri sungine stillbrave parisni murquizo swamyl abedtashh sigfried tokunbo-hiamang vojtechhuser maximmoinat thehyve dotnetpart anju-jsl yrpark gowthamrao kormilitzin dantohe saadjanjua clint2h dstdevops estone96 yunseobshin maroun survitup qkerby inform-health-informatics ych512 csindoni charhart brianquinnom1 janmichaelyu djb2188 hsharm3 zhu-c lucianobenetti mohammadjouni vangalamaheshh kevinjyee littlehorsea bjmckjr shrikantkunte davidcarnahan mkim0710 sujith58 jpsrao adolfoeliazat gouyou cfigueroacastro

commondatamodel's Issues

Expand Drug Strength tables to all Standard Concepts in the Drug Domain

Requester: Christian Reich

Proposal

Currently, only Clinical and Branded Drugs have a record in the DRUG_STRENGTH table. However, since all Standard Concepts are allowed to be used in the data and therefore dosing information is required for them, the algorithm should parse all Concept Classes:

Clinical Drug, Branded Drug
Clinical Drug Components, Branded Drug Component
Quantified Clinical Drug and Quantified Branded Drug

Not sure yet what to do with the Packs. Also, not sure what to do with Ingredients and Drug Forms, which have no dose information, but a standard unit for measuring is needed for the use case of calculating dose information from quantity field in DRUG_EXPOSURE. This will be handled in a different request.

To add the components is very straightforward. However, to add the Quantified products, a new field is necessary: denominator_value. The reason is the dose of compound of divisible products such as solutions and gels in Clinical or Branded Drugs is given as as normalized concentration, where the denominator is assumed as a unit of 1. E.g., the product "Thiopental 20 MG/ML Prefilled Syringe" comes in 20 mL syringes. The equivalent DRUG_STRENGTH table entry should specify 400 mg Thiopental in 20 mL of solution.

Field	Required	Type	Description
denominator_value	No	float	The amount of total liquid (or other divisible product, such as ointment, gel, spray, etc.).

Use Cases

The DRUG_STRENGTH table is used to calculate the dose of a certain compound and fill the dose table.

v4 to v5: RedShift

for conversion scripts - SqlRender does support RedShift,

can RedSHift SQL be also posted.

Also, for users new to R, it may be nice to show R code that 'generates' the final SQL from parametized-SQL.

Datetime field in OBSERVATION_PERIOD creates Oracle error

Datetime needs to be removed from the OBSERVATION_PERIOD table

v5.1 SQL File Comments Can Use and Update

Hello,

I tried to adjust myself.. The DDL file for SQL Server for v5.1 has stale comments in the header. It represents 5.0. It would be helpful to the community if the header comments reflected the current version - 5.1

Missing design principle: always storing the source value

I know I reviewed the specs myself and have also missed this, but I would think that the list of design principles (page 6 of the specs) is missing this one:

Maintaining source codes. Even though all codes are mapped to the Standardized Vocabularies, the model also stores the original source code to ensure no information is lost.

Add Diagnosis-Related Group (DRG) to COST

Requester: OMOP-PCORnet Interoperability Collaborative (c/o Rimma Belenkaya)

Proposal

Relevant table: COST

Background

This proposal applies only to US Diagnosis-Related Groups (DRGs). DRG is a system to
classify hospital cases into one of approximately 500 groups. They have been used in the
United States since 1983. There is more than one DRG system being used in the United States,
but only the MS-DRG (Medicare Severity Diagnosis Related Grouper) system is used by
Medicare. OMOP vocabulary includes MS-DRG version 33.0 released in October 2015, the
vocabulary_id is ‘DRG’.
Presently, US DRG code is stored in the Observation table and linked to Visit_Occurrence table
via visit_occurrence_id.

Proposed Change

Move DRG from Observation to the new Cost table as a first class attribute. Link it to
Visit_Occurrence table via visit_occurrence_id.
Add the following DRG fields to Cost table:

Field	Required	Type	Description
DRG_CONCEPT_ID	Integer	No	A foreign key to the predefined concept in the DRG Vocabulary reflecting the DRG for a visit.
DRG_SOURCE_VALUE	Varchar(3)	No	The 3-digit DRG source code as it appears in the source data.

Conventions

Multiple DRG Codes

Occasionally, one visit may have multiple DRG codes. In this case, there will be multiple
costs associated with these DRG codes. In OMOP CDM, they will be represented by
multiple records containing corresponding costs and DRG codes in the Cost table.

Vocabulary Mapping

Source codes coming from DRG systems other than MS-DRG should be mapped to
concept_id = 0.

Use Cases

DRG codes are used in the analysis of cost-efficiency and designs of a hospital payment system. The Affordable Care Act and other recent legislation affect medical reimbursement by altering the DRG system. This legislation will give DRGs an even larger role in determining reimbursements in the coming years.

Consequences

This change is an important enhancement to the model and can be implemented immediately
without impacting other projects.
It will have an immediate effect on the three OMOP-based CDRNs: NYC-CDRN representing
over 2.5 million patients; pSCANNER covering over 21 million patients; and PEDSnet which
includes eight of the nation's largest children's hospitals and provides service to 4.6 million
children per year.
This change will enhance representation and analysis of DRG codes in OMOP CDM and
simplify ETL for the sites that are presently using Observation table.
It will have no implications on vocabulary or existing software.

References

Weiland DE, ea. Using Delta/DRG Diagrams and Decision Tree Analysis to Select a
Cost-Effective Surgery for Cholecystitis. JSLS. 1997 Apr-Jun.
Vertrees JC , ea. Bundling post-acute care services into MS-DRG payments.Medicare
Medicaid Res Rev. 2013 Aug 2.
Rimler SB , ea. Radiographics. Diagnosis-related Groups and Hospital Inpatient Federal
Reimbursement. 2015 Oct.

new cost table breaks Achilles.... and probably other software

Is there any particular reason why the CDMv5.0 just drops the other cost tables and merges it into one in the same CDM version?

I believe a change like this this should be reflected on a new version of the CDM (5.1?), because now we will have inconsistent instances of the CDM depending on when they got the DDL.

On a side note, the indexes and constraints should probably be updated also.

contribute R code for this repository (era tables generation)

To generate drug_era table - the SQL for this is in "parametized SQL" and can be rendered into dialects using R.

My PR will introduce a folder for code that is flavor-less (R code that takes MS SQL with params) and renders it . I plan to put it in folder R. If you have better idea, please suggest it. ).

Also, the code will do it for version 5.0.1. Or how close are we to official release of 5.1

Inconsistencies between ddl and constraints for PostgreSQL

There are inconsistencies between thee ddl and constrains for PostgreSQL.

In master, in the DDL the cost tables are commented out but there are still constraints applied on these non-existing tables (e.g. visit_costs).

new release for 5.0.1

we have only one release here but there are technicaly 5.0.0 and 5.0.1

Should person be of type bigint instead of int in the standard CDM DDL?

person_id in dbo.person is being declared as integer, but is referenced elsewhere as 'intger'.

EDIT

The CDM ddl is consistent: person_id is used as integer everywhere, but integer has been too small to hold certain person_ids from some vendor's raw source data, and we need to alter our DDL internally to have all person_id's set to 'bigint'. Is this something that should be applied to the standard, or should we update the standard DDL to have person_id be bigint?

CSV or spreadsheet formats of the v4 and v5?

The repository has DDL scripts and PDFs, but no simple CSV or spreadsheet-based formats of the data models. Is this available and if so can this be added to the repository?

v5.2 release question

Hi on this page:

http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm

The new notes and notes_nlp are introduced as being part of v5.2.
Currently, we are on version v5.0.1 and I see ongoing work on v5.1.

On the Readme it says there will be bi- yearly updates (so July 1 is v5.1?). Does this mean the official
release of the notes tables will be January next year? Is there a specific reason why these are not
also added in V5.1? Needed vocab updates..?

Will the CDM team start a v5.2 branch soon?

In the meantime we can create our own ddl for the new tables to start testing some NLP work done at Erasmus but i like to understand the CDM update process a bit better. If I can help anywhere let me know..

Thanks.

Peter

Separate this repository into version specific repos

To keep v 5.0.1 - I suggest we move v 5.0.1 into a separate repository CommonDataMode-v5.0.1 and the lastest released version will be this repository.

Add Package or Box size and Supplier to Drugs

Add Package of Box size and Supplier to Drugs

Requester: Christian Reich
Discussion: in WG

Proposal

Relevant table: DRUG_STRENGTH

In many other countries, drugs are not dispensed in the pharmacies, with small exceptions (usually ointments that have to be created fresh). Instead, all prescription drugs are packaged just like OTC drugs in the US, with the number of content usually lasting for one, two or three months. These products are directly prescribed.

In order to be capture these products, "Clinical Drug" or "Branded Drug" is not sufficient. "Clinical Pack" or "Branded Pack" is also not the same, as those are products combined in fixed numbers, like contraceptives. Therefore, a new RxNorm like Concept Class "Drug Box" is needed. The DRUG_STRENGTH table would require an additional field box_size:

Field	Required	Type	Description
box_size	No	integer	The number of units of Clinical or Branded Drug, or Quantified Clinical or Branded Drug contained in a box as dispensed to the patient.

In addition, Branded Drugs or Packs are provided by a specific supplier. They would be added as new concepts:

Field	Required	Type	Description
supplier	No	string(50)	The name of the supplier (manufacturer or wholeseller) who provides a Branded Drug/Pack to a certain market.

In the CONCEPT_RELATIONSHIP table new relationships would be recorded to link those Drug products to their suppliers with the relationship_id "Has supplier" and "Supplier of".

Use Cases

This information is necessary to infer from European or other non-US prescription data the quantity of product. The ETL process would transfer that information into the quantity field in the drug_exposure table.

V4/4.5 to V5

Before we start programming - has anyone already done this? Any best practices to ensure we generate something portable?

New Non-Clustered Indexes Script Issue

The new indexes created in the latest commit adds references to local databases that prevents it from running without modifications. Also, the index names should be consistent to the naming conventions of other indexes. @clairblacketer would you like me to make this change?

Derived tables SQL (e.g., drug_era tables)

The v4 to v5 parametized SQL includes the code for deriving _era tables.

I would like to volunteer to split this and make that separate item (derived table SQL code).

Who is the github admin person for this repo?

Redshift Version

Are there any plans to publish a ddl and associated files for a Redshift implementation? If not, I can fork the PostgreSQL version. How close is this to the final specification?

Question about latest release

With the box_size change to the drug_strength table, will there be a 5.0.2 release?

Add Time Field to various Occurrence and Exposure Tables

While v5 visit_occurrence table supports both date and time of event, other occurrence tables and exposure tables such as drug, condition, and procedure support only the date level. It is desirable to have the option to include specific time data for such occurrences. Our goal is to allow temporal operations finer than day without disrupting OHDSI by requiring major recoding.

Here is the current use of date and time fields in CDM v5.01:

PERSON

year_of_birth as integer, required
month_of_birth as integer, not required
day_of_birth as integer, not required
time_of_birth as time, not required

SPECIMEN

specimen_date as date, required
specimen_time as time, not required

DEATH

death_date as date, required

VISIT_OCCURRENCE

visit_start_date as date, required
visit_start_time as time, not required
visit_end_date as date, required
visit_end_time as time, not required

PROCEDURE_OCCURRENCE

procedure_date as date, required

DRUG_EXPOSURE

drug_exposure_start_date as date, required
drug_exposure _end_date as date, not required

DEVICE_EXPOSURE

device_exposure_start_date as date, required
device _exposure _end_date as date, not required

CONDITION_OCCURRENCE

condition_start_date as date, required
condition _end_date as date, not required

MEASUREMENT

measurement_date as date, required
measurement _time as time, not required

NOTE

note_date as date, required
note_time as time, not required

OBSERVATION

observation_date as date, required
observation _time as time, not required

And the ERA tables, COHORT tables, and PAYER_PLAN_PERIOD table.

DECISION 1

We have a choice between adding fields of type time, which require significant processing to determine durations (join the date and time, and then operate on that) versus defining a datetime field as timestamp, which allows fast operations but produces redundancy.

[Suggest datetime = timestamp.]

DECISION 2

We propose adding a datetime field or fields to the following tables

Condition_Occurrence
Procedure_Occurrence
Drug_exposure
Device_exposure
Death

We propose replacing the current time fields with datetime fields in the following tables

Specimen
Visit_Occurrence
Measurement
Note
Observation

A datetime field (also known as timestamp field) would be added to each date field. Date fields would not be changed.

This would support handling data from ICU, Emergency Department, infusions, post-procedure care, etc. where multiple events occur on the same day and sequence matters. This granularity would also support the incorporation of data generated from tracking devices.

In addition, current *_TIME fields would be removed. At this point, all current software will continue working with the *_DATE fields, and over time we will develop extensions to the software to accommodate DATETIME in different database management systems.

[Suggest moving forward with fields called *_DATETIME, removing the *_TIME fields. The *_DATE fields will remain required.]

DECISION 3

Are the datetime fields required. Making them required allows developers to begin to use them with a potential migration from date+time or date+timestamp to timesteamp in the future, but it forces CDM builders to enter unknown times or timestamps. If required, the default time will be the first instant in the allowable period. E.g., 1990-12-01-00:00:00.000000 is the correct entry for December, 1990.

[Suggest optional for now.]

DECISION 4

Should a date time (timestamp) field be added to BIRTH (or alternatively should PERSON.time_of_birth be changed to timestamp). This will allow more rapid calculation of age but will be redundant with the current information.

[Suggest add BIRTH_DATETIME. Remove TIME_OF_BIRTH.]

DECISION 5

Should we add a *_TIME_GRAN granularity field for each new timestamp field, which would indicate year, month, day, hour, minute, or second as the timestamp's granularity. When a timestamp is filled in, it should be set to the first valid time in that indicated interval.

[Suggest defer decision.]

Denominator Units Conventions and Quantity Units in DRUG_EXPOSURE

Denominator Units for liquid drugs and quantity field in DRUG_EXPOSURE

Requester: Klaus Bonadt
Discussion: here.

Proposal

Proposal for conventions to achieve consistent entries in the attribute quantity of the DRUG_EXPOSURE table and entries in the DRUG_STRENGTH table.
Eventually, we will be able to calculate the total amount of the active ingredient by

DRUG_EXPOSURE.quantity x DRUG_STRENGTH.numerator_value [numerator_unit_concept_id] (for concentrations) or
DRUG_EXPOSURE.quantity x DRUG_STRENGTH.amount_value [amount_unit_concept_id] (for pieces, e.g. tablets)
For drugs with a dose release over time (e.g. Patches) we will be able to calculate the hourly rate by
DRUG_STRENGTH.numerator_value [numerator_unit_concept_id]

Conventions for the DRUG_EXPOSURE table:

Roughly speaking, the value for quantity represents the quantity of the drug in the denominator unit of the corresponding entry (or entries) in the DRUG_STRENGTH table. More precisely,
- For drugs dispensed in “pieces” (e.g. tablets or puffs of an inhaler) the quantity refers to this number of pieces. (Thus, quantity has no unit and the denominator unit in the DRUG_STRENGTH table is either empty or {actuat}).
- For drugs with a concept_id denoting a quantified drug the quantity refers to a fraction/multiple of the whole pack. (The quantity has no unit, the corresponding entry/entries in the DRUG_STRENGTH table refer to the concentration with the total amount in the denominator).
- For all other drugs quantity refers to the total amount of the product measured in mL or g. (Thus, the unit of the quantity is either g or mL – both units can be used interchangeably.
For products with a dosage release rate (for example patches), the quantity refers to the number of pieces. (The denominator unit of the corresponding entry in the DRUG_STRENGTH table is hour).
Compounding: For each compound one entry will be added to the DRUG_EXPOSURE table. This enables us all the above discussed options to provide quantity and strength. Even a compounding of multi-ingredient products would be possible.

Conventions for the DRUG_STRENGTH table:

Probably, nothing has to be changed. But we should cross-check the following assumed properties of the DRUG_STRENGTH table coming from RxNorm and apply these for future enhancements:

For drugs dispensed in “pieces”, all corresponding entries must have either an empty denominator unit or {actuat} in the denominator unit.
For drugs with a dose release over time, we use hour in the denominator.
For all other drugs, the denominator unit refers to mass or volumes, measured in mL or mg.

Changes in the DRUG_EXPOSURE table:

With the above mentioned conventions the fields effective_drug_dose and dose_unit_concept_id can be removed. They don’t support multiple ingredients anyway.

Change/To do in the DRUG_STRENGTH table:

In order to support quantified drugs we need an additional field denominator_value.
For compounding we would need to add records to DRUG_STRENGTH for all compounds (drug_concept_id=ingredient_concept_id) with the mg concept in the denominator_unit_concept_id for all "dry" compounds and mL for the "wet" ones. We could steal the knowledge what is what from RxNorm.

Options:

We could agree on using mg instead of g for the quantity in case 1c. However, this would require additional conventions and steps for the ETL process: We would use mL for liquids and products with different denominator units (for example 42799258 "Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 0.01 MG/MG Topical Gel"). However, we would use mg for products which exclusively use a mass unit. Thus, we would need to check all corresponding entries in the DRUG_STRENGTH table beforehand in order to know whether we can apply mg or mL. I don't recommend this.
We could use an additional field quantity_unit_concept_id to denote the unit. However, this would also require additional sets of conventions to avoid incompatible entries between quantity_unit_concept_id and denominator_unit_concept_id. I don't recommend this.
In the DRUG_STRENGTH table we could allow additional units for masses (besides mg), or volumes (besides mL) or time (besides h). However, this would add not only additional complexity for the conversions but also for the checks to decide whether a DRUG_STRENGTH entry belongs to a dose release over time, to a mass or a volume unit. I don't recommend this.
We might want to distinguish between drugs used in compounding and normal drugs dispensed on the same day. In the DRUG_EXPOSURE table we could use a compounding_sequence (holding a sequence number for each compounding on the same day for the same patient) or a compound_sequence (holding a sequence number for each compound. The sequence will restart from 1 for all subsequent compounding). I do recommend this, preferably the latter one.

Use Cases

1	Solid preparations, preparations where dose cannot be split
	Example: 20 tablets of 19020053 "Acetaminophen 500 MG Oral Tablet"
DRUG_STRENGTH	The denominator_unit is empty. Amount_value and amount_unit_concept_id contain the effective dose. Note that in non-US databases the quantity could be retrieved from the box_size field, however, the ETL should copy that to the DRUG_EXPOSURE.quantity field.
DRUG_EXPOSURE	Quantity refers to number of pieces, e.g. 20 tablets
`Total dose=`	`quantity x amount_value [amount_unit_concept_id]`
	`Acetaminophen dose = 20 x 500 mg = 10,000 mg`

2	Puffs of an inhaler
	Example: 150 puffs of 21174430 "Nicotine 0.5 MG/ACTUAT [Nicorette]". Note: There is no difference to use case 1 besides that the DRUG_STRENGTH table has {actuat} in the denominator_unit. In this case the strength is provided in the numerator.
DRUG_STRENGTH	The denominator_unit is {actuat}. The numerator value and numerator unit contain the dose per actuation (puff)
DRUG_EXPOSURE	Quantity refers to the number puffs, e.g. 150
`Total dose=`	`quantity x numerator_value [numerator_unit_concept_id]`
	`Nicotin dose = 150 x 0.5 mg = 75 mg`

3	Quantified Drugs
	Example: 43275770 "60 ML Acetaminophen 50 MG/ML Oral Solution", meaning, there are 60 mL of Acetaminophe (Paracetamol) of a concentration of 50 mg/mL. Two such bottles are prescribed.
DRUG_STRENGTH	The concentration is given in the numerator/denominator fields, with the denominator unit either mg or mL, and the denominator value typically not 1. E.g. 3000 mg in 60 mL.
DRUG_EXPOSURE	Quantity refers to a fraction or multiple of the container. E.g. 2
`Total dose=`	`quantity x numerator_value [numerator_unit_concept_id]`
	`* Acetaminophen dose = 2 x 3000 mg = 6000 mg*`

4	Liquid drugs, or drugs where the amount can be split randomly. The total amount in given in quantity
	Example: 42799258 "Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 0.01 MG/MG Topical Gel" dispensed in a 1.25 oz pack.
DRUG_STRENGTH	The concentration is given in the numerator/denominator fields, with the denominator unit either mg or mL, and the denominator value as null. E.g. Benzyl Alcohol in mL and Pramoxine hydrochloride in mg
DRUG_EXPOSURE	Quantity refers to the unit provided in DRUG_STRENGTH. Since both mL and mg are found mg is preferred. E.g. 1.25 x 30 (conversion factor oz -> mL) x 1000 (conversion mL -> mg) = 37,000
`Total dose=`	`quantity x numerator_value [numerator_unit_concept_id]`
	`Benzyl Alcohol dose = 37,000 x 0.1 mL x 0.001 = 3.7 mL`
	`Pramoxine hydrochloride dose = 37,000 x 0.01 mg = 370 mg`
	Note: At analysis time, the denominator should be checked in the DRUG_STRENGTH table. As mg is used for the second ingredient the factor 1000 should be applied to convert between g and mg.

5	Compounded drugs
	Example: 30 mL Ibuprofen 20% / Piroxicam 1% Cream.
DRUG_STRENGTH	Two separate entries for the ingredients of Ibuprofen and Piroxicam are required, with an amount value of null and a amount unit of mg.
DRUG_EXPOSURE	Quantity refers to the total weight amount of the compounded ingredient. Use one record in the DRUG_EXPOSURE table for each compound. Eg.g 20% Ibuprofen in 30 mL = 6 mL, 1% Piroxicam in 30 mL = 0.3 mL
`Total dose=`	Depends on the drugs involved: One of the use cases above.
	`Ibuprofen dose = 6 x 1000 = 6000 mg`
	`Piroxicam dose = 0.3 x 1000 = 300 mg`
	Note: At analysis time, the denominator should be checked for both ingredients in the DRUG_STRENGTH table. If it is not mg a conversion factor of 1000 should be applied to convert between mL/g and mg.

6	Drugs with delayed release over time, e.g. Patches
	Example: 1518199 "Ethinyl Estradiol 0.000833 MG/HR / norelgestromin 0.00625 MG/HR Weekly Transdermal Patch
DRUG_STRENGTH	The release is provided as numerator/denominator pair, with the denominator value as null and the denominator unit hour. E.g. Ethinyl Estradiol 0.000833 mg/h / norelgestromin 0.00625 mg/h
DRUG_EXPOSURE	Quantity refers to the number of pieces. E.g. 1 patch.
`Release rate=`	`numerator_value [numerator_unit_concept_id]`
	`Ethinyl Estradiol rate = 0.000833 mg/h`
	`norelgestromin rate 0.00625 mg/h`
	Note: This can be converted to a daily dosage by multiplying it with 24. (Assuming 1 patch at a time for at least 24 hours)

Please add comments to tables and columns

When I look at v4.5 DDL I noticed there are comments on tables and columns. Were these dropped in V5 on purpose?
Ref:
https://raw.githubusercontent.com/OHDSI/CommonDataModel/master/Version4/Oracle/CDM%20V4%20DDL.sql

It would be nice to have
a) the comments on each column added back and
b) include comments if new columns are added

Thanks,
Venkat

Inefficient SQL Code in Conversion Script

I've noticed that the conversion script includes parts that potentially (and for me did) can take forever to run and not finish. I'm using the PostgreSQL script. The problem is in the sections in which warnings are added to etl_warnings:

INSERT INTO [TARGET_CDMV5].[SCHEMA].ETL_WARNINGS (WARNING_MESSAGE)
SELECT 'PROCEDURE_OCCURRENCE: ' || CAST(NUM_INVALID_RECORDS AS VARCHAR) || ' records in the source CDMv4 database have invalid PROCOEDURE_CONCEPT_ID'
FROM (
    SELECT COUNT(PERSON_ID) AS NUM_INVALID_RECORDS
    FROM [SOURCE_CDMV4].[SCHEMA].PROCEDURE_OCCURRENCE
    WHERE PROCEDURE_CONCEPT_ID NOT IN (
            SELECT CONCEPT_ID
            FROM [TARGET_CDMV5].[SCHEMA].CONCEPT
            WHERE CONCEPT_ID = 0
                OR STANDARD_CONCEPT = 'S'
            )
    HAVING COUNT(PERSON_ID) > 0
    ) warn;

Because of the "NOT IN" part, this code checks every entry in the cdm4 procedure_occurrence to see if its procedure_concept_id is in the list of all concepts in cdm5. Assuming 100m entries in procedure_occurrence and 1m concepts, we're asking the database to do 100 trillion things. Has anyone run the PostgreSQL script and run into this issue?

Instead we should be calling distinct on the procedure_concept_id in procedure_occurrence because there may only be about 10k unique procedure_concept_ids, then calling the NOT IN on the 1m concepts, generating a much smaller list of concepts, then returning to compare each individual entry of procedure_occurrence. See below:

INSERT INTO [TARGET_CDMV5].[SCHEMA].ETL_WARNINGS (WARNING_MESSAGE)
SELECT 'PROCEDURE_OCCURRENCE: ' || CAST(NUM_INVALID_RECORDS AS VARCHAR) || ' records in the source CDMv4 database have invalid PROCOEDURE_CONCEPT_ID'
FROM (
    SELECT COUNT(PERSON_ID) AS NUM_INVALID_RECORDS
    FROM [SOURCE_CDMV4].[SCHEMA].PROCEDURE_OCCURRENCE
    WHERE PROCEDURE_CONCEPT_ID IN (
      SELECT PROCEDURE_CONCEPT_ID
      FROM (
        SELECT DISTINCT(PROCEDURE_CONCEPT_ID)
        FROM [SOURCE_CDMV4].[SCHEMA].PROCEDURE_OCCURRENCE) ids
        WHERE ids.PROCEDURE_CONCEPT_ID NOT IN (
          SELECT CONCEPT_ID
            FROM [TARGET_CDMV5].[SCHEMA].CONCEPT
              WHERE CONCEPT_ID = 0
                  OR STANDARD_CONCEPT = 'S'
            )
        )
    HAVING COUNT(PERSON_ID) > 0
    ) warn;

This second script I was able to run overnight and finish (compared to stalling for days). There are multiple sections of the script that add to etl_warnings. Maybe they should be changed?

End of a Drug Exposure

End of a drug exposure

Requester: Klaus Bonadt, Christian Reich, Patrick Ryan
Discussion: here

Proposal

Relevant table: DRUG_EXPOSURE

Field	Required	Type	Description
drug_exposure_end_date	Yes	date	The end date for the current instance of Drug utilization. Depending on different sources, it could be a known or an inferred date and denotes the last day at which the patient was still exposed to drug.
days_supply	No	integer	The number of days of supply of the medication as prescribed. This reflects the intention of the provider for the length of exposure.
verbatim_end_date	No	date	The known end date of a drug exposure as provided by the source.

The verbatim_end_dat field is only necessary as the drug_exposure_end_date becomes mandatory, and might be (wrongly) inferred. In cases where no good data are available to fill that end_date, the ETL will have to take a guess, and that may not satisfy some algorithms for dose calculation.

Conventions

The drug_exposure_end_date denotes the day the drug exposure ended for the patient. This could be that the duration of drug_supply was reached (in which case drug_exposure_end_date = drug_exposure_start_date + days_supply -1), or because the exposure was stopped (medication changed, medication discontinued, etc.)

Use Cases

All standardized analytics and the drug era constructor will only use the drug_exposure_end_date. The days supply could be used to understand the intent of prescribing versus the reality, and it could be used to calculate MPR.

Can you please provide a .docx and .html version of the CDMv5.pdf file

PDF is great for printing but not for other purposes

Represent granular encounters or microvisits in a new VISIT_DETAIL table

Proposal for Visit_detail: Represent granular encounters or microvisits

Requester: Gowtham Rao
Discussions: forum post 1, forum post 2

Proposal

Current VISIT_OCCURRENCE table does not meet use-cases that need the granular information, such as movement of person while in a hospital.
The intent of this proposal is to capture detail information about a record in visit_occurrence. Examples of detail information may be encounters, micro-visits etc., and will be collected as is from the source data.
We propose a new VISIT_DETAIL table with a structure that is similar to current VISIT_OCCURRENCE table. For every record in visit_occurrence there maybe 0 or more records in visit_detail.
Records in visit_detail will be related to each other sequentially or hierarchically, AND will be related to visit_occurrence table (using chaining/sequential method or parent-child/part-of).
All information will belong to the domain visit.
Example: an entire inpatient stay maybe one record in visit_occurrence table. This may have one or more detail information such as ER, ICU, medical floor, rehabilitation floor etc. Each of these visit_details may have different start/end date-times, different concept_id's and fact_id's - that would be separate record in visit_detail with a FK link to visit_occurrence. Each record within visit_detail maybe related to each other, sequentially –> ER leading to ICU leading to medical floor, leading to rehabilitation, or in hierarchical parent-child visit –> a visit for dialysis while in ICU.

New VISIT_DETAIL table

This table will have the same structure as current VISIT_OCCURRENCE table, except for two changes:

Two new foreign keys pointing to itself (visit_detail_parent_id) and to visit_occurrence table (visit_occurrence_id)
Removal of _date fields.

Field	Required	Type	Description
visit_detail_id	Yes	integer	A unique identifier for each Person's visit-detail at a healthcare provider.
person_id	Yes	integer	A foreign key identifier to the Person for whom the visit is recorded. The demographic details of that Person are stored in the PERSON table.
visit_detail_concept_id	Yes	integer	A foreign key that refers to a visit Concept identifier in the Standardized Vocabularies.
visit_start_date	Yes	date	The start date of the visit.
visit_start_datetime	Yes	datetime	The date and time of the visit-detail started.
visit_end_date	Yes	date	The end date of the visit.
visit_end_datetime	No	datetime	The date and time of the visit end.
visit_type_concept_id	Yes	integer	A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of source data from which the visit record is derived.
provider_id	No	integer	A foreign key to the provider in the provider table who was associated with the visit.
care_site_id	No	integer	A foreign key to the care site in the care site table where visit occurred
admitting_source_concept_id	No	integer	A foreign key to the predefined concept in the Place of Service Vocabulary reflecting the admitting source for a visit.
discharge_to_concept_id	No	integer	A foreign key to the predefined concept in the Place of Service Vocabulary reflecting the discharge disposition (destination) for a visit.
preceding_visit_detail_id	No	integer	A foreign key to the visit_occurrence table record of the visit immediately preceding this visit.
visit_source_value	No	string	The source code for the visit as it appears in the source data.
visit_source_concept_id	No	Integer	A foreign key to a Concept that refers to the code used in the source.
admitting_source_value	No	string	The source code for the admitting source as it appears in the source data.
discharge_to_source_value	No	string	The source code for the discharge disposition as it appears in the source data.
visit_detail_parent_id	No	integer	A foreign key to the visit_detail table record to represent the immediate parent visit-detail record.
visit_occurrence_id	Yes	integer	A foreign key that refers to the record in the visit_occurrence table

Relationship between VISIT_DETAIL and clinical events tables
We will add a new optional foreign key pointing from a clinical-event table to visit_detail table.

PROCEDURE_OCCURRENCE

Field	Required	Type	Description
procedure_occurrence_id	Yes	integer	A system-generated unique identifier for each Procedure Occurrence.
person_id	Yes	integer	A foreign key identifier to the Person who is subjected to the Procedure. The demographic details of that Person are stored in the PERSON table.
procedure_concept_id	Yes	integer	A foreign key that refers to a standard procedure Concept identifier in the Standardized Vocabularies.
procedure_date	Yes	date	The date on which the Procedure was performed.
procedure_datetime	No	datetime	The date and time on which the Procedure was performed.
procedure_type_concept_id	Yes	integer	A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of source data from which the procedure record is derived.
modifier_concept_id	No	integer	A foreign key to a Standard Concept identifier for a modifier to the Procedure (e.g. bilateral)
quantity	No	integer	The quantity of procedures ordered or administered.
provider_id	No	integer	A foreign key to the provider in the provider table who was responsible for carrying out the procedure.
visit_occurrence_id	No	integer	A foreign key to the visit in the visit table during which the Procedure was carried out.
visit_detail_id	No	integer	A foreign key to the visit in the visit-detail table during which the Procedure was carried out.
procedure_source_value	No	varchar(50)	The source code for the Procedure as it appears in the source data. This code is mapped to a standard procedure Concept in the Standardized Vocabularies and the original code is, stored here for reference. Procedure source codes are typically ICD-9-Proc, CPT-4, HCPCS or OPCS-4 codes.
procedure_source_concept_id	No	integer	A foreign key to a Procedure Concept that refers to the code used in the source.
qualifier_source_value	No	varchar(50)	The source code for the qualifier as it appears in the source data.

DRUG_EXPOSURE

Field	Required	Type	Description
drug_exposure_id	Yes	integer	A system-generated unique identifier for each Drug utilization event.
person_id	Yes	integer	A foreign key identifier to the person who is subjected to the Drug. The demographic details of that person are stored in the person table.
drug_concept_id	Yes	integer	A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Drug concept.
drug_exposure_start_date	Yes	date	The start date for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_start_datetime	No	datetime	The start date and time for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_end_date	No	date	The end date for the current instance of Drug utilization. It is not available from all sources.
drug_exposure_end_datetime	No	datetime	The end date and time for the current instance of Drug utilization. It is not available from all sources.
drug_type_concept_id	Yes	integer	A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data.
stop_reason	No	varchar(20)	The reason the Drug was stopped. Reasons include regimen completed, changed, removed, etc.
refills	No	integer	The number of refills after the initial prescription. The initial prescription is not counted, values start with 0.
quantity	No	float	The quantity of drug as recorded in the original prescription or dispensing record.
days_supply	No	integer	The number of days of supply of the medication as recorded in the original prescription or dispensing record.
sig	No	clob	The directions ("signetur") on the Drug prescription as recorded in the original prescription (and printed on the container) or dispensing record.
route_concept_id	No	integer	A foreign key to a predefined concept in the Standardized Vocabularies reflecting the route of administration.
effective_drug_dose	No	float	Numerical value of Drug dose for this Drug Exposure record.
dose_unit_concept_ id	No	integer	A foreign key to a predefined concept in the Standardized Vocabularies reflecting the unit the effective_drug_dose value is expressed.
lot_number	No	varchar(50)	An identifier assigned to a particular quantity or lot of Drug product from the manufacturer.
provider_id	No	integer	A foreign key to the provider in the provider table who initiated (prescribed or administered) the Drug Exposure.
visit_occurrence_id	No	integer	A foreign key to the visit in the visit table during which the Drug Exposure was initiated.
visit_detail_id	No	integer	A foreign key to the visit in the visit-detail table during which the Drug Exposure was initiated.
drug_source_value	No	varchar(50)	The source code for the Drug as it appears in the source data. This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference.
drug_source_concept_id	No	integer	A foreign key to a Drug Concept that refers to the code used in the source.
route_source_value	No	varchar(50)	The information about the route of administration as detailed in the source.
dose_unit_source_value	No	varchar(50)	The information about the dose unit as detailed in the source.

DEVICE_EXPOSURE

Field	Required	Type	Description
device_exposure_id	Yes	integer	A system-generated unique identifier for each Device Exposure.
person_id	Yes	integer	A foreign key identifier to the Person who is subjected to the Device. The demographic details of that person are stored in the Person table.
device_concept_id	Yes	integer	A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Device concept.
device_exposure_start_date	Yes	date	The date the Device or supply was applied or used.
device_exposure_start_datetime	No	datetime	The date and time the Device or supply was applied or used.
device_exposure_end_date	No	date	The date the Device or supply was removed from use.
device_exposure_end_datetime	No	datetime	The date and time the Device or supply was removed from use.
device_type_concept_id	Yes	integer	A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Device Exposure recorded. It indicates how the Device Exposure was represented in the source data.
unique_device_id	No	varchar(50)	A UDI or equivalent identifying the instance of the Device used in the Person.
quantity	No	integer	The number of individual Devices used for the exposure.
provider_id	No	integer	A foreign key to the provider in the PROVIDER table who initiated of administered the Device.
visit_occurrence_id	No	integer	A foreign key to the visit in the VISIT table during which the device was used.
visit_detail_id	No	integer	A foreign key to the visit in the VISIT_DETAIL table during which the device was used.
device_source_value	No	varchar(50)	The source code for the Device as it appears in the source data. This code is mapped to a standard Device Concept in the Standardized Vocabularies and the original code is stored here for reference.
device_source_ concept_id	No	integer	A foreign key to a Device Concept that refers to the code used in the source.

CONDITION_OCCURRENCE

Field	Required	Type	Description
condition_occurrence_id	Yes	integer	A unique identifier for each Condition Occurrence event.
person_id	Yes	integer	A foreign key identifier to the Person who is experiencing the condition. The demographic details of that Person are stored in the PERSON table.
condition_concept_id	Yes	integer	A foreign key that refers to a Standard Condition Concept identifier in the Standardized Vocabularies.
condition_start_date	Yes	date	The date when the instance of the Condition is recorded.
condition_start_datetime	No	datetime	The date and time when the instance of the Condition is recorded.
condition_end_date	No	date	The date when the instance of the Condition is considered to have ended.
condition_end_datetime	No	date	The date when the instance of the Condition is considered to have ended.
condition_type_concept_id	Yes	integer	A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the source data from which the condition was recorded, the level of standardization, and the type of occurrence.
stop_reason	No	varchar(20)	The reason that the condition was no longer present, as indicated in the source data.
provider_id	No	integer	A foreign key to the Provider in the PROVIDER table who was responsible for capturing (diagnosing) the Condition.
visit_occurrence_id	No	integer	A foreign key to the visit in the VISIT table during which the Condition was determined (diagnosed).
visit_detail_id	No	integer	A foreign key to the visit in the VISIT_DETAIL table during which the Condition was determined (diagnosed).
condition_status_concept_id	No	integer	A foreign key to the predefined concept in the standard vocabulary reflecting the condition status.
condition_source_concept_id	No	integer	A foreign key to a Condition Concept that refers to the code used in the source.
condition_source_value	No	varchar(50)	The source code for the condition as it appears in the source data. This code is mapped to a standard condition concept in the Standardized Vocabularies and the original code is stored here for reference.
condition_status_source_value	No	varchar(50)

MEASUREMENT

Field	Required	Type	Description
measurement_id	Yes	integer	A unique identifier for each Measurement.
person_id	Yes	integer	A foreign key identifier to the Person about whom the measurement was recorded. The demographic details of that Person are stored in the PERSON table.
measurement_concept_id	Yes	integer	A foreign key to the standard measurement concept identifier in the Standardized Vocabularies.
measurement_date	Yes	date	The date of the Measurement.
measurement_datetime	No	datetime	The date and time of the Measurement. (Some database systems don't have a datatype of time. To accomodate all temporal analyses, datatype datetime can be used (combining measurement_date and measurement_time)[[http://forums.ohdsi.org/t/date-time-and-datetime-problem-and-the-world-of-hours-and-1day/314
measurement_type_concept_id	Yes	integer	A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the provenance from where the Measurement record was recorded.
operator_concept_id	No	integer	A foreign key identifier to the predefined Concept in the Standardized Vocabularies reflecting the mathematical operator that is applied to the value_as_number. Operators are <, ≤, =, ≥, >.
value_as_number	No	float	A Measurement result where the result is expressed as a numeric value.
value_as_concept_id	No	integer	A foreign key to a Measurement result represented as a Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.).
unit_concept_id	No	integer	A foreign key to a Standard Concept ID of Measurement Units in the Standardized Vocabularies.
range_low	No	float	The lower limit of the normal range of the Measurement result. The lower range is assumed to be of the same unit of measure as the Measurement value.
range_high	No	float	The upper limit of the normal range of the Measurement. The upper range is assumed to be of the same unit of measure as the Measurement value.
provider_id	No	integer	A foreign key to the provider in the PROVIDER table who was responsible for initiating or obtaining the measurement.
visit_occurrence_id	No	integer	A foreign key to the Visit in the VISIT_OCCURRENCE table during which the Measurement was recorded.
visit_detail_id	No	integer	A foreign key to the Visit in the VISIT_DETAIL table during which the Measurement was recorded.
measurement_source_value	No	varchar(50)	The Measurement name as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is stored here for reference.
measurement_source_concept_id	No	integer	A foreign key to a Concept in the Standard Vocabularies that refers to the code used in the source.
unit_source_value	No	varchar(50)	The source code for the unit as it appears in the source data. This code is mapped to a standard unit concept in the Standardized Vocabularies and the original code is stored here for reference.
value_source_value	No	varchar(50)	The source value associated with the content of the value_as_number or value_as_concept_id as stored in the source data.

NOTE

Field	Required	Type	Description
note_id	Yes	integer	A unique identifier for each note.
person_id	Yes	integer	A foreign key identifier to the Person about whom the Note was recorded. The demographic details of that Person are stored in the PERSON table.
note_date	Yes	date	The date the note was recorded.
note_datetime	No	datetime	The date and time the note was recorded.
note_type_concept_id	Yes	integer	A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the type, origin or provenance of the Note.
note_class_concept_id	Yes	integer	A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the HL7 LOINC Document Type Vocabulary classification of the note.
note_title	No	string(250)	The title of the Note as it appears in the source.
note_text	Yes	RBDMS dependent text	The content of the Note.
encoding_concept_id	Yes	integer	A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the note character encoding type.
language_concept_id	Yes	integer	A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the language of the note.
provider_id	No	integer	A foreign key to the Provider in the PROVIDER table who took the Note.
visit_occurrence_id	No	integer	Foreign key to the Visit in the VISIT_OCCURRENCE table when the Note was taken.
visit_detail_id	No	integer	Foreign key to the Visit in the VISIT_DETAIL table when the Note was taken.

OBSERVATION

Field	Required	Type	Description
observation_id	Yes	integer	A unique identifier for each observation.
person_id	Yes	integer	A foreign key identifier to the Person about whom the observation was recorded. The demographic details of that Person are stored in the PERSON table.
observation_concept_id	Yes	integer	A foreign key to the standard observation concept identifier in the Standardized Vocabularies.
observation_date	Yes	date	The date of the observation.
observation_datetime	No	datetime	The date and time of the observation.
observation_type_concept_id	Yes	integer	A foreign key to the predefined concept identifier in the Standardized Vocabularies reflecting the type of the observation.
value_as_number	No	float	The observation result stored as a number. This is applicable to observations where the result is expressed as a numeric value.
value_as_string	No	varchar(60)	The observation result stored as a string. This is applicable to observations where the result is expressed as verbatim text.
value_as_concept_id	No	Integer	A foreign key to an observation result stored as a Concept ID. This is applicable to observations where the result can be expressed as a Standard Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.).
qualifier_concept_id	No	integer	A foreign key to a Standard Concept ID for a qualifier (e.g., severity of drug-drug interaction alert)
unit_concept_id	No	integer	A foreign key to a Standard Concept ID of measurement units in the Standardized Vocabularies.
provider_id	No	integer	A foreign key to the provider in the PROVIDER table who was responsible for making the observation.
visit_occurrence_id	No	integer	A foreign key to the visit in the VISIT_OCCURRENCE table during which the observation was recorded.
visit_detail_id	No	integer	A foreign key to the visit in the VISIT_DETAIL table during which the observation was recorded.
observation_source_value	No	varchar(50)	The observation code as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is, stored here for reference.
observation_source_concept_id	No	integer	A foreign key to a Concept that refers to the code used in the source.
unit_source_value	No	varchar(50)	The source code for the unit as it appears in the source data. This code is mapped to a standard unit concept in the Standardized Vocabularies and the original code is, stored here for reference.
qualifier_source_value	No	varchar(50)	The source value associated with a qualifier to characterize the observation

Conventions

Representation of US claim data

US claims data generally has two-levels

Header/summary data that summarizes the entire claim
Line/detail that details a claim.

Detail is thus a child of the summary, and for every record in summary there is one or more records in detail. i.e. there will be atleast one FK link from visit_detail to visit_occurrence.

Use Cases

What happened to DRGs?

In CDM v4 I used to record DRG codes in the disease_class_concept_id field in the procedure_cost table, but those fields are gone in v5. DRG codes do not map to any standard concepts in the vocab either. Does that mean we do not record DRGs in the CDM anymore?

V4-V5 conversion script - "Type Concept" concepts not converted

The solution is to change the case of "Type Concept" to "type concept" for all occurrences in all DBMS versions of the conversion SQL script.

Here is one example of the issue in the code:

AND LOWER(DOMAIN_ID) IN ('death type', 'Type Concept');

v5.1 vs master

for PR that fixes a bug, what branch should we target?

Time storage in CDM v5

I'm ETLing data from a MSSQL data warehouse into CDM v5 on Postgres. When I cast the time portion of a timestamp into a TIME, I'm getting strings that are greater than the varchar(10) limit on several CDM v5 tables. The max string length for time appears to be 11 characters.

CAST('11/13/2013 12:00:00 AM' AS TIME) => '12:00:00 AM' (11 characters)

This caused errors when attempting to load time data.

If we're using the ANSI SQL time formats, as specified in the documentation, the max length of the time fields should be altered to account for the following specified ANSI format: "HH:MM.SS.MMMM", which assumes that we're all using 24 hour time.

Perhaps the time fields should use the "time" data type rather than strings?

Remove cost event tables from CDM constraints in v5.1

In the CDM constraints sql file on v5.1 branch:

https://github.com/OHDSI/CommonDataModel/blob/v5.1/Sql%20Server/OMOP%20CDM%20constraints%20-%20SQL%20Server.sql

there are still references for the cost event tables. These should be commented out with the same verbage that is in the DDL

/* The individual cost tables are being phased out and will disappear */

These should then be completely removed in version 6.0

5.1 release

CDM documentation on wiki indicates v5.1

there is branch 5.1.

What steps need to be satisfied for a release 5.1 to occurr?

(e.g., testing on db platforms like Achilles? - which requires significant expertise) (which I am not perhaps advocating here for) .

Looks like last release was done by @schuemie

Is @cgreich the new admin?

(I am happy to volunteer help with those steps)

Issues with FK Constraints

Currently the WebAPI does not write records into the CDM cohort_definition table. Because of this, scripts that run during cohort generation are failing because inserting records into the cohort table with a cohort_definition_id that doesn't exist in the cohort_definition table is violating the referential integrity constraints.

How to contribute partial ETL code?

I would like to contribute to community our code that starts with
condition_occurrence table that has data in ICD9 in *_source_value and no entries in *_concept_id and *_source_concept_id

That code uses the vocab knowledge and populates those columns based on the data above.

Is this wanted and if yes, can you give me guidance on where to place such code (claire)

Add references to markdown urls in R pdf code

The current R code to create the CDM pdf file references a local copy of the git repository. It would be nice if it could reference the urls for the markdown files instead. This would allow anyone to run it regardless if they have a local copy or not.

Missing foreign key constraints

The following foreign key constraints are not in the DDL files:

concept_synonym.language_concept_id
source_to_concept_map.source_concept_id
cohort_definition.subject_concept_id
attribute_definition.attribute_type_concept_id

Replace Date fields with DateTime fields

Here is the current use of date and time fields in CDM v5.01:

PERSON

year_of_birth as integer, required
month_of_birth as integer, not required
day_of_birth as integer, not required
time_of_birth as time, not required

SPECIMEN

specimen_date as date, required
specimen_time as time, not required

DEATH

death_date as date, required

VISIT_OCCURRENCE

visit_start_date as date, required
visit_start_time as time, not required
visit_end_date as date, required
visit_end_time as time, not required

PROCEDURE_OCCURRENCE

procedure_date as date, required

DRUG_EXPOSURE

drug_exposure_start_date as date, required
drug_exposure _end_date as date, not required

DEVICE_EXPOSURE

device_exposure_start_date as date, required
device _exposure _end_date as date, not required

CONDITION_OCCURRENCE

condition_start_date as date, required
condition _end_date as date, not required

MEASUREMENT

measurement_date as date, required
measurement _time as time, not required

NOTE

note_date as date, required
note_time as time, not required

OBSERVATION

observation_date as date, required
observation _time as time, not required

And the ERA tables, COHORT tables, and PAYER_PLAN_PERIOD table.

DECISION 1

[Suggest datetime = timestamp.]

DECISION 2

We propose adding a datetime field or fields to the following tables

Condition_Occurrence
Procedure_Occurrence
Drug_exposure
Device_exposure
Death

We propose replacing the current time fields with datetime fields in the following tables

Specimen
Visit_Occurrence
Measurement
Note
Observation

A datetime field (also known as timestamp field) would be added to each date field. Date fields would not be changed.

[Suggest moving forward with fields called *_DATETIME, removing the *_TIME fields. The *_DATE fields will remain required.]

DECISION 3

[Suggest optional for now.]

DECISION 4

[Suggest add BIRTH_DATETIME. Remove TIME_OF_BIRTH.]

DECISION 5

[Suggest defer decision.]

Remove datetime from OBSERVATION_PERIOD

Remove datetime from OBSERVATION_PERIOD as it is not necessary in that table.

V4 -> V5 Conversion Script Error

The conversion script fails at the very end on #classification_map table creation with:

The ORDER BY clause is invalid in views, CREATE TABLE AS SELECT, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

Remove the ORDER BY clause and this will fix the problem.

Vocabulary data type Mismatch between CDM_SOURCE and VOCABULARY table

In reviewing and trying to load "fake data" into the OMOP CDM v5 Data Model it has come to my attention that vocabulary_version column entries in two tables have differing lengths, which has a potential to cause loading issues and it is my recommendation to update this to be consistent between tables.

VOCABULARY.VOCABULARY_VERSION VARCHAR(255)
CDM_SOURCE.VOCABULARY_VERSION VARCHAR(20)

it is my understanding the Vocabulary table version is for ANY vocabulary that may be added to the OHDSI Vocabulary and having that large size of 255 makes sense, to accommodate multiple naming conventions.

There is the CDM Source table version is meant to capture the OHDSI Vocabulary used, which is my current understanding, which is based off the old VOCABULARY table from OMOP Vocabulary V4.4 data model.

As per the last April 11 2014 load, the Vocabulary Name column contained data that surpassed the 20 character column length, for a total of 32 characters 'IMEDS Vocabulary v4.4 2014-04-11'

Please do let me know if there is any mistake from my perspective in this analysis.

Drug Quantity for Solid and Liquid Preparations

Drug Quantity

Requester: Christian Reich, Klaus Bonadt
Discussion: No forum posts

Proposal

Relevant table: DRUG_EXPOSURE

We propose to abolish the effective_drug_dose field from the DRUG_EXPOSURE table entirely. The content can be handled by the quantity table.

DRUG_EXPOSURE

Field	Required	Type	Description
drug_exposure_id	Yes	integer	A system-generated unique identifier for each Drug utilization event.
person_id	Yes	integer	A foreign key identifier to the person who is subjected to the Drug. The demographic details of that person are stored in the person table.
drug_concept_id	Yes	integer	A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Drug concept.
drug_exposure_start_date	Yes	date	The start date for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_start_datetime	No	datetime	The start date and time for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_end_date	No	date	The end date for the current instance of Drug utilization. It is not available from all sources.
drug_exposure_end_datetime	No	datetime	The end date and time for the current instance of Drug utilization. It is not available from all sources.
drug_type_concept_id	Yes	integer	A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data.
stop_reason	No	varchar(20)	The reason the Drug was stopped. Reasons include regimen completed, changed, removed, etc.
refills	No	integer	The number of refills after the initial prescription. The initial prescription is not counted, values start with 0.
quantity	No	float	The quantity of drug as recorded in the original prescription or dispensing record.
days_supply	No	integer	The number of days of supply of the medication as recorded in the original prescription or dispensing record.
sig	No	clob	The directions ("signetur") on the Drug prescription as recorded in the original prescription (and printed on the container) or dispensing record.
route_concept_id	No	integer	A foreign key to a predefined concept in the Standardized Vocabularies reflecting the route of administration.
~~effective_drug_dose~~	No	~~float~~	~~Numerical value of Drug dose for this Drug Exposure record.~~
dose_unit_concept_ id	No	integer	A foreign key to a predefined concept in the Standardized Vocabularies reflecting the unit the effective_drug_dose value is expressed.
lot_number	No	varchar(50)	An identifier assigned to a particular quantity or lot of Drug product from the manufacturer.
provider_id	No	integer	A foreign key to the provider in the provider table who initiated (prescribed or administered) the Drug Exposure.
visit_occurrence_id	No	integer	A foreign key to the visit in the visit table during which the Drug Exposure was initiated.
drug_source_value	No	varchar(50)	The source code for the Drug as it appears in the source data. This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference.
drug_source_concept_id	No	integer	A foreign key to a Drug Concept that refers to the code used in the source.
route_source_value	No	varchar(50)	The information about the route of administration as detailed in the source.
dose_unit_source_value	No	varchar(50)	The information about the dose unit as detailed in the source.

We need to have a way to define drug_strength if we have no product information, but only ingredients or drug forms, or if like in chemotherapies or pediatric preparations the drug strength is not defined by the product.

Currently, this information is stored as:

drug_concept_id - contains the ingredient
quantity or effective drug_dose - value
nowhere or dose_unit_concept_id - unit

This works well for ingredients and uni-ingredient drug forms. We should however define a single way of doing it.

Also, we have a problem with multi-ingredient drug forms, because we only have one field.

Conventions

All dosing information is handled through single ingredients stored in the drug_concept_id, and the dosing in quantity (for the amount) and drug_strength.amount_unit_concept_id. See approved proposals for extension of DRUG_STRENGTH and for dosing.

Use Cases

Calculation of daily or total dose of a DRUG_EXPOSURE record.

trivial doc issue - load script for v4.5 states its for v5

change "script to create OMOP common data model, version 5.0 for Oracle database" --> "script to create OMOP common data model, version 4.0 for Oracle database" in CDM V4 DDL.sql

Updated changes document for V4 to V5

Is there final version of the v4 to v5 changes? I was able to find the changes for the proposed version 5 spec:
http://omop.org/sites/default/files/OMOP%20CDM%20v5%20draft%20changes%20CDM4to5%2015may2014.pdf

visit_occurrence nested

Hello

The table visit_occurrence does not allows auto references (nested). This would be useful in order to modelise multpile level of admissions such:
admissions > stays > beds ...

DDL for drug_strength is missing DENOMINATOR_VALUE

I believe this should be the new DDL for drug_strength based on the drug_strength.csv file:

CREATE TABLE [dbo].[drug_strength]([drug_concept_id] [int] NOT NULL,
[ingredient_concept_id] [int] NOT NULL,
[amount_value] [float] NULL,
[amount_unit_concept_id] [int] NULL,
[numerator_value] [float] NULL,
[numerator_unit_concept_id] [int] NULL,
[denominator_value] [float] NULL,
[denominator_unit_concept_id] [int] NULL,
[valid_start_date] [date] NOT NULL,
[valid_end_date] [date] NOT NULL,
[invalid_reason] [varchar]%281%29 NULL)

Moving V4 and V4 to V5 conversion scripts

I propose we move the V4 and V4 to V5 conversion scripts to a separate branch and keep the master branch only for the latest release.

Table and column comments

The comments supplied in version 4.5 of the DDL were incredibly useful. These seem to have disappeared- can these be added back?

BOX_SIZE column in DRUG_STRENGTH.csv

I'm trying to follow the instructions for converting CDM4 to CDM5. Under step 2 of "Usage," it says to download the Vocabularies using ATHENA. I'm downloading the default checked boxes and the DRUG_STRENGTH.csv file I get has a column called BOX_SIZE that doesn't exist in the tables I created in step 1 "Create your V5 Target Database." Should this extra column be in the database, and if so what data type is it? Or did I download the wrong Vocabularies?

postgres vocab import fails

problem is fixed when drug_strength import is commented out. (column count mismatch)

also useful is to prefix the code with schema settings. I wish vocab and results schema split was part of CDM specs.

SET search_path TO vocab;

--COPY DRUG_STRENGTH FROM 'C:\CDMV5VOCAB\DRUG_STRENGTH.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT FROM 'C:\CDMV5VOCAB\CONCEPT.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_RELATIONSHIP FROM 'C:\CDMV5VOCAB\CONCEPT_RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_ANCESTOR FROM 'C:\CDMV5VOCAB\CONCEPT_ANCESTOR.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_SYNONYM FROM 'C:\CDMV5VOCAB\CONCEPT_SYNONYM.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY VOCABULARY FROM 'C:\CDMV5VOCAB\VOCABULARY.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY RELATIONSHIP FROM 'C:\CDMV5VOCAB\RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_CLASS FROM 'C:\CDMV5VOCAB\CONCEPT_CLASS.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY DOMAIN FROM 'C:\CDMV5VOCAB\DOMAIN.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

ohdsi / commondatamodel Goto Github PK

commondatamodel's Introduction

How to Use this Repository

Generating the DDLs

Dependencies and prerequisites

Create DDL, Foreign Keys, Primary Keys, and Indexes from R

First, install the package from GitHub

List the currently supported SQL dialects

List the currently supported CDM versions

1. Use the buildRelease function

2. Use the executeDdl function

Bug Fixes/Model Updates

Step 0

Step 1

Step 1.1

Step 1.2

Step 1.3

Step 2

NOTE ABOUT CDM v6.0

commondatamodel's People

Contributors

Stargazers

Watchers

Forkers

commondatamodel's Issues

Proposal

Use Cases

Add Diagnosis-Related Group (DRG) to COST

Proposal

Background

Proposed Change

Conventions

Multiple DRG Codes

Vocabulary Mapping

Use Cases

Consequences

References

Add Package of Box size and Supplier to Drugs

Proposal

Use Cases

DECISION 1

DECISION 2

DECISION 3

DECISION 4

DECISION 5

Denominator Units for liquid drugs and quantity field in DRUG_EXPOSURE

Proposal

Conventions for the DRUG_EXPOSURE table:

Conventions for the DRUG_STRENGTH table:

Changes in the DRUG_EXPOSURE table:

Change/To do in the DRUG_STRENGTH table:

Options:

Use Cases

End of a drug exposure

Proposal

Conventions

Use Cases

Proposal for Visit_detail: Represent granular encounters or microvisits

Proposal

New VISIT_DETAIL table

PROCEDURE_OCCURRENCE

DRUG_EXPOSURE

DEVICE_EXPOSURE

CONDITION_OCCURRENCE

MEASUREMENT

NOTE

OBSERVATION

Conventions

Use Cases

DECISION 1

DECISION 2

DECISION 3

DECISION 4

DECISION 5

Drug Quantity

Proposal

DRUG_EXPOSURE

Conventions

Use Cases

Recommend Projects

1. Use the `buildRelease` function

2. Use the `executeDdl` function