Giter Site home page Giter Site logo

commondatamodel's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

commondatamodel's Issues

Missing design principle: always storing the source value

I know I reviewed the specs myself and have also missed this, but I would think that the list of design principles (page 6 of the specs) is missing this one:

  • Maintaining source codes. Even though all codes are mapped to the Standardized Vocabularies, the model also stores the original source code to ensure no information is lost.

Missing foreign key constraints

The following foreign key constraints are not in the DDL files:

  • concept_synonym.language_concept_id
  • source_to_concept_map.source_concept_id
  • cohort_definition.subject_concept_id
  • attribute_definition.attribute_type_concept_id

Inefficient SQL Code in Conversion Script

I've noticed that the conversion script includes parts that potentially (and for me did) can take forever to run and not finish. I'm using the PostgreSQL script. The problem is in the sections in which warnings are added to etl_warnings:

INSERT INTO [TARGET_CDMV5].[SCHEMA].ETL_WARNINGS (WARNING_MESSAGE)
SELECT 'PROCEDURE_OCCURRENCE: ' || CAST(NUM_INVALID_RECORDS AS VARCHAR) || ' records in the source CDMv4 database have invalid PROCOEDURE_CONCEPT_ID'
FROM (
    SELECT COUNT(PERSON_ID) AS NUM_INVALID_RECORDS
    FROM [SOURCE_CDMV4].[SCHEMA].PROCEDURE_OCCURRENCE
    WHERE PROCEDURE_CONCEPT_ID NOT IN (
            SELECT CONCEPT_ID
            FROM [TARGET_CDMV5].[SCHEMA].CONCEPT
            WHERE CONCEPT_ID = 0
                OR STANDARD_CONCEPT = 'S'
            )
    HAVING COUNT(PERSON_ID) > 0
    ) warn;

Because of the "NOT IN" part, this code checks every entry in the cdm4 procedure_occurrence to see if its procedure_concept_id is in the list of all concepts in cdm5. Assuming 100m entries in procedure_occurrence and 1m concepts, we're asking the database to do 100 trillion things. Has anyone run the PostgreSQL script and run into this issue?

Instead we should be calling distinct on the procedure_concept_id in procedure_occurrence because there may only be about 10k unique procedure_concept_ids, then calling the NOT IN on the 1m concepts, generating a much smaller list of concepts, then returning to compare each individual entry of procedure_occurrence. See below:

INSERT INTO [TARGET_CDMV5].[SCHEMA].ETL_WARNINGS (WARNING_MESSAGE)
SELECT 'PROCEDURE_OCCURRENCE: ' || CAST(NUM_INVALID_RECORDS AS VARCHAR) || ' records in the source CDMv4 database have invalid PROCOEDURE_CONCEPT_ID'
FROM (
    SELECT COUNT(PERSON_ID) AS NUM_INVALID_RECORDS
    FROM [SOURCE_CDMV4].[SCHEMA].PROCEDURE_OCCURRENCE
    WHERE PROCEDURE_CONCEPT_ID IN (
      SELECT PROCEDURE_CONCEPT_ID
      FROM (
        SELECT DISTINCT(PROCEDURE_CONCEPT_ID)
        FROM [SOURCE_CDMV4].[SCHEMA].PROCEDURE_OCCURRENCE) ids
        WHERE ids.PROCEDURE_CONCEPT_ID NOT IN (
          SELECT CONCEPT_ID
            FROM [TARGET_CDMV5].[SCHEMA].CONCEPT
              WHERE CONCEPT_ID = 0
                  OR STANDARD_CONCEPT = 'S'
            )
        )
    HAVING COUNT(PERSON_ID) > 0
    ) warn;

This second script I was able to run overnight and finish (compared to stalling for days). There are multiple sections of the script that add to etl_warnings. Maybe they should be changed?

Add Time Field to various Occurrence and Exposure Tables

While v5 visit_occurrence table supports both date and time of event, other occurrence tables and exposure tables such as drug, condition, and procedure support only the date level. It is desirable to have the option to include specific time data for such occurrences. Our goal is to allow temporal operations finer than day without disrupting OHDSI by requiring major recoding.

Here is the current use of date and time fields in CDM v5.01:

PERSON

  • year_of_birth as integer, required
  • month_of_birth as integer, not required
  • day_of_birth as integer, not required
  • time_of_birth as time, not required

SPECIMEN

  • specimen_date as date, required
  • specimen_time as time, not required

DEATH

  • death_date as date, required

VISIT_OCCURRENCE

  • visit_start_date as date, required
  • visit_start_time as time, not required
  • visit_end_date as date, required
  • visit_end_time as time, not required

PROCEDURE_OCCURRENCE

  • procedure_date as date, required

DRUG_EXPOSURE

  • drug_exposure_start_date as date, required
  • drug_exposure _end_date as date, not required

DEVICE_EXPOSURE

  • device_exposure_start_date as date, required
  • device _exposure _end_date as date, not required

CONDITION_OCCURRENCE

  • condition_start_date as date, required
  • condition _end_date as date, not required

MEASUREMENT

  • measurement_date as date, required
  • measurement _time as time, not required

NOTE

  • note_date as date, required
  • note_time as time, not required

OBSERVATION

  • observation_date as date, required
  • observation _time as time, not required

And the ERA tables, COHORT tables, and PAYER_PLAN_PERIOD table.

DECISION 1

We have a choice between adding fields of type time, which require significant processing to determine durations (join the date and time, and then operate on that) versus defining a datetime field as timestamp, which allows fast operations but produces redundancy.

[Suggest datetime = timestamp.]

DECISION 2

We propose adding a datetime field or fields to the following tables

  • Condition_Occurrence
  • Procedure_Occurrence
  • Drug_exposure
  • Device_exposure
  • Death

We propose replacing the current time fields with datetime fields in the following tables

  • Specimen
  • Visit_Occurrence
  • Measurement
  • Note
  • Observation

A datetime field (also known as timestamp field) would be added to each date field. Date fields would not be changed.

This would support handling data from ICU, Emergency Department, infusions, post-procedure care, etc. where multiple events occur on the same day and sequence matters. This granularity would also support the incorporation of data generated from tracking devices.

In addition, current *_TIME fields would be removed. At this point, all current software will continue working with the *_DATE fields, and over time we will develop extensions to the software to accommodate DATETIME in different database management systems.

[Suggest moving forward with fields called *_DATETIME, removing the *_TIME fields. The *_DATE fields will remain required.]

DECISION 3

Are the datetime fields required. Making them required allows developers to begin to use them with a potential migration from date+time or date+timestamp to timesteamp in the future, but it forces CDM builders to enter unknown times or timestamps. If required, the default time will be the first instant in the allowable period. E.g., 1990-12-01-00:00:00.000000 is the correct entry for December, 1990.

[Suggest optional for now.]

DECISION 4

Should a date time (timestamp) field be added to BIRTH (or alternatively should PERSON.time_of_birth be changed to timestamp). This will allow more rapid calculation of age but will be redundant with the current information.

[Suggest add BIRTH_DATETIME. Remove TIME_OF_BIRTH.]

DECISION 5

Should we add a *_TIME_GRAN granularity field for each new timestamp field, which would indicate year, month, day, hour, minute, or second as the timestamp's granularity. When a timestamp is filled in, it should be set to the first valid time in that indicated interval.

[Suggest defer decision.]

How to contribute partial ETL code?

I would like to contribute to community our code that starts with
condition_occurrence table that has data in ICD9 in *_source_value and no entries in *_concept_id and *_source_concept_id

That code uses the vocab knowledge and populates those columns based on the data above.

Is this wanted and if yes, can you give me guidance on where to place such code (claire)

new cost table breaks Achilles.... and probably other software

Is there any particular reason why the CDMv5.0 just drops the other cost tables and merges it into one in the same CDM version?

I believe a change like this this should be reflected on a new version of the CDM (5.1?), because now we will have inconsistent instances of the CDM depending on when they got the DDL.

On a side note, the indexes and constraints should probably be updated also.

contribute R code for this repository (era tables generation)

To generate drug_era table - the SQL for this is in "parametized SQL" and can be rendered into dialects using R.

My PR will introduce a folder for code that is flavor-less (R code that takes MS SQL with params) and renders it . I plan to put it in folder R. If you have better idea, please suggest it. ).

Also, the code will do it for version 5.0.1. Or how close are we to official release of 5.1

Issues with FK Constraints

Currently the WebAPI does not write records into the CDM cohort_definition table. Because of this, scripts that run during cohort generation are failing because inserting records into the cohort table with a cohort_definition_id that doesn't exist in the cohort_definition table is violating the referential integrity constraints.

v5.2 release question

Hi on this page:

http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm

The new notes and notes_nlp are introduced as being part of v5.2.
Currently, we are on version v5.0.1 and I see ongoing work on v5.1.

On the Readme it says there will be bi- yearly updates (so July 1 is v5.1?). Does this mean the official
release of the notes tables will be January next year? Is there a specific reason why these are not
also added in V5.1? Needed vocab updates..?

Will the CDM team start a v5.2 branch soon?

In the meantime we can create our own ddl for the new tables to start testing some NLP work done at Erasmus but i like to understand the CDM update process a bit better. If I can help anywhere let me know..

Thanks.

Peter

v5.1 SQL File Comments Can Use and Update

Hello,

I tried to adjust myself.. The DDL file for SQL Server for v5.1 has stale comments in the header. It represents 5.0. It would be helpful to the community if the header comments reflected the current version - 5.1

Expand Drug Strength tables to all Standard Concepts in the Drug Domain

  • Requester: Christian Reich

Proposal

Currently, only Clinical and Branded Drugs have a record in the DRUG_STRENGTH table. However, since all Standard Concepts are allowed to be used in the data and therefore dosing information is required for them, the algorithm should parse all Concept Classes:

  • Clinical Drug, Branded Drug
  • Clinical Drug Components, Branded Drug Component
  • Quantified Clinical Drug and Quantified Branded Drug

Not sure yet what to do with the Packs. Also, not sure what to do with Ingredients and Drug Forms, which have no dose information, but a standard unit for measuring is needed for the use case of calculating dose information from quantity field in DRUG_EXPOSURE. This will be handled in a different request.

To add the components is very straightforward. However, to add the Quantified products, a new field is necessary: denominator_value. The reason is the dose of compound of divisible products such as solutions and gels in Clinical or Branded Drugs is given as as normalized concentration, where the denominator is assumed as a unit of 1. E.g., the product "Thiopental 20 MG/ML Prefilled Syringe" comes in 20 mL syringes. The equivalent DRUG_STRENGTH table entry should specify 400 mg Thiopental in 20 mL of solution.

Field Required Type Description
denominator_value No float The amount of total liquid (or other divisible product, such as ointment, gel, spray, etc.).

Use Cases

The DRUG_STRENGTH table is used to calculate the dose of a certain compound and fill the dose table.

What happened to DRGs?

In CDM v4 I used to record DRG codes in the disease_class_concept_id field in the procedure_cost table, but those fields are gone in v5. DRG codes do not map to any standard concepts in the vocab either. Does that mean we do not record DRGs in the CDM anymore?

Add Diagnosis-Related Group (DRG) to COST

Add Diagnosis-Related Group (DRG) to COST

  • Requester: OMOP-PCORnet Interoperability Collaborative (c/o Rimma Belenkaya)

Proposal

Relevant table: COST

Background

This proposal applies only to US Diagnosis-Related Groups (DRGs). DRG is a system to
classify hospital cases into one of approximately 500 groups. They have been used in the
United States since 1983. There is more than one DRG system being used in the United States,
but only the MS-DRG (Medicare Severity Diagnosis Related Grouper) system is used by
Medicare. OMOP vocabulary includes MS-DRG version 33.0 released in October 2015, the
vocabulary_id is ‘DRG’.
Presently, US DRG code is stored in the Observation table and linked to Visit_Occurrence table
via visit_occurrence_id.

Proposed Change

Move DRG from Observation to the new Cost table as a first class attribute. Link it to
Visit_Occurrence table via visit_occurrence_id.
Add the following DRG fields to Cost table:

Field Required Type Description
DRG_CONCEPT_ID Integer No A foreign key to the predefined concept in the DRG Vocabulary reflecting the DRG for a visit.
DRG_SOURCE_VALUE Varchar(3) No The 3-digit DRG source code as it appears in the source data.

Conventions

Multiple DRG Codes

Occasionally, one visit may have multiple DRG codes. In this case, there will be multiple
costs associated with these DRG codes. In OMOP CDM, they will be represented by
multiple records containing corresponding costs and DRG codes in the Cost table.

Vocabulary Mapping

Source codes coming from DRG systems other than MS-DRG should be mapped to
concept_id = 0.

Use Cases

DRG codes are used in the analysis of cost-efficiency and designs of a hospital payment system. The Affordable Care Act and other recent legislation affect medical reimbursement by altering the DRG system. This legislation will give DRGs an even larger role in determining reimbursements in the coming years.

Consequences

  • This change is an important enhancement to the model and can be implemented immediately
    without impacting other projects.
  • It will have an immediate effect on the three OMOP-based CDRNs: NYC-CDRN representing
    over 2.5 million patients; pSCANNER covering over 21 million patients; and PEDSnet which
    includes eight of the nation's largest children's hospitals and provides service to 4.6 million
    children per year.
  • This change will enhance representation and analysis of DRG codes in OMOP CDM and
    simplify ETL for the sites that are presently using Observation table.
  • It will have no implications on vocabulary or existing software.

References

  1. Weiland DE, ea. Using Delta/DRG Diagrams and Decision Tree Analysis to Select a
    Cost-Effective Surgery for Cholecystitis. JSLS. 1997 Apr-Jun.
  2. Vertrees JC , ea. Bundling post-acute care services into MS-DRG payments.Medicare
    Medicaid Res Rev. 2013 Aug 2.
  3. Rimler SB , ea. Radiographics. Diagnosis-related Groups and Hospital Inpatient Federal
    Reimbursement. 2015 Oct.

Denominator Units Conventions and Quantity Units in DRUG_EXPOSURE

Denominator Units for liquid drugs and quantity field in DRUG_EXPOSURE

  • Requester: Klaus Bonadt
  • Discussion: here.

Proposal

Proposal for conventions to achieve consistent entries in the attribute quantity of the DRUG_EXPOSURE table and entries in the DRUG_STRENGTH table.
Eventually, we will be able to calculate the total amount of the active ingredient by

  • DRUG_EXPOSURE.quantity x DRUG_STRENGTH.numerator_value [numerator_unit_concept_id] (for concentrations) or
  • DRUG_EXPOSURE.quantity x DRUG_STRENGTH.amount_value [amount_unit_concept_id] (for pieces, e.g. tablets)
    For drugs with a dose release over time (e.g. Patches) we will be able to calculate the hourly rate by
  • DRUG_STRENGTH.numerator_value [numerator_unit_concept_id]

Conventions for the DRUG_EXPOSURE table:

  • Roughly speaking, the value for quantity represents the quantity of the drug in the denominator unit of the corresponding entry (or entries) in the DRUG_STRENGTH table. More precisely,
    • For drugs dispensed in “pieces” (e.g. tablets or puffs of an inhaler) the quantity refers to this number of pieces. (Thus, quantity has no unit and the denominator unit in the DRUG_STRENGTH table is either empty or {actuat}).
    • For drugs with a concept_id denoting a quantified drug the quantity refers to a fraction/multiple of the whole pack. (The quantity has no unit, the corresponding entry/entries in the DRUG_STRENGTH table refer to the concentration with the total amount in the denominator).
    • For all other drugs quantity refers to the total amount of the product measured in mL or g. (Thus, the unit of the quantity is either g or mL – both units can be used interchangeably.
  • For products with a dosage release rate (for example patches), the quantity refers to the number of pieces. (The denominator unit of the corresponding entry in the DRUG_STRENGTH table is hour).
  • Compounding: For each compound one entry will be added to the DRUG_EXPOSURE table. This enables us all the above discussed options to provide quantity and strength. Even a compounding of multi-ingredient products would be possible.

Conventions for the DRUG_STRENGTH table:

Probably, nothing has to be changed. But we should cross-check the following assumed properties of the DRUG_STRENGTH table coming from RxNorm and apply these for future enhancements:

  • For drugs dispensed in “pieces”, all corresponding entries must have either an empty denominator unit or {actuat} in the denominator unit.
  • For drugs with a dose release over time, we use hour in the denominator.
  • For all other drugs, the denominator unit refers to mass or volumes, measured in mL or mg.

Changes in the DRUG_EXPOSURE table:

With the above mentioned conventions the fields effective_drug_dose and dose_unit_concept_id can be removed. They don’t support multiple ingredients anyway.

Change/To do in the DRUG_STRENGTH table:

In order to support quantified drugs we need an additional field denominator_value.
For compounding we would need to add records to DRUG_STRENGTH for all compounds (drug_concept_id=ingredient_concept_id) with the mg concept in the denominator_unit_concept_id for all "dry" compounds and mL for the "wet" ones. We could steal the knowledge what is what from RxNorm.

Options:

  • We could agree on using mg instead of g for the quantity in case 1c. However, this would require additional conventions and steps for the ETL process: We would use mL for liquids and products with different denominator units (for example 42799258 "Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 0.01 MG/MG Topical Gel"). However, we would use mg for products which exclusively use a mass unit. Thus, we would need to check all corresponding entries in the DRUG_STRENGTH table beforehand in order to know whether we can apply mg or mL. I don't recommend this.
  • We could use an additional field quantity_unit_concept_id to denote the unit. However, this would also require additional sets of conventions to avoid incompatible entries between quantity_unit_concept_id and denominator_unit_concept_id. I don't recommend this.
  • In the DRUG_STRENGTH table we could allow additional units for masses (besides mg), or volumes (besides mL) or time (besides h). However, this would add not only additional complexity for the conversions but also for the checks to decide whether a DRUG_STRENGTH entry belongs to a dose release over time, to a mass or a volume unit. I don't recommend this.
  • We might want to distinguish between drugs used in compounding and normal drugs dispensed on the same day. In the DRUG_EXPOSURE table we could use a compounding_sequence (holding a sequence number for each compounding on the same day for the same patient) or a compound_sequence (holding a sequence number for each compound. The sequence will restart from 1 for all subsequent compounding). I do recommend this, preferably the latter one.

Use Cases

1 Solid preparations, preparations where dose cannot be split
Example: 20 tablets of 19020053 "Acetaminophen 500 MG Oral Tablet"
DRUG_STRENGTH The denominator_unit is empty. Amount_value and amount_unit_concept_id contain the effective dose. Note that in non-US databases the quantity could be retrieved from the box_size field, however, the ETL should copy that to the DRUG_EXPOSURE.quantity field.
DRUG_EXPOSURE Quantity refers to number of pieces, e.g. 20 tablets
Total dose= quantity x amount_value [amount_unit_concept_id]
Acetaminophen dose = 20 x 500 mg = 10,000 mg
2 Puffs of an inhaler
Example: 150 puffs of 21174430 "Nicotine 0.5 MG/ACTUAT [Nicorette]". Note: There is no difference to use case 1 besides that the DRUG_STRENGTH table has {actuat} in the denominator_unit. In this case the strength is provided in the numerator.
DRUG_STRENGTH The denominator_unit is {actuat}. The numerator value and numerator unit contain the dose per actuation (puff)
DRUG_EXPOSURE Quantity refers to the number puffs, e.g. 150
Total dose= quantity x numerator_value [numerator_unit_concept_id]
Nicotin dose = 150 x 0.5 mg = 75 mg
3 Quantified Drugs
Example: 43275770 "60 ML Acetaminophen 50 MG/ML Oral Solution", meaning, there are 60 mL of Acetaminophe (Paracetamol) of a concentration of 50 mg/mL. Two such bottles are prescribed.
DRUG_STRENGTH The concentration is given in the numerator/denominator fields, with the denominator unit either mg or mL, and the denominator value typically not 1. E.g. 3000 mg in 60 mL.
DRUG_EXPOSURE Quantity refers to a fraction or multiple of the container. E.g. 2
Total dose= quantity x numerator_value [numerator_unit_concept_id]
* Acetaminophen dose = 2 x 3000 mg = 6000 mg*
4 Liquid drugs, or drugs where the amount can be split randomly. The total amount in given in quantity
Example: 42799258 "Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 0.01 MG/MG Topical Gel" dispensed in a 1.25 oz pack.
DRUG_STRENGTH The concentration is given in the numerator/denominator fields, with the denominator unit either mg or mL, and the denominator value as null. E.g. Benzyl Alcohol in mL and Pramoxine hydrochloride in mg
DRUG_EXPOSURE Quantity refers to the unit provided in DRUG_STRENGTH. Since both mL and mg are found mg is preferred. E.g. 1.25 x 30 (conversion factor oz -> mL) x 1000 (conversion mL -> mg) = 37,000
Total dose= quantity x numerator_value [numerator_unit_concept_id]
Benzyl Alcohol dose = 37,000 x 0.1 mL x 0.001 = 3.7 mL
Pramoxine hydrochloride dose = 37,000 x 0.01 mg = 370 mg
Note: At analysis time, the denominator should be checked in the DRUG_STRENGTH table. As mg is used for the second ingredient the factor 1000 should be applied to convert between g and mg.
5 Compounded drugs
Example: 30 mL Ibuprofen 20% / Piroxicam 1% Cream.
DRUG_STRENGTH Two separate entries for the ingredients of Ibuprofen and Piroxicam are required, with an amount value of null and a amount unit of mg.
DRUG_EXPOSURE Quantity refers to the total weight amount of the compounded ingredient. Use one record in the DRUG_EXPOSURE table for each compound. Eg.g 20% Ibuprofen in 30 mL = 6 mL, 1% Piroxicam in 30 mL = 0.3 mL
Total dose= Depends on the drugs involved: One of the use cases above.
Ibuprofen dose = 6 x 1000 = 6000 mg
Piroxicam dose = 0.3 x 1000 = 300 mg
Note: At analysis time, the denominator should be checked for both ingredients in the DRUG_STRENGTH table. If it is not mg a conversion factor of 1000 should be applied to convert between mL/g and mg.
6 Drugs with delayed release over time, e.g. Patches
Example: 1518199 "Ethinyl Estradiol 0.000833 MG/HR / norelgestromin 0.00625 MG/HR Weekly Transdermal Patch
DRUG_STRENGTH The release is provided as numerator/denominator pair, with the denominator value as null and the denominator unit hour. E.g. Ethinyl Estradiol 0.000833 mg/h / norelgestromin 0.00625 mg/h
DRUG_EXPOSURE Quantity refers to the number of pieces. E.g. 1 patch.
Release rate= numerator_value [numerator_unit_concept_id]
Ethinyl Estradiol rate = 0.000833 mg/h
norelgestromin rate 0.00625 mg/h
Note: This can be converted to a daily dosage by multiplying it with 24. (Assuming 1 patch at a time for at least 24 hours)

End of a Drug Exposure

End of a drug exposure

  • Requester: Klaus Bonadt, Christian Reich, Patrick Ryan
  • Discussion: here

Proposal

Relevant table: DRUG_EXPOSURE

Field Required Type Description
drug_exposure_end_date Yes date The end date for the current instance of Drug utilization. Depending on different sources, it could be a known or an inferred date and denotes the last day at which the patient was still exposed to drug.
days_supply No integer The number of days of supply of the medication as prescribed. This reflects the intention of the provider for the length of exposure.
verbatim_end_date No date The known end date of a drug exposure as provided by the source.

The verbatim_end_dat field is only necessary as the drug_exposure_end_date becomes mandatory, and might be (wrongly) inferred. In cases where no good data are available to fill that end_date, the ETL will have to take a guess, and that may not satisfy some algorithms for dose calculation.

Conventions

  • The drug_exposure_end_date denotes the day the drug exposure ended for the patient. This could be that the duration of drug_supply was reached (in which case drug_exposure_end_date = drug_exposure_start_date + days_supply -1), or because the exposure was stopped (medication changed, medication discontinued, etc.)

Use Cases

All standardized analytics and the drug era constructor will only use the drug_exposure_end_date. The days supply could be used to understand the intent of prescribing versus the reality, and it could be used to calculate MPR.

Redshift Version

Are there any plans to publish a ddl and associated files for a Redshift implementation? If not, I can fork the PostgreSQL version. How close is this to the final specification?

V4/4.5 to V5

Before we start programming - has anyone already done this? Any best practices to ensure we generate something portable?

Represent granular encounters or microvisits in a new VISIT_DETAIL table

Proposal for Visit_detail: Represent granular encounters or microvisits


Proposal

  • Current VISIT_OCCURRENCE table does not meet use-cases that need the granular information, such as movement of person while in a hospital.
  • The intent of this proposal is to capture detail information about a record in visit_occurrence. Examples of detail information may be encounters, micro-visits etc., and will be collected as is from the source data.
  • We propose a new VISIT_DETAIL table with a structure that is similar to current VISIT_OCCURRENCE table. For every record in visit_occurrence there maybe 0 or more records in visit_detail.
  • Records in visit_detail will be related to each other sequentially or hierarchically, AND will be related to visit_occurrence table (using chaining/sequential method or parent-child/part-of).
  • All information will belong to the domain visit.
  • Example: an entire inpatient stay maybe one record in visit_occurrence table. This may have one or more detail information such as ER, ICU, medical floor, rehabilitation floor etc. Each of these visit_details may have different start/end date-times, different concept_id's and fact_id's - that would be separate record in visit_detail with a FK link to visit_occurrence. Each record within visit_detail maybe related to each other, sequentially –> ER leading to ICU leading to medical floor, leading to rehabilitation, or in hierarchical parent-child visit –> a visit for dialysis while in ICU.

New VISIT_DETAIL table

This table will have the same structure as current VISIT_OCCURRENCE table, except for two changes:

  • Two new foreign keys pointing to itself (visit_detail_parent_id) and to visit_occurrence table (visit_occurrence_id)
  • Removal of _date fields.
Field Required Type Description
visit_detail_id Yes integer A unique identifier for each Person's visit-detail at a healthcare provider.
person_id Yes integer A foreign key identifier to the Person for whom the visit is recorded. The demographic details of that Person are stored in the PERSON table.
visit_detail_concept_id Yes integer A foreign key that refers to a visit Concept identifier in the Standardized Vocabularies.
visit_start_date Yes date The start date of the visit.
visit_start_datetime Yes datetime The date and time of the visit-detail started.
visit_end_date Yes date The end date of the visit.
visit_end_datetime No datetime The date and time of the visit end.
visit_type_concept_id Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of source data from which the visit record is derived.
provider_id No integer A foreign key to the provider in the provider table who was associated with the visit.
care_site_id No integer A foreign key to the care site in the care site table where visit occurred
admitting_source_concept_id No integer A foreign key to the predefined concept in the Place of Service Vocabulary reflecting the admitting source for a visit.
discharge_to_concept_id No integer A foreign key to the predefined concept in the Place of Service Vocabulary reflecting the discharge disposition (destination) for a visit.
preceding_visit_detail_id No integer A foreign key to the visit_occurrence table record of the visit immediately preceding this visit.
visit_source_value No string The source code for the visit as it appears in the source data.
visit_source_concept_id No Integer A foreign key to a Concept that refers to the code used in the source.
admitting_source_value No string The source code for the admitting source as it appears in the source data.
discharge_to_source_value No string The source code for the discharge disposition as it appears in the source data.
visit_detail_parent_id No integer A foreign key to the visit_detail table record to represent the immediate parent visit-detail record.
visit_occurrence_id Yes integer A foreign key that refers to the record in the visit_occurrence table

Relationship between VISIT_DETAIL and clinical events tables
We will add a new optional foreign key pointing from a clinical-event table to visit_detail table.

PROCEDURE_OCCURRENCE

Field Required Type Description
procedure_occurrence_id Yes integer A system-generated unique identifier for each Procedure Occurrence.
person_id Yes integer A foreign key identifier to the Person who is subjected to the Procedure. The demographic details of that Person are stored in the PERSON table.
procedure_concept_id Yes integer A foreign key that refers to a standard procedure Concept identifier in the Standardized Vocabularies.
procedure_date Yes date The date on which the Procedure was performed.
procedure_datetime No datetime The date and time on which the Procedure was performed.
procedure_type_concept_id Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of source data from which the procedure record is derived.
modifier_concept_id No integer A foreign key to a Standard Concept identifier for a modifier to the Procedure (e.g. bilateral)
quantity No integer The quantity of procedures ordered or administered.
provider_id No integer A foreign key to the provider in the provider table who was responsible for carrying out the procedure.
visit_occurrence_id No integer A foreign key to the visit in the visit table during which the Procedure was carried out.
visit_detail_id No integer A foreign key to the visit in the visit-detail table during which the Procedure was carried out.
procedure_source_value No varchar(50) The source code for the Procedure as it appears in the source data. This code is mapped to a standard procedure Concept in the Standardized Vocabularies and the original code is, stored here for reference. Procedure source codes are typically ICD-9-Proc, CPT-4, HCPCS or OPCS-4 codes.
procedure_source_concept_id No integer A foreign key to a Procedure Concept that refers to the code used in the source.
qualifier_source_value No varchar(50) The source code for the qualifier as it appears in the source data.

DRUG_EXPOSURE

Field Required Type Description
drug_exposure_id Yes integer A system-generated unique identifier for each Drug utilization event.
person_id Yes integer A foreign key identifier to the person who is subjected to the Drug. The demographic details of that person are stored in the person table.
drug_concept_id Yes integer A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Drug concept.
drug_exposure_start_date Yes date The start date for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_start_datetime No datetime The start date and time for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_end_date No date The end date for the current instance of Drug utilization. It is not available from all sources.
drug_exposure_end_datetime No datetime The end date and time for the current instance of Drug utilization. It is not available from all sources.
drug_type_concept_id Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data.
stop_reason No varchar(20) The reason the Drug was stopped. Reasons include regimen completed, changed, removed, etc.
refills No integer The number of refills after the initial prescription. The initial prescription is not counted, values start with 0.
quantity No float The quantity of drug as recorded in the original prescription or dispensing record.
days_supply No integer The number of days of supply of the medication as recorded in the original prescription or dispensing record.
sig No clob The directions ("signetur") on the Drug prescription as recorded in the original prescription (and printed on the container) or dispensing record.
route_concept_id No integer A foreign key to a predefined concept in the Standardized Vocabularies reflecting the route of administration.
effective_drug_dose No float Numerical value of Drug dose for this Drug Exposure record.
dose_unit_concept_ id No integer A foreign key to a predefined concept in the Standardized Vocabularies reflecting the unit the effective_drug_dose value is expressed.
lot_number No varchar(50) An identifier assigned to a particular quantity or lot of Drug product from the manufacturer.
provider_id No integer A foreign key to the provider in the provider table who initiated (prescribed or administered) the Drug Exposure.
visit_occurrence_id No integer A foreign key to the visit in the visit table during which the Drug Exposure was initiated.
visit_detail_id No integer A foreign key to the visit in the visit-detail table during which the Drug Exposure was initiated.
drug_source_value No varchar(50) The source code for the Drug as it appears in the source data. This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference.
drug_source_concept_id No integer A foreign key to a Drug Concept that refers to the code used in the source.
route_source_value No varchar(50) The information about the route of administration as detailed in the source.
dose_unit_source_value No varchar(50) The information about the dose unit as detailed in the source.

DEVICE_EXPOSURE

Field Required Type Description
device_exposure_id Yes integer A system-generated unique identifier for each Device Exposure.
person_id Yes integer A foreign key identifier to the Person who is subjected to the Device. The demographic details of that person are stored in the Person table.
device_concept_id Yes integer A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Device concept.
device_exposure_start_date Yes date The date the Device or supply was applied or used.
device_exposure_start_datetime No datetime The date and time the Device or supply was applied or used.
device_exposure_end_date No date The date the Device or supply was removed from use.
device_exposure_end_datetime No datetime The date and time the Device or supply was removed from use.
device_type_concept_id Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Device Exposure recorded. It indicates how the Device Exposure was represented in the source data.
unique_device_id No varchar(50) A UDI or equivalent identifying the instance of the Device used in the Person.
quantity No integer The number of individual Devices used for the exposure.
provider_id No integer A foreign key to the provider in the PROVIDER table who initiated of administered the Device.
visit_occurrence_id No integer A foreign key to the visit in the VISIT table during which the device was used.
visit_detail_id No integer A foreign key to the visit in the VISIT_DETAIL table during which the device was used.
device_source_value No varchar(50) The source code for the Device as it appears in the source data. This code is mapped to a standard Device Concept in the Standardized Vocabularies and the original code is stored here for reference.
device_source_ concept_id No integer A foreign key to a Device Concept that refers to the code used in the source.

CONDITION_OCCURRENCE

Field Required Type Description
condition_occurrence_id Yes integer A unique identifier for each Condition Occurrence event.
person_id Yes integer A foreign key identifier to the Person who is experiencing the condition. The demographic details of that Person are stored in the PERSON table.
condition_concept_id Yes integer A foreign key that refers to a Standard Condition Concept identifier in the Standardized Vocabularies.
condition_start_date Yes date The date when the instance of the Condition is recorded.
condition_start_datetime No datetime The date and time when the instance of the Condition is recorded.
condition_end_date No date The date when the instance of the Condition is considered to have ended.
condition_end_datetime No date The date when the instance of the Condition is considered to have ended.
condition_type_concept_id Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the source data from which the condition was recorded, the level of standardization, and the type of occurrence.
stop_reason No varchar(20) The reason that the condition was no longer present, as indicated in the source data.
provider_id No integer A foreign key to the Provider in the PROVIDER table who was responsible for capturing (diagnosing) the Condition.
visit_occurrence_id No integer A foreign key to the visit in the VISIT table during which the Condition was determined (diagnosed).
visit_detail_id No integer A foreign key to the visit in the VISIT_DETAIL table during which the Condition was determined (diagnosed).
condition_status_concept_id No integer A foreign key to the predefined concept in the standard vocabulary reflecting the condition status.
condition_source_concept_id No integer A foreign key to a Condition Concept that refers to the code used in the source.
condition_source_value No varchar(50) The source code for the condition as it appears in the source data. This code is mapped to a standard condition concept in the Standardized Vocabularies and the original code is stored here for reference.
condition_status_source_value No varchar(50)

MEASUREMENT

Field Required Type Description
measurement_id Yes integer A unique identifier for each Measurement.
person_id Yes integer A foreign key identifier to the Person about whom the measurement was recorded. The demographic details of that Person are stored in the PERSON table.
measurement_concept_id Yes integer A foreign key to the standard measurement concept identifier in the Standardized Vocabularies.
measurement_date Yes date The date of the Measurement.
measurement_datetime No datetime The date and time of the Measurement. (Some database systems don't have a datatype of time. To accomodate all temporal analyses, datatype datetime can be used (combining measurement_date and measurement_time)[[http://forums.ohdsi.org/t/date-time-and-datetime-problem-and-the-world-of-hours-and-1day/314
measurement_type_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the provenance from where the Measurement record was recorded.
operator_concept_id No integer A foreign key identifier to the predefined Concept in the Standardized Vocabularies reflecting the mathematical operator that is applied to the value_as_number. Operators are <, ≤, =, ≥, >.
value_as_number No float A Measurement result where the result is expressed as a numeric value.
value_as_concept_id No integer A foreign key to a Measurement result represented as a Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.).
unit_concept_id No integer A foreign key to a Standard Concept ID of Measurement Units in the Standardized Vocabularies.
range_low No float The lower limit of the normal range of the Measurement result. The lower range is assumed to be of the same unit of measure as the Measurement value.
range_high No float The upper limit of the normal range of the Measurement. The upper range is assumed to be of the same unit of measure as the Measurement value.
provider_id No integer A foreign key to the provider in the PROVIDER table who was responsible for initiating or obtaining the measurement.
visit_occurrence_id No integer A foreign key to the Visit in the VISIT_OCCURRENCE table during which the Measurement was recorded.
visit_detail_id No integer A foreign key to the Visit in the VISIT_DETAIL table during which the Measurement was recorded.
measurement_source_value No varchar(50) The Measurement name as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is stored here for reference.
measurement_source_concept_id No integer A foreign key to a Concept in the Standard Vocabularies that refers to the code used in the source.
unit_source_value No varchar(50) The source code for the unit as it appears in the source data. This code is mapped to a standard unit concept in the Standardized Vocabularies and the original code is stored here for reference.
value_source_value No varchar(50) The source value associated with the content of the value_as_number or value_as_concept_id as stored in the source data.

NOTE

Field Required Type Description
note_id Yes integer A unique identifier for each note.
person_id Yes integer A foreign key identifier to the Person about whom the Note was recorded. The demographic details of that Person are stored in the PERSON table.
note_date Yes date The date the note was recorded.
note_datetime No datetime The date and time the note was recorded.
note_type_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the type, origin or provenance of the Note.
note_class_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the HL7 LOINC Document Type Vocabulary classification of the note.
note_title No string(250) The title of the Note as it appears in the source.
note_text Yes RBDMS dependent text The content of the Note.
encoding_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the note character encoding type.
language_concept_id Yes integer A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the language of the note.
provider_id No integer A foreign key to the Provider in the PROVIDER table who took the Note.
visit_occurrence_id No integer Foreign key to the Visit in the VISIT_OCCURRENCE table when the Note was taken.
visit_detail_id No integer Foreign key to the Visit in the VISIT_DETAIL table when the Note was taken.

OBSERVATION

Field Required Type Description
observation_id Yes integer A unique identifier for each observation.
person_id Yes integer A foreign key identifier to the Person about whom the observation was recorded. The demographic details of that Person are stored in the PERSON table.
observation_concept_id Yes integer A foreign key to the standard observation concept identifier in the Standardized Vocabularies.
observation_date Yes date The date of the observation.
observation_datetime No datetime The date and time of the observation.
observation_type_concept_id Yes integer A foreign key to the predefined concept identifier in the Standardized Vocabularies reflecting the type of the observation.
value_as_number No float The observation result stored as a number. This is applicable to observations where the result is expressed as a numeric value.
value_as_string No varchar(60) The observation result stored as a string. This is applicable to observations where the result is expressed as verbatim text.
value_as_concept_id No Integer A foreign key to an observation result stored as a Concept ID. This is applicable to observations where the result can be expressed as a Standard Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.).
qualifier_concept_id No integer A foreign key to a Standard Concept ID for a qualifier (e.g., severity of drug-drug interaction alert)
unit_concept_id No integer A foreign key to a Standard Concept ID of measurement units in the Standardized Vocabularies.
provider_id No integer A foreign key to the provider in the PROVIDER table who was responsible for making the observation.
visit_occurrence_id No integer A foreign key to the visit in the VISIT_OCCURRENCE table during which the observation was recorded.
visit_detail_id No integer A foreign key to the visit in the VISIT_DETAIL table during which the observation was recorded.
observation_source_value No varchar(50) The observation code as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is, stored here for reference.
observation_source_concept_id No integer A foreign key to a Concept that refers to the code used in the source.
unit_source_value No varchar(50) The source code for the unit as it appears in the source data. This code is mapped to a standard unit concept in the Standardized Vocabularies and the original code is, stored here for reference.
qualifier_source_value No varchar(50) The source value associated with a qualifier to characterize the observation

Conventions

Representation of US claim data

US claims data generally has two-levels

  • Header/summary data that summarizes the entire claim
  • Line/detail that details a claim.

Detail is thus a child of the summary, and for every record in summary there is one or more records in detail. i.e. there will be atleast one FK link from visit_detail to visit_occurrence.

Use Cases

visit_occurrence nested

Hello

The table visit_occurrence does not allows auto references (nested). This would be useful in order to modelise multpile level of admissions such:
admissions > stays > beds ...

Add references to markdown urls in R pdf code

The current R code to create the CDM pdf file references a local copy of the git repository. It would be nice if it could reference the urls for the markdown files instead. This would allow anyone to run it regardless if they have a local copy or not.

Drug Quantity for Solid and Liquid Preparations

Drug Quantity

  • Requester: Christian Reich, Klaus Bonadt
  • Discussion: No forum posts

Proposal

Relevant table: DRUG_EXPOSURE

  • We propose to abolish the effective_drug_dose field from the DRUG_EXPOSURE table entirely. The content can be handled by the quantity table.

DRUG_EXPOSURE

Field Required Type Description
drug_exposure_id Yes integer A system-generated unique identifier for each Drug utilization event.
person_id Yes integer A foreign key identifier to the person who is subjected to the Drug. The demographic details of that person are stored in the person table.
drug_concept_id Yes integer A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Drug concept.
drug_exposure_start_date Yes date The start date for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_start_datetime No datetime The start date and time for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.
drug_exposure_end_date No date The end date for the current instance of Drug utilization. It is not available from all sources.
drug_exposure_end_datetime No datetime The end date and time for the current instance of Drug utilization. It is not available from all sources.
drug_type_concept_id Yes integer A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data.
stop_reason No varchar(20) The reason the Drug was stopped. Reasons include regimen completed, changed, removed, etc.
refills No integer The number of refills after the initial prescription. The initial prescription is not counted, values start with 0.
quantity No float The quantity of drug as recorded in the original prescription or dispensing record.
days_supply No integer The number of days of supply of the medication as recorded in the original prescription or dispensing record.
sig No clob The directions ("signetur") on the Drug prescription as recorded in the original prescription (and printed on the container) or dispensing record.
route_concept_id No integer A foreign key to a predefined concept in the Standardized Vocabularies reflecting the route of administration.
effective_drug_dose No float Numerical value of Drug dose for this Drug Exposure record.
dose_unit_concept_ id No integer A foreign key to a predefined concept in the Standardized Vocabularies reflecting the unit the effective_drug_dose value is expressed.
lot_number No varchar(50) An identifier assigned to a particular quantity or lot of Drug product from the manufacturer.
provider_id No integer A foreign key to the provider in the provider table who initiated (prescribed or administered) the Drug Exposure.
visit_occurrence_id No integer A foreign key to the visit in the visit table during which the Drug Exposure was initiated.
drug_source_value No varchar(50) The source code for the Drug as it appears in the source data. This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference.
drug_source_concept_id No integer A foreign key to a Drug Concept that refers to the code used in the source.
route_source_value No varchar(50) The information about the route of administration as detailed in the source.
dose_unit_source_value No varchar(50) The information about the dose unit as detailed in the source.
  • We need to have a way to define drug_strength if we have no product information, but only ingredients or drug forms, or if like in chemotherapies or pediatric preparations the drug strength is not defined by the product.

Currently, this information is stored as:

  • drug_concept_id - contains the ingredient
  • quantity or effective drug_dose - value
  • nowhere or dose_unit_concept_id - unit

This works well for ingredients and uni-ingredient drug forms. We should however define a single way of doing it.

Also, we have a problem with multi-ingredient drug forms, because we only have one field.

Conventions

  • All dosing information is handled through single ingredients stored in the drug_concept_id, and the dosing in quantity (for the amount) and drug_strength.amount_unit_concept_id. See approved proposals for extension of DRUG_STRENGTH and for dosing.

Use Cases

  • Calculation of daily or total dose of a DRUG_EXPOSURE record.

V4 -> V5 Conversion Script Error

The conversion script fails at the very end on #classification_map table creation with:

The ORDER BY clause is invalid in views, CREATE TABLE AS SELECT, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

Remove the ORDER BY clause and this will fix the problem.

Should person be of type bigint instead of int in the standard CDM DDL?

person_id in dbo.person is being declared as integer, but is referenced elsewhere as 'intger'.

EDIT

The CDM ddl is consistent: person_id is used as integer everywhere, but integer has been too small to hold certain person_ids from some vendor's raw source data, and we need to alter our DDL internally to have all person_id's set to 'bigint'. Is this something that should be applied to the standard, or should we update the standard DDL to have person_id be bigint?

DDL for drug_strength is missing DENOMINATOR_VALUE

I believe this should be the new DDL for drug_strength based on the drug_strength.csv file:

CREATE TABLE [dbo].[drug_strength]([drug_concept_id] [int] NOT NULL,
[ingredient_concept_id] [int] NOT NULL,
[amount_value] [float] NULL,
[amount_unit_concept_id] [int] NULL,
[numerator_value] [float] NULL,
[numerator_unit_concept_id] [int] NULL,
[denominator_value] [float] NULL,
[denominator_unit_concept_id] [int] NULL,
[valid_start_date] [date] NOT NULL,
[valid_end_date] [date] NOT NULL,
[invalid_reason] [varchar]%281%29 NULL)

Derived tables SQL (e.g., drug_era tables)

The v4 to v5 parametized SQL includes the code for deriving _era tables.

I would like to volunteer to split this and make that separate item (derived table SQL code).

Who is the github admin person for this repo?

postgres vocab import fails

problem is fixed when drug_strength import is commented out. (column count mismatch)

also useful is to prefix the code with schema settings. I wish vocab and results schema split was part of CDM specs.

SET search_path TO vocab;

--COPY DRUG_STRENGTH FROM 'C:\CDMV5VOCAB\DRUG_STRENGTH.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT FROM 'C:\CDMV5VOCAB\CONCEPT.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_RELATIONSHIP FROM 'C:\CDMV5VOCAB\CONCEPT_RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_ANCESTOR FROM 'C:\CDMV5VOCAB\CONCEPT_ANCESTOR.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_SYNONYM FROM 'C:\CDMV5VOCAB\CONCEPT_SYNONYM.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY VOCABULARY FROM 'C:\CDMV5VOCAB\VOCABULARY.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY RELATIONSHIP FROM 'C:\CDMV5VOCAB\RELATIONSHIP.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY CONCEPT_CLASS FROM 'C:\CDMV5VOCAB\CONCEPT_CLASS.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

COPY DOMAIN FROM 'C:\CDMV5VOCAB\DOMAIN.csv' WITH DELIMITER E'\t' CSV HEADER QUOTE E'\b' ;

Add Package or Box size and Supplier to Drugs

Add Package of Box size and Supplier to Drugs

  • Requester: Christian Reich
  • Discussion: in WG

Proposal

Relevant table: DRUG_STRENGTH

In many other countries, drugs are not dispensed in the pharmacies, with small exceptions (usually ointments that have to be created fresh). Instead, all prescription drugs are packaged just like OTC drugs in the US, with the number of content usually lasting for one, two or three months. These products are directly prescribed.

In order to be capture these products, "Clinical Drug" or "Branded Drug" is not sufficient. "Clinical Pack" or "Branded Pack" is also not the same, as those are products combined in fixed numbers, like contraceptives. Therefore, a new RxNorm like Concept Class "Drug Box" is needed. The DRUG_STRENGTH table would require an additional field box_size:

Field Required Type Description
box_size No integer The number of units of Clinical or Branded Drug, or Quantified Clinical or Branded Drug contained in a box as dispensed to the patient.

In addition, Branded Drugs or Packs are provided by a specific supplier. They would be added as new concepts:

Field Required Type Description
supplier No string(50) The name of the supplier (manufacturer or wholeseller) who provides a Branded Drug/Pack to a certain market.

In the CONCEPT_RELATIONSHIP table new relationships would be recorded to link those Drug products to their suppliers with the relationship_id "Has supplier" and "Supplier of".

Use Cases

This information is necessary to infer from European or other non-US prescription data the quantity of product. The ETL process would transfer that information into the quantity field in the drug_exposure table.

v4 to v5: RedShift

for conversion scripts - SqlRender does support RedShift,

can RedSHift SQL be also posted.

Also, for users new to R, it may be nice to show R code that 'generates' the final SQL from parametized-SQL.

Replace Date fields with DateTime fields

While v5 visit_occurrence table supports both date and time of event, other occurrence tables and exposure tables such as drug, condition, and procedure support only the date level. It is desirable to have the option to include specific time data for such occurrences. Our goal is to allow temporal operations finer than day without disrupting OHDSI by requiring major recoding.

Here is the current use of date and time fields in CDM v5.01:

PERSON

  • year_of_birth as integer, required
  • month_of_birth as integer, not required
  • day_of_birth as integer, not required
  • time_of_birth as time, not required

SPECIMEN

  • specimen_date as date, required
  • specimen_time as time, not required

DEATH

  • death_date as date, required

VISIT_OCCURRENCE

  • visit_start_date as date, required
  • visit_start_time as time, not required
  • visit_end_date as date, required
  • visit_end_time as time, not required

PROCEDURE_OCCURRENCE

  • procedure_date as date, required

DRUG_EXPOSURE

  • drug_exposure_start_date as date, required
  • drug_exposure _end_date as date, not required

DEVICE_EXPOSURE

  • device_exposure_start_date as date, required
  • device _exposure _end_date as date, not required

CONDITION_OCCURRENCE

  • condition_start_date as date, required
  • condition _end_date as date, not required

MEASUREMENT

  • measurement_date as date, required
  • measurement _time as time, not required

NOTE

  • note_date as date, required
  • note_time as time, not required

OBSERVATION

  • observation_date as date, required
  • observation _time as time, not required

And the ERA tables, COHORT tables, and PAYER_PLAN_PERIOD table.

DECISION 1

We have a choice between adding fields of type time, which require significant processing to determine durations (join the date and time, and then operate on that) versus defining a datetime field as timestamp, which allows fast operations but produces redundancy.

[Suggest datetime = timestamp.]

DECISION 2

We propose adding a datetime field or fields to the following tables

  • Condition_Occurrence
  • Procedure_Occurrence
  • Drug_exposure
  • Device_exposure
  • Death

We propose replacing the current time fields with datetime fields in the following tables

  • Specimen
  • Visit_Occurrence
  • Measurement
  • Note
  • Observation

A datetime field (also known as timestamp field) would be added to each date field. Date fields would not be changed.

This would support handling data from ICU, Emergency Department, infusions, post-procedure care, etc. where multiple events occur on the same day and sequence matters. This granularity would also support the incorporation of data generated from tracking devices.

In addition, current *_TIME fields would be removed. At this point, all current software will continue working with the *_DATE fields, and over time we will develop extensions to the software to accommodate DATETIME in different database management systems.

[Suggest moving forward with fields called *_DATETIME, removing the *_TIME fields. The *_DATE fields will remain required.]

DECISION 3

Are the datetime fields required. Making them required allows developers to begin to use them with a potential migration from date+time or date+timestamp to timesteamp in the future, but it forces CDM builders to enter unknown times or timestamps. If required, the default time will be the first instant in the allowable period. E.g., 1990-12-01-00:00:00.000000 is the correct entry for December, 1990.

[Suggest optional for now.]

DECISION 4

Should a date time (timestamp) field be added to BIRTH (or alternatively should PERSON.time_of_birth be changed to timestamp). This will allow more rapid calculation of age but will be redundant with the current information.

[Suggest add BIRTH_DATETIME. Remove TIME_OF_BIRTH.]

DECISION 5

Should we add a *_TIME_GRAN granularity field for each new timestamp field, which would indicate year, month, day, hour, minute, or second as the timestamp's granularity. When a timestamp is filled in, it should be set to the first valid time in that indicated interval.

[Suggest defer decision.]

Time storage in CDM v5

I'm ETLing data from a MSSQL data warehouse into CDM v5 on Postgres. When I cast the time portion of a timestamp into a TIME, I'm getting strings that are greater than the varchar(10) limit on several CDM v5 tables. The max string length for time appears to be 11 characters.

CAST('11/13/2013 12:00:00 AM' AS TIME) => '12:00:00 AM' (11 characters)

This caused errors when attempting to load time data.

If we're using the ANSI SQL time formats, as specified in the documentation, the max length of the time fields should be altered to account for the following specified ANSI format: "HH:MM.SS.MMMM", which assumes that we're all using 24 hour time.

Perhaps the time fields should use the "time" data type rather than strings?

v5.1 vs master

for PR that fixes a bug, what branch should we target?

BOX_SIZE column in DRUG_STRENGTH.csv

I'm trying to follow the instructions for converting CDM4 to CDM5. Under step 2 of "Usage," it says to download the Vocabularies using ATHENA. I'm downloading the default checked boxes and the DRUG_STRENGTH.csv file I get has a column called BOX_SIZE that doesn't exist in the tables I created in step 1 "Create your V5 Target Database." Should this extra column be in the database, and if so what data type is it? Or did I download the wrong Vocabularies?

Vocabulary data type Mismatch between CDM_SOURCE and VOCABULARY table

In reviewing and trying to load "fake data" into the OMOP CDM v5 Data Model it has come to my attention that vocabulary_version column entries in two tables have differing lengths, which has a potential to cause loading issues and it is my recommendation to update this to be consistent between tables.

VOCABULARY.VOCABULARY_VERSION VARCHAR(255)
CDM_SOURCE.VOCABULARY_VERSION VARCHAR(20)

it is my understanding the Vocabulary table version is for ANY vocabulary that may be added to the OHDSI Vocabulary and having that large size of 255 makes sense, to accommodate multiple naming conventions.

There is the CDM Source table version is meant to capture the OHDSI Vocabulary used, which is my current understanding, which is based off the old VOCABULARY table from OMOP Vocabulary V4.4 data model.

As per the last April 11 2014 load, the Vocabulary Name column contained data that surpassed the 20 character column length, for a total of 32 characters 'IMEDS Vocabulary v4.4 2014-04-11'

Please do let me know if there is any mistake from my perspective in this analysis.

5.1 release

CDM documentation on wiki indicates v5.1

there is branch 5.1.

What steps need to be satisfied for a release 5.1 to occurr?

(e.g., testing on db platforms like Achilles? - which requires significant expertise) (which I am not perhaps advocating here for) .

Looks like last release was done by @schuemie

Is @cgreich the new admin?

(I am happy to volunteer help with those steps)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.