Giter Site home page Giter Site logo

finregistry_detailed_longitudinal_and_endpoint_generation's Introduction

this repository contains the processing scripts used in Finregistry for creating detailed longitudinal and endpointer files.

detailed longitudinal script was written by Matteo Ferro with the help of Essi Viippola.
endpointer script was written by Andrius Vabalas.

finregistry_detailed_longitudinal_and_endpoint_generation's People

Contributors

ferroteo avatar andrius6 avatar essiviippola avatar

Stargazers

 avatar

Watchers

Andrea Ganna avatar vincent avatar Feiyi Wang avatar  avatar

finregistry_detailed_longitudinal_and_endpoint_generation's Issues

duplication in CODE4 (length of stay)

OK, this is a bit weird. First of all how can there be a length of stay for outpatients? Outpatient clinics generally don't have beds, so the only reasonable explanation is that these people were admitted to inpatients.

MOCKID SOURCE PVM CODE4 NUM
A INPAT same date 1 1
A INPAT same date 5 2
A OUTPAT same date 0 3
A OUTPAT same date 1 4

Resolve for next version of DL

registry end dates

data used in detailed longitudinal is not up to date for those registries:

  • death
  • kela purchases

both are missing the jan 2020 - dec 2021 time period

CODE8/9

variables CODE8 and CODE9 are missing, need to be added

nan values in DL

I think these need to be replaced again with blanks in the next version.

inpatient operations inconsistencies Nov 2022 -> Sept 2023

Our group found that the data for inpatient operations in the DL file appears suspiciously truncated in the version from September 11, 2023. For example, there are over a million rows of data corresponding to inpatient operations in 1997 but that number changes and varies and drops radically over the years in a way that doesn’t make medical sense. For instance there are 3 rows corresponding to inpatient operations total in 2009. The previous DL version from November 2022 was more consistent and had hundreds of thousands of operations per year including in 2009.

Is this data missing from the new version, or were the operations reclassified somehow?

You can see this issue if you extract the inpatient operations from the detailed_longitudinal_2023-09-11.csv as below.

awk -F "," 'BEGIN{print "FINREGISTRYID,PVM,EVENT_YRMTH,EVENT_AGE,INDEX,SOURCE,ICDVER,CATEGORY,CODE1,CODE2,CODE3,CODE4,CODE5,CODE6,CODE7"} {if ($6 == "OPER_IN") {print}}' $DL_FILE_PATH > $OUT_FILE_PATH```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.