CFDE Biomarkers partnership project
For the Biomarkers project, the UNM-IDG Team is developing a dataset of clinically relevant molecular biomarkers, using the Cerner HealthFacts and RealWorldData databases, containing deidentified EHR data, including LOINC codes for laboratory tests. Named entity recognition (NER) associates LOINC terms with biomolecules and particularly genes and proteins, an initial focus of this study.
- Download LOINC db from loinc.org
- relatednames_table.py - Split Loinc.csv relatenames2 column to create separate table.
- Go_loinc_DbCreate.sh - Build PgSql db from Loinc.csv and relatename.tsv.
- Go_loinc_GetData.sh - Query db for chemicals with names, relatednames.
- Go_loinc_NER_tagger_gene.sh - NER for genes using JensenLab Tagger.
- Go_loinc_NER_leadmine_gene.sh - NER for genes using NextMove Leadmine.
- Go_hf_labs.sh,
- hf_lab_loinc_counts.sql - Query Cerner db for labs.
- biomarkers_loinc_hf.Rmd
- Generate list of clinically relevant molecular biomarker candidates.
- Count encounters and patients for all LOINC codes (chemical).
- Group lab procedures into list; aggregate on LOINC codes.
- Group protein synonyms; aggregate on LOINC codes.
- Sort LOINC codes by occurence, as a proxy for clinical relevance.