Giter Site home page Giter Site logo

big-life-lab / phes-odm Goto Github PK

View Code? Open in Web Editor NEW
54.0 54.0 18.0 29.93 MB

The Public Health Environmental Surveillance Open Data Model (PHES-ODM, or ODM). A data model, dictionary and support tools for environmental surveillance.

License: Creative Commons Attribution Share Alike 4.0 International

R 24.96% Python 75.04%
covid-19 covid19 data-dictionary metadata naming-conventions odm sars-cov-2 vocabulary wastewater

phes-odm's People

Contributors

andreia3aral avatar dougmanuel avatar emerc079 avatar gaurisaran avatar himeshis avatar hswerdfe avatar jeandavidt avatar martinwellman avatar mathew-thomson avatar nielsnicolai avatar rvyuha avatar vipileggi avatar wyusuf068 avatar yisheikh avatar yulric avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phes-odm's Issues

pooled samples

pooled samples can be handled in the current model with the addition of a field for the parent sample. all samples that went into the one sample would be labeled with the same ID.

Site data should have optional start date and end date fields

Site metadata should have optional start date and end date fields (which would allow multiple lines for the same site) in case sampling characteristics change over time (e.g., sampleType or assayMethod). This allows linkage between site data and sample data to indicate that different observations correspond to different sampling methods.

Sample collection terminology

The terminology that is currently being used for the sample collection method is not very clear to our research group. There seems to be a distinction between grab, continuous, discrete and integrated. Are the latter 3 all different composite samples, but what is there exact difference (any reference)? We are used to the terms grab sample, composite sample - time proportional, and composite sample - flow proportional.

  • grab sample: Sample was a simple grab sample
  • contFlowProp: Continuous flow proportional
  • contConstant: Continuous constant time proportional?
  • contOther: Continuous other
  • discTimeProp: Discrete time proportional
  • discTimeProp24hq1h: Discrete time proportional 24-hour composite, every 1 hr (make this more general)
  • discTimeProp24hq4h: Discrete time proportional 24-hour composite, every 4 hr
  • discTimeProp24hq6h: Discrete time proportional 24-hour composite, every 6 hr
  • discFlowProp: Discrete flow proportional
  • discVolumeProp: Discrete volume proportional
  • discOther: Discrete other
  • integratedOther: Integrated other

measurements taken on the same sample on different dates

Currently the measurements that are taken on different dates do not fit into the suggested "wide" format of the table.
They do fit into the long format, but that is less intuitive.
as an example NML records some information about the state of the sample when it arives, then typically will process the sample the next day.

Add ability to record a binary measurement

Add ability to record a binary measurement
Like covid detected vs not detected, like in the dipstick or pregnancy style tests that are under development.

suggest adding a field
isDetected to the Measurement table, and LOQ and LOD from the assayMethod table would help quantify its level for a given test.

Create a working group

Create a working group to help development.

Include people from:

  • different jurisdictions: provinces
  • International representation
  • lab settings: private, university/not-for-profit, national
  • disciplines representing lab, public health, epidemiology/modelling, etc.

pH COD Temp

Add some WWTP parameters. pH, COD, Temp of ww

how to record standard WWTP params

how do we record standard WWTP parameters, like

  • flow rate
  • TSS
  • BOD

two options
1 in the measurements table change genregion to be measurementCategory and add some options to the possibilities like this.

  • measurementCategory:
    • covid-unspecified (default)
    • covid-N1
    • covid-N2
    • covid-N3
    • covid-E
    • covid-RdRp
    • covid-N1N2avg
    • ww-param-flow
    • ww-param-tss
    • ww-param-Bod

LOD

LOD should be added to the data model.

this goes in AssayMethod.

assayMethod.version

Note that if you would update the assayMethod.version field (because of and improvement made to the assay and thus leading to a new version), all measurements that were performed in the past will also be linked to the updated version of the assay method. And thus they will loose their history. In fact the same applies to AssayMethod.date, Lab.updateDate

Moreover, because the AssayMethod table does not have a name field, it is currently imposible to see that two versions of the same assay method exist.

Inhibition

Inhibition should be added to the data model.

this either goes in AssayMethod or Measurement.

LOQ

LOQ should be added to the data model

this should be added to the AssayMethod table

methodCollection discrete time prop

methodCollection: method used to collect the data.
discTimeProp: Discrete time proportional
discTimeProp24hq1h: Discrete time proportional 24-hour composite, every 1 hr
discTimeProp24hq4h: Discrete time proportional 24-hour composite, every 4 hr
discTimeProp24hq6h: Discrete time proportional 24-hour composite, every 6 hr

The trigger time will vary. For our hospital work it was was every 15 minutes. I'm not sure what is going on with NWT, as they are handling that. Also, the total integration time varies, in NWT is can be 48, to 72 hours. So there needs to be some plasticity here.

Create an online form that generates a template for the lab

Create a form for data provider intake.

This form should :

  1. Ask a few quick questions about site location, default sampling, measurements that are taken,
  2. Then it should save this tombstone information directly to the DB, or other ()
  3. Output an excel file with a single sheet that is a template for that lab
  4. tell them we will be in touch to arrange file drop

Site and Sample type

I noticed that the options for Site.type (where the sample was taken) and Sample.type (what kind of sample is taken) are rather limited in the current version. Also, both fields are currently very dependant on each other.

Site.type

Following an internal discussion, I would suggest to break down Site.type based on how far the site is removed from the source. As also proposed in this report that was recently published as by the UK governement.

1. Near-source

  • airplane
  • correctionalFacility
  • elementrarySchool
  • hospital
  • longTermCareFacility
  • sewageTruck
  • universityCampus
  • other

2. Major sewer pipeline

3. Intermediate sewer infrastructure

  • pumpingStation
  • holdTank
  • retentionPond
  • other

4. Wastewater treatment plant

  • sanitaryMunicipal
  • combinedMunicipal
  • industrial
  • lagoon
  • other

5. Environment

  • river
  • lake
  • estuary
  • sea
  • ocean
  • other

Sample.type

A more complete classification for the Sample.type could look somewhat like this

  • sewerSediment
  • rawWastewater
  • wwPostGrit
  • primarySludge
  • primaryEffluent
  • secondarySludge
  • secondaryEffluent
  • water
  • faeces
  • other

concentration measurements replicates vs mean/SD

From a data analysis an statistical point of view, we would greatly benefit from having the concentration measurements of all the replicates rather than a summary statistic (e.g. mean, std dev). This "issue" is a formal request to have replicates measurements as the minimal data set. Thanks!

add a lot of notes fields.

The ‘notes’ field in the ‘Site’ data group can also be added to each of the other data groups to capture any additional ‘site-sample specific’ information.
Vince

DefaultAssayID

working on the transformation script and I realized that the same lab that measures Viral copies per ml may also measure temperature of the sample. the defaultAssay, can't apply to temperature of sample.

So we would need another table, linking lab with measurement type, might look like this

LabDefaults

  • labID
  • measureCat
  • measureUnit
  • measureType
  • defaultAssay

suggest reporting both the PMMoV Ct value and PMMoV copies/X amount of sludge

Received this comment

The amount of sample loaded in qPCR plates might vary from lab to lab and so reporting only PMMoV Ct value when working with sludge will not allow quick comparison between studies (one would need to look into individual methods and adjust all data). If comparison of pepper level in sludge is something of interest, I would suggest reporting both the PMMoV Ct value and PMMoV copies/X amount of sludge to allow for direct comparison.

I need to think about if we are currently able to do this.

Line level case data, could be added

Line level case data, could be added to the data model with an additional table.
caseID case-id of the reporter.
reporterID
postalCode_3_6
age
sex
onsetDate
repordedDate
recoveryDate
DeathDate

Propose rotating the public health Information Details table

I wonder if we should rotate the CovidPublicHealthData this would allow for different reporters of various information like hospital admition numbers and cases counts:
new proposed table would be

CovidPublicHealthData

  • publicHealthID: (Primary key) Unique identifier for the table.

  • ReporterID ID of the reporter who gave this data

  • PolygonID: Links with the Polygon table (foreign key).

  • date: Date of covid-19 measure.

  • dateType: Type of date used.

    • episodeDate : Episode date is usually just the earliest of a list of dates available as not every case has every date
    • onsetDate: Earliest that symptoms were reported for this case
    • reportedDate: Date that the numbers were reported publicly
  • publicHealthMeasuretype: Type of date used.

    • confirmed : Number of confirmed cases.
    • active:Number of active cases.
    • tests:Number Number of tests.
    • positiveTests: Number of positive tests.
    • percentPositivityRate Percent positivity rate.
    • hospitalCensus Hospital census or the number of people admitted with covid-19.
    • hospitalAdmit Hospital admissions or patients newly admitted to hospital.
  • publicHealthMeasureValue: numeric

  • notes: notes

Allow access to data

Currently, all data access permisions are part of the reporter. This means that whenever a single reporter wants to share both data that is (i) visible to all (ii) and data that has access limitations, multiple instances of the same reporter will need to be created in the database. Which does not seem to be very efficient. Would it therefore not be better to have all the allowAcces fields on the level of the measurement (or sample)?

Add UUID for long table

measureID is used for multiple measures (i.e. gene regions) performed at the same time. However, we also need a unique id for long table. Add UUID for long table unique ID.

Use cases:
Use a new measureID when a lab repeats analyses to confirm or perform QA.
Use UUID only for long tables. UUID is not needed when using a wide table format.

Create UUID as a sequential created from measureID. e.g. measureID = 100,
then UUID = 100.1, 100.2 ....

what is and is not a unit?

creating a template for Chand at NML.
He was telling me that PMMV is not a unit, so we might need to change that. These are the headers we found in his header, where headers are of the form

measureCat_measureUnit_measureType

  • Measurement.covidN1_cycles_singleton_1
  • Measurement.covidN1_cycles_singleton_2
  • Measurement.covidN2_cycles_singleton_1
  • Measurement.covidN2_cycles_singleton_2
  • Measurement.mhv_cycles_singleton_1
  • Measurement.mhv_cycles_singleton_2
  • Measurement.PMMV_cycles_singlton_1
  • Measurement.PMMV_cycles_singlton_2
  • Measurement._notes
  • Measurement.covidN1_ml_singleton_1
  • Measurement.covidN1_ml_singleton_2
  • Measurement.PPMV_ml_singleton_1
  • Measurement.PPMV_ml_singleton_2

The ones of the machine are in units of cycles then once he normalizes by a standard curve they are in copies per ml.

So in the end the issue is move ppmv out of measureUnit and into measureCat

Surrogate Recovery

Surrogate Recovery should be added to the data model.

this either goes in AssayMethod or Measurement.

ETL - script wide to long

Create a generic ETL tool that transforms forms any data from the generic wide structure to the generic long format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.