Giter Site home page Giter Site logo

nerc-ceh / observation-management-system Goto Github PK

View Code? Open in Web Editor NEW
1.0 5.0 1.0 3.19 MB

Management system for processing observation data from raw state to differing levels of data products, then making these products available to users.

License: GNU General Public License v3.0

R 0.08% Scala 7.27% HTML 90.70% Web Ontology Language 1.77% Python 0.18%

observation-management-system's Introduction

Observation Management System

Introduction

The purpose of this project is to provide a management system that can work with observations generated by any of the CEH internal sensor networks, or other observation generating processes such as sampling, chemical and biological analysis, or model output. The goals that fall under this main purpose include:

  • storing observations with the semantic data necessary to support the OGC O&M standard
  • real-time quality-control (QC) checks to generate qualitative and quantitative meta-data regarding the quality and uncertainty for every observation
  • real-time model execution to create derived data and forecasts as the observations necessary for input arrive
  • real-time alerts and warnings based on observations, model output, and forecasts based on pre-defined and adaptive criteria

This system, while designed with the O&M standard in mind, will not produce the functionality necessary to support SOS calls for observation data. A catalogue and higher level software will take care of that side of things, and wrap access to this system.

More information on the different areas of this project can be found in their respective documentation, listed here:

Technologies

There are three main technologies this project builds upon:

  • Apache Kafka
  • Apache Flink
  • Apache Cassandra

Kafka is the message-queue software that is used to logically store data between processing bolts and before entry into the database. Cassandra is the persistent storage used to store the observation data and the processed data. Flink is the processing framework that was chosen over Apache Storm and Apache Spark (due to needed capabilities, best summed up here. While at present Spark appears to have better distributed ML libraries, there are many third-party Scala libraries that can make up this deficit. For the aggregation from multiple networks and potential two-way communication, Apache Nifi appears the best choice.

Related Software Not Used

  • Prometheus
  • Graphite (and the more relevant Cyanite) + Grafana
  • InfluxDB
  • openTSDB
  • KairosDB
  • ElasticSearch + Kibana
  • OpenMCT

Types of Data

Raw Observation Data

Raw observation data, in the context of this system includes: sensor data, abstract procedure generated data such as chemical analysis of a sample, manual measurements and samples, and data of a similar nature. It also includes derived observations generated outside of this system. For example, the HOBO temperature and relative humidity sensor on the Lake Observation Platforms generate observations for dew point temperature, which is derived from the sensed temperature and relative humidity observations. As this is not generated within the management system, it is classed as raw observation data and not derived data.

Derived Data

Derived data in this context is any observation or data generated by the management system. This can take the form of derived observation data, such as the thermocline depth observation which is generated from input of observations sensed by the stratified PRT chain. It can also take the form of process output such as QC checks, forecasts, and the aggregation of observation data to hourly and daily mean observations.

The distinction between observation data and derived data is important in the rationale behind the persistence and backup choices on different Kafka queues. A distinction is also made between short-lived and long-lived derived data, where short-lived data has a TTS value set and long-lived data is held indefinitely.

Long-Lived

Derived data products such as the hourly and daily observation aggregates, and their extended interpolated representations are examples of long-lived derived data. These are examples of derived data which would be of use to users wishing to work on a higher temporal aggregation than the raw observations allow, or who may need a full series of observations (interpolated) rather than the original which may have missing values. Another example of long-lived derived data is that of the QC check observations. These observations are of interest for analysis of potential issues of a sensor, and allow users of the data to better understand the context of an observation.

Short-Lived

Short-lived data refers to derived data which has a short time-frame of interest, such as forecasts or certain model outputs. For example, a forecast generated on a Monday for the following Tuesday to Friday becomes less interesting by the Saturday, and the need to keep the output past the period of interest becomes questionable when it can be reconstructed at will. If there is any criteria or checks on the forecast, it is conceivable that these may be better to keep. For short-lived data a TTL value is set within Cassandra.

Data Flow

TBC.

Semantic Annotation, Data Persistence

TBC.

QC

TBC.

Aggregation

TBC.

observation-management-system's People

Contributors

ogladr-kjarr avatar

Stargazers

Gaudissart Vincent avatar

Watchers

James Cloos avatar Jonathan Cooper avatar Rod Scott avatar David Roy (CEH) avatar  avatar

Forkers

mindis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.