Giter Site home page Giter Site logo

wis2-guide's Introduction

wis2-guide's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wis2-guide's Issues

Volume C1

Suspend maintenance of Volume C1.

link discrepancy between GDC and GC

As per https://wmo-im.github.io/wis2-guide/guide/wis2-guide-DRAFT.html#_global_discovery_catalogue:

A Global Discovery Catalogue will only update discovery metadata records to replace links for dataset subscription and notification (origin) with their equivalent links for subscription at Global Broker instances (cache).

...given the lifecycle of a WCMP2 record:

  • WIS2 Node publishes WNM pointing to WCMP2 record
  • GB receives message
  • GC (subscribed to GC) stores WCMP2 record to cache
  • GDC (subscribed to GB) detects (via origin/a/wis2/+/metadata/#)
  • GDC downloads from GC
  • GDC updates links of GB and GC
  • GDC publishes updated record to its OGC API - Records API

The GC still has the original WCMP2 document with the original subscription/notification links, given the changes the GDC made to the WCMP2 document were not propagated back to the GC.

Options:

  1. have GDC notify on the updated WCMP2 record (topic TBD) so GC can update
  2. other options?

cc @antje-s @6a6d74 @golfvert

Indicate to Global Caches how long a resource should be cached for

As discussed at ET-W2AT meeting, 16-Oct-2023.

WIS2 Manual (Tech Regs) states:

"""
4.5.4 Based on received notifications, a Global Cache shall download core data from WIS nodes or other Global Caches and store for a duration compatible with the real-time or near real-time schedule of the data and not less than 24-hours.
"""

The default period for caching resources is 24-hours. But some (near) real time might need to be kept for longer based on the "real-time or near real-time schedule". Its a WMO Programme decision about how long data should be cached for, agreed with the WIS2 operators - in a process that's yet to be defined.

A mechanism is required to inform Global Caches to cache content for a duration other than 24-hours - usually longer, but perhaps shorter too.

It seems likely that this would be a per-object policy, i.e., each object that is cached should indicate the length of time it should be cached. The simplest way to do this would be to add a cache-duration attribute to the notification message.

A Global Cache may ignore the cache-duration statement, for example, if the duration was set to an excessively long period that compromised the ability of the Global Cache to perform. In "tech regs speak": the Global Cache should respect the cache-duration value.

Tom's list

Guidance for data consumers

Guidance for data consumers subscribing to notifications from Global Broker; managing duplicates vs. managing failover; use of replay service; etc.

Assignment of topics for multidisciplinary datasets

Posting a question from @masato-f29 from the NWPMetadata team. 

"Sea surface temperature can be included in weather, climate, and oceans. Should sea surface temperature data be tagged with three controlled vocabularies (CV): weather, climate, and ocean? Or should we make it exclusive and propose the additional CV at level 8?

DECISION

The topic_hierarchy will only be used to identify a channel for pub/sub. When a dataset is applicable under multiple domains, one should choose one domain (that is a best fit) and use the WCMP2 metadata record for further descriptions.

clarify GDC population at startup

The WIS2 Guide states in section 8.4.1:

A Global Cache will store a full set of discovery metadata records. This is not an additional metadata
catalogue that Data Consumers can search and browse – it provides a complete set of discovery
metadata records to support populating a Global Discovery Catalogue instance.

This ensures that a GDC can initiate itself from already published discovery metadata in the event of a catastrophe/re-deploy.

Options:

  1. have Global Caches store all discovery metadata at a known endpoint for a GDC to bootstrap itself on init
  2. re-define the above and specify that WIS2 Nodes are required to re-publish all discovery metadata on a periodic basis (weekly?)
  3. other options?

cc @golfvert @6a6d74 @efucile

Simple NC/DCPC data sharing in data-and-metadata-flows.adoc

In my understanding the simple NC/DCPC data sharing described in data-and-metadata-flows.adoc is not part of WIS2 and should be removed. Also WIS2 Nodes are not required to operate a local metadata catalogue.

If this section is intended to illustrate the concept this should be made clear explicitly.

remove section 1.2.5

ET-W2AT 2024-01-08: remove 1.2.5 which is superseded with:

1.2 What is WIS2?

1.3 Why are Datasets so important?

Make metadata a obligatory requirement

I'm not sure if this is the correct place for this issue.

As far as i recall, we decided, that metadata should be a prerequisite before transmitting data over WIS2. Somehow this requirement should be enforced. Following the discussions here
wmo-im/wis2-notification-message#25
and here
wmo-im/wis2-notification-message#31

i currently don't see a place in the protocol / message flow, where we can check if metadata is available.

We either need to decide at which place we can put such a check, or we need to agree, that we live with a strong requirement for metadata potentially allowing for data without metadata.

cc: @golfvert @antje-s @josusky @tomkralidis

Portal for WIS2

Need for a shiny portal for WIS2. The purpose is to have a platform offering:

  • a comprehensive overview of WIS 2.0 and it's architecture
  • guidance on how to discover and access data
  • multi language

ICAO and GTS headers

  • Need to decide on how to maintain and manage GTS headers during the transition

  • something for INFCOM-3

Update Competencies to be submitted to INFCOM-4

Competencies are described in the Manual only. We need to review them and to decide if they are still staying in the manual or moving to the guide. We don't want to have competencies both in the manual and the guide.

problems rendering metrics hierarchy CSV tables

@kaiwirt the metrics hierarchy CSV inclusions in sections/part2/global-services.adoc cause generation of docx to fail:

To reproduce:

asciidoctor --trace --backend docbook --out-file - index.adoc | pandoc --from docbook --to docx --output wis2-guide-DRAFT.docx
asciidoctor: WARNING: tables must have at least one body row
asciidoctor: WARNING: tables must have at least one body row
asciidoctor: WARNING: tables must have at least one body row
Invalid XML:
1010:112 (81273)-1010:122 (81283): Expected end element for: Name {nameLocalName = "emphasis", nameNamespace = Just "http://docbook.org/ns/docbook", namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "literal", nameNamespace = Just "http://docbook.org/ns/docbook", namePrefix = Nothing})
make: *** [docx] Error 44

HTML generation does not fail, however is not rendered correctly. Example: https://wmo-im.github.io/wis2-guide/guide/wis2-guide-DRAFT.html#_metrics_for_global_brokers

Global Cache behaviour for recommended data

ET-W2AT 08.01.2024: Global Caches do only care about core data and do not subscribe to or republish WNM for recommended data. There might be two issues associated with this when data users subscribe to origin/a/wis2/#

  • Download requests directly to WIS2 nodes which can not offer enough bandwidth. This might also impact the performance of Global Caches when accessing these WIS2 nodes
  • Bad user experience if WIS2 nodes limit access to their http endpoints for core data to Global Services (failed downloads).

In the pre-operational phase we need to check if this behaviour is an issue.

normalize terms

  • WIS2 Register
  • Centre (not Center)
  • centre-id
  • WIS2 Node
  • Global Services (Global Broker, Global Discovery Catalogue, Global Cache
    • Global Monitor (not Global Monitoring)
  • WIS2 or WIS 2.0
  • WTH, WNM, WCMP2
  • WIS2 Notification Message
  • WIS2 Topic Hierarchy
  • WMO Core Metadata Profile 2
  • WMO vs. WMO Secretariat
    • WMO: the entire membership + Secretariat
    • WMO Secretariat: Member and Constituent Bodies support

specify guide for report format

WIS2 users are able to publish notifications data, metadata and reports. An example of a report notification:

  • origin/a/wis2/can/eccc-msc/report

Examples:

  • GDC publishing the availability of the metadata archive (per #9 (comment))
  • Global Services providing notifications about metrics in support of Global Monitoring
  • other examples?

Areas for clarification:

  • when a message is published to this topic, should the payload be WNM, or allow / extend?
  • if WNM, then we determine guidance in the WNM issue tracker proper
  • if not WNM, provide direct guidance in the guide

Access control of recommended data

A data provider can implement access control for recommended data. Do we have a requirement to allow only a limited number of mechanisms to simplify the task of a user who needs to subscribe to different streams of recommended data?

approval of centre-id between fast-track cycles

Once WIS2 enters the operational phase in 2025, the WIS2 Topic Hierarchy will be managed via the WMO Fast Track process.

At the November W2AT face-to-face meetings, an experimental level was added for all domains to use for provisional domain topics in support of publishing data between FT cycles.

While this provides utility for new domain topics between FT cycles, we need to consider how to register new centre-id's as well (which are managed at a higher level [without an experimental option]).

cc @golfvert @6a6d74 @efucile

WIS2 Node broker links exposure in WCMP2 via GB subscriptions

cc @antje-s

Currently, when WCMP2 metadata is published, the workflow is that the GDC transposes/injects GB URLs for WCMP2 links (origin -> cache) which are bound to WIS2/MQTT/WTH subscription/download. A link is added foreach GB that is part of the WIS2 global services ecosystem. The "origin" link is never part of the resulting WCMP2 record in the GDC.

There is an edge case that one subscribing to the GB with topic .../metadata/..., will receive the "raw" WCMP2 document, which has the "origin" links.

A WIS2 Node may or may not want their MQTT endpoint made available.

In summary, the GDC updates the WCMP2 record links per above, but the GB does not. Should it?

@6a6d74 @golfvert @kaiwirt

Additional work needed on "Registration and removal of a dataset" and "Connecting with global services" sections

Further review is needed on these sections to confirm how the process will work.

@klsheets comments:

The things you point out regarding the the GDC and authenticating metadata and the role of GISCs are the items I struggled most to document when drafting these edits. I was largely following the workflows diagrams we reviewed in November, provided by Remy and Annexes to the meeting report. Which brings up and additional question - should those diagrams be linked or included as an Annex in the guide?

(original updates were in PR #62)

add faceting specification to Global Discovery Catalogue

As mentioned in TT-WISMD 2023-05-05 by Tom GDC should use Facets.
Faceting offers filter options for search result lists. Due to increasing metadata amount it is important to have additional possibilities to reduce items of a search result list that a user has to pass. Faceting is based on indexed fields as categories and all different values contained in the search result list items are offered as filter options.
Example:
Starting a full text search for "temperature" in our WIS Portal under https://gisc.dwd.de and set facet filter "Originator" (under "Filter by:") to "NC Mali" reduces the result list from 63667 matches to 26.
All indexed fields could be used as facets.
Different implementations can point out advantages and disadvantages in practice over time, so in my view we should minimize requirements.
In the WCMP2 also for this reason, fixed word lists should be used if possible to avoid spelling variants, typos, different words for the same content, as these lead to different filter options and unnecessarily lengthen the filter option lists

Data Packages

WIS2 employs a one message one file policy. However we want to open the discussion if WIS2 should also allow the creation of packages.

One thing to consider is, that one message one file leads to a lot of connections from a single source each downloading one observation. This might trigger Denial of Service Prevention Systems or other cyber security technologies.

What i would like to propose is to allow for observations (or other items) to be stored in archives and these archives could then be ingested into WIS2 using for example ZIP or gzip as data format.

The WIS2 Guide would then need to be updated to specify the allowed packaging methods (given that ZIP and gzip are also widely accepted standards).

Approval procedures of Global Services and nodes

Need to determine what the procedures are for approving the implementation of WIS2 Global Services and WIS2 nodes to ensure that they are compliant to the WIS2 regulations.
Notes, this is separate from the approval of NCs, DCPCs and GISCs.

Decision on publication_datetime on cached data

What is the expected behaviour of publication_datetime on messages from the cache?

  • Keep original value (would allow to determine round-trip time)
  • Replace with the time when the message was (re-(published by the cache

Add examples to WIS2 Guide indicating how attribution / other license should be expressed in the WIS2 Discovery Metadata

In support of Data Policy implementation, SC-IMT has tasks relating to attribution of data:

  1. Encourage honouring of attribution [for Core data] (data policy implementation task 5)
  2. Urge transparent conditions for use of recommended data (data policy implementation task 7)
  3. Urge respect of conditions for use of recommended data (data policy implementation task 8)

The WIS2 Guide seems like the right place to include good practice on these tasks - both in terms of how the attribution clause / license are expressed in discovery metadata, and reminding (?) users of their obligations wrt conditions of use (e.g., showing how to cite WMO data).

Prevent caching of large datasets in Global Caches

Centres like ECMWF or Eumetsat will provide very large amount of core data that nevertheless should not (or may not) be stored in the Global Cache.
In the current approach, there is no mechanism to prevent those kind of data not to end up in the cache.
Right now all data published using messages in the topics origin/a/wis2/country/centre-id/core/# will end up being caught by Global Cache and copied.
We should define a way to prevent that default behaviour.

updated: 31 May 2023

===DECISIONS

  1. for core data, add properties.cache (true|false, default=true) to the notification message as decided by data producer
  2. if properties.cache: false the Global Cache SHALL:
    • Not download the data made available using this message
    • Publish the Notification in cache/a/wis2/... (similarly to the data being cache) with the properties.links not modified (the link will still point to the data producer endpoint)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.