WIS2 Guide
Working draft is made available at https://wmo-im.github.io/wis2-guide.
WIS2 Guide
Home Page: https://wmo-im.github.io/wis2-guide
License: Apache License 2.0
WIS2 Guide
Working draft is made available at https://wmo-im.github.io/wis2-guide.
Images in the section 2.8.1.1 on WIS2-SWIM interoperability are currently non-editable (see PR #86) ... we need to swap these for editable versions.
Suspend maintenance of Volume C1.
As per https://wmo-im.github.io/wis2-guide/guide/wis2-guide-DRAFT.html#_global_discovery_catalogue:
A Global Discovery Catalogue will only update discovery metadata records to replace links for dataset subscription and notification (origin) with their equivalent links for subscription at Global Broker instances (cache).
...given the lifecycle of a WCMP2 record:
origin/a/wis2/+/metadata/#
)The GC still has the original WCMP2 document with the original subscription/notification links, given the changes the GDC made to the WCMP2 document were not propagated back to the GC.
Options:
Include information about one-to-one relationship between a TH and a metadata record.
#38
Transcribe the WIS2 Guide into AsciiDoc with reproducible workflow.
As discussed at ET-W2AT meeting, 16-Oct-2023.
WIS2 Manual (Tech Regs) states:
"""
4.5.4 Based on received notifications, a Global Cache shall download core data from WIS nodes or other Global Caches and store for a duration compatible with the real-time or near real-time schedule of the data and not less than 24-hours.
"""
The default period for caching resources is 24-hours. But some (near) real time might need to be kept for longer based on the "real-time or near real-time schedule". Its a WMO Programme decision about how long data should be cached for, agreed with the WIS2 operators - in a process that's yet to be defined.
A mechanism is required to inform Global Caches to cache content for a duration other than 24-hours - usually longer, but perhaps shorter too.
It seems likely that this would be a per-object policy, i.e., each object that is cached should indicate the length of time it should be cached. The simplest way to do this would be to add a cache-duration
attribute to the notification message.
A Global Cache may ignore the cache-duration
statement, for example, if the duration was set to an excessively long period that compromised the ability of the Global Cache to perform. In "tech regs speak": the Global Cache should respect the cache-duration
value.
Guidance for data consumers subscribing to notifications from Global Broker; managing duplicates vs. managing failover; use of replay service; etc.
In GTS the retention is at least 3 months. The WIS2 node is expected to replace the AMSS, we need to clarify the data retention on WIS2 node
Posting a question from @masato-f29 from the NWPMetadata team.
"Sea surface temperature can be included in weather, climate, and oceans. Should sea surface temperature data be tagged with three controlled vocabularies (CV): weather, climate, and ocean? Or should we make it exclusive and propose the additional CV at level 8?
The topic_hierarchy will only be used to identify a channel for pub/sub. When a dataset is applicable under multiple domains, one should choose one domain (that is a best fit) and use the WCMP2 metadata record for further descriptions.
The WIS2 Guide states in section 8.4.1:
A Global Cache will store a full set of discovery metadata records. This is not an additional metadata
catalogue that Data Consumers can search and browse – it provides a complete set of discovery
metadata records to support populating a Global Discovery Catalogue instance.
This ensures that a GDC can initiate itself from already published discovery metadata in the event of a catastrophe/re-deploy.
Options:
In my understanding the simple NC/DCPC data sharing described in data-and-metadata-flows.adoc is not part of WIS2 and should be removed. Also WIS2 Nodes are not required to operate a local metadata catalogue.
If this section is intended to illustrate the concept this should be made clear explicitly.
ET-W2AT 2024-01-08: remove 1.2.5 which is superseded with:
1.2 What is WIS2?
1.3 Why are Datasets so important?
I'm not sure if this is the correct place for this issue.
As far as i recall, we decided, that metadata should be a prerequisite before transmitting data over WIS2. Somehow this requirement should be enforced. Following the discussions here
wmo-im/wis2-notification-message#25
and here
wmo-im/wis2-notification-message#31
i currently don't see a place in the protocol / message flow, where we can check if metadata is available.
We either need to decide at which place we can put such a check, or we need to agree, that we live with a strong requirement for metadata potentially allowing for data without metadata.
In relation to wmo-im/wis2-notification-message#6 (comment), assess notification message inline content size (proposed 4096 bytes) over course of pilot phase.
We need to clarify QoS parameters as well as retention for brokers (WIS2 Node/Broker, Global Broker).
cc @golfvert @josusky
Decision 26/06/2023
Need for a shiny portal for WIS2. The purpose is to have a platform offering:
It would be valuable to have a clear set of requirements for a WIS2 Node in the Guide.
The section in the Guide for data publishers should be updated to include a request for "big" data providers to prioritize which of their Core data should be distributed via the Global Cache and which they will serve directly themselves, to limit the GC volume to the agreed daily amount (e.g., 100GB)
Need to decide on how to maintain and manage GTS headers during the transition
something for INFCOM-3
As WIS2 specifications evolve over time, we need to consider change management (how long to support previous versions, timing, coordination).
Competencies are described in the Manual only. We need to review them and to decide if they are still staying in the manual or moving to the guide. We don't want to have competencies both in the manual and the guide.
@kaiwirt the metrics hierarchy CSV inclusions in sections/part2/global-services.adoc
cause generation of docx to fail:
To reproduce:
asciidoctor --trace --backend docbook --out-file - index.adoc | pandoc --from docbook --to docx --output wis2-guide-DRAFT.docx
asciidoctor: WARNING: tables must have at least one body row
asciidoctor: WARNING: tables must have at least one body row
asciidoctor: WARNING: tables must have at least one body row
Invalid XML:
1010:112 (81273)-1010:122 (81283): Expected end element for: Name {nameLocalName = "emphasis", nameNamespace = Just "http://docbook.org/ns/docbook", namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "literal", nameNamespace = Just "http://docbook.org/ns/docbook", namePrefix = Nothing})
make: *** [docx] Error 44
HTML generation does not fail, however is not rendered correctly. Example: https://wmo-im.github.io/wis2-guide/guide/wis2-guide-DRAFT.html#_metrics_for_global_brokers
ET-W2AT 08.01.2024: Global Caches do only care about core data and do not subscribe to or republish WNM for recommended data. There might be two issues associated with this when data users subscribe to origin/a/wis2/#
In the pre-operational phase we need to check if this behaviour is an issue.
to update the guide to WIS
ET-W2AT 2024-01-08:
Continuing from: #7 (comment):
BTW - I'm assuming that the Global Discovery Catalogue adds an actionable link for the associated cache/a/wis2/... topic at every Global Broker? This is the way that data consumers find out where they can subscribe - I'm expecting this to be a list of places, one or more of which is relevant for them.
centre-id
WIS2 users are able to publish notifications data, metadata and reports. An example of a report notification:
origin/a/wis2/can/eccc-msc/report
Examples:
Areas for clarification:
A data provider can implement access control for recommended data. Do we have a requirement to allow only a limited number of mechanisms to simplify the task of a user who needs to subscribe to different streams of recommended data?
Once WIS2 enters the operational phase in 2025, the WIS2 Topic Hierarchy will be managed via the WMO Fast Track process.
At the November W2AT face-to-face meetings, an experimental
level was added for all domains to use for provisional domain topics in support of publishing data between FT cycles.
While this provides utility for new domain topics between FT cycles, we need to consider how to register new centre-id
's as well (which are managed at a higher level [without an experimental option]).
What metrics will the Global Monitor collect/report on w.r.t. the GDC? At the moment we can provide KPI reporting based on pywcmp, which provides a (JSON) report of how a WCMP2 record scores against the KPI matrix. As well, the result of #10 may also provide valuable GDC metrics.
To ensure that WIS2 is operating "as expected" we need some objective performance measures (KPIs); e.g., up-time, response time, data availability etc.
cc @antje-s
Currently, when WCMP2 metadata is published, the workflow is that the GDC transposes/injects GB URLs for WCMP2 links (origin
-> cache
) which are bound to WIS2/MQTT/WTH subscription/download. A link is added foreach GB that is part of the WIS2 global services ecosystem. The "origin" link is never part of the resulting WCMP2 record in the GDC.
There is an edge case that one subscribing to the GB with topic .../metadata/...
, will receive the "raw" WCMP2 document, which has the "origin" links.
A WIS2 Node may or may not want their MQTT endpoint made available.
In summary, the GDC updates the WCMP2 record links per above, but the GB does not. Should it?
Given data providers publish discovery metadata to the .../metadata
topic, a GDC SHALL validate the WCMP2 against the topic to which it was published.
cc @Amienshxq
Further review is needed on these sections to confirm how the process will work.
@klsheets comments:
The things you point out regarding the the GDC and authenticating metadata and the role of GISCs are the items I struggled most to document when drafting these edits. I was largely following the workflows diagrams we reviewed in November, provided by Remy and Annexes to the meeting report. Which brings up and additional question - should those diagrams be linked or included as an Annex in the guide?
(original updates were in PR #62)
As mentioned in TT-WISMD 2023-05-05 by Tom GDC should use Facets.
Faceting offers filter options for search result lists. Due to increasing metadata amount it is important to have additional possibilities to reduce items of a search result list that a user has to pass. Faceting is based on indexed fields as categories and all different values contained in the search result list items are offered as filter options.
Example:
Starting a full text search for "temperature" in our WIS Portal under https://gisc.dwd.de and set facet filter "Originator" (under "Filter by:") to "NC Mali" reduces the result list from 63667 matches to 26.
All indexed fields could be used as facets.
Different implementations can point out advantages and disadvantages in practice over time, so in my view we should minimize requirements.
In the WCMP2 also for this reason, fixed word lists should be used if possible to avoid spelling variants, typos, different words for the same content, as these lead to different filter options and unnecessarily lengthen the filter option lists
WIS2 employs a one message one file policy. However we want to open the discussion if WIS2 should also allow the creation of packages.
One thing to consider is, that one message one file leads to a lot of connections from a single source each downloading one observation. This might trigger Denial of Service Prevention Systems or other cyber security technologies.
What i would like to propose is to allow for observations (or other items) to be stored in archives and these archives could then be ingested into WIS2 using for example ZIP or gzip as data format.
The WIS2 Guide would then need to be updated to specify the allowed packaging methods (given that ZIP and gzip are also widely accepted standards).
Need to determine what the procedures are for approving the implementation of WIS2 Global Services and WIS2 nodes to ensure that they are compliant to the WIS2 regulations.
Notes, this is separate from the approval of NCs, DCPCs and GISCs.
What is the expected behaviour of publication_datetime on messages from the cache?
clarify QoS parameters as well as retention for brokers (WIS2 Node/Broker, Global Broker).
In support of Data Policy implementation, SC-IMT has tasks relating to attribution of data:
The WIS2 Guide seems like the right place to include good practice on these tasks - both in terms of how the attribution clause / license are expressed in discovery metadata, and reminding (?) users of their obligations wrt conditions of use (e.g., showing how to cite WMO data).
Centres like ECMWF or Eumetsat will provide very large amount of core data that nevertheless should not (or may not) be stored in the Global Cache.
In the current approach, there is no mechanism to prevent those kind of data not to end up in the cache.
Right now all data published using messages in the topics origin/a/wis2/country/centre-id/core/#
will end up being caught by Global Cache and copied.
We should define a way to prevent that default behaviour.
updated: 31 May 2023
===DECISIONS
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.