Comments (32)
I am afraid that this can start very complex discussions. Example: precipitation is hydrology and weather. Does this mean that we publish precipitation observations on two topics? We should not build too much around topic and make use of the discovery metadata to inform different communities. I think that this needs to be addressed at the WCMP2 level, not in the topic.
Publishing on different topics the same data can increase the complexity in an unsustainable way.
from wis2-guide.
DECISION
The topic_hierarchy will only be used to identify a channel for pub/sub. When a dataset is applicable under multiple domains, one should choose one domain (that is a best fit) and use the WCMP2 metadata record for further descriptions.
I updated the decision in the first issue comment to reflect the general consensus of this group.
from wis2-guide.
Having a multidisciplinary definition may lead to providers dumping any data in question to this topic?
The lesser evil here could be publishing multiple messages with the same properties.data_id
.
So if a data granule is published that applies to 3 topics, then:
- 3 messages sent
- each message has the identical
properties.data_id
In this manner, we are able to ensure deduplication. This would, however, require WCMP2 properties.wmo:topicHierarchy
to support multiple topics (which would need an update, which is fine).
from wis2-guide.
I suggest to wait before implementing anything! This is still under discussion. As explained in my comment above, I have the feeling (I might be wrong though) that we are taking the topic hierarchy discussion on the wrong path. It may eventually look like a second-class metadata record. We should focus on the official metadata for this kind of information.
from wis2-guide.
@wmo-im/tt-nwpmd This will probably be the guidance:
If a dataset is multidisciplinary in nature, then choose the best fit for the TH. Think of the TH as a key or identifier for notifications on the cache with some basic meaning, but not a full description. More descriptions about other relevant disciplines will go into the WCMP2 metadata record for that dataset. Currently, the TT-WISMD is considering the best approach for this. Please see this comment in issue # 101 wmo-im/wcmp2#101 (comment).
from wis2-guide.
TT-WISMD 2023-04-12:
- add a multidisciplinary token/value?
- can be put forth in WCMP2 as themes/concepts
- should have a single topic for subscription
- needs more discussion
from wis2-guide.
can a blob of data be registered under several hierarchy? i.e. multiple times
from wis2-guide.
The TT-NWPMD meeting (17.04.2023) suggested that it would be more helpful to allow the datasets be made available, subscribed and notified under several (multiple) hierarchy topics.
- add a multidisciplinary token/value?
TT-NWPMD would also like to seek further clarification how the idea of multidisciplinary token/value will work.
I had a quick chat with @amilan17, if I understood correctly, the idea was that to add to Level 8 (now the 7 Earth system domains) a new controlled vocabulary for "multidisciplinary".
from wis2-guide.
The current design is, that the data_id is uniquely identifying the data granule. If a cache is receiving three messages in three topics with the same data_id, then the behaviour is, that the cache is downloading the data once, republishes the corresponding message and drops the other two messages as duplicates.
from wis2-guide.
If we go for this option, multiple messages in the currently existing domains, with the same data_id, then the Global cache should republish the message in each topic hierarchy but only download once. Is that doable ?
How to have this behaviour will keeping the current anti duplication of downloads ?
We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow filtering
by subscribers. It is not meant either to show that this data is useful for X and not Y. Again, this is the purpose of the metadata record.
It (sort of) reminds of the other discussion (wmo-im/wis2-topic-hierarchy#30), it looks to me that we are overcharging the meaning and purpose of the topic hierarchy.
I therefore wonder if both "requirements" are consistent with the currently agreed purpose of the topic hierarchy.
We don't want the topic hierarchy to become the new TTAAii of WIS2 ;)
from wis2-guide.
We can implement this change if this is agreed on. However i am not sure this is a good solution. Having different messages in different topics for the same data only increases the number of messages with no additional benefit in my opinion.
If data "fits" into several topics, then i would prefer having a decision on which is the correct topic for that data instead of just sending out multiple messages.
from wis2-guide.
We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow
filtering
by subscribers. It is not meant either to show that this data is useful for X and not Y. Again, this is the purpose of the metadata record.
It (sort of) reminds of the other discussion (#30), it looks to me that we are overcharging the meaning and purpose of the topic hierarchy.
@golfvert the readme of this repository states that:
"The WIS2 topic hierarchy provides a central classification and categorization scheme used by data providers and WIS2 Global Services in support of core WIS2 workflows: publish, discover, subscribe and download."
from wis2-guide.
I don't think that what I wrote contradicts that statement. The data is classified and categorized. We haven't a unique central topic with everything. Then, the question is how far this topic hierarchy should be sufficient to identify the data.
I have the feeling that we might push this too far...
from wis2-guide.
Proposed decision for the @wmo-im/tt-wismd:
- Only one Topic Hierarchy (w/ one discipline) is allowed in the metadata record.
- One should not publish the same message under multiple disciplines.
Noting, that the primary purpose of the TH is to identify the channel for the publication of notifications and to subscribe to notifications from that channel. For broader discovery of other relevant disciplines these should be listed as themes in the metadata record. - Also noting that the TT-WISMD, decided to remove the topic hierarchy from the data_id in the notification message.
from wis2-guide.
@sebvi wants a real-life use case to understand
from wis2-guide.
related to: wmo-im/wcmp2#94
from wis2-guide.
Following the discussion above I struggle to see what I could use the topicHierarchy for. I thought it was for filtering, but if multiple hierarchies are not permitted it won't serve the purpose for many datasets. Furthermore I don't understand the comment above:
We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow filtering by subscribers. It is not meant either to show that this data is useful for X and not Y. Again, this is the purpose of the metadata record.
I thought is was part of the metadata (https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html#_topic_hierarchy) and the elements in this is what we use for filtering relevant information. For cryosphere many of the datasets could also be published using weather or climate etc and then you will have to resort to Properties/Themes to really sort what is relevant and just ignore TopicHierarchy. So given the ambiguity for many datasets I am struggling to see the use case for TopicHierarchy.
from wis2-guide.
Hi @steingod,
this might be a terminology issue. The word "topic" has a specific meaning in publish-subscribe protocols. It is an identifier needed to create a subscription. Having multiple topics for one dataset would be confusing - should I subscribe to all of them, or the first one, the last one?
from wis2-guide.
I thought is was part of the metadata (https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html#_topic_hierarchy) and the elements in this is what we use for filtering relevant information.
@steingod the TT-WISMD decided recently scale down the multiple uses of topic_hierarchy and this we will remove it as a property in WCMP2 and as a requirement for the data_id in the notification message. See: wmo-im/wcmp2#95.
So now, the topic_hierarchy will only be used to identify a channel for pub/sub.
from wis2-guide.
Thanks for the update, makes sense to me. Concerning pub/sub I do understand it has a specific meaning, but for this to be useful at the practical level, the implementation requires that it is possible to connect datasets to only one channel, else you would anyway have to subscribe to everything and filter afterwards. Removing it as a requirement from WCMP2 makes sense, but how is the relation of datasets and channels addressed to make it consistent across the community(ies)?
from wis2-guide.
TT-NWPMD meeting (2023.06.13) noted the decision on scaling down the multiple uses of topic hierarchy. TT-NWPMD asks for further clarification on how to solve the original issue. For a dataset of multidisciplinary in nature, which topic should it be associated with? Clear and well-documented guidance would be needed to ensure consistency.
from wis2-guide.
Adding my thoughts ...
We need to treat each domain separately, so "similar data" from, say, 2 earth system domains would need to be published in places on the topic hierarchy. We shouldn't try to conflate. This solution might not be super elegant, but at least it's predictable for data publishers and data consumers.
from wis2-guide.
note: currently notifications for the same data (same data-id) would be considered as duplicate by the Global Cache, even if they were published in different topics. Of course code is patient and it could be extended, but we will increase complexity...
the code would have to implement that the download is executed only once (in order not to let the data volume grow as well) and execute a re-publish of the further notifications, whereby the download link would have to be adjusted with the value of the first data download (this value have to be saved).
Also the disadvantage remains that we would increase the overall message volume significantly, e.g. most observation data are relevant for several domains and would trigger many notifications.
And an automatic forwarding at receiver's end would be difficult, because the first received notification will execute the data download and the next ones not (recognized as "already downloaded" via data-id check) so that the data will be missed in the other client targets. Also the client-code would have to implement more differentiation.
At least my first feeling would be that multiple publish for the same data (with automatic download of the linked data) is not a good idea. But maybe I am just overlooking a simple solution...
from wis2-guide.
I VERY strongly supports Enrico's comment... topic hierarchy and messages is about knowing that new dataset is available while providing some filtering capabilities. It is not to describe the data nor to limit its usage.
from wis2-guide.
I also strongly oppose that we publish messages for the same data in multiple topics. This only adds complexity without providing much advantage.
from wis2-guide.
TT-WISMD 2023-06-22:
- TT agrees / endorses decision
from wis2-guide.
One data should be published with one discipline. If the data is relevant to other discplines, these information should be described in the metadata of the data. (see TT-NWPMD meeting on 05.07.2023)
from wis2-guide.
Should we close the issue as decided?
from wis2-guide.
We need the decision reflected in documentation in the resulting specification (once wmo-im/wis2-topic-hierarchy#47 is reviewed/merged).
from wis2-guide.
TT-WISMD 2023-09-12:
- add to specification
- if a dataset can be made available under > 1 topic, the centre SHALL choose one topic for publication purposes
from wis2-guide.
TT-WISMD 2023-09-25
- WCMP2 (distribution/MQTT) and WTH (?): add text to see the WIS2 Guide for further guidance on "choosing a topic for your dataset"
- update WIS2 Guide with work clarification
from wis2-guide.
PR in #39.
from wis2-guide.
Related Issues (20)
- ICAO and GTS headers HOT 1
- KPIs for transition from GTS to WIS2 HOT 4
- Portal for WIS2 HOT 5
- Guidance for data consumers HOT 2
- Non-functional performance measures need to be defined HOT 1
- Volume C1 HOT 1
- Simple NC/DCPC data sharing in data-and-metadata-flows.adoc HOT 2
- Decision on publication_datetime on cached data HOT 4
- add guidance on APIs
- add GDC validation of WCMP2 id against incoming topic HOT 2
- clarify WIS2 specification release management
- MQTT features HOT 2
- Finalise document structure for the WIS2 Guide
- Indicate to Global Caches how long a resource should be cached for
- specify guide for report format HOT 2
- Add examples to WIS2 Guide indicating how attribution / other license should be expressed in the WIS2 Discovery Metadata HOT 1
- Access control of recommended data HOT 1
- approval of centre-id between fast-track cycles HOT 5
- WIS2 Node broker links exposure in WCMP2 via GB subscriptions HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wis2-guide.