Giter Site home page Giter Site logo

Comments (32)

efucile avatar efucile commented on June 12, 2024 2

I am afraid that this can start very complex discussions. Example: precipitation is hydrology and weather. Does this mean that we publish precipitation observations on two topics? We should not build too much around topic and make use of the discovery metadata to inform different communities. I think that this needs to be addressed at the WCMP2 level, not in the topic.
Publishing on different topics the same data can increase the complexity in an unsustainable way.

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024 2

DECISION

The topic_hierarchy will only be used to identify a channel for pub/sub. When a dataset is applicable under multiple domains, one should choose one domain (that is a best fit) and use the WCMP2 metadata record for further descriptions.

I updated the decision in the first issue comment to reflect the general consensus of this group.

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024 1

Having a multidisciplinary definition may lead to providers dumping any data in question to this topic?

The lesser evil here could be publishing multiple messages with the same properties.data_id.

So if a data granule is published that applies to 3 topics, then:

  • 3 messages sent
  • each message has the identical properties.data_id

In this manner, we are able to ensure deduplication. This would, however, require WCMP2 properties.wmo:topicHierarchy to support multiple topics (which would need an update, which is fine).

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024 1

I suggest to wait before implementing anything! This is still under discussion. As explained in my comment above, I have the feeling (I might be wrong though) that we are taking the topic hierarchy discussion on the wrong path. It may eventually look like a second-class metadata record. We should focus on the official metadata for this kind of information.

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024 1

@wmo-im/tt-nwpmd This will probably be the guidance:

If a dataset is multidisciplinary in nature, then choose the best fit for the TH. Think of the TH as a key or identifier for notifications on the cache with some basic meaning, but not a full description. More descriptions about other relevant disciplines will go into the WCMP2 metadata record for that dataset. Currently, the TT-WISMD is considering the best approach for this. Please see this comment in issue # 101 wmo-im/wcmp2#101 (comment).

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024

TT-WISMD 2023-04-12:

  • add a multidisciplinary token/value?
    • can be put forth in WCMP2 as themes/concepts
  • should have a single topic for subscription
  • needs more discussion

from wis2-guide.

sebvi avatar sebvi commented on June 12, 2024

can a blob of data be registered under several hierarchy? i.e. multiple times

from wis2-guide.

yhe-wmo avatar yhe-wmo commented on June 12, 2024

The TT-NWPMD meeting (17.04.2023) suggested that it would be more helpful to allow the datasets be made available, subscribed and notified under several (multiple) hierarchy topics.

  • add a multidisciplinary token/value?

TT-NWPMD would also like to seek further clarification how the idea of multidisciplinary token/value will work.
I had a quick chat with @amilan17, if I understood correctly, the idea was that to add to Level 8 (now the 7 Earth system domains) a new controlled vocabulary for "multidisciplinary".

from wis2-guide.

kaiwirt avatar kaiwirt commented on June 12, 2024

The current design is, that the data_id is uniquely identifying the data granule. If a cache is receiving three messages in three topics with the same data_id, then the behaviour is, that the cache is downloading the data once, republishes the corresponding message and drops the other two messages as duplicates.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

If we go for this option, multiple messages in the currently existing domains, with the same data_id, then the Global cache should republish the message in each topic hierarchy but only download once. Is that doable ?

How to have this behaviour will keeping the current anti duplication of downloads ?

We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow filtering by subscribers. It is not meant either to show that this data is useful for X and not Y. Again, this is the purpose of the metadata record.
It (sort of) reminds of the other discussion (wmo-im/wis2-topic-hierarchy#30), it looks to me that we are overcharging the meaning and purpose of the topic hierarchy.

I therefore wonder if both "requirements" are consistent with the currently agreed purpose of the topic hierarchy.
We don't want the topic hierarchy to become the new TTAAii of WIS2 ;)

from wis2-guide.

kaiwirt avatar kaiwirt commented on June 12, 2024

We can implement this change if this is agreed on. However i am not sure this is a good solution. Having different messages in different topics for the same data only increases the number of messages with no additional benefit in my opinion.

If data "fits" into several topics, then i would prefer having a decision on which is the correct topic for that data instead of just sending out multiple messages.

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024

We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow filtering by subscribers. It is not meant either to show that this data is useful for X and not Y. Again, this is the purpose of the metadata record.
It (sort of) reminds of the other discussion (#30), it looks to me that we are overcharging the meaning and purpose of the topic hierarchy.

@golfvert the readme of this repository states that:
"The WIS2 topic hierarchy provides a central classification and categorization scheme used by data providers and WIS2 Global Services in support of core WIS2 workflows: publish, discover, subscribe and download."

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

I don't think that what I wrote contradicts that statement. The data is classified and categorized. We haven't a unique central topic with everything. Then, the question is how far this topic hierarchy should be sufficient to identify the data.
I have the feeling that we might push this too far...

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024

Proposed decision for the @wmo-im/tt-wismd:

  • Only one Topic Hierarchy (w/ one discipline) is allowed in the metadata record.
  • One should not publish the same message under multiple disciplines.
    Noting, that the primary purpose of the TH is to identify the channel for the publication of notifications and to subscribe to notifications from that channel. For broader discovery of other relevant disciplines these should be listed as themes in the metadata record.
  • Also noting that the TT-WISMD, decided to remove the topic hierarchy from the data_id in the notification message.

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024

@sebvi wants a real-life use case to understand

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024

related to: wmo-im/wcmp2#94

from wis2-guide.

steingod avatar steingod commented on June 12, 2024

Following the discussion above I struggle to see what I could use the topicHierarchy for. I thought it was for filtering, but if multiple hierarchies are not permitted it won't serve the purpose for many datasets. Furthermore I don't understand the comment above:

We also have to remember the purpose of the topic hierarchy. It is not a way to describe the data (that is the job of the metadata) but to allow filtering by subscribers. It is not meant either to show that this data is useful for X and not Y. Again, this is the purpose of the metadata record.

I thought is was part of the metadata (https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html#_topic_hierarchy) and the elements in this is what we use for filtering relevant information. For cryosphere many of the datasets could also be published using weather or climate etc and then you will have to resort to Properties/Themes to really sort what is relevant and just ignore TopicHierarchy. So given the ambiguity for many datasets I am struggling to see the use case for TopicHierarchy.

from wis2-guide.

josusky avatar josusky commented on June 12, 2024

Hi @steingod,
this might be a terminology issue. The word "topic" has a specific meaning in publish-subscribe protocols. It is an identifier needed to create a subscription. Having multiple topics for one dataset would be confusing - should I subscribe to all of them, or the first one, the last one?

from wis2-guide.

amilan17 avatar amilan17 commented on June 12, 2024

I thought is was part of the metadata (https://wmo-im.github.io/wcmp2/standard/wcmp2-DRAFT.html#_topic_hierarchy) and the elements in this is what we use for filtering relevant information.

@steingod the TT-WISMD decided recently scale down the multiple uses of topic_hierarchy and this we will remove it as a property in WCMP2 and as a requirement for the data_id in the notification message. See: wmo-im/wcmp2#95

So now, the topic_hierarchy will only be used to identify a channel for pub/sub.

from wis2-guide.

steingod avatar steingod commented on June 12, 2024

Thanks for the update, makes sense to me. Concerning pub/sub I do understand it has a specific meaning, but for this to be useful at the practical level, the implementation requires that it is possible to connect datasets to only one channel, else you would anyway have to subscribe to everything and filter afterwards. Removing it as a requirement from WCMP2 makes sense, but how is the relation of datasets and channels addressed to make it consistent across the community(ies)?

from wis2-guide.

yhe-wmo avatar yhe-wmo commented on June 12, 2024

TT-NWPMD meeting (2023.06.13) noted the decision on scaling down the multiple uses of topic hierarchy. TT-NWPMD asks for further clarification on how to solve the original issue. For a dataset of multidisciplinary in nature, which topic should it be associated with? Clear and well-documented guidance would be needed to ensure consistency.

from wis2-guide.

6a6d74 avatar 6a6d74 commented on June 12, 2024

Adding my thoughts ...

We need to treat each domain separately, so "similar data" from, say, 2 earth system domains would need to be published in places on the topic hierarchy. We shouldn't try to conflate. This solution might not be super elegant, but at least it's predictable for data publishers and data consumers.

from wis2-guide.

antje-s avatar antje-s commented on June 12, 2024

note: currently notifications for the same data (same data-id) would be considered as duplicate by the Global Cache, even if they were published in different topics. Of course code is patient and it could be extended, but we will increase complexity...
the code would have to implement that the download is executed only once (in order not to let the data volume grow as well) and execute a re-publish of the further notifications, whereby the download link would have to be adjusted with the value of the first data download (this value have to be saved).
Also the disadvantage remains that we would increase the overall message volume significantly, e.g. most observation data are relevant for several domains and would trigger many notifications.
And an automatic forwarding at receiver's end would be difficult, because the first received notification will execute the data download and the next ones not (recognized as "already downloaded" via data-id check) so that the data will be missed in the other client targets. Also the client-code would have to implement more differentiation.

At least my first feeling would be that multiple publish for the same data (with automatic download of the linked data) is not a good idea. But maybe I am just overlooking a simple solution...

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

I VERY strongly supports Enrico's comment... topic hierarchy and messages is about knowing that new dataset is available while providing some filtering capabilities. It is not to describe the data nor to limit its usage.

from wis2-guide.

kaiwirt avatar kaiwirt commented on June 12, 2024

I also strongly oppose that we publish messages for the same data in multiple topics. This only adds complexity without providing much advantage.

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024

TT-WISMD 2023-06-22:

  • TT agrees / endorses decision

from wis2-guide.

yhonda21 avatar yhonda21 commented on June 12, 2024

One data should be published with one discipline. If the data is relevant to other discplines, these information should be described in the metadata of the data. (see TT-NWPMD meeting on 05.07.2023)

from wis2-guide.

antje-s avatar antje-s commented on June 12, 2024

Should we close the issue as decided?

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024

We need the decision reflected in documentation in the resulting specification (once wmo-im/wis2-topic-hierarchy#47 is reviewed/merged).

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024

TT-WISMD 2023-09-12:

  • add to specification
  • if a dataset can be made available under > 1 topic, the centre SHALL choose one topic for publication purposes

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024

TT-WISMD 2023-09-25

  • WCMP2 (distribution/MQTT) and WTH (?): add text to see the WIS2 Guide for further guidance on "choosing a topic for your dataset"
  • update WIS2 Guide with work clarification

from wis2-guide.

tomkralidis avatar tomkralidis commented on June 12, 2024

PR in #39.

from wis2-guide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.