Giter Site home page Giter Site logo

Comments (20)

golfvert avatar golfvert commented on June 12, 2024 1

I spoke with the expert on MQTT parameters.

Having QOS=1 to cope with "short" glitches is a good idea.
For "long" glitches (they are not glitches any more...), then MQTT tooling will not be sufficient and a system out of MQTT would be required.

So, QOS=1 and persistent queues is a good solution for a "reasonable" number of messages (or queue size in bytes).

Shall we give it a go with

  • QOS=1 and a persistent queue of a minimum of 2000 messages is highly preferred
  • For this to work a connected user must keep the same clientid value.

Global Broker will have quite a large number of client, depending on the use of resources, they may choose to go above the recommended limit of 2000 messages.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

As a piece of information, this http://www.steves-internet-guide.com/mqtt-retained-messages-example/ describes fairly nicely the retain and some qos aspects.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

IMHO, we need to agree on:

  1. Retain true/false. If true we should agree on the retain duration. Old topics (12h, 24h, more...) without new messages should be cleared to avoid old messages to be retained. At the moment GB France is clearing retained messages older than 12h.
  2. QOS:
  • at the wis2 node level
  • at the GB level
  1. On the GB most (all?) implementations have some mechanism to keep messages in queues. If a known client is disconnected for X sec./min./... it should be able (or not) to get messages published while he was "away".

from wis2-guide.

6a6d74 avatar 6a6d74 commented on June 12, 2024

We know that some data are higher priority than others; e.g., tsunami warnings ...

But, the Global Brokers should easily be able to scale to support the load from data consumers. So I'd rather not put in place any extra complexity to deal with a problem that may not arise.

I suggest that we run stress-test on Global Broker(s) in the final quarter of the pilot project to determine if special message prioritisation is needed.

Other than that, we do (urgently) need to agree the broker protocol elements.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

A meeting is organized on Sept. 4th with an MQTT Expert (Dev from VerneMQ) to propose parameters to be used as part of MQTT specifications.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

Following the meeting yesterday, and to make a long story short, the advice from VerneMQ specialist is to keep the system as simple as possible.
Our design is providing a redundant, resilient solution. There is no obvious need to use QOS, retain, persistent queues with MQTT to get the resilient system we need.
The proposal can be summarized like this:
QOS0, no retain, no persistent queues. MQTT 5.0 is preferred. The "S" version of the protocol, almost a must. Authentication and authorisation (read only) for preventing flooding the broker must be implemented.
The chapter of the guide on MQTT should reflect this.

from wis2-guide.

josusky avatar josusky commented on June 12, 2024

Will you provide more details during the F2F meeting?

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

Have you received the link with the recording of the presentation on MQTT from last week ? It should address most of the questions. And why this three line conclusion.
I don't want to share the link publicly here...

from wis2-guide.

josusky avatar josusky commented on June 12, 2024

OK, I have watched the recording. The part about QoS starts at 1:12:50 and continues until about 1:28. At two moments, one at the beginning, the second is at 1:21:25, you have made a wrong assumption. Both of you, Kai and @golfvert, said that when a client has connection to two brokers than it makes the delivery practically sure, i.e. almost like QoS = 2. That is not true at all. It is not even QoS = 1, it is more like QoS = 0.01. Let me explain.
If the client has a short network outage or restart of the service, it stops communicating with all brokers. Maybe for a second, may be for a few minutes. If all brokers use QoS = 0 it is never going to get the messages that it have missed. While QoS = 1 would solve this nicely and without use of a bespoke reply service.
As you, @golfvert said, 99% of the time all clients will be online. In such case:

  • the persistent queues are practically empty
  • the QoS = 1 generates a small network overhead
  • the 1% of clients that are temporary off-line will be very happy when they come on-line

Sure, a Global Broker needs to have safety some measures in place. If it looses all outgoing connections (but not the incoming) for and hour it might need extra memory, if it happens for a day, well, first of all a Global Broker should not be down for so long and secondly, in this case it is acceptable to drop messages - that's why we have the redundancy :-)

from wis2-guide.

josusky avatar josusky commented on June 12, 2024

To summarize my previous comment - instead of:

QOS0, no retain, no persistent queues. MQTT 5.0 is preferred. The "S" version of the protocol, almost a must. Authentication and authorisation (read only) for preventing flooding the broker must be implemented.

I suggest:

  • QoS = 1
  • retain = false
  • MQTT 5.0 is highly preferred
  • The "S" version of the protocol (that is TLS), a must
  • Authentication and authorisation - a must (methods to be discussed) - wmo-im/wis2pilot#109.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

If QOS=1 then if the same client-id (without cleanSession) is used after a disconnection the client will get the messages (up to a certain limit) he missed while disconnected.
Typically, mosquitto has additional parameters to avoid "killing" the broker with a very large number of messages in the queue:

# The maximum number of QoS 1 and 2 messages to hold in a queue per client
# above those that are currently in-flight.  Defaults to 1000. Set
# to 0 for no maximum (not recommended).
# See also queue_qos0_messages.
# See also max_queued_bytes.
#max_queued_messages 1000

So, if we go for QOS=1 we also have to define a reasonable duration (or number of messages) for the persistent queue.
How long ? How many messages ?
Do we want to address glitches (eg. 10 seconds) ? small problem (1 minute) ? more ?
Then above that limit, messages will have disappeared and if we want users to get all messages anyhow, then we need a replay service.

If we have a replay service, then, do we need to make a difference between a glitch and a 2-hour interruption ?

I agree with:

  • retain = false
  • MQTT 5.0 is highly preferred
  • The "S" version of the protocol (that is TLS), a must - Do we impose valid (that is not self-signed) certs ?
  • Authentication and authorisation - a must (methods to be discussed)

If QOS=1 we need also the number of messages in the queue or the size of the queue.

from wis2-guide.

josusky avatar josusky commented on June 12, 2024

Yes, my main concern are "glitches", that is short unplanned outages that are basically inevitable. The new system shall not be worse than the old one. Message switching systems in GTS have persistent queues or store files somewhere (and retry if the storing fails) so a short outage (usually) does not cause data loss.
In the case of a longer outage (hours or days) it is expected/acceptable that either some data will be missing or some extra step is needed to get them. Even if the extra step is automated it needs to be signaled/logged because it may take non-trivial amount of time, CPU/network resources etc.
The exact size of the persistent queues is to be discussed. Or perhaps we could do some testing/measurement to see what is a reasonable size. But as a first guess I would go for a few (let's say 5?) minutes.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

With the caveat that the configuration, on the brokers, is not time based but based on the number of messages or the size of the queue in bytes.
Depending on the kind (how "large" it is) of subscription, the same limit eg. 1000 messages or 8MB will provide more or less protection against glitches.
I'll speak to André (Verne). And report back.

from wis2-guide.

kurt-hectic avatar kurt-hectic commented on June 12, 2024

It is perhaps a good idea to reach out to both WIS2 consumers and producers regarding the issue and details of persistent connections / queues, as both likely stand to gain from implementing it.

On the side of consumers different types of use-cases may have to be distinguished. NWP or warning related centers (and a replay service!) are high-volume / high-importance clients that benefit from GB implementing possibly substantial queues using persistent sessions. The public at large perhaps does not have the need for persistent sessions, or if they are implemented, they can be smaller.

Producers (WIS2 nodes) also gain from persistent sessions. It is practically unavoidable that the (external) internet connection of a country will go down for some time. In this case all GB would not be connected and not get data which was published while the internet connection was down. With persistent sessions this data would become available on re-subscription by default without the need for a complex re-publishing mechanism.

GB would have to allocate significant memory to provide persistent sessions to all users, considering that they are open to the public. One idea would be to implement QoS / persistent sessions / queue size on a per user base (if supported by the broker software), another one to expose a dedicated broker instance to "high-value" users such as NWP or alerting related centers.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

"GC would have to allocate significant memory to provide persistent sessions to all users" I don't think so...
Users will NOT have a direct subscription to the GC. Only the GBs will have connections from Users.
GC (as all WIS2Node) will only have subscription from the Global Brokers. So, the size of persistent queues is a no-brainer except for the GBs.
Then, if a persistent queue is available (this is a parameter on the broker side), then the users will have the opportunity to use them or not. They could either use a new client-id in case of failure in their connection to the GB. They can choose to discard all messages. Furthermore, they could go for QOS=0.
If we have 2000 messages with an average size of 1KB, then each client may require a maximum of 2MB to hold the queue. If there are 1000 clients, then it could be 2GB of storage. Storage can be done on RAM and/or disk.
If we have 100000 clients on the GB (I guess this will never be the case) then holding all those queues may/will become a problem.
Persistent queues are implemented (or not, but this is a global parameter) on the broker and are used or not by the subscribers.
It can have performance implication when the number of subscribers is large.
This is only the case for GBs in our design.

from wis2-guide.

kurt-hectic avatar kurt-hectic commented on June 12, 2024

@golfvert I meant GB, not GC, comment is updated.

The requirement of a NWP or world data center, which would subscribe to "cache/+/+/data/core/weather/surface-based-observations/synop" (or similar topic yielding a large number of notifications) would likely not be met by a queue of 2000 assuming your model of 300 eventual centers, sharing 1 obs per second.

The idea of different service levels for different users is that a malicious (anonymous) user could connect with persistent sessions set, subscribe to "cache/#" and disconnect, a large number of times with different client_ids and possibly exhaust memory / disk required to queue messages in the event "high-value" center goes offline.

from wis2-guide.

golfvert avatar golfvert commented on June 12, 2024

"300 eventual centers, sharing 1 obs per second."
300 hundred centres, maybe.
1 obs per second, unlikely.

In the Res. 1 and GBON, aren't we talking about one per hour ?

"The idea of different service levels for different users" could be interesting. However, in the three implementation of MQTT I have looked at (mosquitto, verne, emqx) this is a global parameter. So, it can't be 10000 for VIPs and 10 for others.

2000 messages is of course debatable. My understanding of the implementations of MQTT is that we can choose one value....

" a large number of times with different client_ids and possibly exhaust memory / disk" this kind of issues will have to be managed by GB whether we go for 10, 1000, 10000 messages in the persistent queue.

from wis2-guide.

efucile avatar efucile commented on June 12, 2024

Decision

All the brokers(not only global brokers) shall comply with the following recommendations

  • QOS=1
  • retain = false
  • MQTT 5.0 should be used
  • The "S" version of the protocol (that is TLS) should be used and if it is used shall provide a valid certificate
  • Authentication and authorisation shall be used with user/password

Annex in the guide is required to add these recommendation for all the services providing a broker (node, GB, GC). The technical specs for those services will point to the annex.

from wis2-guide.

kaiwirt avatar kaiwirt commented on June 12, 2024

I would like to add to the list for Global Brokers to support shared subscriptions (topic $share)

from wis2-guide.

6a6d74 avatar 6a6d74 commented on June 12, 2024

see PR #84 ... complete. Closing issue.

from wis2-guide.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.