Quoting Martin Duke: RFC-Compliant ICMP PMTU messages i

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The server must probe the server->client path.

Path MTU Discovery about base-drafts HOT 23 CLOSED

quicwg commented on May 25, 2024

Path MTU Discovery

from base-drafts.

Comments (23)

ianswett commented on May 25, 2024

Agreed that (a) seems largely impractical. (b) is semi-practical, but the checksum is still computable by a passive observer.

Something along the lines of (c) is what I would recommend. It's somewhat similar to what our current experiment does today, though we ignore ICMP entirely and rely on loss as a signal. The negative of course is that

I don't think we want to support small MTUs for QUIC, so we should probably pick some minimum size that we expect all forthcoming CHLOs to fit into. Multipacket CHLOs make stateless rejects impractical, and networks that don't support packets at least ~1200 bytes in size are exceedingly rare.

from base-drafts.

alagoutte commented on May 25, 2024

@ianswett if i remember CHLO is always padded to maximum size (1370 bytes for IPv4 and 1350 bytes for IPv6) ? For avoid amplification ?

from base-drafts.

ianswett commented on May 25, 2024

All handshake packets are always padded to the maximum packet size to ensure the path supports the chosen MTU. So if the path doens't, the handshake fails. This has proven to be a very practical approach, only removing <1% of users who don't support 1350/1370 MTU sizes.

Conveniently, the CHLO needs to be padded for anti-amplifications reasons as well.

from base-drafts.

martinduke commented on May 25, 2024

"(b) is semi-practical, but the checksum is still computable by a passive observer."

Indeed, all ICMP is vulnerable to a passive observer, but if the header echo isn't there it's vulnerable to off-path attacks that guess the 4-tuple.

It seems odd to spend an enormous amount of energy to save a handful of bytes in headers and frames, and then turn around and use a conservatively low MTU, when a very common IPv4 UDP MSS is 1472 Bytes.

Regarding option (c), it makes a lot of sense for CHLO to be a max-size packet for these purposes, but it's not sufficient. Furthermore, using loss as the signal instead of ICMP seems like an inefficient solution.

from base-drafts.

ianswett commented on May 25, 2024

In practice today, we're losing less than 10% of the potential packet size, but I agree it isn't ideal.

Can you clarify why padding the CHLO is not sufficient?

from base-drafts.

martinduke commented on May 25, 2024

The server must probe the server->client path.
If the path changes (particularly with multipath), there must be some means of probing the MTU or you will have lots of packet fragmentation.

from base-drafts.

vasilvv commented on May 25, 2024

I am not sure QUIC can in general rely on the ICMP messages for path MTU discovery, since the ICMP messages are normally consumed by the kernel, which may not expose them to individual clients.

from base-drafts.

martinduke commented on May 25, 2024

I agree that user-space implementations will be reliant on OSes that do whatever they want with UDP ICMP messages. Perhaps the Path MTU discovery section needs lots of SHOULDs instead of MUSTs, because as QUIC migrates into the kernel (?) it can fix these problems. But at the very least, we can set the sockopts to not use the DF bit most of the time.

I should really put something together in a detailed proposal. But the outline should be something like this.

Conditions in which QUIC MUST or MAY send a packet at full-size (using PAD frames as necessary) with the DF bit set. -- this would certainly include the first CHLO and, probably, the second server-generated packet, in addition to packets involving new 4-tuples.
Reactions to loss of those packets
Reactions to ICMP packet too big messages -- would include SHOULDs involving storage of IP header fields, which would be most secure. I think some would say that QUIC SHOULD ignore these packets entirely, but I would disagree.

Relatedly, we could set a relatively high minimum MTU for QUIC connections (~1000 Bytes?). For legitimately low-MTU links, this would cause lots of fragmentation, but it would substantially mitigate off-path attackers.

What do all of you think?

from base-drafts.

martinduke commented on May 25, 2024

I should also add that the very first packet is a poor one to use to wait for loss, as the RTT is entirely unknown and must use a conservative RTO value. It might be better to use a relatively conservative value, as the draft does, at first, and probe upwards to see if there's more capacity to unlock.

from base-drafts.

rjshade commented on May 25, 2024

What's the benefit of relying on ICMP responses vs. doing MTU discovery at the QUIC layer ("packetization layer PMTUD" RFC 4821)?

from base-drafts.

martinduke commented on May 25, 2024

What's the benefit of relying on ICMP responses vs. doing MTU discovery at the QUIC layer ("packetization layer PMTUD" RFC 4821)?

Thanks for the reference; I had not seen that RFC.

Though ICMP messages have their disadvantages, there are four advantages over a loss-based scheme.

Immediate reporting of the actual PMTU, rather than a search process that is likely to undershoot the actual PMTU.
Precision only comes with many probes -> many losses that must be retransmitted.
Loss is an overloaded signal, meaning congestion or RF problems in different contexts. Therefore there is the possibility of error in interpreting a loss.
In many cases, detecting a loss will take much longer than an ICMP message sent from an in-path router.

Meanwhile, a MTU underestimate has not only packet overhead considerations, but also directly impacts the gross throughput possible via congestion control (which operates in multiples of the MSS).

from base-drafts.

ianswett commented on May 25, 2024

I think I'd like a 'trust but verify" approach to path MTU. In an ideal case, QUIC would get the ICMP message and verify that it really could get a non-fragmented packet through with that size. As long as the size was larger than the chosen handshake size, it would try it.

Some of what we've discussed(ie: padding the CHLO and SHLO and setting the DF bit) is what the implementation does today, and not including it in the draft was an oversight we really need to fix.

Actually, QUIC's congestion control(and I believe FreeBSD's) operates in bytes, not MSS. But I agree it's likely the network and host are more efficient with larger packets.

But please do a pull request with what you describe above, because I think you're going in a good direction, and it's just a matter of working out some details, which is easy to do in the comments of a PR.

from base-drafts.

martinduke commented on May 25, 2024

I think I'd like a 'trust but verify" approach to path MTU. In an ideal case, QUIC would get the ICMP message and verify that it really could get a non-fragmented packet through with that size. As long as the size was larger than the chosen handshake size, it would try it.

Stacks should ignore ICMP messages that increase the PMTU. The "only" issues with ICMP are non-conforming routers, and attackers (especially "off-path" attackers) that drive the MTU down to the minimum value.

Some of what we've discussed(ie: padding the CHLO and SHLO and setting the DF bit) is what the implementation does today, and not including it in the draft was an oversight we really need to fix.

That's great, but again, CHLO and SHLO will often have long RTOs, so loss-based MTU discovery is uniquely ill-suited to these packets.

Actually, QUIC's congestion control(and I believe FreeBSD's) operates in bytes, not MSS. But I agree it's likely the network and host are more efficient with larger packets.

I believe there's already a comment that QUIC congestion control is poorly spelled out in the draft. But the draft says it uses TCP congestion controls, which define their initial cwnd in multiples of MSS. In the absence of ABC, which is not listed in the draft, then acknowledgments increment cwnd in multiples of MSS as well.

But please do a pull request with what you describe above, because I think you're going in a good direction, and it's just a matter of working out some details, which is easy to do in the comments of a PR.

It might take me a week or two to get to it, but I will do so. Thanks for the encouragement!

from base-drafts.

ianswett commented on May 25, 2024

Stacks should ignore ICMP messages that *increase* the PMTU. The "only" issues with ICMP are non-conforming routers, and attackers (especially "off-path" attackers) that drive the MTU down to the minimum value.

Right, what I had in mind was completing the handshake, then sending out a RFC 4821 style PMTUD packet and if an ICMP message comes back, try that size one more time to see if it gets through. If the original probe got through, even though an ICMP message was received, then QUIC should ignore the ICMP message and stick with the probed size.

That's great, but again, CHLO and SHLO will often have long RTOs, so loss-based MTU discovery is uniquely ill-suited to these packets.

Today, most paths either block all UDP or support largish(ie: >1400 byte) MTUs, so there needs to be good handling for timeouts. Use of ICMP messages in the handshake could be useful when available, but it's an optimization to allow a few extra people to speak QUIC, at least on today's public internet, where it's necessary to have a TCP fallback. But we should specify what should happen in a fallback free world where ICMP is available to clients and servers.

Good point, I'll make sure ABC gets added along with a more fleshed out congestion control section.

It might take me a week or two to get to it, but I will do so. Thanks for the encouragement!

Looking forward to it.

from base-drafts.

martinduke commented on May 25, 2024

I'm not sure if the PR pings people who are tracking this issue, but I submitted two pull requests:
#105
#106

The first is my preferred version, which strongly recommends ICMP-based PMTU discovery. It makes somewhat bold assumptions about the real world:

ICMP black holes are rare enough to be handled by a MAY if people want to use PLPMTUD in addition to ICMP.
I personally tend to work in kernel space, but my quick glance at UDP socket APIs suggest that it's not a very hard problem to modify normal DF settings/ICMP handling in a user space implementation.
It strongly discourages a fixed, conservative PMTU. IMO it seems perverse to leave ~100 bytes on the table given all the complexity we're introduced to save a handful of bytes in packet and frame headers.

The second PR trashes all those assumptions, and is a very permissive (and wordy) spec that basically allows anything. It still adds a bunch of SHOULDs that make ICMP-based discovery work better in a QUIC context. This is the one piece where I feel strongly that QUIC's packetization section should not just reference a bunch of MTU RFCs.

I am interested in feedback on one or both, in particular which PR is a better basis for further editing.

from base-drafts.

MikeBishop commented on May 25, 2024

I think the spirit of QUIC so far has been "Be as efficient as possible; if we break something, there's always TCP." In that vein, the first seems more in keeping. On a purely technical basis, I don't have enough context to opine.

from base-drafts.

martinduke commented on May 25, 2024

On Tue, Dec 13, 2016 at 9:04 AM, ianswett ***@***.***> wrote: Right, what I had in mind was completing the handshake, then sending out a RFC 4821 style PMTUD packet and if an ICMP message comes back, try that size one more time to see if it gets through. If the original probe got through, even though an ICMP message was received, then QUIC should ignore the ICMP message and stick with the probed size.

Because there is no retransmission ambiguity, there is no need to try the larger size again. If the ack comes back for the original packet, than the ICMP message is spurious. I probably should have put this consideration in the pull request.

from base-drafts.

mcmanus commented on May 25, 2024

both 105 and 106 shift us from MAY use some kind of pmtud onto SHOULD use some kind of pmtud (the details of which vary). Functionally, that's creating a requirement of implementations that I don't think is justified as necessary by the experience so far and the complexity laid out in the PR.

given ian's experience of 90% effectiveness in comment in #64 (comment) I would be wary of introducing ICMP into this at all.

from base-drafts.

martinduke commented on May 25, 2024

given ian's experience of 90% effectiveness in comment in #64 (comment) I would be wary of introducing ICMP into this at all.

'90% effectiveness' means we're leaving about 150 bytes per datagram on the table. It would be fine to have a protocol that didn't packetize data all that efficiently in the name of simplicity, but if that is the goal then we should absolutely get rid of the many variable-length header fields, which introduce a ton of complexity for less than 100 bytes of savings in most cases.

from base-drafts.

mcmanus commented on May 25, 2024

I should have been more clear that I was making 2 different (but related) comments 1] PMTUD overall ought to remain a MAY 2] in describing PMTUD we could choose to detail a loss based in band approach or an ICMP approach (or a hybrid etc..). I meant to advocate for the in band approach because of concerns over the complexity of ICMP given its rather small impact here. * part of the complexity is simply ICMP is a whole different protocol stack - often with different same host consumers than the QUIC stack (as has been mentioned). * a bigger part of the complexity imo is that ICMP introduces unauthenticated and unencrypted inputs into the system. So you have to at least add the complexity of verifying them independently which undermines a lot of their original advantages over a loss based approach anyhow (e.g partially the argument about search space, the argument about faster) and who knows if this also enables meaningful traffic analysis such as identifying reliable vs non reliable streams, etc.. Much better in my opinion to keep all quic inputs authenticated as much as possible - and this is a place where it seems possible. I'm not actually a big fan of the variable-length header fields, but your argument isn't really apples to apples. Variable-Length-Encoded bytes are truly saved bandwidth, while the 150 byte MTU shortcoming relates to packet overhead ratios.. adding 150 bytes of data to each packet has about the same bandwidth impact as saving 6 or 7 actual bytes if my arithmetic worked out.

…

On Tue, Jan 3, 2017 at 1:11 PM, martinduke ***@***.***> wrote: given ian's experience of 90% effectiveness in comment in #64 <#64> (comment) I would be wary of introducing ICMP into this at all. '90% effectiveness' means we're leaving about 150 bytes per datagram on the table. It would be fine to have a protocol that didn't packetize data all that efficiently in the name of simplicity, but if that is the goal then we should absolutely get rid of the many variable-length header fields, which introduce a ton of complexity for less than 100 bytes of savings in most cases. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#64 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAP5sxEbWTvUmKtgbh7rwymjzSWrReJbks5rOo9NgaJpZM4LCCRR> .

from base-drafts.

martinthomson commented on May 25, 2024

I think that #106 is closer now, we should discuss at the interim.

from base-drafts.

mnot commented on May 25, 2024

As discussed in Tokyo, @martinduke to propose text for PR #106 to reduce the default packet size to the IPv6 default and recommend PLPMTUD with optional usage of ICMP information

from base-drafts.

martinthomson commented on May 25, 2024

#106 was merged, so this is now done.

from base-drafts.

Path MTU Discovery about base-drafts HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent