MTU mismatches are one of the most consistently underestimated risks in network migrations. Not because engineers do not know about MTU — they do. But because standard testing almost never catches it. And in the pressure of a migration window, nobody is specifically looking for it.
MTU problems are infrastructure-layer failures that present as application-layer symptoms. Slow applications. Intermittent timeouts. Selective failures that follow a pattern nobody initially connects to the network.
Ping, traceroute, telnet, SSH, HTTPS — none of these will reveal an MTU problem unless you are specifically testing for it. Small packets pass cleanly. The mismatch only becomes visible when production traffic — larger packets, higher volumes, real application behaviour — hits the path. By then, the migration is live and the pressure to diagnose and fix is intense.
Here is what it looks like in production. Three real examples.
GRE tunnel used for internet failover between data centres
Some clients were fine. Others were timing out intermittently. The pattern that eventually gave it away: clients on IPv4-only ISPs were unaffected. Clients on dual-stack ISPs were not.
The additional encapsulation overhead from IPv6 headers was pushing packets over the GRE tunnel's effective MTU. Standard connectivity tests — all passing. Real traffic from dual-stack clients — failing selectively, in a pattern that looked like an application problem, a routing problem, a firewall problem — before anyone looked at MTU. The fix was straightforward once identified. Getting to the identification took significantly longer than it should have.
VXLAN to ACI fabric migration over DCI
Applications running within the VXLAN fabric — fine. The same applications, once their VLANs were extended over the DCI into the ACI fabric — severely degraded throughput. Not down. Just slow enough to be noticed by users and painful enough to generate escalations.
The VXLAN encapsulation overhead combined with the DCI path MTU created fragmentation that only became visible at production traffic volumes. In a test window with low traffic and small payloads, everything looked healthy. The issue was invisible until the application was under real load with real packet sizes.
Remote Site moved to SD-WAN — Cisco ISE RADIUS proxy broke
Not the SD-WAN configuration. Not the ISE policy. Not the authentication logic. The ISE interface MTU — not adjusted to account for the overhead introduced by the SD-WAN overlay. RADIUS packets that had always been fine on the legacy WAN were now being fragmented. Authentication appeared to work intermittently, which made diagnosis harder — intermittent failures are always harder to pin down than consistent ones. Adjusting the ISE interface MTU fixed it immediately.
Why standard testing does not catch this
The underlying issue is that most pre-migration testing uses small packets, low volumes and controlled conditions. MTU problems are load-sensitive and packet-size-sensitive. A ping test with default packet sizes passes. A file transfer, a database synchronisation, an application with large payload sizes — these expose the problem that the ping test missed.
Every overlay technology reduces the effective MTU of the path beneath it. GRE adds 24 bytes. VXLAN adds 50 bytes. IPsec adds variable overhead depending on the cipher and authentication. SD-WAN overlays add encapsulation that varies by vendor and configuration. These overheads compound. A path that passes through multiple overlay technologies can lose 100 bytes or more from the effective MTU before a single byte of application data is sent.
What to design for
MTU needs to be treated as a first-class design consideration — not an implementation detail left for the engineers to sort out during build. At design stage, every overlay in the path should be identified, its overhead documented, and the effective MTU calculated end-to-end. That calculated MTU should be validated against the application requirements.
Jumbo frames help but do not eliminate the problem — they raise the ceiling, but overlay overhead still reduces it, and not every device in the path may support jumbo frames. Path MTU Discovery helps but is frequently blocked by firewalls that drop ICMP Type 3 Code 4 messages. Do not rely on PMTUD as your safety net.
Do not rely on Path MTU Discovery as your safety net. PMTUD is frequently blocked by firewalls that drop ICMP Type 3 Code 4 messages — silently, without any indication that fragmentation is occurring.
Test explicitly for MTU before go-live. Not with ping defaults — with ping extended to maximum packet size, with iPerf or equivalent under load, with the actual application where possible. Document the expected MTU at every significant point in the design. Include it in the LLD with specific values, not as a generic note that MTU should be considered.
- Every overlay in the path is identified and its overhead documented. GRE, VXLAN, IPsec, SD-WAN — each one reduces the effective MTU. The LLD should show the calculation, not just a note that MTU is to be confirmed.
- The effective MTU is validated against application requirements. Particularly for applications with large payloads — databases, file transfer, backup, VoIP with large RTP packets.
- ICMP is permitted appropriately for PMTUD. Firewalls that silently drop ICMP Type 3 Code 4 messages disable Path MTU Discovery and make MTU problems significantly harder to diagnose.
- ISE, RADIUS and authentication infrastructure MTU is specifically checked. Authentication protocols are frequently overlooked in MTU analysis and consistently cause post-migration issues.
- Explicit MTU testing is included in the pre-go-live checklist. Not ping with defaults. Specific tests at maximum packet size, under load, across the actual production path.