icmp packet loss with async mtu check

Bug #2046202 reported by Márton Kiss
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-magpie
Confirmed
Undecided
Unassigned

Bug Description

In some customer environments with Cisco Nexus switches the magpie tests are randomly loosing a single icmp packet under load.

2023-12-08 08:45:10 INFO unit.magpie-internal-space/0.juju-log server.go:325 ping stdout PING 10.99.14.7 (10.99.14.7) 8972(9000) bytes of data.
8980 bytes from 10.99.14.7: icmp_seq=1 ttl=64 time=0.209 ms
8980 bytes from 10.99.14.7: icmp_seq=2 ttl=64 time=0.184 ms
8980 bytes from 10.99.14.7: icmp_seq=3 ttl=64 time=0.202 ms
8980 bytes from 10.99.14.7: icmp_seq=4 ttl=64 time=0.196 ms
8980 bytes from 10.99.14.7: icmp_seq=5 ttl=64 time=0.522 ms
8980 bytes from 10.99.14.7: icmp_seq=6 ttl=64 time=2.34 ms <- response time raised
8980 bytes from 10.99.14.7: icmp_seq=7 ttl=64 time=1.58 ms
8980 bytes from 10.99.14.7: icmp_seq=8 ttl=64 time=0.860 ms
8980 bytes from 10.99.14.7: icmp_seq=9 ttl=64 time=1.29 ms
8980 bytes from 10.99.14.7: icmp_seq=10 ttl=64 time=1.42 ms <- packet loss happening after this event
8980 bytes from 10.99.14.7: icmp_seq=12 ttl=64 time=2.01 ms
8980 bytes from 10.99.14.7: icmp_seq=13 ttl=64 time=2.24 ms
8980 bytes from 10.99.14.7: icmp_seq=14 ttl=64 time=1.94 ms
8980 bytes from 10.99.14.7: icmp_seq=15 ttl=64 time=0.151 ms
8980 bytes from 10.99.14.7: icmp_seq=16 ttl=64 time=0.154 ms <- response time normalised
...

--- 10.99.14.7 ping statistics ---
40 packets transmitted, 39 received, 2.5% packet loss, time 4028ms
rtt min/avg/max/mdev = 0.151/0.502/2.340/0.652 ms

When the magpie test was executed with check_iperf=false, no packet loss was experienced at all. The goal of mtu should be to make sure that for exampel 9k icmp packet size is able to pass. However, this call is using the same async ping call as the ping mesh check, with the same settings and parallel execution. Based on several test runs it seems to be that the mtu test is flooding the switches with large size icmp packets, what is (due to the hook execution order) running on multiple units in the same time, together with iperf testing. This seems to be filling out the bandwidth of the 25Gbit link, and icmp has no QOS configured on those environments.

https://opendev.org/openstack/charm-magpie/src/branch/master/src/lib/charms/layer/magpie_tools.py#L801

I reverted the ping code for the synchronous execution of the mtu check, and in this case the packet loss was not happening at all. It would be great to consider of sending a single packet only for testing, and run only one iperf + mtu icmp per deployment to avoid overloading the switches from multiple directions (subsequent charm actions instead of random hook execution for example).

The patch used for sync mtu testing:
https://pastebin.canonical.com/p/VdjTgdmryC/

Revision history for this message
Nobuto Murata (nobuto) wrote :

Can you confirm what value was set for the following options?

- check_iperf
- ping_mesh_mode

Sounds like both are set to True. In our internal process nowadays check_iperf=false is expected and iperf check can be run using an action. But we couldn't flip the default value for check_iperf out of the box due to keeping the backward compatibility.

Changed in charm-magpie:
status: New → Incomplete
Revision history for this message
Márton Kiss (marton-kiss) wrote :

Both were set to true, this is the default setting.

Revision history for this message
Nobuto Murata (nobuto) wrote :

I think we can flip the default of ping_mesh_mode to be on the safe side. However, for the check_iperf part, it's hard to flip the default and you should really disable it and use the action instead.

Changed in charm-magpie:
status: Incomplete → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-magpie (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-magpie/+/903431

Revision history for this message
Márton Kiss (marton-kiss) wrote :

Nobuto, I think the icmp packet loss here was happening with check_iperf=true, and the ping mesh is not adding too much to the story. When the check_iperf is enabled then the execution is the following:
- ping mtu check
- iperf speed test

And the ping mtu check with the current code is using the config settings from the charm, where the default ping_tries set to 20, and due to the async execution of the ping it is happening as fast as possible, so if you have 9 nodes, this will mean 9x20 pings with 9k packet size happening within 1-2secs. And those steps repeated for example on half of the units, and a usual 9 node deployment with vlans consist of 9 x 7 units.

So if the usage of the check_iperf is not suggested, then it would be great to deprecate this option, or add documentation for the config option that explains the drawback of the usage.

Revision history for this message
Nobuto Murata (nobuto) wrote :

> So if the usage of the check_iperf is not suggested, then it would be great to deprecate this option, or add documentation for the config option that explains the drawback of the usage.

Again, the default on is for backward compatibility. And the charm is being rewritten with the operator framework and the behavioral change could be done as part of it as a new major version to break backward compatibility.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.