Extremely Slow kernel/initrd/squashfs Transfers due to nginx sendfile on
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
Undecided
|
Unassigned |
Bug Description
Since upgrading to MAAS 2.5.0, most hosts exhibit extremely slow transfer speeds for the kernel, initramfs and squashfs when PXE booting from MAAS. The transfer speed settles at around 1Mbit/sec which is slow enough to cause many common MAAS operations to timeout and fail.
I can reproduce this both in a PXE environment on the same L2 domain, deployed hosts wget'ing the image on the same L2 domain, and as well as from a desktop browser on a different L3 subnet. The only resolutions i've been able to work out currently are;
* Reboot the MAAS host - this resolves the speed issues for a short while (<1 hour or a few PXE boots?).
* Run a tcpdump on the MAAS host - As soon as tcpdump is run from the MAAS, transfer speeds significantly increase - making it tricky to diagnose at a network level what is happening! This fix persists for as long as tcpdump is running, as soon as tcpdump is stopped speeds return to 1Mbit/sec.
* Disable the sendfile option in /var/lib/
I'm unsure why exactly disabling sendfile fixes the issue given what enabling it is supposed to significantly speed up I/O operations. Equally, the fact that running tcpdump fixes the issue until terminated is also rather odd!
As running tcpdump on the MAAS host resolves the issue temporarily, I instead attempted to run tcpdump both on a deployed host in the same L2 broadcast domain wget'ing the images, and Wireshark on the desktop browser. Both showed an initial high throughput, which quickly slowed down to about 1Mbit/sec with huge numbers of Out-of-Order TCP segments, TCP retransmissions and other TCP errors.
My environment is as follows;
* Single Virtual Machine running MAAS Rack and Region controller.
* Linux maas 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
* nginx version: nginx/1.14.0 (Ubuntu)
* ESXi VM - 2 vCPU, 4GB RAM, 1x NIC using VMXNET3, portgroup with VLAN.
* ESXi Hypervisor - 6.5.0 Update 2 (Build 10884925)
I originally posted about this on the MAAS forums at https:/
Many thanks.
Changed in maas: | |
milestone: | none → 2.6.0 |
status: | New → Triaged |
summary: |
- Extremely Slow kernel/initrd/squashfs Transfers + [2.5] Extremely Slow kernel/initrd/squashfs Transfers due to nginx + sendfile on |
I managed to reproduce this, although the other way around (with MAAS in a container and PXE booting VMS).
I'm going to do some performance testing to see whether it regresses badly turning off sendfile, tpc_nopush and tcp_nodelay, since they all seem to be causing problems with VMs.
If there are no severe regressions, let's turn them off.