Extremely Slow kernel/initrd/squashfs Transfers due to nginx sendfile on

Bug #1811537 reported by KingJ
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Undecided
Unassigned

Bug Description

Since upgrading to MAAS 2.5.0, most hosts exhibit extremely slow transfer speeds for the kernel, initramfs and squashfs when PXE booting from MAAS. The transfer speed settles at around 1Mbit/sec which is slow enough to cause many common MAAS operations to timeout and fail.

I can reproduce this both in a PXE environment on the same L2 domain, deployed hosts wget'ing the image on the same L2 domain, and as well as from a desktop browser on a different L3 subnet. The only resolutions i've been able to work out currently are;

 * Reboot the MAAS host - this resolves the speed issues for a short while (<1 hour or a few PXE boots?).
 * Run a tcpdump on the MAAS host - As soon as tcpdump is run from the MAAS, transfer speeds significantly increase - making it tricky to diagnose at a network level what is happening! This fix persists for as long as tcpdump is running, as soon as tcpdump is stopped speeds return to 1Mbit/sec.
 * Disable the sendfile option in /var/lib/maas/http/nginx.conf and restart the maas-http service. This resolves the issue indefinitely.

I'm unsure why exactly disabling sendfile fixes the issue given what enabling it is supposed to significantly speed up I/O operations. Equally, the fact that running tcpdump fixes the issue until terminated is also rather odd!

As running tcpdump on the MAAS host resolves the issue temporarily, I instead attempted to run tcpdump both on a deployed host in the same L2 broadcast domain wget'ing the images, and Wireshark on the desktop browser. Both showed an initial high throughput, which quickly slowed down to about 1Mbit/sec with huge numbers of Out-of-Order TCP segments, TCP retransmissions and other TCP errors.

My environment is as follows;

 * Single Virtual Machine running MAAS Rack and Region controller.
 * Linux maas 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
 * nginx version: nginx/1.14.0 (Ubuntu)
 * ESXi VM - 2 vCPU, 4GB RAM, 1x NIC using VMXNET3, portgroup with VLAN.
 * ESXi Hypervisor - 6.5.0 Update 2 (Build 10884925)

I originally posted about this on the MAAS forums at https://discourse.maas.io/t/slow-image-download-unless-running-tcpdump/309 and was encouraged to submit a bug report.

Many thanks.

Revision history for this message
KingJ (kj-kingj) wrote :
Changed in maas:
milestone: none → 2.6.0
status: New → Triaged
summary: - Extremely Slow kernel/initrd/squashfs Transfers
+ [2.5] Extremely Slow kernel/initrd/squashfs Transfers due to nginx
+ sendfile on
Revision history for this message
Björn Tillenius (bjornt) wrote : Re: [2.5] Extremely Slow kernel/initrd/squashfs Transfers due to nginx sendfile on

I managed to reproduce this, although the other way around (with MAAS in a container and PXE booting VMS).

I'm going to do some performance testing to see whether it regresses badly turning off sendfile, tpc_nopush and tcp_nodelay, since they all seem to be causing problems with VMs.

If there are no severe regressions, let's turn them off.

Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Does it still happen on a supported MAAS version (3.1 or later)?

summary: - [2.5] Extremely Slow kernel/initrd/squashfs Transfers due to nginx
- sendfile on
+ Extremely Slow kernel/initrd/squashfs Transfers due to nginx sendfile on
no longer affects: maas/2.5
Changed in maas:
milestone: 2.6.0 → none
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.