Comment 52 for bug 1668129

Revision history for this message
Dan Streetman (ddstreet) wrote :

> This bug is still present on 14.04 using linux-generic-lts-xenial kernel 4.4.0-87-generic.

that's correct, and there is no planned change for the standard kernel. Only the linux-aws kernel is being changed to address this issue, by disabling Xen memory ballooning, as described in comment 50.

A bit more detail on the issue:

1. AWS Xen hypervisor boots linux and provides e820 map, and Xen balloon target.
2. Ubuntu kernel boots and sets up all memory listed in the e820 map.
3. Xen balloon driver notices total memory doesn't quite match its target, and so requests some pages from Xen hypervisor.
4. AWS Xen hypervisor allows Ubuntu kernel balloon driver to have exactly 11 more pages, which are registered with the Ubuntu kernel as hotplugged memory (hypervisor rejects requests for any more balloon pages).
5. The new balloon hotplugged pages are enabled (via udev or kernel config or sysfs), which makes them available for general use
6. If any NVMe I/O operation uses any of those 11 balloon pages for DMA, the hypervisor sees that the page physical address is outside its e820 map address range (because it was a hotplugged page) and fails the NVMe I/O.

The problem here lies either in #4 or #6 above, meaning that the hypervisor either should reject all requests for additional hotplugged memory pages (step 4) or it should allow DMA using hotplugged memory pages (step 6). Any change to the Ubuntu kernel is only working around this hypervisor problem by not enabling any hotplugged pages.

AWS is well aware of this and is investigating what changes can be made to their hypervisor, but I am not part of those discussions and so I can't provide any more detail on if/when AWS might fix either #4 and/or #6. I will note that the Amazon Linux kernel has Xen ballooning disabled, and I believe the RHEL kernel does as well, so they have both only worked around this issue.

Until the AWS hypervisor is changed, there are various options to work around the issue:

Trusty:
The trusty 14.04 release does have Xen ballooning enabled, and it does hotplug memory, however the udev rules do not enable the hotplugged memory, so this issue does not exist in trusty (unless the hotplugged memory is manually enabled).

Xenial with 4.4 kernel:
The standard 4.4 kernel in Xenial does have Xen ballooning enabled, because it may be desired under non-AWS Xen hypervisors. The recommended way to work around the issue is to edit the 40-vm-hotadd.rules as described in comment 29.

Xenial with HWE kernel, or Zesty:
Starting with the 4.8 kernel, hotplug memory is automatically onlined, so in addition to editing the udev rule as described above (in Xenial with 4.4 kernel), you also must add a kernel boot param as described in comment 44.

Xenial linux-aws:
The linux-aws kernel has Xen ballooning disabled in the kernel configuration, so it will not cause any memory to be hotplugged, thus avoiding the problem; no other workaround is required when using the linux-aws kernel.

I am marking this as "wont fix" for the standard Xenial kernel.