Comment 24 for bug 1668129

Revision history for this message
Dan Streetman (ddstreet) wrote :

> We see an address of 0xfc7ffb000

Hi Matt,

I don't think you're accounting for the additional pages due to the Xen balloon, are you? That increases physical memory, after boot. If you check the /proc/zoneinfo file, look at the Normal zone's spanned pages and start pfn, e.g.:

Node 0, zone Normal
  pages free 15116671
        min 7661
        low 22873
        high 38085
   node_scanned 0
        spanned 15499264
        present 15499264
        managed 15212161
...
  start_pfn: 1048576

and so,
$ printf "%x\n" $[ 1048576 + 15499264 ]
fc8000

meaning that address you see is part of the pages in the balloon memory region...

I disabled Ubuntu's memory hotadd (commented it out in /lib/udev/rules.d/40-vm-hotadd.rules), and rebooted, and the Normal zone's present pages was reduced so that the end is fc0000, matching the boot time max pfn; I then tried to reproduce the problem and it seems gone!

So I think that must be the issue; the hypervisor's NVMe driver isn't expecting any pages from the Xen ballooned region. I checked on Amazon Linux, and saw why it isn't affected:

$ grep XEN_BALLOON /boot/config-4.4.41-36.55.amzn1.x86_64
# CONFIG_XEN_BALLOON is not set

I suspect that skips quite a lot of problems for Amazon Linux, as the Xen ballooning is quite annoying (see bug 1518457 comment 126, for example).

Maybe Ubuntu should disable Xen ballooning for AWS also? If not, then this seems to be a hypervisor bug, it needs to allow pages from the ballooned region also.