[Hyper-V] Memory Ballooning re-broken in 16.04

Bug #1584597 reported by lf
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned
Xenial
Expired
High
Unassigned

Bug Description

Regression: #1294283 fixed memory ballooning in Hyper-V in 14.04, but the bug has returned in 16.04.

Steps to reproduce:

1. Create Gen2 Hyper-V VM
2. Install Ubuntu 16.04
3. modprobe hv_balloon
Memory usage according to Hyper-V Manager will not change

This has been reproduced on 2 fresh Ubuntu Server 16.04 instances.

I installed linux-generic-xenial on a 14.04 instance, rebooted, modprobe'd hv_balloon and memory usage in Hyper-V Manager reduced to the correct level.

Environment:
Host: Windows NanoServer 2016

Ubuntu Server 16.04 version_signature: `Ubuntu 4.4.0-21.37-generic 4.4.6`
Ubuntu Server 14.04 version_signature: `Ubuntu 4.4.0-22.40~14.04.1-generic 4.4.8`

Revision history for this message
lf (l-ububtu-3) wrote :

I'd like to post the output of lspci -vnvn, but it doesn't return anything on Hyper-V.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1584597

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
Changed in linux (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Joshua R. Poulson (jrp) wrote : Re: Hyper-V Memory Ballooning re-broken in 16.04

I have verified that 4.4.0-22 does not balloon memory back to Hyper-V and 4.4.0-21 does. We are investigating, but there doesn't seem to be a lot of Hyper-V changes that relate to memory between these two kernels.

Revision history for this message
Joshua R. Poulson (jrp) wrote :

I have tried a number of different reproduction scenarios at this point on 4.4.0-21 and 4.4.0-22 and am not getting 100% except on a fresh install.

Why are you modprobing hv_balloon? It should always be present and should never need to be loaded manually.

Revision history for this message
Joshua R. Poulson (jrp) wrote :

Per https://technet.microsoft.com/en-us/library/dn531029.aspx the recommended kernel and tools to load is as follows:

# apt-get update
# apt-get install --install-recommends linux-virtual-lts-xenial
# apt-get install --install-recommends linux-tools-virtual-lts-xenial linux-cloud-tools-virtual-lts-xenial

Of course, it has to be rebooted after this... I *don't* recreate the problem when I do this... but I did recreate it on a VM that had been running for a couple weeks.

tags: added: kernel-da-key kernel-hyper-v
tags: added: xenial
Revision history for this message
lf (l-ububtu-3) wrote :

Oddly, the issue seems to have fixed itself. It's quite unclear how it managed to do so though.

Revision history for this message
lf (l-ububtu-3) wrote :

I just reinstalled a VM and, on that VM, it doesn't work.

It would be really nice to have some visibility into the inner workings of the hv_balloon module to figure out what's going on with it.

Revision history for this message
Joshua R. Poulson (jrp) wrote :

@l-ububtu-3 What was the procedure on the reinstall?

Based on my results, I will add some more checks to our long-term testing efforts to see if ballooning messages are being dropped.

An explanation of Hyper-V dynamic memory in general is here: https://technet.microsoft.com/en-us/library/hh831766(v=ws.11).aspx

For the Linux driver the process is relatively straightforward. When there's less memory pressure than what is allocated, the guest balloon driver allocates buffers to itself and releases that memory to Hyper-V. As a result, the Linux guest will think the memory is there but in use, but Hyper-V will report a lower assigned memory. When there is more pressure than what is available, free, first the balloon driver will request memory and release the buffers it had allocated when they are assigned. If there are no more buffers available, Hyper-V will assign more memory and the driver will "hot add" that memory to the guest. If there is no pressure, nothing will change.

Revision history for this message
Jason Couture (plaguethenet) wrote :

This bug is causing some serious issues for us with the 16.04 rollout. We oversubscribe which is admittedly our problem, and our 16.04 VM's spike to 8gb during boot, but then settle at a reasonable 2gb, but the VM continues to hog 8gb to itself. We have 5 total, so that's a grand total of 30GB that's held for no reason at all.

We get a single set of messages from the balloon driver at boot, and then nothing at all.
This is all we get:
[ 62.565094] hv_balloon: Received INFO_TYPE_MAX_PAGE_CNT
[ 62.565153] hv_balloon: Data Size is 8

uname -a:
Linux ContainerHost1 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

/proc/version_signature:
Ubuntu 4.4.0-24.43-generic 4.4.10

free -h:
              total used free shared buff/cache available
Mem: 7.9G 2.3G 4.8G 9.9M 830M 5.5G
Swap: 2.0G 721M 1.3G

Hyper V shows 7790MB assigned to this VM.
Hyper-V settings are set for 512-8192MB, with a 20% buffer.

According to Hyper-V's description of dynamic memory, this VM should only have 2.76GB of ram attached to it at this point in time.

Hyper-V host is 2012 R2

Do we have any confirmation that this is actually a bug? Or any ETA on a fix?

Can I help you collect more info to fix the problem?

Revision history for this message
Jason Couture (plaguethenet) wrote : apport information

AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jun 21 09:43 seq
 crw-rw---- 1 root audio 116, 33 Jun 21 09:43 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=/dev/mapper/ContainerHost1--vg-swap_1
InstallationDate: Installed on 2016-05-12 (39 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Microsoft Corporation Virtual Machine
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 hyperv_fb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-24-generic.efi.signed root=/dev/mapper/hostname--vg-root ro
ProcVersionSignature: Ubuntu 4.4.0-24.43-generic 4.4.10
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-24-generic N/A
 linux-backports-modules-4.4.0-24-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial xenial xenial
Uname: Linux 4.4.0-24-generic x86_64
UnreportableReason: The report belongs to a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: False
dmi.bios.date: 11/26/2012
dmi.bios.vendor: Microsoft Corporation
dmi.bios.version: Hyper-V UEFI Release v1.0
dmi.board.asset.tag: None
dmi.board.name: Virtual Machine
dmi.board.vendor: Microsoft Corporation
dmi.board.version: Hyper-V UEFI Release v1.0
dmi.chassis.asset.tag: 8451-8042-7430-5443-4053-9006-43
dmi.chassis.type: 3
dmi.chassis.vendor: Microsoft Corporation
dmi.chassis.version: Hyper-V UEFI Release v1.0
dmi.modalias: dmi:bvnMicrosoftCorporation:bvrHyper-VUEFIReleasev1.0:bd11/26/2012:svnMicrosoftCorporation:pnVirtualMachine:pvrHyper-VUEFIReleasev1.0:rvnMicrosoftCorporation:rnVirtualMachine:rvrHyper-VUEFIReleasev1.0:cvnMicrosoftCorporation:ct3:cvrHyper-VUEFIReleasev1.0:
dmi.product.name: Virtual Machine
dmi.product.version: Hyper-V UEFI Release v1.0
dmi.sys.vendor: Microsoft Corporation

tags: added: apport-collected
Revision history for this message
Jason Couture (plaguethenet) wrote : CRDA.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : JournalErrors.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : ProcModules.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : UdevDb.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : WifiSyslog.txt

apport information

Revision history for this message
Jason Couture (plaguethenet) wrote : Re: Hyper-V Memory Ballooning re-broken in 16.04

Screenshot of hyper-v settings, and VM state, next to VM

Revision history for this message
Joshua R. Poulson (jrp) wrote :

Thanks for all the additional information, we'll take a look.

Revision history for this message
Joshua R. Poulson (jrp) wrote :

In your screenshot, memory demand reported by Hyper-V is quite a bit higher than within the guest. How soon after the peak memory demand was the screenshot taken?

Revision history for this message
Alex Ng (alexng-v) wrote :

Hi Jason,

As Josh mentioned, the screenshot shows that demand as seen by Hyper-V is still quite high compared to what is displayed inside the guest. If this number doesn't settle, then Hyper-V would have no reason to reclaim memory. Does the Hyper-V memory demand number ever settle after some time?

With the "free" command, you should be seeing the following if memory is being returned to the host successfully:

- Because Linux guests return memory via ballooning instead of hot-unplug; you'll never see the "total" memory decrease (i.e. it'll always reflect the spike)

- Instead, you should see that "used" memory increases and "free" memory decreases. This indicates the kernel allocated some memory so no one can use it, and that Hyper-V has reclaimed this memory. (i.e. the used memory should increase by roughly the amount that the host has reclaimed)

Revision history for this message
Jason Couture (plaguethenet) wrote :
Revision history for this message
Jason Couture (plaguethenet) wrote :

This is the first time i've seen "free" drop down low as it should, but the memory still hasn't been reclaimed by hyper-v. This particular VM has been up 19 days. am I wrong in thinking that dynamic mem should have scaled back to maybe... 2.2-ish GB? Roughly?

I tried looking at hv_balloon.c, but i'm rather rusty so I don't think i'll find anything myself.

Revision history for this message
Jason Couture (plaguethenet) wrote :

Here's another screenshot from a different VM, same thing as previous screenshot.

Demand should be roughly 3.6ish, but instead it's 7ish.

This box has been running for 7 days, I think this is the machine from the original screenshot.

Revision history for this message
Jason Couture (plaguethenet) wrote :

My apologies for so many separate comments, I'll try to keep the noise down.

I should add that these boxes run redis, and the amount of data in redis is fairly static. It's a devtest environment. Memory demand peaks at boot and then falls off to a constant usage. So it's safe to assume that the memory usage has remained at the levels it is now overnight, if not since boot.

Revision history for this message
Alex Ng (alexng-v) wrote :

In both screenshots, it appears that the "buff/cache" value is very high.

Perhaps the balloon driver is reporting the "buff/cache" as in use to the host. Can you try running following commands to flush write buffers and free page cache?

1) sync
2) echo 3 > /proc/sys/vm/drop_caches

After running the above, do you see that the "buff/cache" value decreases and does this result in the memory being reclaimed by Hyper-V?

Revision history for this message
Jason Couture (plaguethenet) wrote :

This machine didn't have any impact, but the other one dropped about 1g of ram when I did that, less than should have been reclaimed, but some reclamation at least.

Revision history for this message
Jason Couture (plaguethenet) wrote :

Screenshot of VM that dropped some ram.

Revision history for this message
Alex Ng (alexng-v) wrote :

Thanks. In both VMs, it looks like the buff/cached memory dropped by 2GB. After some time, did the Hyper-V host eventually reclaim this memory? Can you check if the Hyper-V host's assigned memory also drops after some time?

Also, can you tell us if you tried this on an older build where it doesn't repro?

Revision history for this message
Alex Ng (alexng-v) wrote :

One other caveat I should mention:

Even if the VM has reduced its memory consumption, Hyper-V does not necessarily reclaim the unused memory from that VM.

Generally, Hyper-V reclaims unused memory from a VM if it's seeing memory pressure from other VMs or if the Hyper-V host itself is seeing memory pressure.

The behavior you are seeing might be expected. Indeed, if your physical host has a lot of physical memory, Hyper-V may decide that enough memory is available to satisfy all VMs and it doesn't need to reclaim.

If say a bunch of other VMs come online that also need a lot of memory, perhaps we would see Hyper-V start to reclaim the unused memory from the existing VMs.

Revision history for this message
Jason Couture (plaguethenet) wrote :

So both VM's are still sitting at the same memory usage according to hyper-v.
Memory demand hasn't seemed to budge either.

I tried this on a 15.10 box we had, and got very different results. I logged in (Server needs a GUI for some tools) and memory went from 2gb to 6gb. I did a few things, did a uname -a to get the kernel version, and then logged out.

While the allocated memory stayed at 6gb, the memory demand dropped to about 2gb(I missed the demand in the first screenshot, you'll have to take my word at it).

In constrast (Screenshot #6), the VM is still demanding 4.5GB vs the 2.8 that it's using. Buffers are back up to 3.0gb though, I can rerun the commands and wait a few minutes to see if you think that'd make a difference.

I increased memory pressure by setting almost all VM's reserve buffer to 100%, alot of VM's had memory ballooned out for use for other VM's except for the 16.04 machines. (Screenshot 10)

It seems except for the one time the VM had memory removed, the 16.04's are reporting a high demand.

Is there anything else I can try to help you out?

Revision history for this message
Alex Ng (alexng-v) wrote :

The hv_balloon driver hasn't changed between 15.10 and 16.04, so there shouldn't be any difference in the way the driver reports demand to Hyper-V.

To provide a further breakdown of the memory usage, can you show the output of "cat /proc/meminfo"?

Might help to compare this info between 16.04 and 15.10.

Revision history for this message
Jason Couture (plaguethenet) wrote :

Here you go

root@ContainerHost1:~# cat /proc/meminfo
MemTotal: 8328584 kB
MemFree: 130152 kB
MemAvailable: 3530108 kB
Buffers: 273056 kB
Cached: 3071076 kB
SwapCached: 598656 kB
Active: 3785684 kB
Inactive: 3125476 kB
Active(anon): 2047484 kB
Inactive(anon): 2241632 kB
Active(file): 1738200 kB
Inactive(file): 883844 kB
Unevictable: 3660 kB
Mlocked: 3660 kB
SwapTotal: 2097148 kB
SwapFree: 1293344 kB
Dirty: 32316 kB
Writeback: 0 kB
AnonPages: 3010444 kB
Mapped: 404308 kB
Shmem: 719664 kB
Slab: 959700 kB
SReclaimable: 820980 kB
SUnreclaim: 138720 kB
KernelStack: 17200 kB
PageTables: 33148 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6261440 kB
Committed_AS: 7291440 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1992704 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 202152 kB
DirectMap2M: 8183808 kB

Revision history for this message
Alex Ng (alexng-v) wrote :

> Committed_AS: 7291440 kB

The amount of committed memory reported by meminfo is about 7GB. So it shouldn't be surprising to see that the demand as seen by Hyper-V is also high.

In general, the Hyper-V demand is calculated as Committed_AS value plus some buffer(roughly 700MB if you have 8GB of total ram). So for the above meminfo report, I would expect Hyper-V to show that about ~7-8GB is in use.

Can you show /proc/meminfo when things have calmed down (as well as a screenshot of what Hyper-V is showing at the same time)?

Revision history for this message
erty (ertymail) wrote :

I had same issue because my memory parameters didn't divide by 128mb
see official Hyper-V documentation Note 8 and 9: https://technet.microsoft.com/ru-ru/windows-server-docs/compute/hyper-v/supported-ubuntu-virtual-machines-on-hyper-v

Revision history for this message
Evan (evancox10) wrote :

OP seems to be MIA, but I'm also seeing higher than expected memory usage. This is actually showing up in both Ubuntu 15.10 and 16.04. I've attached screenshots of `cat proc/meminfo` from both an Ubuntu 15.10 Desktop and Ubuntu Mate 16.04.1 Desktop install.

The 15.10 VM has dynamic memory set to 512 MiB - 4096 MiB. Assigned memory is maxed out at 4096 MiB, memory demand is reported as 3604 MiB.

The 16.04.1 VM has dynamic memory set to 512 MiB - 2048 MiB. Assigned memory is maxed out at 2048 MiB, memory demand is reported as 2088 MiB.

Both of these are with the VM idling for days and nothing going on.

The host computer has 16 GiB of memory, with 4.4 GiB available. Memory usage by host applications (other than VMs) is relatively light.

Revision history for this message
Evan (evancox10) wrote :

OP seems to be MIA, but I'm also seeing higher than expected memory usage. This is actually showing up in both Ubuntu 15.10 and 16.04. I've attached screenshots of `cat proc/meminfo` from both an Ubuntu 15.10 Desktop and Ubuntu Mate 16.04.1 Desktop install.

The 15.10 VM has dynamic memory set to 512 MiB - 4096 MiB. Assigned memory is maxed out at 4096 MiB, memory demand is reported as 3604 MiB.

The 16.04.1 VM has dynamic memory set to 512 MiB - 2048 MiB. Assigned memory is maxed out at 2048 MiB, memory demand is reported as 2088 MiB.

Both of these are with the VM idling for days and nothing going on.

The host computer has 16 GiB of memory, with 4.4 GiB available. Memory usage by host applications (other than VMs) is relatively light.

Revision history for this message
Evan (evancox10) wrote :

Sorry, second screen shot didn't get attached. Here it is. See post #38 for details.

Revision history for this message
Bernhard (galmok) wrote :

I seem to experience a kernel crash (or application crash in early moments of this bug) due to Hyper-V ballooning (or at least dynamic memory). I am using Xenial on Hyper-V on Windows Server 2012 64bit. I started using dynamic memory (1024 Kb start memory, allowed growing to 3072 Kb). Apparently this worked fine; I saw memory rise during demand and memory was released back to the host when not needed. But the releasing of memory apparently causes memory corruption. I noticed this my using the stress tool that was called like this:

bme@zabbix:~$ stress --vm-bytes $(awk '/MemFree/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1
stress: info: [3698] dispatching hogs: 0 cpu, 0 io, 1 vm, 0 hdd
stress: FAIL: [3699] (522) memory corruption at: 0x7fd801e00010
stress: FAIL: [3698] (394) <-- worker 3699 returned error 1
stress: WARN: [3698] (396) now reaping child worker processes
stress: FAIL: [3698] (451) failed run completed in 0s

The memory corruption only occured when Hyper-V had reclaimed memory (balloning). Disabling dynamic memory has removed the memory corruptions.

Revision history for this message
Alex Ng (alexng-v) wrote :

Bernhard(galmok), the memory corruption you're seeing is likely a known issue that happens when Linux guests are running on Windows Server 2012. This was fixed recently in the upstream Linux kernels, so hopefully a future update will have these patches.

Revision history for this message
Alex Ng (alexng-v) wrote :

Evan, from your screenshots, I can see that the memory demand values and the /proc/meminfo values do match closely to each other.

On the 16.04.01 VM, /proc/meminfo shows Committed_AS value is ~1800MB. The balloon driver then adds a buffer of ~300MB (this buffer is calculated according to how much total memory is assigned to the VM). So ~1800MB + ~300MB = 2100MB matches closely with the 2088 MB memory demand that you're seeing from the host.

A similar calculation on the 15.10 VM also shows that the meminfo values match closely with the reported memory demand. (~2900MB committed + ~500MB buffer = 3400MB compared to 3600MB)

The main culprit in both cases appears to be the large amount of committed memory. Would be worth investigating what the cause of this memory usage is.

Joshua R. Poulson (jrp)
summary: - Hyper-V Memory Ballooning re-broken in 16.04
+ [Hyper-V] Memory Ballooning re-broken in 16.04
Joshua R. Poulson (jrp)
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in linux (Ubuntu Xenial):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Xenial) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Xenial):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.