[Hyper-V] udev trying to modify owner and mode of /dev/vmbus/hv_vss

Bug #1496927 reported by Chris Valean on 2015-09-17
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Wily
High
Joseph Salisbury
Xenial
High
Joseph Salisbury

Bug Description

While testing Ubuntu 15.10 Wily Werewolf (development branch) 64 bit we found the below issue.

Repro rate: 100%
Repro details:
Hyper-V: Server 2012 (no R2) - WS 2012 is the only affected platform
Daily build version: 2015.09.10
Kernel: kernel 4.2.0-7-generic

During the below event, we also noticed that system-udevd is constantly around 100% CPU on the VM, taking a lot of cycles.

On Server 2012 we don't have the Live backup with VSS feature, so these messages should not occur in the first place.

After 20 minutes we have over 3000 messages in the logs, all like below:
Sep 14 17:43:33 ubuntu systemd-udevd[14292]: setting owner of /dev/vmbus/hv_vss to uid=0, gid=0 failed: No such file or directory
Sep 14 17:44:00 ubuntu systemd-udevd[14292]: setting mode of /dev/vmbus/hv_vss to 020600 failed: No such file or directory
Sep 14 17:44:00 ubuntu systemd-udevd[14292]: setting owner of /dev/vmbus/hv_vss to uid=0, gid=0 failed: No such file or directory
Sep 14 17:44:01 ubuntu systemd-udevd[14292]: setting mode of /dev/vmbus/hv_vss to 020600 failed: No such file or directory

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1496927

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Chris Valean (cvalean) on 2015-09-17
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Is this issue only happening in Wily, or does it also happen in earlier releases?

tags: added: kernel-da-key wily
Changed in linux (Ubuntu):
importance: Undecided → Medium
Chris Valean (cvalean) wrote :

We're seeing this only on Wily and on 2012.

This is not showing up with other Ubuntu releases and also neither on the other supported Windows Server releases.

Ovidiu Rusu (orusu) wrote :

Details:
Hyper-V: Server 2012 (no R2)
Kernel: 4.2.0-12-generic
Daily build version: 01-Oct-2015

Same problem occurs to me on the latest iso. I've tried to disable apparmor but nothing changed. The second ideea was to disable hv-vss-daemon and it seems that there is a little improvement, meaning that less messages of error are shown, but the systemd-udev process is still consuming a lot of CPU.

Chris Valean (cvalean) wrote :

Joe, adding to what my colleague Ovidiu said, the behavior remains and can be seen in the Oct 12nd build - kernel 4.2.0-16 when running on WS 2012 (without R2).

The systemd-udevd still takes an unusual high number of CPU power, with the system being idle.

In some other circumstances, the boot time can go as up to 3-5 minutes without a clear output on what is causing the delay.

Changed in linux (Ubuntu Wily):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Jirayut Nimsaeng (winggundamth) wrote :

Confirm that this bug happen to me too. I'm using Ubuntu 14.04.3 LTS and install new Kernel with command

sudo apt-get install -y linux-generic-lts-wily linux-tools-virtual-lts-wily linux-cloud-tools-virtual-lts-wily hv-kvp-daemon-init

Then reboot to get new Kernel 4.2.0-18-generic then systemd-udevd got high cpu usage instantly. If I run the command "udevadm monitor" it will show a lot of these repeat all the time.

UDEV [20358.123891] add /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_6722212 (vmbus)
KERNEL[20358.123989] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_6722212 (vmbus)
UDEV [20358.124105] remove /devices/virtual/misc/vmbus!hv_vss (misc)
KERNEL[20358.124336] add /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_6722213 (vmbus)
UDEV [20358.124356] remove /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_6722212 (vmbus)
KERNEL[20358.124403] add /devices/virtual/misc/vmbus!hv_vss (misc)
UDEV [20358.124753] add /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/vmbus_6722213 (vmbus)
UDEV [20358.124853] add /devices/virtual/misc/vmbus!hv_vss (misc)

So I remove all wily kernel and back to vivid kernel then everything back to normal

Jirayut Nimsaeng (winggundamth) wrote :

I forgot to tell that my system using HyperV on Windows Server 2008 R2

Chris Valean (cvalean) wrote :

Joe, do you have any update on this issue, or raise the priority of it?
Thank you!

Changed in linux (Ubuntu Wily):
importance: Medium → High
Changed in linux (Ubuntu):
importance: Medium → High
tags: added: kernel-hyper-v
Joseph Salisbury (jsalisbury) wrote :

Are there specific steps to reproduce this issue, or does it just require running HypverV with a Wily kernel?

Yes. Just install Wily kernel on HyperV with Windows Server 2008 R2 or Windows Server 2012 (No R2)

Joseph Salisbury (jsalisbury) wrote :

I confirm that this can not be reproduce on Windows Server with R2. I'm creating a second Hyper-V environment without R2. Does it matter if this is a Gen1 or Gen2 guest?

Chris Valean (cvalean) wrote :

Hi Joe,
On 2012 and prior you have only Gen1.
Please use Windows Server 2012 (*without* R2) with a Wily VM for the direct repro.
Given Jirayut replies, it seems that this goes down to the Wily kernel, so with Wily on WS2012 you'll have the direct repro.

Joseph Salisbury (jsalisbury) wrote :

I'm not able to reproduce the bug. I should be able to perform a bisect now and identify the commit that caused this.

Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in linux (Ubuntu Wily):
status: Confirmed → In Progress
Joseph Salisbury (jsalisbury) wrote :

Yes, this is defiantly a regression. I booted the wily guest with the mainline kernel(v4.2-rc2) and the bug still exists. I then booted with the Vivid kernels(3.19 based) and the bug went away. I'll bisect this down and should have an update shortly.

tags: added: xenial
Chris Valean (cvalean) wrote :

Joe, do you have any updates on this?
Thank you!

Joseph Salisbury (jsalisbury) wrote :

I moved my test system back over to Windows Server 2012 R2 to do some testing for bug 1470250. I'll switch back to without the R2 updates and continue the bisect on this bug.

Serge (serge-0) wrote :

Hi guys,

there is the same bug in Ubuntu 15.04 with default kernel 4.2.0.19 as well when using it as vm upon MS Windows Server 2012 (with no R2) with Hyper -V role. I've downgraded the kernel to the one of version 3.19.0-39 and a high cpu load has disappeared at all.
Currently a don't see a systemd-udev in the first rank when starting atop.

Serge (serge-0) wrote :

I've also spotted the fact that when launching /rebooting а vm with the kernel 4.2.x it takes too long (about 2 minutes) to put up an eth0. There is no such an issue with the kernel 3.19.x in Ubuntu 15.04/15.10

Serge (serge-0) wrote :

Finnaly after some efforts i've manged to make working linux-tools and linux-cloud-tools in Ubuntu 15.10 with the kernel 3.19.0-39-generic.

As the above tools for the kernel 3.19.0-39 are available in Vivid distro only (i fogot to mention I had upgraded 15.04 into 15.10) so I had to download them manually as files and then install in vm.

Here are the names of the pakages:

linux-cloud-tools-3.19.0-39-generic_3.19.0-39.44_amd64.deb
linux-cloud-tools-3.19.0-39_3.19.0-39.44_amd64.deb
linux-lts-vivid-cloud-tools-3.19.0-39_3.19.0-39.44~14.04.1_amd64.deb
linux-tools-3.19.0-39-generic_3.19.0-39.44~14.04.1_amd64.deb
linux-lts-vivid-tools-3.19.0-39_3.19.0-39.44~14.04.1_amd64.deb

Henrik (fridstrom) wrote :

Any progress on this Joseph?

Chris Valean (cvalean) wrote :

We're seeing this also on the proposed 4.3 rebased kernel (from bug ID 1519917) on top of 15.10.
So this appeared the first time in 4.2 and then into 4.3.

Ovidiu Rusu (orusu) wrote :

I've tried to boot an Ubuntu 16.04 daily build from 09-Feb on a Windows Server 2012 but it takes around 6-7 minutes. If I try to reboot it after boot the VM hangs. Also system-udevd consumes a lot of CPU.

Joshua R. Poulson (jrp) wrote :

Joe, be aware that the vss daemon for Hyper-V only supports host versions 5.0 and greater. As a result, it won't actually *do* anything on Windows Server 2012. I'm not sure what the underlying issue is, but is it spinning in an attempt to start the daemon? This really slows down our VMs and we do a lot of testing so it would be really good to figure out what changed to cause this behavior.

Sidney McGeer (sidneymcgeer) wrote :

Not sure if helpful but I just tried this on 4.5.0-040500rc5 and the problem is gone.

system-udevd is running at less than 1% of CPU

Joseph Salisbury (jsalisbury) wrote :

Thanks good news, Sidney. @Chris, can you confirm this issue is resolved on 4.5.0-040500rc5? If it is, I can perform a reverse bisect to identify the fix, unless an obvious fix sticks out in the git logs.

Chris Valean (cvalean) wrote :

Hi Joe,

I have a 15.04 install running linux-next 4.5.0-rc5 from Febr 26th, on which I can confirm Sidney's update, that udev/systemd is no longer slowing this down to a halt.

As the default 3.19 kernel on 15.04 is not having this behavior, I've installed kernel 4.2.8 from the ubuntu ppa mainline tree.
Running 4.2.8 on 15.04 brings us to the original problem with systemd-udevd CPU cycles.
It would be great if you can do the bisect this as high as kernel 4.4 from Xenial, as that one was also showing the same symptoms, so you can narrow down the patches between 4.4 (xenial) and 4.5 upstream.
Thank you!

Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Chris. I'll perform a reverse bisect to find the commit that fixes this issue.

Joseph Salisbury (jsalisbury) wrote :

I believe the following commit will fix this bug:

commit ed9ba608e4851144af8c7061cbb19f751c73e998
Author: Olaf Hering <email address hidden>
Date: Mon Dec 14 16:01:42 2015 -0800

    Drivers: hv: vss: run only on supported host versions

I built a Wily test kernel with this commit, which can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1496927/

Can you confirm this test kernel fixes the bug?

Joseph Salisbury (jsalisbury) wrote :

Commit ed9ba608e is in Xenial as of the Ubuntu-4.4.0-9.24 kernel. So if commit ed9ba608e is the real fix, this bug should already be fixed in Xenial.

Adrian Suhov (asuhov) wrote :

Hi!
I tested Wily with the test kernel from http://kernel.ubuntu.com/~jsalisbury/lp1496927/ and it fixes the bug. I also tested Xenial with 4.4.0-11-generic kernel and the issue is fixed there too.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Adrian. I'll submit an SRU request to have commit ed9ba608e included in Wily.

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Released
Joseph Salisbury (jsalisbury) wrote :

I submitted an SRU request for this fix to be included in Wily.

Brad Figg (brad-figg) on 2016-03-14
Changed in linux (Ubuntu Wily):
status: In Progress → Fix Committed
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-wily
Chris Valean (cvalean) wrote :

We've tested the proposed kernel 4.2.0.35.38 and the issue is no longer present, basic functionality has also been tested.

Thank you!

tags: added: verification-done-wily
removed: verification-needed-wily
Launchpad Janitor (janitor) wrote :
Download full text (20.9 KiB)

This bug was fixed in the package linux - 4.2.0-35.40

---------------
linux (4.2.0-35.40) wily; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1557706

  [ Upstream Kernel Changes ]

  * Revert "workqueue: make sure delayed work run in local cpu"
    - LP: #1556269
  * Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo"
    - LP: #1556269
  * KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX
    - LP: #1552592
  * locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
    - LP: #1545330
  * [media] usbvision fix overflow of interfaces array
    - LP: #1556269
  * [media] usbvision: fix crash on detecting device with invalid
    configuration
    - LP: #1556269
  * ASN.1: Fix non-match detection failure on data overrun
    - LP: #1556269
  * iw_cxgb3: Fix incorrectly returning error on success
    - LP: #1556269
  * EVM: Use crypto_memneq() for digest comparisons
    - LP: #1556269
  * vmstat: explicitly schedule per-cpu work on the CPU we need it to run
    on
    - LP: #1556269
  * x86/entry/compat: Add missing CLAC to entry_INT80_32
    - LP: #1556269
  * iio-light: Use a signed return type for ltr501_match_samp_freq()
    - LP: #1556269
  * iio: add IIO_TRIGGER dependency to STK8BA50
    - LP: #1556269
  * iio: add HAS_IOMEM dependency to VF610_ADC
    - LP: #1556269
  * iio: dac: mcp4725: set iio name property in sysfs
    - LP: #1556269
  * iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG
    - LP: #1556269
  * iio: light: acpi-als: Report data as processed
    - LP: #1556269
  * iio:adc:ti_am335x_adc Fix buffered mode by identifying as software
    buffer.
    - LP: #1556269
  * ASoC: rt5645: fix the shift bit of IN1 boost
    - LP: #1556269
  * ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
    - LP: #1556269
  * cgroup: make sure a parent css isn't offlined before its children
    - LP: #1556269
  * ARM: OMAP2+: Fix wait_dll_lock_timed for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix l2dis_3630 for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix save_secure_ram_context for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix l2_inv_api_params for rodata
    - LP: #1556269
  * ARM: OMAP2+: Fix ppa_zero_params and ppa_por_params for rodata
    - LP: #1556269
  * rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded
    - LP: #1556269
  * PCI/AER: Flush workqueue on device remove to avoid use-after-free
    - LP: #1556269
  * ARM: dts: Fix wl12xx missing clocks that cause hangs
    - LP: #1556269
  * libata: disable forced PORTS_IMPL for >= AHCI 1.3
    - LP: #1556269
  * mac80211: Requeue work after scan complete for all VIF types.
    - LP: #1556269
  * rfkill: fix rfkill_fop_read wait_event usage
    - LP: #1556269
  * ARM: dts: at91: sama5d4: fix instance id of DBGU
    - LP: #1556269
  * ARM: dts: at91: sama5d4ek: add phy address and IRQ for macb0
    - LP: #1556269
  * ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
    - LP: #1556269
  * crypto: shash - Fix has_key setting
    - LP: #1556269
  * Input: vmmouse - fix absolute device registration
    - LP: #1556269
  * spi: atmel: fix gpio chip-select in case of non-DT platform
    - LP: #1556269
  ...

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers