[Hyper-V] KVP daemon fails to start on first boot of disco VM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| linux (Ubuntu) |
Undecided
|
Seth Forshee | ||
| Xenial |
Undecided
|
Unassigned | ||
| Bionic |
Undecided
|
Unassigned | ||
| Disco |
Undecided
|
Unassigned |
Bug Description
SRU Justification
Impact: The KVP daemon fails to start on first boot due to being started before the hv_kvp device appears.
Fix: Update the hv-kvp-daemon service file to start the daemon after device node appears.
Regression Potential: The changes are only to the hv-kvp-daemon service file and adding a udev rule, so the worst case regression would be that the service does not start. In testing the service did start as expected.
Test Case: See comment #15.
---
Launching a recent daily image of disco on azure results in a VM in which the hv-kvp-
$ systemctl status -o cat hv-kvp-
● hv-kvp-
Loaded: loaded (/lib/systemd/
Active: failed (Result: exit-code) since Thu 2019-03-14 13:07:15 UTC; 11min a
Main PID: 219 (code=exited, status=1/FAILURE)
Started Hyper-V KVP Protocol Daemon.
KVP starting; pid is:219
open /dev/vmbus/hv_kvp failed; error: 2 No such file or directory
hv-kvp-
hv-kvp-
The instance was created with:
$ az vm create --resource-group [redacted] --image Canonical:
As best as I can tell, the /dev/vmbus/hv_kvp isn't available when the hv-kvp-
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.10-0ubuntu23
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 19.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Microsoft Corporation Virtual Machine
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=screen-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB: 0 hyperv_fb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: disco uec-images
Uname: Linux 4.18.0-1011-azure x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy netdev plugdev sudo video
_MarkForUpload: True
dmi.bios.date: 06/02/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 090007
dmi.board.name: Virtual Machine
dmi.board.vendor: Microsoft Corporation
dmi.board.version: 7.0
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Microsoft Corporation
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.name: Virtual Machine
dmi.product.uuid: 3b0f2160-
dmi.product.
dmi.sys.vendor: Microsoft Corporation
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
tags: | added: disco |
apport information
tags: | added: apport-collected uec-images |
description: | updated |
apport information
Francis Ginther (fginther) wrote : Lspci.txt | #4 |
apport information
apport information
apport information
apport information
apport information
Francis Ginther (fginther) wrote : UdevDb.txt | #9 |
apport information
apport information
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
tags: | added: kernel-hyper-v |
Tobias Koch (tobijk) wrote : | #11 |
This also happens in Bionic minimal images and seems to be a simple race where the daemon is launched when the device is not yet visible.
It seems Fedora had the same or a similar issue:
https:/
From the conversation in that bug report, my understanding is that this daemon supports features that can be enabled/disabled while the system is live and that the daemon should therefore be launched/stopped dynamically when the /dev/vmbus/hv_kvp device node comes or goes.
Since this race condition happening or not happening may depend on host, hypervisor and boot speed, it could pop up also on existing releases and thus should probably be treated with a certain amount of urgency.
tags: | added: bionic cosmic xenial |
Seth Forshee (sforshee) wrote : | #12 |
I agree with comment #11, that this seems like a service problem. I took a quick look at the kernel side. Normally I might suggest that we make the relevant driver built-in, but in this case it already is. It looks like the device gets created in response to a message from the host (afaict, I'm unfamiliar with hyperv), and I'm not sure how we would speed that up. In any case it seems like modifying the service to only start once its dependencies are available is a good idea.
Dimitri John Ledkov (xnox) wrote : | #13 |
It's funny how long it took them to figure it out, but yes
https:/
is the right answer. Please use that patch / combination of tagging the device with tag&wants using udev rule + making the service bind to it.
Changed in linux (Ubuntu): | |
assignee: | nobody → Seth Forshee (sforshee) |
status: | Confirmed → In Progress |
Seth Forshee (sforshee) wrote : | #14 |
Please try the attached package. I believe you should be able to install it alone; none of the other packages look to be locked to a specific version of linux-cloud-
Changed in linux (Ubuntu): | |
status: | In Progress → Incomplete |
Francis Ginther (fginther) wrote : | #15 |
This fixed the problem that I'm seeing on disco with a 20190402 serial. My test was to recreate the problem by removing /dev/vmbus/hv_kvp then rebooting. This simple test showed that the service came up again in a failed state with 'systemctl status hv-kvp-
Then I installed this updated package, and redid the same experiment. This time the service was running with no errors.
description: | updated |
Changed in linux (Ubuntu): | |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Committed |
Launchpad Janitor (janitor) wrote : | #16 |
This bug was fixed in the package linux - 5.0.0-11.12
---------------
linux (5.0.0-11.12) disco; urgency=medium
* linux: 5.0.0-11.12 -proposed tracker (LP: #1824383)
* hns3: PPU_PF_
(LP: #1824194)
- net: hns3: fix for not calculating tx bd num correctly
* disco: unable to use iptables/enable ufw under -virtual kernel
(LP: #1823862)
- [Packaging] add bpfilter to linux-modules
* Make shiftfs a module rather than built-in (LP: #1824354)
- [Config] CONFIG_SHIFT_FS=m
* shiftfs: chown sets untranslated ids in lower fs (LP: #1824350)
- SAUCE: shiftfs: use translated ids when chaning lower fs attrs
* [Hyper-V] KVP daemon fails to start on first boot of disco VM (LP: #1820063)
- [Packaging] bind hv_kvp_daemon startup to hv_kvp device
linux (5.0.0-10.11) disco; urgency=medium
* linux: 5.0.0-10.11 -proposed tracker (LP: #1823936)
* Apparmor enforcement failure in lxc selftests (LP: #1823379)
- SAUCE: apparmor: Restore Y/N in /sys for apparmor's "enabled"
* systemd cause kernel trace "BUG: unable to handle kernel paging request at
6db23a14" on Cosmic i386 (LP: #1813244)
- openvswitch: fix flow actions reallocation
linux (5.0.0-9.10) disco; urgency=medium
* linux: 5.0.0-9.10 -proposed tracker (LP: #1823228)
* Packaging resync (LP: #1786013)
- [Packaging] resync git-ubuntu-log
- [Packaging] update helper scripts
- [Packaging] resync retpoline extraction
* Huawei Hi1822 NIC has poor performance (LP: #1820187)
- net-next/hinic: replace disable_
* Add uid shifting overlay filesystem (shiftfs) (LP: #1823186)
- shiftfs: uid/gid shifting bind mount
- shiftfs: rework and extend
- shiftfs: support some btrfs ioctls
- [Config] enable shiftfs
* Cannot boot or install - have to use nomodeset (LP: #1821820)
- Revert "drm/i915/fbdev: Actually configure untiled displays"
* Disco update: v5.0.6 upstream stable release (LP: #1823060)
- netfilter: nf_tables: fix set double-free in abort path
- dccp: do not use ipv6 header for ipv4 flow
- genetlink: Fix a memory leak on error path
- gtp: change NET_UDP_TUNNEL dependency to select
- ipv6: make ip6_create_rt_rcu return ip6_null_entry instead of NULL
- mac8390: Fix mmio access size probe
- mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S
- net: aquantia: fix rx checksum offload for UDP/TCP over IPv6
- net: datagram: fix unbounded loop in __skb_try_
- net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec
- net: phy: meson-gxl: fix interrupt support
- net: rose: fix a possible stack overflow
- net: stmmac: fix memory corruption with large MTUs
- net-sysfs: call dev_hold if kobject_
- net: usb: aqc111: Extend HWID table by QNAP device
- packets: Always register packet sk in the same order
- rhashtable: Still do rehash when we get EEXIST
- sctp: get sctphdr by offset in sctp_compute_cksum
- sctp: use memdup_user instead of vmemdup_user
- tcp: do not use ipv6 header for ipv4 flow
- tipc: allow servic...
Changed in linux (Ubuntu): | |
status: | Fix Committed → Fix Released |
Changed in linux (Ubuntu Disco): | |
status: | New → Fix Released |
Changed in linux (Ubuntu): | |
status: | Fix Released → Incomplete |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
Marcelo Cerri (mhcerri) wrote : | #17 |
For bionic and xenial: https:/
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Xenial): | |
status: | New → Fix Committed |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-bionic |
tags: | added: id-5c9d168ed6c5704cac05e595 |
Launchpad Janitor (janitor) wrote : | #19 |
This bug was fixed in the package linux - 4.15.0-74.84
---------------
linux (4.15.0-74.84) bionic; urgency=medium
* bionic/linux: 4.15.0-74.84 -proposed tracker (LP: #1856749)
* [Hyper-V] KVP daemon fails to start on first boot of disco VM (LP: #1820063)
- [Packaging] bind hv_kvp_daemon startup to hv_kvp device
* Unrevert "arm64: Use firmware to detect CPUs that are not affected by
Spectre-v2" (LP: #1854207)
- arm64: Get rid of __smccc_
- arm64: Use firmware to detect CPUs that are not affected by Spectre-v2
* Bionic kernel panic on Cavium ThunderX CN88XX (LP: #1853485)
- SAUCE: irqchip/gic-v3-its: Add missing return value in
its_
linux (4.15.0-73.82) bionic; urgency=medium
* bionic/linux: 4.15.0-73.82 -proposed tracker (LP: #1854819)
* CVE-2019-14901
- SAUCE: mwifiex: Fix heap overflow in mmwifiex_
* CVE-2019-14896 // CVE-2019-14897
- SAUCE: libertas: Fix two buffer overflows at parsing bss descriptor
* CVE-2019-14895
- SAUCE: mwifiex: fix possible heap overflow in mwifiex_
* CVE-2019-18660: patches for Ubuntu (LP: #1853142) // CVE-2019-18660
- powerpc/64s: support nospectre_v2 cmdline option
- powerpc/book3s64: Fix link stack flush on context switch
- KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
* Please add patch fixing RK818 ID detection (LP: #1853192)
- SAUCE: mfd: rk808: Fix RK818 ID template
* [SRU][B/
whitelist (LP: #1853246)
- HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630
- Input: elan_i2c - export the device id whitelist
- HID: quirks: Refactor ELAN 400 and 401 handling
* Lenovo dock MAC Address pass through doesn't work in Ubuntu (LP: #1827961)
- r8152: Add macpassthru support for ThinkPad Thunderbolt 3 Dock Gen 2
* s390/dasd: reduce the default queue depth and nr of hardware queues
(LP: #1852257)
- s390/dasd: reduce the default queue depth and nr of hardware queues
* External microphone can't work on some dell machines with the codec alc256
or alc236 (LP: #1853791)
- SAUCE: ALSA: hda/realtek - Move some alc256 pintbls to fallback table
- SAUCE: ALSA: hda/realtek - Move some alc236 pintbls to fallback table
* Memory leak in net/xfrm/
(LP: #1853197)
- xfrm: Fix memleak on xfrm state destroy
* CVE-2019-19083
- drm/amd/display: memory leak
* update ENA driver for DIMLIB dynamic interrupt moderation (LP: #1853180)
- net: ena: add intr_moder_
- net: ena: switch to dim algorithm for rx adaptive interrupt moderation
- net: ena: reimplement set/get_coalesce()
- net: ena: enable the interrupt_
- net: ena: remove code duplication in
ena_
- net: ena: remove old adaptive interrupt moderation code from ena_netdev
- net: ena: remove ena_restore_
- net: ena: remov...
Changed in linux (Ubuntu Bionic): | |
status: | Fix Committed → Fix Released |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-xenial |
Francis Ginther (fginther) wrote : | #21 |
I was unable to reproduce this failure on either a xenial minimal or standard image. I went ahead and ran through the testing documented in comment #15 with no issue. Each time, the device file was present as expected and the hv-kvp-
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
Launchpad Janitor (janitor) wrote : | #22 |
This bug was fixed in the package linux - 4.4.0-173.203
---------------
linux (4.4.0-173.203) xenial; urgency=medium
* xenial/linux: 4.4.0-173.203 -proposed tracker (LP: #1859718)
* CVE-2019-14615
- drm/i915/gen9: Clear residual context state on context switch
linux (4.4.0-172.202) xenial; urgency=medium
* xenial/linux: 4.4.0-172.202 -proposed tracker (LP: #1858594)
* tools/perf fails to build after Xenial update to 4.4.208 upstream stable
release (LP: #1858798)
- Revert "perf report: Add warning when libunwind not compiled in"
* CVE-2019-18885
- btrfs: refactor btrfs_find_device() take fs_devices as argument
- btrfs: merge btrfs_find_device and find_device
* Integrate Intel SGX driver into linux-azure (LP: #1844245)
- [Packaging] Add systemd service to load intel_sgx
* Xenial update: 4.4.208 upstream stable release (LP: #1858462)
- btrfs: do not leak reloc root if we fail to read the fs root
- btrfs: handle ENOENT in btrfs_uuid_
- ALSA: hda/ca0132 - Keep power on during processing DSP response
- ALSA: hda/ca0132 - Avoid endless loop
- drm: mst: Fix query_payload ack reply struct
- iio: light: bh1750: Resolve compiler warning and make code more readable
- spi: Add call to spi_slave_abort() function when spidev driver is released
- staging: rtl8188eu: fix possible null dereference
- rtlwifi: prevent memory leak in rtl_usb_probe
- IB/iser: bound protection_sg size by data_sg size
- media: am437x-vpfe: Setting STD to current value is not an error
- media: i2c: ov2659: fix s_stream return value
- media: i2c: ov2659: Fix missing 720p register config
- media: ov6650: Fix stored frame format not in sync with hardware
- tools/power/
- usb: renesas_usbhs: add suspend event support in gadget mode
- hwrng: omap3-rom - Call clk_disable_
- regulator: max8907: Fix the usage of uninitialized variable in
max8907_
- media: flexcop-usb: fix NULL-ptr deref in flexcop_
- samples: pktgen: fix proc_cmd command result check logic
- mwifiex: pcie: Fix memory leak in mwifiex_
- media: ti-vpe: vpe: fix a v4l2-compliance warning about invalid pixel format
- media: ti-vpe: vpe: fix a v4l2-compliance failure about frame sequence
number
- media: ti-vpe: vpe: Make sure YUYV is set as default format
- extcon: sm5502: Reset registers during initialization
- x86/mm: Use the correct function type for native_set_fixmap()
- perf report: Add warning when libunwind not compiled in
- iio: adc: max1027: Reset the device at probe time
- Bluetooth: hci_core: fix init for HCI_USER_CHANNEL
- drm/gma500: fix memory disclosures due to uninitialized bytes
- x86/ioapic: Prevent inconsistent state when moving an interrupt
- arm64: psci: Reduce the waiting time for cpu_psci_cpu_kill()
- libata: Ensure ata_port probe has completed before detach
- pinctrl: sh-pfc: sh7734: Fix duplicate TCLK1_B
- bnx2x: Fix PF-VF communication over multi-cos queu...
Changed in linux (Ubuntu Xenial): | |
status: | Fix Committed → Fix Released |
Pat Viafore (patviafore) wrote : | #23 |
We are no longer seeing this issue with the latest images.
Thiago Martins (martinx) wrote : | #24 |
I'm seeing this problem with Ubuntu 18.04.4 with HWE Kernel, fully upgraded.
Terry Rudd (terrykrudd) wrote : | #25 |
Thiago, can you be more specific about what you're seeing and with exactly which kernel?
Thiago Martins (martinx) wrote : | #26 |
Terry,
Ubuntu 18.04.4 with Kernel 5.3.0-40-generic, running on QEMU, takes more than 1:30 minutes to boot, it stops here for a while:
---
[ *] A start job is running for sys-devi…
---
I'm using the following image to build my QEMU VMs:
http://
Cheers!
Thiago
Terry Rudd (terrykrudd) wrote : | #27 |
Thank you Thiago, we will investigate further.
Thiago Martins (martinx) wrote : | #28 |
This is also happening on bare-metal machines as well!
It starts to happen after installing the "linux-
Also, Ubuntu 20.04 has the same problem.
20.04 slow boot stops for 1:30 min heree:
---
A start job is running for /sys/dev…
---
Thiago Martins (martinx) wrote : | #29 |
Maybe related to https:/
Marcelo Cerri (mhcerri) wrote : | #30 |
Hi.
Do you know if the problem happens with other kernel versions such as 4.15 or if it didn't use to happen before?
Donald Babcock (dbabcock) wrote : | #31 |
I am seeing this on 18.04 Linux 5.3.0-1020-azure VM. Sister VM on same kernel, same patch level, same host does not see the issue.
I isolated it to a Hyper-V integration service difference between the issue vm and it's non-affect sibling. The issue reliably appears and goes away if "data exchange services" under "integration services" are disabled or re-enabled on the VM. Data Exchange disabled produces the issue.
Clue was found here https:/
Brian Candler (b-candler) wrote : | #32 |
Seeing this on bare metal (Dell R740xd) with Ubuntu 18.04 and linux-image-
Ladar Levison (kingladar) wrote : | #33 |
This bug returned with the Groovy Gorilla release. In this case, the `linux-
vagrant init --box-version=3.1.0
vagrant up --provider=libvirt
If pulling the console up using virt-viewer you will see the boot process hang.
I managed to fix the bug with following:
# Disable the daemon to remove the symlihk.
systemctl disable hv-kvp-
# Override the default unit file with a version that won't hang during boot ups.
cat <<-EOF > /etc/systemd/
[Unit]
Description=Hyper-V KVP Protocol Daemon
ConditionVirtua
ConditionPathEx
DefaultDependen
BindsTo=
After=systemd-
Before=
Conflicts=
RequiresMountsF
[Service]
ExecStart=
[Install]
WantedBy=
EOF
See:
https:/
https:/
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1820063
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.