[Hyper-V] kvp daemon start failure on Zesty

Bug #1664663 reported by Chris Valean
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Joseph Salisbury
Zesty
Invalid
High
Joseph Salisbury

Bug Description

Hello,

On Ubuntu Server 17.04 daily from Febr 14th we see that the KVP daemon although it is enabled in systemd, it fails to start at system boot.

This is shown by error: "KVP: read failed; error:9 Bad file descriptor"

I then manually ran the kvp binary, it returned the KVP data, and it's also automatically loading at boot. However, this is of course not the normal behavior.
17.04 daily from Febr 14th is the first build we covered, so I cannot say if this is a regression introduced only at some point in 17.04 dailies.

In syslog these are the only messages for KVP:
Feb 14 08:01:22 ubuntu systemd[1]: Started Hyper-V KVP Protocol Daemon.
Feb 14 08:01:22 ubuntu KVP: KVP starting; pid is:1837
Feb 14 08:01:22 ubuntu KVP: KVP LIC Version: 3.1
Feb 14 08:02:31 ubuntu KVP: read failed; error:9 Bad file descriptor

systemd status:
root@ubuntu:~# systemctl status hv-kvp-daemon.service
● hv-kvp-daemon.service - Hyper-V KVP Protocol Daemon
   Loaded: loaded (/lib/systemd/system/hv-kvp-daemon.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2017-02-14 08:02:31 PST; 1min 41s ago
 Main PID: 1837 (code=exited, status=1/FAILURE)
      CPU: 4ms

Feb 14 08:01:22 ubuntu systemd[1]: Started Hyper-V KVP Protocol Daemon.
Feb 14 08:01:22 ubuntu KVP[1837]: KVP starting; pid is:1837
Feb 14 08:01:22 ubuntu KVP[1837]: KVP LIC Version: 3.1
Feb 14 08:02:31 ubuntu systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE
Feb 14 08:02:31 ubuntu systemd[1]: hv-kvp-daemon.service: Unit entered failed state.
Feb 14 08:02:31 ubuntu systemd[1]: hv-kvp-daemon.service: Failed with result 'exit-code'.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1664663

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Chris Valean (cvalean)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you see if this also happens with the latest mainline kernel:
 http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc8

tags: added: zesty
Revision history for this message
Chris Valean (cvalean) wrote :

Hi Joe,

The error comes from the kvp daemon, which is not part of the kernel itself.
Or along with the kernel-ppa are you also building linux-tools/cloud-tools?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It sounds like this is only happening on Zesty, so let me build an early Zesty kernel version with it's associated linux-tools and cloud-tools packages.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built an early Zesty kernel and tools packages. The packages can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1664663/

Can you test out these packages? That will tell us if this is a regression. If the bug still exists, we can try some earlier versions.

Changed in linux (Ubuntu Zesty):
status: Confirmed → Triaged
tags: added: kernel-hyper-v
Revision history for this message
Chris Valean (cvalean) wrote :

The same error appears from KVP also on the test kernel 4.8.0-22_4.8.0-22.24~lp1664663.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built another 4.8.0-22 test kernel, but using a Yakkety tree. That kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1664663/yakkety

Can you also give that kernel a try?

Revision history for this message
Stuart Miller (spmiller) wrote :

I have also seen this in 16.04.2, with the same symptoms as the OP.

Restarting the service (sudo systemctl restart hv-kvp-daemon.service) seems to have fixed this problem across reboots.

Revision history for this message
Chris Valean (cvalean) wrote :

Hi Joe,

Using today's ISO - zesty-server-amd64.iso from March 8th, and doing a fresh install, then installing linux-tools-generic and linux-cloud-tools-generic, I see that the KVP daemon starts at the reboot without the previous issue.
The tools are running on top of kernel 4.10.0-9.

However, now another issue surfaced.

A recent KVP daemon patch introduced a KVP_SCRIPT_PATH definition:
#define KVP_SCRIPT_PATH “/usr/libexec/hypervkvpd/

LIS KVP implementation has 3 scripts:
hv_get_dns_info
hv_get_dhcp_info
hv_set_ifconfig

These scripts are put by default under /usr/sbin, however with the above change, now these must be moved under /usr/libexec/hypervkvpd.
Or of course, default to /usr/sbin from the kvp daemon source file if you want to keep the current structure.

Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Revision history for this message
Chris Valean (cvalean) wrote :

Hi Stuart,

From your comment #8, I took the 16.04.2 server ISO and did an install.
This uses kernel 4.4.0-62 and after installing the linux-tools/cloud-tools, then I haven't ran into the initial issue.
Where you seeing the exact message from kvp that it didn't start?

Note that if the kernel is 4.4.0-62, you must explicitly use the version for the tools:
#apt install linux-tools-4.4.0-62-generic linux-cloud-tools-4.4.0-62-generic

Revision history for this message
Stuart Miller (spmiller) wrote :

Hi Chris,

Thanks for investigating. I've tried again to confirm the problem still occurs -- and it does, although sometimes it takes multiple restarts of the hv-kvp-daemon.service for it to start successfully.

Some diagnostic information is below.

Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-62-generic x86_64)

user@server:~$ ps -ax | grep hv
   129 ? S< 0:00 [hv_vmbus_con]
   359 ? S 0:00 [hv_balloon]
   601 ? Ss 0:00 /usr/lib/linux-tools/4.4.0-62-generic/hv_vss_daemon -n
   603 ? Ss 0:00 /usr/lib/linux-tools/4.4.0-62-generic/hv_fcopy_daemon -n

user@server:~$ sudo systemctl status hv-kvp-daemon.service
● hv-kvp-daemon.service - Hyper-V KVP Protocol Daemon
   Loaded: loaded (/lib/systemd/system/hv-kvp-daemon.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2017-03-09 20:50:24 UTC; 4min 45s ago
  Process: 585 ExecStart=/usr/sbin/hv_kvp_daemon -n (code=exited, status=1/FAILURE)
 Main PID: 585 (code=exited, status=1/FAILURE)

Mar 09 20:49:24 server systemd[1]: Started Hyper-V KVP Protocol Daemon.
Mar 09 20:49:24 server KVP[585]: KVP starting; pid is:585
Mar 09 20:49:24 server KVP[585]: KVP LIC Version: 3.1
Mar 09 20:50:24 server KVP[585]: read failed; error:9 Bad file descriptor
Mar 09 20:50:24 server systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE
Mar 09 20:50:24 server systemd[1]: hv-kvp-daemon.service: Unit entered failed state.
Mar 09 20:50:24 server systemd[1]: hv-kvp-daemon.service: Failed with result 'exit-code'.

user@server:~$ dpkg -s linux-tools-4.4.0-62-generic
Package: linux-tools-4.4.0-62-generic
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 250
Maintainer: Ubuntu Kernel Team <email address hidden>
Architecture: amd64
Source: linux
Version: 4.4.0-62.83
Depends: linux-tools-4.4.0-62
Description: Linux kernel version specific tools for version 4.4.0-62
 This package provides the architecture dependant parts for kernel
 version locked tools (such as perf and x86_energy_perf_policy) for
 version 4.4.0-62 on
 64 bit x86.

user@server:~$ dpkg -s linux-cloud-tools-4.4.0-62-generic
Package: linux-cloud-tools-4.4.0-62-generic
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 248
Maintainer: Ubuntu Kernel Team <email address hidden>
Architecture: amd64
Source: linux
Version: 4.4.0-62.83
Depends: linux-cloud-tools-4.4.0-62
Description: Linux kernel version specific cloud tools for version 4.4.0-62
 This package provides the architecture dependant parts for kernel
 version locked tools for cloud for version 4.4.0-62 on
 64 bit x86.

Help > About Hyper-V Manager
Hyper-V Manager
Microsoft Corporation
Version: 10.0.14393.0

vmms.exe
Version
10.0.14393.351

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel after changing KVP_SCRIPTS_PATH to /usr/sbin/hypervkvpd. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1664663

Can you give that kernel a try? If it resolves the issue, is that the change you want to make instead of moving the scripts to /usr/libexec/hypervkvpd/? I don't believe that directory currently exists.

Revision history for this message
Joshua R. Poulson (jrp) wrote :

Long is working on a patch for submission upstream that should help with this, and I will add it here when he does.

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in linux (Ubuntu Zesty):
status: In Progress → Incomplete
Revision history for this message
Joshua R. Poulson (jrp) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Zesty):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Zesty test kernel with commit a3ade8cc4. It required commits 5c1bec61fdfcd, 37cdd991fac81 and 631e63a9f346c as prereqs.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1664663/

Can you test this kernel and see if it resolves this bug?

Revision history for this message
Chris Valean (cvalean) wrote :

Please merge the changes.

Sorry for the delay on this, we did some testing and we're not seeing any issues in the test kernel.
We don't have a clear repro to exercise the bug behavior, but the changes do not break the functionality.

Revision history for this message
Chris Valean (cvalean) wrote :

Joe, I take that back - comment #16 is incorrect.

I'm seeing a call trace kvp related when using a high numbers of cores on the VM on the test kernel provided.
We cannot repro that on 4.10.0-19 and nor in the latest one 4.10.0-26.

Can you please apply that patch on top of current 4.10.0-26 so we can re-test that?
Thank you!

Call trace for reference:

[ 72.265165] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 72.265177] IP: hv_begin_read+0xc/0x20 [hv_vmbus]
[ 72.265178] PGD 0
[ 72.265179]
[ 72.265182] Oops: 0002 [#1] SMP
[ 72.265783] Modules linked in: hv_utils(+) hv_storvsc aesni_intel(+) aes_x86_64 ptp hid scsi_transport_fc hv_netvsc c
rypto_simd pps_core hyperv_keyboard glue_helper cryptd psmouse pata_acpi floppy fjes hv_vmbus
[ 72.265798] CPU: 48 PID: 300 Comm: ksoftirqd/48 Not tainted 4.10.0-22-generic #24~lp1664663
[ 72.265800] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
[ 72.265801] task: ffff91907d620000 task.stack: ffffb78a00e6c000
[ 72.265807] RIP: 0010:hv_begin_read+0xc/0x20 [hv_vmbus]
[ 72.265809] RSP: 0018:ffffb78a00e6fe00 EFLAGS: 00010202
[ 72.265811] RAX: 0000000000000000 RBX: ffff9190764ba000 RCX: 0000000040000084
[ 72.265812] RDX: ffff9190764ba1c0 RSI: 0000000000000000 RDI: ffff9190764ba120
[ 72.265813] RBP: ffffb78a00e6fe00 R08: 0000000000000027 R09: 00000000000193c0
[ 72.265814] R10: 00000000eac0c6e6 R11: 0000000000000010 R12: ffff9190764ba000
[ 72.265815] R13: ffff9190764ba120 R14: ffffffff86005130 R15: 0000000000000006
[ 72.265817] FS: 0000000000000000(0000) GS:ffff91907e200000(0000) knlGS:0000000000000000
[ 72.265818] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 72.265820] CR2: 0000000000000008 CR3: 0000000046a09000 CR4: 00000000001406e0
[ 72.265823] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 72.265824] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 72.265825] Call Trace:
[ 72.265832] vmbus_on_event+0x49/0x70 [hv_vmbus]
[ 72.265840] tasklet_action+0x5e/0x110
[ 72.265847] __do_softirq+0x104/0x2af
[ 72.265850] run_ksoftirqd+0x29/0x60
[ 72.265855] smpboot_thread_fn+0x10a/0x160
[ 72.265860] kthread+0x109/0x140
[ 72.265863] ? sort_range+0x30/0x30
[ 72.265867] ? kthread_create_on_node+0x60/0x60
[ 72.265870] ret_from_fork+0x2c/0x40
[ 72.265871] Code: eb f0 89 c6 48 c7 c7 70 2f 21 c0 e8 7e d0 19 c5 eb d8 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f
44 00 00 48 8b 07 55 48 89 e5 <c7> 40 08 01 00 00 00 0f ae f0 5d c3 0f 1f 84 00 00 00 00 00 0f
[ 72.265903] RIP: hv_begin_read+0xc/0x20 [hv_vmbus] RSP: ffffb78a00e6fe00
[ 72.265904] CR2: 0000000000000008
[ 72.265922] ---[ end trace 896d8fc4b37b0731 ]---
[ 72.265923] Kernel panic - not syncing: Fatal exception in interrupt
[ 72.269147] Kernel Offset: 0x4200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff
)
[ 72.269147] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the patch on top of current 4.10.0-26. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1664663/

Revision history for this message
Chris Valean (cvalean) wrote :

The new test kernel based on 4.10.0-26 also has this behavior.
We also checked the latest upstream and we're not seeing the failure, so let me discuss this internally to see why this fails on 4.10 but not on upstream.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It's been a while since this bug was updated. Does this bug still exist, Chris?

Revision history for this message
Chris Valean (cvalean) wrote :

We do not have a clear repro on this issue, but recently there have been several fixes for rescind, so those might help.
Let's close this for the time being.

Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Zesty):
status: In Progress → Invalid
Revision history for this message
Eduardo Otubo (otubo) wrote :

This patch fixes the problem.

commit 7fa32e5ec28b1609abc0b797b58267f725fc3964
Author: K. Y. Srinivasan <email address hidden>
Date: Tue Nov 14 06:53:33 2017 -0700

    Drivers: hv: vmbus: Fix a rescind issue

    The current rescind processing code will not correctly handle
    the case where the host immediately rescinds a channel that has
    been offerred. In this case, we could be blocked in the open call and
    since the channel is rescinded, the host will not respond and we could
    be blocked forever in the vmbus open call.i Fix this problem.

    Signed-off-by: K. Y. Srinivasan <email address hidden>
    Cc: <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Eduardo, the Zesty kernel is now EOL, so this bug has been marked as invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.