[Hyper-V] KVP daemon crashes at startup

Bug #1779640 reported by Ionut Lenghel
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Marcelo Cerri
Bionic
Confirmed
High
Marcelo Cerri
Disco
Won't Fix
High
Marcelo Cerri

Bug Description

While testing Bionic daily build with kernel 4.15.0-20-generic we saw the following issue with the KVP daemon:

KVP daemon crashes after approximatively 2 minutes of uptime and it enters in a failed state. The daemon can be manually started and it enters back in active (running) state.
The error messages from /var/log/syslog after the daemon enters the failed state are the following:

Apr 25 04:28:46 bionicDaily KVP: read failed; error:9 Bad file descriptor
Apr 25 04:28:46 bionicDaily systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 04:28:46 bionicDaily systemd[1]: hv-kvp-daemon.service: Failed with result 'exit-code'.
Apr 25 04:28:59 bionicDaily systemd[1]: Started Hyper-V KVP Protocol Daemon.

Note: There was a simmilar issue discussed on this thread https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664663, but the fixing commit seems to be inclued in this Bionic build.

Ionut Lenghel (ilenghel)
affects: linux-azure (Ubuntu) → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1779640

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
Ionut Lenghel (ilenghel) wrote :

I have tested this on kernel 4.15.0-24.26 and linux-cloud-tools 4.15.0-24.26 and the issue persists.

Chris Valean (cvalean)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We were discussion if this specific bug was a regression in bug 1766857. Do you recall prior kernel versions that did not have this specific issue? If so, we can perform a bisect to narrow down what commit introduced this.

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
Changed in linux (Ubuntu):
status: Confirmed → Triaged
tags: added: kernel-hyper-v
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Ionut Lenghel (ilenghel) wrote :

We haven't seen it in Xenial, but we managed to reproduce it back to the daily Bionic build from the 2nd of March. We haven't tested Bionic daily builds before the 2nd of March, nor did I try to reproduce it on Zesty or or Artful.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test some prior kernels to see if we can bisect this issue down? If so, the following kernels would be a good starting point:

Artful: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/15007007
Early Bionic: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable/+build/13870786

Revision history for this message
Chris Valean (cvalean) wrote :

Missed the reply on this thread, my apologies.
I tried now to install the packages from those ppa's, however apt will not detect the kernels.

I've added the ppa repo and did an #apt update, but apt will then not detect neither of the kernels, so I mus t be doing something wrong.
On another note, the linux-tools packages seem to require a different libbinutils version, which I don't see in the mentioned repos to have been built.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Are you able to manually download the .deb packages from those pages and used dpkg to install them:

sudo dpkg -i linux-image-4.14.0-13-generic_4.14.0-13.15_amd64.deb
sudo dpkg -i linux-image-extra-4.14.0-13-generic_4.14.0-13.15_amd64.deb

If we cant install earlier kernels, we may have to used a different method of debugging than a bisect.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Another good test would be to see if this bug is still happening on newer releases such as Cosmic or even Disco.

Thanks

Revision history for this message
Dan Watkins (oddbloke) wrote :

On disco, we're seeing the following in the cloud image testing:

Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft KVP: KVP starting; pid is:204
Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft KVP: open /dev/vmbus/hv_kvp failed; error: 2 No such file or directory
Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE
Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft systemd[1]: hv-kvp-daemon.service: Failed with result 'exit-code'.

Is this the same issue, or should I file a distinct bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Dan,

Can you confirm the tools packages are installed? Please paste the output of:
dpkg -l | grep tools

Revision history for this message
Dan Watkins (oddbloke) wrote :

This was from our test suite, so I don't have direct access to an instance, but using the manifest, it is:

$ grep tools ~/manifest.txt
bcache-tools 1.0.8-3
dosfstools 4.1-2
initramfs-tools 0.131ubuntu15
initramfs-tools-bin 0.131ubuntu15
initramfs-tools-core 0.131ubuntu15
linux-azure-cloud-tools-4.18.0-1006 4.18.0-1006.6
linux-azure-tools-4.18.0-1006 4.18.0-1006.6
linux-cloud-tools-4.18.0-1006-azure 4.18.0-1006.6
linux-cloud-tools-azure 4.18.0.1006.7
linux-cloud-tools-common 4.18.0-11.12
linux-tools-4.18.0-1006-azure 4.18.0-1006.6
linux-tools-azure 4.18.0.1006.7
linux-tools-common 4.18.0-11.12
net-tools 1.60+git20161116.90da8a0-2ubuntu1
open-vm-tools 2:10.3.5-3
squashfs-tools 1:4.3-6ubuntu2
ubuntu-advantage-tools 17

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you have a way to pause a test, so you can access the instance? It would be good to know if the file actually exists:

ls -l /dev/vmbus/hv_kvp

Revision history for this message
Dan Watkins (oddbloke) wrote :

Yep, I have now done so:

$ ls -la /dev/vmbus/hv_kvp
crw------- 1 root root 10, 55 Dec 18 22:31 /dev/vmbus/hv_kvp

Joe, I've also imported your SSH keys in to <email address hidden> if there's any other poking around you want to do.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Are you able to manually able to start the daemon after seeing that the file exists?

Revision history for this message
Dan Watkins (oddbloke) wrote : Re: [Bug 1779640] Re: [Hyper-V] KVP daemon crashes at startup

On Wed, Dec 19, 2018 at 04:10:45PM -0000, Joseph Salisbury wrote:
> Are you able to manually able to start the daemon after seeing that the
> file exists?

Yes.

tags: added: id-5c17e732a66adc67521fc988
Revision history for this message
Francis Ginther (fginther) wrote :

Could this be a race condition with the device file creation and start of the hv-kvp-daemon.service?

[from /var/log/syslog]
Jan 16 10:26:45 kvp-disco-test systemd[1]: Started Hyper-V KVP Protocol Daemon.
...
Jan 16 10:26:45 kvp-disco-test KVP: KVP starting; pid is:213
Jan 16 10:26:45 kvp-disco-test KVP: open /dev/vmbus/hv_kvp failed; error: 2 No such file or directory
Jan 16 10:26:45 kvp-disco-test systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE

$ stat /dev/vmbus/hv_kvp
  File: /dev/vmbus/hv_kvp
  Size: 0 Blocks: 0 IO Block: 4096 character special file
Device: 6h/6d Inode: 212 Links: 1 Device type: a,37
Access: (0600/crw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-01-16 10:26:56.868312550 +0000
Modify: 2019-01-16 10:26:56.868312550 +0000
Change: 2019-01-16 10:26:56.868312550 +0000

I realize this isn't the creation time (which I can't figure out if I can even get), but it's suspiciuos. Also, manually starting the service once logged in works and it stays running.

Changed in linux (Ubuntu):
status: In Progress → Confirmed
Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → nobody
Terry Rudd (terrykrudd)
Changed in linux (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Marcelo Cerri (mhcerri)
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Sorry for the delay. That might be a dependency problem on the KVP systemd service. I will run some tests locally. Can you confirm if the problem also happens with the linux-azure kernel?

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Does it also happen with the lastest generic bionic kernel (4.15.0-45)?

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

I wasn't able to reproduce it in hyper-v or in azure using the 4.15.0-45 generic kernel.

Revision history for this message
Francis Ginther (fginther) wrote :

I am not able to reproduce the crash on bionic. There is an issue on disco, but it's not a crash, but a failure to start the daemon as mentioned in comment 16. As this appears to be a different issue, I can open a new bug.

Revision history for this message
Francis Ginther (fginther) wrote :

I've filed a new bug for the kvp daemon not starting on disco: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820063

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
lilideng (lilideng) wrote :

Try HyperV command, you will see hv-kvp-daemon service exit with KVP: read failed; error:9 Bad file descriptor

Disable-VMIntegrationService -Name "Key-Value Pair Exchange" -VMName $VMName -ComputerName $hvServer

Kernel version -
11/21/2019 07:10:21 : [INFO ] Old kernel: 4.15.0-38-generic
11/21/2019 07:10:21 : [INFO ] New kernel: 5.0.0-1026-azure

Revision history for this message
Steve Langasek (vorlon) wrote :

Ubuntu 19.04 reached end of life on January 23, 2020. I am therefore marking this bug 'wontfix' for that release.

Changed in linux (Ubuntu Disco):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.