[Hyper-V] KVP daemon crashes at startup

Bug #1779640 reported by Ionut Lenghel on 2018-07-02
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Marcelo Cerri
Bionic
High
Marcelo Cerri
Disco
High
Marcelo Cerri

Bug Description

While testing Bionic daily build with kernel 4.15.0-20-generic we saw the following issue with the KVP daemon:

KVP daemon crashes after approximatively 2 minutes of uptime and it enters in a failed state. The daemon can be manually started and it enters back in active (running) state.
The error messages from /var/log/syslog after the daemon enters the failed state are the following:

Apr 25 04:28:46 bionicDaily KVP: read failed; error:9 Bad file descriptor
Apr 25 04:28:46 bionicDaily systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 04:28:46 bionicDaily systemd[1]: hv-kvp-daemon.service: Failed with result 'exit-code'.
Apr 25 04:28:59 bionicDaily systemd[1]: Started Hyper-V KVP Protocol Daemon.

Note: There was a simmilar issue discussed on this thread https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664663, but the fixing commit seems to be inclued in this Bionic build.

Ionut Lenghel (ilenghel) on 2018-07-02
affects: linux-azure (Ubuntu) → linux (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1779640

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Ionut Lenghel (ilenghel) wrote :

I have tested this on kernel 4.15.0-24.26 and linux-cloud-tools 4.15.0-24.26 and the issue persists.

Chris Valean (cvalean) on 2018-07-02
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

We were discussion if this specific bug was a regression in bug 1766857. Do you recall prior kernel versions that did not have this specific issue? If so, we can perform a bisect to narrow down what commit introduced this.

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
Changed in linux (Ubuntu):
status: Confirmed → Triaged
tags: added: kernel-hyper-v
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Ionut Lenghel (ilenghel) wrote :

We haven't seen it in Xenial, but we managed to reproduce it back to the daily Bionic build from the 2nd of March. We haven't tested Bionic daily builds before the 2nd of March, nor did I try to reproduce it on Zesty or or Artful.

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test some prior kernels to see if we can bisect this issue down? If so, the following kernels would be a good starting point:

Artful: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/15007007
Early Bionic: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable/+build/13870786

Chris Valean (cvalean) wrote :

Missed the reply on this thread, my apologies.
I tried now to install the packages from those ppa's, however apt will not detect the kernels.

I've added the ppa repo and did an #apt update, but apt will then not detect neither of the kernels, so I mus t be doing something wrong.
On another note, the linux-tools packages seem to require a different libbinutils version, which I don't see in the mentioned repos to have been built.

Joseph Salisbury (jsalisbury) wrote :

Are you able to manually download the .deb packages from those pages and used dpkg to install them:

sudo dpkg -i linux-image-4.14.0-13-generic_4.14.0-13.15_amd64.deb
sudo dpkg -i linux-image-extra-4.14.0-13-generic_4.14.0-13.15_amd64.deb

If we cant install earlier kernels, we may have to used a different method of debugging than a bisect.

Joseph Salisbury (jsalisbury) wrote :

Another good test would be to see if this bug is still happening on newer releases such as Cosmic or even Disco.

Thanks

Dan Watkins (daniel-thewatkins) wrote :

On disco, we're seeing the following in the cloud image testing:

Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft KVP: KVP starting; pid is:204
Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft KVP: open /dev/vmbus/hv_kvp failed; error: 2 No such file or directory
Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE
Dec 17 19:37:43 jenk-ituzztgzwdxjesafyoft systemd[1]: hv-kvp-daemon.service: Failed with result 'exit-code'.

Is this the same issue, or should I file a distinct bug?

Joseph Salisbury (jsalisbury) wrote :

Hi Dan,

Can you confirm the tools packages are installed? Please paste the output of:
dpkg -l | grep tools

This was from our test suite, so I don't have direct access to an instance, but using the manifest, it is:

$ grep tools ~/manifest.txt
bcache-tools 1.0.8-3
dosfstools 4.1-2
initramfs-tools 0.131ubuntu15
initramfs-tools-bin 0.131ubuntu15
initramfs-tools-core 0.131ubuntu15
linux-azure-cloud-tools-4.18.0-1006 4.18.0-1006.6
linux-azure-tools-4.18.0-1006 4.18.0-1006.6
linux-cloud-tools-4.18.0-1006-azure 4.18.0-1006.6
linux-cloud-tools-azure 4.18.0.1006.7
linux-cloud-tools-common 4.18.0-11.12
linux-tools-4.18.0-1006-azure 4.18.0-1006.6
linux-tools-azure 4.18.0.1006.7
linux-tools-common 4.18.0-11.12
net-tools 1.60+git20161116.90da8a0-2ubuntu1
open-vm-tools 2:10.3.5-3
squashfs-tools 1:4.3-6ubuntu2
ubuntu-advantage-tools 17

Joseph Salisbury (jsalisbury) wrote :

Do you have a way to pause a test, so you can access the instance? It would be good to know if the file actually exists:

ls -l /dev/vmbus/hv_kvp

Yep, I have now done so:

$ ls -la /dev/vmbus/hv_kvp
crw------- 1 root root 10, 55 Dec 18 22:31 /dev/vmbus/hv_kvp

Joe, I've also imported your SSH keys in to <email address hidden> if there's any other poking around you want to do.

Joseph Salisbury (jsalisbury) wrote :

Are you able to manually able to start the daemon after seeing that the file exists?

On Wed, Dec 19, 2018 at 04:10:45PM -0000, Joseph Salisbury wrote:
> Are you able to manually able to start the daemon after seeing that the
> file exists?

Yes.

tags: added: id-5c17e732a66adc67521fc988
Francis Ginther (fginther) wrote :

Could this be a race condition with the device file creation and start of the hv-kvp-daemon.service?

[from /var/log/syslog]
Jan 16 10:26:45 kvp-disco-test systemd[1]: Started Hyper-V KVP Protocol Daemon.
...
Jan 16 10:26:45 kvp-disco-test KVP: KVP starting; pid is:213
Jan 16 10:26:45 kvp-disco-test KVP: open /dev/vmbus/hv_kvp failed; error: 2 No such file or directory
Jan 16 10:26:45 kvp-disco-test systemd[1]: hv-kvp-daemon.service: Main process exited, code=exited, status=1/FAILURE

$ stat /dev/vmbus/hv_kvp
  File: /dev/vmbus/hv_kvp
  Size: 0 Blocks: 0 IO Block: 4096 character special file
Device: 6h/6d Inode: 212 Links: 1 Device type: a,37
Access: (0600/crw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-01-16 10:26:56.868312550 +0000
Modify: 2019-01-16 10:26:56.868312550 +0000
Change: 2019-01-16 10:26:56.868312550 +0000

I realize this isn't the creation time (which I can't figure out if I can even get), but it's suspiciuos. Also, manually starting the service once logged in works and it stays running.

Changed in linux (Ubuntu):
status: In Progress → Confirmed
Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → nobody
Terry Rudd (terrykrudd) on 2019-01-24
Changed in linux (Ubuntu):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Marcelo Cerri (mhcerri)
Marcelo Cerri (mhcerri) wrote :

Sorry for the delay. That might be a dependency problem on the KVP systemd service. I will run some tests locally. Can you confirm if the problem also happens with the linux-azure kernel?

Marcelo Cerri (mhcerri) wrote :

Does it also happen with the lastest generic bionic kernel (4.15.0-45)?

Marcelo Cerri (mhcerri) wrote :

I wasn't able to reproduce it in hyper-v or in azure using the 4.15.0-45 generic kernel.

Francis Ginther (fginther) wrote :

I am not able to reproduce the crash on bionic. There is an issue on disco, but it's not a crash, but a failure to start the daemon as mentioned in comment 16. As this appears to be a different issue, I can open a new bug.

Francis Ginther (fginther) wrote :

I've filed a new bug for the kvp daemon not starting on disco: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820063

Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers