Ubuntu 18.04 hangs at "kvm: exiting hardware virtualization" on AMD servers with DVD

Bug #1777674 reported by Sujith Pandel on 2018-06-19
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dellserver
Undecided
Unassigned
linux (Ubuntu)
High
Unassigned
Bionic
High
Unassigned
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

Ubuntu 18.04 hangs at "kvm: exiting hardware virtualization" on AMD servers when under graceful reboot stress for 12hrs.

* This hand is observed only when Onboard SATA DVD Drive is connected.

* Not seen with Ubuntu 16.04.4 (HWE kernel v4.13)

* Seen with Ubuntu 18.04 (4.15.0-23-generic, 4.15.0-20-generic)

Steps:
Setup a DellEMC AMD server with Onboard DVD Drive, install Ubuntu 18.04 and start reboot stress
After a few reboots, observe that the machine hangs at "kvm: exiting hardware virtualization" and does not proceed with reboot cycles.

Only physical reset helps in continuing the reboot test.

Sujith Pandel (sujithpandel) wrote :
Sujith Pandel (sujithpandel) wrote :

Logs where the hang is not seen:
<snip>
[ 75.448494] kvm: exiting hardware virtualization
[ 76.187482] reboot: Restarting system
[ 76.191663] reboot: machine restart
[ 76.197621] ACPI MEMORY or I/O RESET_REG.
</snip>

The last 3 lines are not seen when the hang is observed.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1777674

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic

Moving the bug-report back to Confirmed since these logs like "kvm exiting hardware virt" and the logs post this are not stored on the physical disks (reason - disks are already unmounted).
Such logs are available only in the console logs which are already uploaded in this report.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → New
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc1

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Sujith Pandel (sujithpandel) wrote :

Update:
I was under the assumption that kvm/kvm_amd module was the suspect.
But I was wrong.
Reason - I tried performing '#modprobe -r kvm_amd kvm' before issuing '#systemctl reboot' in the reboot script and the system still halted with Ubuntu 18.04 kernel.

@jsalisbury -
I will keep it today and update you by tomorrow.

Sujith Pandel (sujithpandel) wrote :

I am observing that in the absence of any network (interfaces not configured for dhcp/static) I can see the graceful reboot halting almost every reboot.

* Issue is observed with Ubuntu 18.04 kernel 4.15.0-23-generic
* Also observed with 4.18-rc1 hosted at http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc1

Attaching the log which contains failure for both these cases.

Do you think systemd could be the suspect?

Sujith Pandel (sujithpandel) wrote :

I think I narrowed it down to these:
v4.14 - No issue observed
v4.15-rc-1 - Issue observed

Wherever the issue is observed, if I blacklist or rmmod ahci driver before reboot, no issue is observed (because the SATA ODD is gone now).

summary: Ubuntu 18.04 hangs at "kvm: exiting hardware virtualization" on AMD
- servers
+ servers with DVD
Sujith Pandel (sujithpandel) wrote :

Any update on this?

Dimitri John Ledkov (xnox) wrote :

How is reboot requested/triggered?
Systemd has 30min timeout on the reboot, so things must timeout and force-self-reboot after 30min.

no longer affects: systemd (Ubuntu Bionic)
Changed in systemd (Ubuntu):
status: New → Incomplete
Sujith Pandel (sujithpandel) wrote :

reboot is triggered using #systemctl reboot.
systemd timeout of 30min does not help, system stays hung. Needs a manual power-reset.

Dimitri John Ledkov (xnox) wrote :

Could you please download and install https://launchpad.net/ubuntu/+source/finalrd/3/+build/15227702/+files/finalrd_3_all.deb

(this is a link from https://launchpad.net/ubuntu/+source/finalrd/3/+build/15227702 )

And check if that helps with reboots? This should perform pivot-root from rootfs to initramfs, and generally improve shutdown reliability.

However, I do suspect this to be a kernel issue. If ahci is really identified as a kernel shutdown culprit, we might want to ship an executable script in /lib/systemd/system-shutdown/yank-modules.sh or somesuch which would rmmod ahci if possible.

So far I do not have any indications that it is indeed systemd holding up reboot, and not the kernel.

Dimitri John Ledkov (xnox) wrote :

Ideally, we'd want to bisect and fix the kernel itself to reliably shutdown. yanking ahci sounds like a large hammer. Invalidating systemd task, and pinging kernely people.

Changed in systemd (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Bionic):
status: Incomplete → In Progress
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v4.14 final and v4.15-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
1be2172e96e33bfa22a5c7a651f768ef30ce3984

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1777674

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Changed in linux (Ubuntu Bionic):
status: In Progress → Confirmed
Changed in linux (Ubuntu):
status: In Progress → Confirmed
assignee: Joseph Salisbury (jsalisbury) → nobody
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers