Installer causes VM to be paused

Bug #1776269 reported by John Bester
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Ubuntu 4.15.0-23.25-generic 4.15.18
Linux sbri19 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I have an Ubuntu 18.04 Server with a Windows 10 VM with AMD Graphics card passthrough. The driver happily installed and worked like a charm in Ubuntu Server 17.10. The server was built just before Ubuntu 18.04 was released and Ubuntu Server 17.10 was installed as a temporary measure. After upgrading to Ubuntu Server 18.04, I had to reinstall the AMD driver. Unfortunately, the VM goes into PAUSE mode during the driver installation (hardware detected without a problem). To resume the VM, it needs to be reset.

The attachments show the log of the VM as well as the error when trying to resume the VM after it has been paused.

Ubuntu 18.04 has libvirt 4.0.0, and Ubuntu 17.10 had libvirt 3.6, so this seems to be a bug that slipped in. Since Libvirt is already on version 4.4, is there a ppa repository with a later version which I can test so see if the bug has been resolved?

affects: linux (Ubuntu) → qemu (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi John,
I didn't merge a newer libvirt for Cosmic yet that you could test right now.
Also the attachment you mentioned didn't make it to the bug, would you mind attaching it (again)?

Usually dropping to paused means that qemu ran e.g. into an assertion or something like it.
So (maybe you had already) have a look at the guest log at /var/log/libvirt/qemu/<guestname>.log

I can tag this to give you a ping once I have a newer libvirt, but that will be a while (vactaion time). But lets sort out the logs first.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
John Bester (john-bester) wrote :

Sorry about the attachment. The POST timed out and I had to go back on browser and submit again. I think the attachment got lost in the process.

Revision history for this message
John Bester (john-bester) wrote :

How difficult would it be to have a PPA repository with the qemu / libvirt version that was distributed with 17.10? I know that 18.04 is an LTS and therefore big software changes will not happen. I have to rebuild a server and install 17.10 if I do not get a solution to this process quickly. Any help would be greatly appreciated.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ah you mean reverse, to have the version of Artful used in Bionic.

You can do so without even a PPA:
  cp /etc/apt/sources.list /etc/apt/sources.list.d/artful.list
  sed -i.bak s/bionic/artful/g /etc/apt/sources.list.d/artful.list
  apt-cache policy libvirt-daemon-system
libvirt-daemon-system:
  Installed: 4.0.0-1ubuntu8.1
  Candidate: 4.0.0-1ubuntu8.1
  Version table:
 *** 4.0.0-1ubuntu8.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     4.0.0-1ubuntu8 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages
     3.6.0-1ubuntu6.7 500
        500 http://archive.ubuntu.com/ubuntu artful-updates/main amd64 Packages
     3.6.0-1ubuntu6.3 500
        500 http://security.ubuntu.com/ubuntu artful-security/main amd64 Packages
     3.6.0-1ubuntu5 500
        500 http://archive.ubuntu.com/ubuntu artful/main amd64 Packages

You can then run this to downgrade the components:
  apt install libvirt-daemon-system=3.6.0-1ubuntu6.7 libvirt-clients=3.6.0-1ubuntu6.7 libvirt-daemon=3.6.0-1ubuntu6.7 libvirt0=3.6.0-1ubuntu6.7

Not that I'd recommend in general, but for debugging certainly an easy way to do it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Although FYI your error looks more like an qemu issue to me

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Actually checking the issue in more detail it seems even more that neither libvirt nor qemu is "it".

Instead many users report similar issues [1][2] due to the host grabbing (part of) the device.
That leads to a conflict then.

So I'd ask to check if any of that applies to your bionic system, maybe it has a newer or additional driver that does this than what your 17.10 had.

[1]: https://www.redhat.com/archives/vfio-users/2016-March/msg00088.html
[2]: https://www.reddit.com/r/VFIO/comments/4sxt3h/cant_get_pci_passthrough_quite_right/

Revision history for this message
John Bester (john-bester) wrote :

Thanks. This put me on the right track and I was able to solve the problem. I will summarise the steps I took. (The PCI address in my case is 26:00.0 - take that into account when reading further)

On the KVM host, execute the following:
dmesg | grep -i 'vfio.*26[:]00'

This produced the following:
vfio-pci 0000:26:00.0: BAR 0: can't reserve [mem 0xe0000000-0xefffffff 64bit pref]

These entries indicates that the VM cannot access all IO memory of the PCI device.

To find out which module grabbed the IO memory, do the following in the KVM host:

grep -B 5 -A 5 "26[:]00" /proc/iomem

This had the following entry:
e0000000-e01effff : efifb

This address block is inside the block that could not be reserved for the VM and efifb (EFI Frame Buffer) grabbed this memory. In my case, the PCI device was also the primary device configured for the PC in EFI. It could also be be vesafb (VESA Frame Buffer). The vesafb or efifb uses this address block for virtual consoles. (See /sys/class/vtconsole/vtcon0 and vtcon1).

I tried the following options from the given references:
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
reboot - this did not solve the problem

I also tried a solution which requires that edit /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="<other options> video=vesa:off,efifb:off"
update-initramfs -u
update-grub
reboot - this did not solve the problem either

Then I some more options:
GRUB_CMDLINE_LINUX_DEFAULT="<other options> vga=normal nofb nomodeset video=efifb:off"
update-initramfs -u
update-grub
reboot - this solved the problem and after reboot efifb did not lock that address range any more.

Reference: https://support.digium.com/community/s/article/How-to-disable-the-Linux-frame-buffer-if-it-s-causing-problems

Revision history for this message
Robie Basak (racb) wrote :

Thank you for reporting back - it'll be helpful to others searching for a solution to the same problem.

I think you've concluded that the cause wasn't a bug in the qemu package in Ubuntu then? So I'll set the bug status to Invalid to reflect this. If I've misunderstood, please do explain and reopen.

Changed in qemu (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.