framebuf STOP BSOD and performance regression ubuntu10.21 to ubuntu10.22

Bug #1752761 reported by Carl Morgan on 2018-03-02
20
This bug affects 9 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Undecided
Unassigned
Trusty
High
Marc Deslauriers
Xenial
High
Marc Deslauriers

Bug Description

Hi,

Corporate environment, Windows XenU platforms, using QEMU HVM (qemu-system-x86) on multiple Ubuntu Xen0 platforms. Established stable production environment (for > 1-5 years), Ubuntu and Windows nodes getting latest patches etc. Dell R6XX series server hardware.

After updates from mainline: 1:2.5+dfsg-5ubuntu10 to 1:2.5+dfsg-5ubuntu10.22 a reboot of XenU VMs very slow and repeated BlueScreening.

Windows Server 2012, does come up after 4+ minutes booting.

Windows Server 2008R8, Windows 7 Pro, Windows 10 Pro VMs fail to boot with blue screen "framebuf" STOP. (PNG available).

Boot to safe mode (very slow ~ 4mins to login screen) and remove video drivers, reboot succeeds, windows drivers auto updated, reboot fails.

Testing completed on Windows Server 2008R8 images including migration of VM Disk devices to other Dell rack servers:

o Xenial Xen0 server - Same issues
o Trusty Xen0 server - Same issues
o Precise Xen0 server - Fast boot / no issues

On Xenial systems, downgrading qemu-system-x86 to version 1:2.5+dfsg-5ubuntu10 reverts to previous performance / stability (~25 secs to loginscreen) = all good.

Tested PPA versions of qemu-system-x86 with local dpkg installs, version ubuntu10.21 works fine, ubuntu10.22 fails. Proposed ubuntu10.23 also fails.

QEMU Command line used (unchanged between good and back observations):

/usr/bin/qemu-system-i386
-xen-domid 9
-chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-9,server,nowait
-no-shutdown
-mon chardev=libxl-cmd,mode=control
-chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-9,server,nowait
-mon chardev=libxenstat-cmd,mode=control
-nodefaults
-name HOSTNAME
-vnc <IP ADDRESS>:<SCREEN>,to=99
-display none
-serial pty
-device cirrus-vga,vgamem_mb=8
-boot order=c
-usb
-usbdevice tablet
-smp 2,maxcpus=2
-device rtl8139,id=nic0,netdev=net0,mac=XX:XX:XX:XX:XX:XX
-netdev type=tap,id=net0,ifname=vif9.0-emu,script=no,downscript=no
-machine xenfv
-m 6992
-drive file=/dev/VG-xen/HOSTNAME-disk,if=ide,index=0,media=disk,format=raw,cache=writeback

Xen CFG:
name = '<HOSTNAME>'
builder = 'hvm'
memory = 7000
vcpus=2
shadow_memory = 8
acpi=1
vif = ['type=ioemu, bridge=xenbr0']
disk = [ 'phy:/dev/VG-xen/HOSTNMAE-disk,hda,w']
boot='c'
usbdevice='tablet'
vnc=1
vncdisplay=<DISPLAY>
vnclisten='<IP ADDRESS>'
vncconsole=1
serial='pty'
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'restart'

Xen GPL gplpv_Vista2008x64_0.11.0.373.msi drivers being used ( https://wiki.univention.de/index.php/Installing-signed-GPLPV-drivers )

CVE References

Hi Carl,
thank you for your detailed report!

This is the second qemu/xen bug for regressions in these security updates in one day - with none (=0) of them over the last two years - unlikely to be a coincidence.

@Marc - I don't see an obvious change, but you have way more context on these changes since you have backported them. Do you have any info of a potential regression in them?
Maybe the CVE-2018-5683 change?

@Carl - your detailed steps are already great.
Could you give it a try if the same applies to a unmodified (gplpv) and non-prepared/installed windows as well?
Maybe by using an ISO of [1] in your already prepared setup - and if it fails as well sharing the commands you did for that as well? That would make it even better to reproduce.

@Carl - we have another report on even Lubuntu iso's stalling. If you could (since all other parts of your setup are already ready) try the same with a boot from [2] - that is reported to hang with the new version.

If the steps above could be confirmed I'd expect that helps Marc a lot to look into the individual changes in this regard.

Probably related to bug: 1752375

[1]: https://www.microsoft.com/software-download/windows10
[2]: http://cdimage.ubuntu.com/lubuntu/releases/17.10.1/release/lubuntu-17.10.1-desktop-amd64.iso

P.S. If this doesn't reproduce for us, but Mark would provide ppa builds with individual fixes - would you be willing and able to check them?

Download full text (6.5 KiB)

Hi Christian,

Thanks for your reply:

Pt 2) LUBUNTU CD boot - not an option at this time - UAT / production Xen0 with other non Windows PV VMs running on them.

Pt 1) Yes I can do some additional testing on a different R620 Xen0 / Xenial platform.

Currently I can cause/remove the issue by just installing the 10.21 and 10.22 versions. This can be done live / without a reboot of the base Xen platform. From other testing I can tell you the recovery ISO CD is very slow to boot, but it did complete to initial screens previously (although I couldn't tell you exactly which of the newer versions of QEMU that was on).

I can build an clean image from ISO fairly easily - do you have a preference for Windows 2008R8 vs Win7 Pro ? I believe it is the same issues, but most of my immediate issues are with Win Server VMs. I also have a Trusty system with the same issues - does that add anything if I can run tests on it?

Finally (slightly dumb question) does the DBG version of the DEB (.DDEB) provide any additional information / crash reports - is that worth installing? Do you want me to work with .22 or the proposed .23? We can probably run development cuts from PPA, cut need a bit of risk control work at my end.

@Marc - Does the BSOD screen shot / address / Windows MEM Dump file stuff give you anything?

Regards,

Carl

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of ChristianEhrhardt
Sent: Friday, 2 March 2018 7:57 PM
To: Morgan, Carl <email address hidden>
Subject: [Bug 1752761] Re: Regression in vga handling ubuntu10.21 to ubuntu10.22

Hi Carl,
thank you for your detailed report!

This is the second qemu/xen bug for regressions in these security updates in one day - with none (=0) of them over the last two years - unlikely to be a coincidence.

@Marc - I don't see an obvious change, but you have way more context on these changes since you have backported them. Do you have any info of a potential regression in them?
Maybe the CVE-2018-5683 change?

@Carl - your detailed steps are already great.
Could you give it a try if the same applies to a unmodified (gplpv) and non-prepared/installed windows as well?
Maybe by using an ISO of [1] in your already prepared setup - and if it fails as well sharing the commands you did for that as well? That would make it even better to reproduce.

@Carl - we have another report on even Lubuntu iso's stalling. If you could (since all other parts of your setup are already ready) try the same with a boot from [2] - that is reported to hang with the new version.

If the steps above could be confirmed I'd expect that helps Marc a lot to look into the individual changes in this regard.

Probably related to bug: 1752375

[1]: https://www.microsoft.com/software-download/windows10
[2]: http://cdimage.ubuntu.com/lubuntu/releases/17.10.1/release/lubuntu-17.10.1-desktop-amd64.iso

P.S. If this doesn't reproduce for us, but Mark would provide ppa builds with individual fixes - would you be willing and able to check them?

** CVE added: https://linkprotect.cudasvc.com/url?a=https://cve.mitre.org/cgi-bin/cvename.cgi%3fname%...

Read more...

@Carl - the ddebs help if you need a crash report for example a program segfaulted.
Then with the ddebs you can get more info of the symbols.
But this BSOD crash is "inside" the guest, the ddebs on the Xen host would hot add additional information at the moment.

@Carl - stick with .22 - the .23 in proposed is something completely different.

I was also subscribing SMB as he might have a Xen environment ready to test things.
@SMB - did you see any regression with the things you usually run?
@SMB - it might need graphics reading "Windows, Live Boot, ..." so if you have console only guests could you try a graphical one?

Carl Morgan (carl-morgan) wrote :

Simple Xen config (hvm/8GB/2VCPUs)

Windows Server 2008R8 ISO ( SW_DVD5_Windows_Svr_DC_EE_SE_Web_2008_R2_64Bit_English_w_SP1_MLF_X17-22580.ISO )

Xenial - 16.04 - w/ qemu 1:2.5+dfsg-5ubuntu10.21

DVD Boot to 'Install button' - ~60 secs
Standard Full Install to first reboot - ~4mins
Second reboot - ~2mins
Password setup and to login screen - ~1.5mins
Total Build to desktop - < 9mins

Carl Morgan (carl-morgan) wrote :

Shutdown Windows VM

apt-get install qemu-system-x86=1:2.5+dfsg-5ubuntu10.22

xm create testWin2K8R8.cfg

4m40s to get the graphical 'Windows booting screen'
5m30s BSOD of death framebuf, memory dump and reboot

Marc Deslauriers (mdeslaur) wrote :

Hi,

Could you try another graphics interface besides cirrus-vga, just try and isolate which patch caused the issue?

Thanks!

Marc Deslauriers (mdeslaur) wrote :

Actually, I'm beginning to suspect the CVE-2017-11334 patches. I'm preparing a test package that reverts them and will upload it here:

https://launchpad.net/~ubuntu-security-proposed/+archive/ubuntu/ppa/+packages

Once the package is uploaded and built in the PPA, could you please test it?
Thanks!

Carl Morgan (carl-morgan) wrote :

Hi Marc,

Tested on same Win server 2008R8 VM image, with -24test PPA package installed, < 30 sec boot to login screen. All looks good again.

Let me know how we can binary search the issue if that help. I'm happy to compile up code if it is 'out-of-the-box' type stuff.

Cheers,

Carl

Marc Deslauriers (mdeslaur) wrote :

Hi Carl,

Thanks for testing the package in the PPA, it pretty much confirms my suspicion.

I'll publish a security update regression USN on Monday.

Marc.

Changed in qemu (Ubuntu Trusty):
status: New → In Progress
Changed in qemu (Ubuntu Xenial):
status: New → In Progress
Changed in qemu (Ubuntu Trusty):
assignee: nobody → Marc Deslauriers (mdeslaur)
Changed in qemu (Ubuntu Xenial):
assignee: nobody → Marc Deslauriers (mdeslaur)
Changed in qemu (Ubuntu):
status: New → Invalid
Changed in qemu (Ubuntu Trusty):
importance: Undecided → High
Changed in qemu (Ubuntu Xenial):
importance: Undecided → High
Carl Morgan (carl-morgan) wrote :

BTW: I can confirm this issue (5min boot) is also a problem with the latest Trust qemu-system updates

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.40

---------------
qemu (2.0.0+dfsg-2ubuntu1.40) trusty-security; urgency=medium

  * SECURITY REGRESSION: Xen regression (LP: #1752761)
    - debian/patches/CVE-2017-11334-1.patch: removed.
    - debian/patches/CVE-2017-11334-2.patch: removed.

 -- Marc Deslauriers <email address hidden> Sun, 04 Mar 2018 10:11:19 -0500

Changed in qemu (Ubuntu Trusty):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:2.5+dfsg-5ubuntu10.24

---------------
qemu (1:2.5+dfsg-5ubuntu10.24) xenial-security; urgency=medium

  * SECURITY REGRESSION: Xen regression (LP: #1752761)
    - debian/patches/CVE-2017-11334-1.patch: removed.
    - debian/patches/CVE-2017-11334-2.patch: removed.
  * This package does _not_ contain the changes from
    (1:2.5+dfsg-5ubuntu10.23) in xenial-proposed.

 -- Marc Deslauriers <email address hidden> Fri, 02 Mar 2018 08:14:50 -0500

Changed in qemu (Ubuntu Xenial):
status: In Progress → Fix Released
Marc Deslauriers (mdeslaur) wrote :

Regression fix has now been published: https://usn.ubuntu.com/usn/usn-3575-2/

Thanks!

summary: - Regression in vga handling ubuntu10.21 to ubuntu10.22
+ framebuf STOP BSOD and performance regression ubuntu10.21 to ubuntu10.22
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers