Linux rtc self test fails in a VM under xenial

Bug #1649718 reported by Seth Forshee
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Seth Forshee
Xenial
Fix Released
High
Seth Forshee

Bug Description

== SRU Justification ==

Impact: A race in kvm can result in the EOI signal for the rtc irq to be lost. After this happens no more rtc interrupts will be delivered to the guest.

Fix: Three upstream cherry picks which fix the problem.

Regression Potential: These patches have been upstream since 4.6, so they're well-tested at this point. Thus regressions are unlikely.

---

ADT testing for the linux package hangs at the kernel's rtc selftest, for example:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-zesty/zesty/amd64/l/linux/20161212_132117_a258d@/log.gz

Running this test manually, I've observed that this will hang in me with various kernel versions going back to 4.4 in a VM on my machine which is running xenial. The test runs to completion in a VM on a different machine running zesty.

This is the section of the test which produces the hang:

        /* Turn on update interrupts (one per second) */
        retval = ioctl(fd, RTC_UIE_ON, 0);
        if (retval == -1) {
                if (errno == EINVAL) {
                        fprintf(stderr,
                                "\n...Update IRQs not supported.\n");
                        goto test_READ;
                }
                perror("RTC_UIE_ON ioctl");
                exit(errno);
        }

        fprintf(stderr, "Counting 5 update (1/sec) interrupts from reading %s:",
                        rtc);
        fflush(stderr);
        for (i=1; i<6; i++) {
                /* This read will block */
                retval = read(fd, &data, sizeof(unsigned long));
                if (retval == -1) {
                        perror("read");
                        exit(errno);
                }
                fprintf(stderr, " %d",i);
                fflush(stderr);
                irqcount++;
        }

The read blocks indefinitely most of the time. After boot it might return once or twice before it hangs, but running the test subsequently always hangs on the first read. I'll attach the full source for the test (rtctest.c).

Revision history for this message
Seth Forshee (sforshee) wrote :
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1649718] [NEW] Linux rtc self test fails in a VM under xenial

Can you attach the qemu version, kernel version and qemu command line used?

A quick search shows this possibly related bug in RHEL:

https://bugzilla.redhat.com/show_bug.cgi?id=1184691

Which was closed WONTFIX with no reason. More digging is required.

On Wed, Dec 14, 2016 at 12:02 AM, Seth Forshee <
<email address hidden>> wrote:

> Public bug reported:
>
> ADT testing for the linux package hangs at the kernel's rtc selftest,
> for example:
>
> https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_
> 77e2ada1e7a84929a74ba3b87153c0ac
> /autopkgtest-zesty/zesty/amd64/l/linux/20161212_132117_a258d@/log.gz
>
> Running this test manually, I've observed that this will hang in me with
> various kernel versions going back to 4.4 in a VM on my machine which is
> running xenial. The test runs to completion in a VM on a different
> machine running zesty.
>
> This is the section of the test which produces the hang:
>
> /* Turn on update interrupts (one per second) */
> retval = ioctl(fd, RTC_UIE_ON, 0);
> if (retval == -1) {
> if (errno == EINVAL) {
> fprintf(stderr,
> "\n...Update IRQs not supported.\n");
> goto test_READ;
> }
> perror("RTC_UIE_ON ioctl");
> exit(errno);
> }
>
> fprintf(stderr, "Counting 5 update (1/sec) interrupts from reading
> %s:",
> rtc);
> fflush(stderr);
> for (i=1; i<6; i++) {
> /* This read will block */
> retval = read(fd, &data, sizeof(unsigned long));
> if (retval == -1) {
> perror("read");
> exit(errno);
> }
> fprintf(stderr, " %d",i);
> fflush(stderr);
> irqcount++;
> }
>
> The read blocks indefinitely most of the time. After boot it might
> return once or twice before it hangs, but running the test subsequently
> always hangs on the first read. I'll attach the full source for the test
> (rtctest.c).
>
> ** Affects: qemu (Ubuntu)
> Importance: Undecided
> Status: New
>
> ** Attachment added: "rtctest.c"
> https://bugs.launchpad.net/bugs/1649718/+attachment/
> 4791346/+files/rtctest.c
>
> --
> You received this bug notification because you are subscribed to qemu in
> Ubuntu.
> https://bugs.launchpad.net/bugs/1649718
>
> Title:
> Linux rtc self test fails in a VM under xenial
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1649718/+subscriptions
>

Revision history for this message
Seth Forshee (sforshee) wrote :
Download full text (4.7 KiB)

In xenial I have 1:2.5+dfsg-5ubuntu10.7, I also saw the problem with 1:2.5+dfsg-5ubuntu10.6. In zesty I have 1:2.6.1+dfsg-0ubuntu8.

I'm launching the VMs using virsh, the log shows this (for xenial):

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=spice /usr/bin/kvm-spice -name zesty-server -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu Haswell -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 5ef1be70-b464-45c2-82fd-aef4d786371d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-zesty-server/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/images/ubuntu16.04.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f7:cf:f6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on

In zesty I have:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=spice /usr/bin/kvm-spice -name guest=zesty-server,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-zesty-server/master-key.aes -machine pc-i440fx-yakkety,accel=kvm,usb=off -cpu Broadwell-noTSX -m 1024 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid cbd77ef3-578e-4dcd-8f75-3c2bd6919f7e -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-zesty-server/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -glob...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1649718] Re: Linux rtc self test fails in a VM under xenial

Trusty on Xenial does not
Xenial on Xenial hangs
Xenial on Zesty does not.

Kernel hoping continues.

Revision history for this message
Seth Forshee (sforshee) wrote :

Looking at kernel 4.9, the RTC_UIE_ON ioctl sets an alarm via the cmos_rtc driver. This does a couple of things.

1. Disables the CMOS RTC alarm interrupt, writes an alarm time to the CMOS RTC, then enables the alarm interrupt.

2. When configured to emulate an RTC using the HPET (which our kernel is), makes sure HPET timer 1 is initialized for RTC emulation.

Based on this I looked at the diffs for hw/timer/mc146818rtc.c (which seems to be the CMOS RTC device) and hw/timer/hpet.c between the two qemu versions. The differences all look pretty trivial, I don't see anything which looks like it would have fixed the problem.

Revision history for this message
Ryan Harper (raharper) wrote :

On Wed, Dec 14, 2016 at 4:31 PM, Seth Forshee <<email address hidden>
> wrote:

> Looking at kernel 4.9, the RTC_UIE_ON ioctl sets an alarm via the
> cmos_rtc driver. This does a couple of things.
>
> 1. Disables the CMOS RTC alarm interrupt, writes an alarm time to the
> CMOS RTC, then enables the alarm interrupt.
>
> 2. When configured to emulate an RTC using the HPET (which our kernel
> is), makes sure HPET timer 1 is initialized for RTC emulation.
>
> Based on this I looked at the diffs for hw/timer/mc146818rtc.c (which
> seems to be the CMOS RTC device) and hw/timer/hpet.c between the two
> qemu versions. The differences all look pretty trivial, I don't see
> anything which looks like it would have fixed the problem.
>

It's a guest kernel + qemu interaction. In the past, there was an issue
w.r.t
ensuring RTC interrupts were being delivered; that's what's currently
broken.

Comparing: grep rtc /proc/interrupts where it works (Trusty) vs failed, and
you can see
that we get at most a few interrupts to the RTC device.

There isn't a whole lot of change to qemu's RTC between 2.5 and 2.6.1 in
yakkety
So, I need to see if running yakkety qemu on Xenial fixes things as well so
we can
continue to narrow down where the regression was introduced.

>
> --
> You received this bug notification because you are subscribed to qemu in
> Ubuntu.
> https://bugs.launchpad.net/bugs/1649718
>
> Title:
> Linux rtc self test fails in a VM under xenial
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1649718/+subscriptions
>

Revision history for this message
Ryan Harper (raharper) wrote :

The other oddity, running the guest with 1 cpu always works for me in the
cases where it failed before.

On Wed, Dec 14, 2016 at 5:37 PM, Ryan Harper <email address hidden>
wrote:

>
>
> On Wed, Dec 14, 2016 at 4:31 PM, Seth Forshee <
> <email address hidden>> wrote:
>
>> Looking at kernel 4.9, the RTC_UIE_ON ioctl sets an alarm via the
>> cmos_rtc driver. This does a couple of things.
>>
>> 1. Disables the CMOS RTC alarm interrupt, writes an alarm time to the
>> CMOS RTC, then enables the alarm interrupt.
>>
>> 2. When configured to emulate an RTC using the HPET (which our kernel
>> is), makes sure HPET timer 1 is initialized for RTC emulation.
>>
>> Based on this I looked at the diffs for hw/timer/mc146818rtc.c (which
>> seems to be the CMOS RTC device) and hw/timer/hpet.c between the two
>> qemu versions. The differences all look pretty trivial, I don't see
>> anything which looks like it would have fixed the problem.
>>
>
> It's a guest kernel + qemu interaction. In the past, there was an issue
> w.r.t
> ensuring RTC interrupts were being delivered; that's what's currently
> broken.
>
> Comparing: grep rtc /proc/interrupts where it works (Trusty) vs failed,
> and you can see
> that we get at most a few interrupts to the RTC device.
>
> There isn't a whole lot of change to qemu's RTC between 2.5 and 2.6.1 in
> yakkety
> So, I need to see if running yakkety qemu on Xenial fixes things as well
> so we can
> continue to narrow down where the regression was introduced.
>
>
>>
>> --
>> You received this bug notification because you are subscribed to qemu in
>> Ubuntu.
>> https://bugs.launchpad.net/bugs/1649718
>>
>> Title:
>> Linux rtc self test fails in a VM under xenial
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1649718/
>> +subscriptions
>>
>
>

Revision history for this message
Ryan Harper (raharper) wrote :

I think this is related to kvm module in 4.4's in-kernel irq chip.

If one runs with kernel_irqchip=off property set on the qemu command, then
even Xenial on Xenial SMP
never fails.

Installing the 16.04 HWE Edge kernel (4.8.0-30) on my Xenial host resolves
this as well.

On Wed, Dec 14, 2016 at 8:17 PM, Ryan Harper <email address hidden>
wrote:

> The other oddity, running the guest with 1 cpu always works for me in the
> cases where it failed before.
>
>
> On Wed, Dec 14, 2016 at 5:37 PM, Ryan Harper <email address hidden>
> wrote:
>
>>
>>
>> On Wed, Dec 14, 2016 at 4:31 PM, Seth Forshee <
>> <email address hidden>> wrote:
>>
>>> Looking at kernel 4.9, the RTC_UIE_ON ioctl sets an alarm via the
>>> cmos_rtc driver. This does a couple of things.
>>>
>>> 1. Disables the CMOS RTC alarm interrupt, writes an alarm time to the
>>> CMOS RTC, then enables the alarm interrupt.
>>>
>>> 2. When configured to emulate an RTC using the HPET (which our kernel
>>> is), makes sure HPET timer 1 is initialized for RTC emulation.
>>>
>>> Based on this I looked at the diffs for hw/timer/mc146818rtc.c (which
>>> seems to be the CMOS RTC device) and hw/timer/hpet.c between the two
>>> qemu versions. The differences all look pretty trivial, I don't see
>>> anything which looks like it would have fixed the problem.
>>>
>>
>> It's a guest kernel + qemu interaction. In the past, there was an issue
>> w.r.t
>> ensuring RTC interrupts were being delivered; that's what's currently
>> broken.
>>
>> Comparing: grep rtc /proc/interrupts where it works (Trusty) vs failed,
>> and you can see
>> that we get at most a few interrupts to the RTC device.
>>
>> There isn't a whole lot of change to qemu's RTC between 2.5 and 2.6.1 in
>> yakkety
>> So, I need to see if running yakkety qemu on Xenial fixes things as well
>> so we can
>> continue to narrow down where the regression was introduced.
>>
>>
>>>
>>> --
>>> You received this bug notification because you are subscribed to qemu in
>>> Ubuntu.
>>> https://bugs.launchpad.net/bugs/1649718
>>>
>>> Title:
>>> Linux rtc self test fails in a VM under xenial
>>>
>>> To manage notifications about this bug go to:
>>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1649718/
>>> +subscriptions
>>>
>>
>>
>

Revision history for this message
Seth Forshee (sforshee) wrote :

On Wed, Dec 14, 2016 at 08:17:53PM -0000, Ryan Harper wrote:
> I think this is related to kvm module in 4.4's in-kernel irq chip.
>
> If one runs with kernel_irqchip=off property set on the qemu command, then
> even Xenial on Xenial SMP
> never fails.
>
> Installing the 16.04 HWE Edge kernel (4.8.0-30) on my Xenial host resolves
> this as well.

This looks promising - https://lkml.org/lkml/2016/2/29/427

I'll test and see if those patches fix the problem.

Revision history for this message
Seth Forshee (sforshee) wrote :

Those kernel patches fix the problem. Changing the package to linux, I'll get those patches into xenial's kernel.

Ryan, thanks for the help.

affects: qemu (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu Xenial):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Seth Forshee (sforshee)
description: updated
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Seth Forshee (sforshee) wrote :

Verified fix in 4.4.0-63.84.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (23.0 KiB)

This bug was fixed in the package linux - 4.4.0-63.84

---------------
linux (4.4.0-63.84) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1660704

  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * Kdump through NMI SMP and single core not working on Ubuntu16.10
    (LP: #1630924)
    - x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
    - SAUCE: hv: don't reset hv_context.tsc_page on crash

  * [regression 4.8.0-14 -> 4.8.0-17] keyboard and touchscreen lost on Acer
    Chromebook R11 (LP: #1630238)
    - [Config] CONFIG_PINCTRL_CHERRYVIEW=y

  * Call trace when testing fstat stressor on ppc64el with virtual keyboard and
    mouse present (LP: #1652132)
    - SAUCE: HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL

  * VLAN SR-IOV regression for IXGBE driver (LP: #1658491)
    - ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths

  * "Out of memory" errors after upgrade to 4.4.0-59 (LP: #1655842)
    - mm, page_alloc: convert alloc_flags to unsigned
    - mm, compaction: change COMPACT_ constants into enum
    - mm, compaction: distinguish COMPACT_DEFERRED from COMPACT_SKIPPED
    - mm, compaction: simplify __alloc_pages_direct_compact feedback interface
    - mm, compaction: distinguish between full and partial COMPACT_COMPLETE
    - mm, compaction: abstract compaction feedback to helpers
    - mm, oom: protect !costly allocations some more
    - mm: consider compaction feedback also for costly allocation
    - mm, oom, compaction: prevent from should_compact_retry looping for ever for
      costly orders
    - mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION
    - mm, oom: prevent premature OOM killer invocation for high order request

  * Backport 3 patches to fix bugs with AIX clients using IBMVSCSI Target Driver
    (LP: #1657194)
    - SAUCE: ibmvscsis: Fix max transfer length
    - SAUCE: ibmvscsis: fix sleeping in interrupt context
    - SAUCE: ibmvscsis: Fix srp_transfer_data fail return code

  * NVMe: adapter is missing after abnormal shutdown followed by quick reboot,
    quirk needed (LP: #1656913)
    - nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

  * Ubuntu 16.10 KVM SRIOV: if enable sriov while ping flood is running ping
    will stop working (LP: #1625318)
    - PCI: Do any VF BAR updates before enabling the BARs
    - PCI: Ignore BAR updates on virtual functions
    - PCI: Update BARs using property bits appropriate for type
    - PCI: Separate VF BAR updates from standard BAR updates
    - PCI: Don't update VF BARs while VF memory space is enabled
    - PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
    - PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
    - PCI: Add comments about ROM BAR updating

  * Linux rtc self test fails in a VM under xenial (LP: #1649718)
    - kvm: x86: Convert ioapic->rtc_status.dest_map to a struct
    - kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map
    - kvm: x86: Check dest_map->vector to match eoi signals for rtc

  * Xenial update to v4.4.44 stable releas...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.