cmos_interrupt not getting called

Bug #2011854 reported by Cory Todd
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux-oracle-5.15 (Ubuntu)
Fix Committed
Medium
Cory Todd
Focal
Fix Released
Medium
Cory Todd
Jammy
Fix Released
Medium
Cory Todd

Bug Description

The ubuntu_ltp_kernel_misc rtc01 test case has exposed a possible regression of the RTC cmos driver on certain Oracle clouds.

This was observed on jammy:linux-oracle-5.15.0-1031.37.
- VM.DenseIO2.8
- VM.Standard2.1

rtc01 0 TINFO : RTC READ TEST:
rtc01 1 TPASS : RTC READ TEST Passed
rtc01 0 TINFO : Current RTC date/time is 16-3-2023, 17:36:04.
rtc01 0 TINFO : RTC ALARM TEST :
rtc01 0 TINFO : Alarm time set to 17:36:09.
rtc01 0 TINFO : Waiting 5 seconds for the alarm...
rtc01 2 TFAIL : rtc01.c:151: Timed out waiting for the alarm
rtc01 0 TINFO : RTC UPDATE INTERRUPTS TEST :
rtc01 0 TINFO : Waiting for 5 update interrupts...
rtc01 3 TFAIL : rtc01.c:208: Timed out waiting for the update interrupt
rtc01 0 TINFO : RTC Tests Done!

Notice that we successfully enable RTC_AIE_ON, unlike VM.Standard.A1.Flex-4c.8m which does not support it. I confirmed with bpftrace on -1029 that we see the cmos interrupt (part of ltp read_alarm_test)

sudo bpftrace -e 'kprobe:cmos_interrupt { printf("%s\n", probe) }'
Attaching 1 probe...
kprobe:cmos_interrupt

On -1031 no interrupt is detected for both the alarm and update tests.

Cory Todd (corytodd)
tags: added: sru-20230227
removed: sr20230227
Cory Todd (corytodd)
description: updated
Revision history for this message
Cory Todd (corytodd) wrote :

After some more debugging I found that the AIE bits are getting set which agrees with our attempt to set the alarm. We also see that the pnp probe is completing successfully as evident by this dmesg entry:

[ 1.715329] rtc_cmos 00:00: RTC can wake from S4
[ 1.721453] rtc_cmos 00:00: registered as rtc0
[ 1.727449] rtc_cmos 00:00: setting system clock to 2023-03-16T21:43:47 UTC (1679003027)
[ 1.736086] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram

That would only show if cmos_do_probe completely normally. It appears that acpi_cmos_wake_setup (formerly rtc_wake_setup) is not called on the same code paths as it previously was.

Revision history for this message
Cory Todd (corytodd) wrote (last edit ):

I confirmed that the kernel -update (-1030) is able to pass this test.

Revision history for this message
Cory Todd (corytodd) wrote (last edit ):

It was found that this breakage is due to the use_acpi_alarm module parameter getting enabled. This appears to be set in rtc-cmos:use_acpi_alarm_quirks which use to check FADT for a cleared ACPI_FADT_LOW_POWER_S0 flag.

This check was removed in

f2a7f3238777 ("rtc: rtc-cmos: Do not check ACPI_FADT_LOW_POWER_S0")

Reverting this commit restores the expected AIE and UIE behavior. I am not sure yet what the proper solution is.

description: updated
Revision history for this message
Cory Todd (corytodd) wrote :

The core issue is that with f2a7f3238777, the kernel on the affected systems no longer uses hpet for the rtc alarm and update interrupts. This may indicate that their non-hpet implementation is faulty. By removing the ACPI_FADT_LOW_POWER_S0 check from use_acpi_alaram_quirks, the kernel expects to use the acpi alarm by way of use_hpet_alarm().

Other facts:
- this patch has only been applied to Jammy. Kinetic and Focal do not have the latest patches affecting rtc-cmos.
- applying these patches to jammy:linux-gcp work correctly on a g1-small instance. This system is able to use the acpi alarm for both alarm and update interrupts.

I see two options:

1) Revert f2a7f3238777 as a SAUCE patch. The justification for this commit seems to be based on the suspend-to-idle use case, notwithstanding the alarm from S0 use case.
2) Patch in a more specific quirk and get it upstreamed. It seems like the quirk that was removed was hiding a bug on this particular system.

Cory Todd (corytodd)
affects: linux-oracle (Ubuntu) → linux-oracle-5.15 (Ubuntu)
Changed in linux-oracle-5.15 (Ubuntu):
status: New → In Progress
assignee: nobody → Cory Todd (corytodd)
importance: Undecided → Medium
Changed in linux-oracle-5.15 (Ubuntu Focal):
assignee: nobody → Cory Todd (corytodd)
Changed in linux-oracle-5.15 (Ubuntu Jammy):
assignee: nobody → Cory Todd (corytodd)
Changed in linux-oracle-5.15 (Ubuntu Focal):
status: New → In Progress
Changed in linux-oracle-5.15 (Ubuntu Jammy):
status: New → In Progress
Cory Todd (corytodd)
Changed in linux-oracle-5.15 (Ubuntu):
status: In Progress → Fix Committed
Changed in linux-oracle-5.15 (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux-oracle-5.15 (Ubuntu Focal):
status: In Progress → Fix Committed
Cory Todd (corytodd)
Changed in linux-oracle-5.15 (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux-oracle-5.15 (Ubuntu Jammy):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (66.9 KiB)

This bug was fixed in the package linux-oracle-5.15 - 5.15.0-1032.38~20.04.1

---------------
linux-oracle-5.15 (5.15.0-1032.38~20.04.1) focal; urgency=medium

  * focal/linux-oracle-5.15: 5.15.0-1032.38~20.04.1 -proposed tracker
    (LP: #2012656)

  [ Ubuntu: 5.15.0-1032.38 ]

  * jammy/linux-oracle: 5.15.0-1032.38 -proposed tracker (LP: #2012655)
  * cmos_interrupt not getting called (LP: #2011854)
    - SAUCE: Revert "rtc: rtc-cmos: Do not check ACPI_FADT_LOW_POWER_S0"

linux-oracle-5.15 (5.15.0-1031.37~20.04.1) focal; urgency=medium

  * focal/linux-oracle-5.15: 5.15.0-1031.37~20.04.1 -proposed tracker
    (LP: #2008340)

  [ Ubuntu: 5.15.0-1031.37 ]

  * jammy/linux-oracle: 5.15.0-1031.37 -proposed tracker (LP: #2008342)
  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2023.02.27)
  * jammy/linux: 5.15.0-68.75 -proposed tracker (LP: #2008349)
  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2023.02.27)
  * Ubuntu 22.04 kernel 5.15.0-46-generic leaks kernel memory in kmalloc-2k
    slabs (LP: #1987430)
    - SAUCE: audit: fix memory leak of audit_log_lsm()
  * [EGS] Backport intel_idle support for Eagle Stream Ubuntu 22.04 release
    (LP: #2003267)
    - intel_idle: add SPR support
    - intel_idle: add 'preferred_cstates' module argument
    - intel_idle: add core C6 optimization for SPR
    - cpuidle: intel_idle: Drop redundant backslash at line end
    - intel_idle: Fix the 'preferred_cstates' module parameter
    - intel_idle: Fix SPR C6 optimization
    - intel_idle: make SPR C1 and C1E be independent
  * Fix speaker mute hotkey doesn't work on Dell G16 series (LP: #2003161)
    - platform/x86: dell-wmi: Add a keymap for KEY_MUTE in type 0x0010 table
  * Fix the ACPI _CPC not found error from kernel dmesg on some dynamic SSDT
    table loaded firmwares (LP: #2006077)
    - ACPI: bus: Avoid using CPPC if not supported by firmware
    - ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
    - ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked
  * rtcpie in timers from ubuntu_kernel_selftests randomly failing
    (LP: #1814234)
    - SAUCE: selftest: rtcpie: Force passing unreliable subtest
  * Jammy update: v5.15.87 upstream stable release (LP: #2007441)
    - usb: dwc3: qcom: Fix memory leak in dwc3_qcom_interconnect_init
    - cifs: fix oops during encryption
    - nvme-pci: fix doorbell buffer value endianness
    - nvme-pci: fix mempool alloc size
    - nvme-pci: fix page size checks
    - ACPI: resource: do IRQ override on LENOVO IdeaPad
    - ACPI: resource: do IRQ override on XMG Core 15
    - ACPI: resource: do IRQ override on Lenovo 14ALC7
    - block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
    - ata: ahci: Fix PCS quirk application for suspend
    - nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
    - nvmet: don't defer passthrough commands with trivial effects to the
      workqueue
    - fs/ntfs3: Validate BOOT record_size
    - fs/ntfs3: Add overflow check for attribute size
    - fs/ntfs3: Validate data run offset
    - fs/ntfs3: Add null pointer check to attr_load_runs_vcn
    - fs/ntfs3: Fix...

Changed in linux-oracle-5.15 (Ubuntu Focal):
status: Fix Committed → Fix Released
Cory Todd (corytodd)
Changed in linux-oracle-5.15 (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This is now affecting K-oracle-5.19.0-1020.23, on these two instances like in the bug report:
- VM.DenseIO2.8
- VM.Standard2.1

tags: added: 5.19 kinetic oracle sru-20230320 ubuntu-ltp-kernel-misc
Po-Hsu Lin (cypressyew)
tags: added: sru-20230515
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.