[Dell Alienware M15x] Possible race condition involving rtc wakealarm when hibernating a system

Bug #571977 reported by Jeff Lane 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

I'm working on a test script that sets a time in the future in /sys/class/rtc/rtc0/wakealarm and then uses pm-suspend to hibernate a system. I believe I've found a race condition that's causing my tests to fail.

First, the steps to recreate:
0: cat /proc/driver/rtc and verify that alarm_IRQ says 'no'
1: echo '+180' > /sys/class/rtc/rtc0/wakealarm
1.5: cat /proc/driver/rtc and verify that alarm_IRQ says 'yes' and the correct alarm time is set
2: sudo pm-suspend
3: wait 3 minutes
4: system wakes itself
5: wait for system to fully wake (disk activity to stop, or at the very least, keyboard and mouse function to resume on desktop)
6: cat /proc/driver/rtc and verify that current time is > alarm time and alarm_IRQ still says 'yes'

The test, when putting the system into an S3 state, does not suffer from this issue. It DOES when I'm using S4. I think the reason is that S3 wakes quickly enough that the kernel can register that the alarm fired and reset /proc/driver/rtc accordingly, however, when waking from suspend, the kernel takes far longer to wake, causing it to think that even though the rtc's alarm_IRQ fired the IRQ didn't fire, so the kernel does not reset /proc/driver/rtc.

For example, this is the output from (my comments highlighted with ##

# watch -n 5 'cat /proc/driver/rtc |head -5'

## First observation, note alarm_date is empty, this is after echoing '0' to /sys/class/rtc/rtc0/wakealarm
rtc_time : 20:35:11
rtc_date : 2010-04-29
alrm_time : 20:38:03
alrm_date : ****-**-29
alarm_IRQ : no
## wakealarm set
rtc_time : 20:35:16
rtc_date : 2010-04-29
alrm_time : 20:37:11
alrm_date : 2010-04-29
alarm_IRQ : yes
## executing pm-hibernate now
rtc_time : 20:35:21
rtc_date : 2010-04-29
alrm_time : 20:37:11
alrm_date : 2010-04-29
alarm_IRQ : yes
rtc_time : 20:35:26
rtc_date : 2010-04-29
alrm_time : 20:37:11
alrm_date : 2010-04-29
alarm_IRQ : yes
## System is now asleep.
## IRQ must be firing, because system wakes itself at this point after sleeping for the proscribed number of seconds (180)
rtc_time : 20:38:16
rtc_date : 2010-04-29
alrm_time : 20:37:11
alrm_date : 2010-04-30
alarm_IRQ : yes
## first report after system is fully awake. Note that rtc_time is now a full 60 seconds ahead of alarm time.

I'm not sure what's actually causing this behaviour, but what it seems as though the kernel isn't actually registering that the IRQ actually fired during a hibernate (or the rtc is broken, but it works fine during S3 tests and I can verify that the IRQ fires and alarm_IRQ resets to 'no' in S3 tests).

In any case, a race is created that isn't met in S3 testing due to the nearly instantaneous kernel resumption from that sleep state, where it is created (or at least the race is lost) when resuming from S4 due to the length of time it takes to resume from that state.)

because of this, subsequent setting of /sys/class/rtc/rtc0/wakealarm will fail without first clearing it with a '0' and if a piece of software is actually looking to see if the RTC fired it's alarm_IRQ, that software will believe that the IRQ has not been fired due to /driver/proc/rtc incorrectly reporting the event.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-21-generic 2.6.32-21.32
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-21.32-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-21-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: bladernr 2029 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf0f20000 irq 22'
   Mixer name : 'IDT 92HD83C1X5'
   Components : 'HDA:111d7604,102802a2,00100104'
   Controls : 16
   Simple ctrls : 10
Card1.Amixer.info:
 Card hw:1 'NVidia'/'HDA NVidia at 0xcdefc000 irq 16'
   Mixer name : 'Nvidia ID a'
   Components : 'HDA:10de000a,10de0101,00100100'
   Controls : 0
   Simple ctrls : 0
Card1.Amixer.values:

Date: Thu Apr 29 20:55:42 2010
HibernationDevice: RESUME=UUID=f4e6db09-5257-40b2-ba2a-0718fc0b3f0d
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MachineType: Alienware M15x
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-21-generic root=UUID=acc23352-13ab-4854-b1d7-a1099a5bf3a5 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34
SourcePackage: linux
dmi.bios.date: 03/11/2010
dmi.bios.vendor: Alienware
dmi.bios.version: A05
dmi.board.vendor: Alienware
dmi.board.version: A05
dmi.chassis.type: 8
dmi.chassis.vendor: Alienware
dmi.chassis.version: A05
dmi.modalias: dmi:bvnAlienware:bvrA05:bd03/11/2010:svnAlienware:pnM15x:pvrA05:rvnAlienware:rn:rvrA05:cvnAlienware:ct8:cvrA05:
dmi.product.name: M15x
dmi.product.version: A05
dmi.sys.vendor: Alienware
---
Architecture: i386
DistroRelease: Ubuntu 10.04
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
Package: linux (not installed)
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: lucid
Uname: Linux 2.6.34-999-generic i686
UnreportableReason: The running kernel is not an Ubuntu kernel
UserGroups:

Revision history for this message
Jeff Lane  (bladernr) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Ugh... this is what I get for writing kernel bugs late in the evening... :( Please see the following changes to my original narrative below:

"I'm working on a test script that sets a time in the future in /sys/class/rtc/rtc0/wakealarm and then uses pm-suspend to hibernate a system. I believe I've found a race condition that's causing my tests to fail."

That should have said that I am using pm-hibernate to put the system into S4. I use pm-suspend to put the system into S3.

"The test, when putting the system into an S3 state, does not suffer from this issue. It DOES when I'm using S4. I think the reason is that S3 wakes quickly enough that the kernel can register that the alarm fired and reset /proc/driver/rtc accordingly, however, when waking from suspend, the kernel takes far longer to wake, causing it to think that even though the rtc's alarm_IRQ fired the IRQ didn't fire, so the kernel does not reset /proc/driver/rtc."

That should have said ". . . however, when waking from HIBERNATE, the kernel takes far longer to wake, causing it to think that the rtc's alarm IRQ never fired (even though it did), so the kernel never updates /proc/driver/rtc to reflect this in the alarm_IRQ entry."

And additionally, "because of this, subsequent setting of /sys/class/rtc/rtc0/wakealarm will fail without first clearing it with a '0' and if a piece of software is actually looking to see if the RTC fired it's alarm_IRQ, that software will believe that the IRQ has not been fired due to /driver/proc/rtc incorrectly reporting the event."

should have said /proc/driver/rtc, NOT /driver/proc/rtc. SIgh... yesterday was a long and draining day :(

Revision history for this message
Manoj Iyer (manjo) wrote :

Looks like a bug in pm scripts where rtc is not cleared after thaw from hibernate.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Jeff,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Jeff Lane  (bladernr)
tags: added: apport-collected
description: updated
Changed in linux (Ubuntu):
status: Incomplete → New
tags: removed: needs-upstream-testing
Revision history for this message
Jeff Lane  (bladernr) wrote :

Recreated with the daily mainline for today...

root@rogue:/home/bladernr# uname -a
Linux rogue 2.6.34-999-generic #201005011008 SMP Sat May 1 10:09:28 UTC 2010 i686 GNU/Linux
root@rogue:/home/bladernr# cat /proc/driver/rtc
rtc_time : 18:56:51
rtc_date : 2010-05-03
alrm_time : 18:45:30
alrm_date : 2010-05-04
alarm_IRQ : yes
alrm_pending : no
24hr : yes
periodic_IRQ : no
update_IRQ : no
HPET_emulated : no
DST_enable : no
periodic_freq : 1024
batt_status : okay
root@rogue:/home/bladernr# cat /sys/class/rtc/rtc0/since_epoch
1272913021
root@rogue:/home/bladernr# cat /sys/class/rtc/rtc0/wakealarm
1272998730

As you can see, the alarm_IRQ entry was not reset, nor was wakealarm, even though my test box DID wake itself at the correct time after being hibernated...

I tried doing an apport-collect for this bug, but apport refuses to allow me to file anything when running a mainline kernel (even though you asked me to do so)...

So if you need logs or other info of whatever sort, ask here and be detailed in what you want.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Jeff,

Can you add the output from:

sudo dmidecode

to this bug? Thanks

Revision history for this message
Colin Ian King (colin-king) wrote :

I'm not entirely sure we can trust the CMOS RTC alarm state getting propagated through to the /proc/driver/rtc interface on all hardware. Perhaps one should use the rtc ioctl() interface to set/get RTC status. There is some sketchy documentation in the kernel source tree under Documentation/rtc.txt

Specifically, perhaps probing the RTC alarm date using the RTC_ALM_READ ioctl() may fetch a date that shows that the alarm has fired or not.

The documentation illustrates how to read the current alarm setting using:

        /* Read the current alarm settings */
        retval = ioctl(fd, RTC_ALM_READ, &rtc_tm);
        if (retval == -1) {
                perror("RTC_ALM_READ ioctl");
                exit(errno);
        }

        fprintf(stderr, "Alarm time now set to %02d:%02d:%02d.\n",
                rtc_tm.tm_hour, rtc_tm.tm_min, rtc_tm.tm_sec);

Maybe this is a more reliable method rather than checking the state via /proc/driver/rtc

Revision history for this message
Jeff Lane  (bladernr) wrote :

Colin, here's dmidecode from my netbook. It shows the behaviour, though it does not apparently support wake on alarm_IRQ anyway.

I'll post same from my other machine shortly...

Revision history for this message
Jeff Lane  (bladernr) wrote :

Colin, here is dmidecode from my other system (Alienware M15x, Core i7).

THIS system DOES support wake from alarm IRQ in S4. The previous attachment is from my netbook that does not wake from hibernate...

So this'll give you two good sets of dmidecode data.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Colin, here is dmidecode from my other system (Alienware M15x, Core i7).

THIS system DOES support wake from alarm IRQ in S4. The previous attachment is from my netbook that does not wake from hibernate...

So this'll give you two good sets of dmidecode data.

Brad Figg (brad-figg)
tags: added: acpi
tags: added: acpi-brightness
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
penalvch (penalvch)
tags: added: needs-upstream-testing
Revision history for this message
penalvch (penalvch) wrote :

Jeff Lane, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc3

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: bios-outdated-a09
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
summary: - [Lucid] Possible race condition involving rtc wakealarm when hibernating
- a system
+ [Dell Alienware M15x] Possible race condition involving rtc wakealarm
+ when hibernating a system
Revision history for this message
Jeff Lane  (bladernr) wrote :

This was an issue in 2010 on Lucid. Give that it's 2014 and I'm on Saucy at this point, I have no idea, nor any interest in pursuing this. :(

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.