Android tracking-panda build 9+ re-enters suspend immediately after resume

Bug #869514 reported by Zach Pfeffer
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Android
Won't Fix
High
Unassigned
linaro-landing-team-ti
Invalid
High
warmcat

Bug Description

When this build suspends, it crashes.

https://android-build.linaro.org/builds/~linaro-android/tracking-panda/#build=9

Pressing F1 does not, bring it out of this state.

Revision history for this message
Zach Pfeffer (pfefferz) wrote :

Here's the dmesg of the run and the crash.

Changed in linaro-android:
status: New → Confirmed
importance: Undecided → High
summary: - tracking-panda build 9 crashes during suspend
+ Android tracking-panda build 9 crashes during suspend
Revision history for this message
Anmar Oueja (anmar) wrote : Re: Android tracking-panda build 9 crashes during suspend

I assigned this to Andy since he is looking into suspend issues already.

Changed in linaro-landing-team-ti:
assignee: nobody → warmcat (andy-warmcat)
Revision history for this message
warmcat (andy-warmcat) wrote :

Yes mem suspend is broken actually in our vanilla tilt tree the android one is based on, it's not android specific issue. I spent some hours on it already unfortunately it dies very late in suspend or very early in resume while interrupts are off and printascii in there changes the behaviour.

Revision history for this message
warmcat (andy-warmcat) wrote :

I fixed mem suspend in tilt-tracking after finding various issues. But it's still broken in Android a different way now, imprecise external abort on resume part, another tough one to diagnose. So I'll continue looking at that next week.

I've pushed that differently-broken Android tree anyway since it has the 4460 boot fix and Dave Martin's __linux__ patchset.

Revision history for this message
warmcat (andy-warmcat) wrote :

I found another genuine bug in Linus 3.1-rc9 revolving around twl6030 interrupt service racing with i2c unit coming out of suspend. That only showed up on Android because Android uses twl6030 rtc as wake alarm.

Now in current tilt-android-tracking 5606bae5b312bd4f30be10f1c7e595b6886300d2 we return from suspend OK, but it seems we are immediately pushed back into suspend, presumably by the rootfs but actually I don't know.

There are some WARN_ON stacktraces related to something else we are working on, then at the end of suspend

...
[ 107.466979] omap_hwmod: dss_hdmi: _wait_target_disable failed
[ 107.631622] omapdss DISPC error: timeout waiting for EVSYNC
[ 107.730468] omapdss DISPC error: timeout waiting for EVSYNC
[ 107.867950] omapdss DISPC error: timeout waiting for EVSYNC
[ 107.987548] omapdss DISPC error: timeout waiting for EVSYNC
[ 107.990600] omap_hwmod: aess: _wait_target_disable failed
[ 108.207611] wakeup wake lock: event1-1336
[ 108.212127] PM: resume of devices complete after 1551.544 msecs
[ 108.218383] PM: Finishing wakeup.
[ 108.223083] Restarting tasks ... ~
[ 108.227600] adb_read(24)
[ 108.229980] done.
[ 108.233428] adb_read: waiting for online state
[ 108.233734] suspend: exit suspend, ret = 0 (2000-01-02 01:50:08.526092432 UTC)

[ 108.233856] PM: Syncing filesystems ... done.
[ 108.233856] PM: Preparing system for mem sleep
[ 108.284851] Freezing user space processes ... (elapsed 0.03 seconds) done.
[ 108.319458] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[ 108.342987] PM: Entering mem sleep
...

I added a blank line at the point we are out of suspend, but you can see 120us later we are re-entering suspend. The other log stuff is OK.

Something else related, I see the following in the logs first as the rootfs is booting

[ 25.301391] request_suspend_state: wakeup (3->0) at 25313598632 (2000-01-02 01:48:55.726837156 UTC)

After ~ 1 minute activity timeout, it logs this before doing the initial suspend

[ 89.013824] request_suspend_state: sleep (0->3) at 89025848140 (2000-01-02 01:49:59.439086664 UTC)

But, I don't see any more request_suspend_state, even after we woke, is that possibly related to re-enter suspend immediately?

Added Vishal since he may have insight into rootfs-side questions.

warmcat (andy-warmcat)
summary: - Android tracking-panda build 9 crashes during suspend
+ Android tracking-panda build 9+ re-enters suspend immediately after
+ resume
Changed in linaro-landing-team-ti:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
vishal (vishalbhoj) wrote :

The device wakes up only if we press and hold F1 till the display resumes to the Android GUI .

I appears that if I press and release F1 once the event type reported to android userspace is wrong and hence The device doesn't resume.here is the part of logcat:
01-01 01:42:35.570 I/power ( 1377): *** set_screen_state 0 //SUSPEND CALL
01-01 01:42:40.873 D/EventHub( 1377): /dev/input/event4 got: t0=50, t1=866637, type=0, code=0, v=1
01-01 01:42:40.873 D/EventHub( 1377): /dev/input/event3 got: t0=50, t1=866637, type=0, code=0, v=1
01-01 01:42:40.873 D/EventHub( 1377): /dev/input/event2 got: t0=50, t1=866637, type=0, code=0, v=1
01-01 01:42:40.873 D/EventHub( 1377): /dev/input/event0 got: t0=50, t1=866149, type=0, code=0, v=1
01-01 01:42:52.417 D/EventHub( 1377): /dev/input/event4 got: t0=54, t1=110259, type=0, code=0, v=1
01-01 01:42:52.417 D/EventHub( 1377): /dev/input/event3 got: t0=54, t1=110259, type=0, code=0, v=1
01-01 01:42:52.417 D/EventHub( 1377): /dev/input/event2 got: t0=54, t1=110259, type=0, code=0, v=1
01-01 01:42:52.417 D/EventHub( 1377): /dev/input/event0 got: t0=54, t1=109924, type=0, code=0, v=1

If I press and hold F1 Key then the device resumes with a keypress event .Here is the log:
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event4 got: t0=63, t1=775237, type=0, code=0, v=1
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event3 got: t0=63, t1=775207, type=0, code=0, v=1
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event2 got: t0=63, t1=712554, type=4, code=4, v=458810
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event2 got: t0=63, t1=726074, type=1, code=59, v=1
01-01 01:43:01.687 D/EventHub( 1377): iev.code=59 keyCode=82 flags=0x00000002 err=0
01-01 01:43:01.687 I/power ( 1377): *** set_screen_state 1
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event2 got: t0=63, t1=726074, type=0, code=0, v=0
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event2 got: t0=63, t1=775176, type=1, code=59, v=0
01-01 01:43:01.687 D/EventHub( 1377): iev.code=59 keyCode=82 flags=0x00000002 err=0
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event2 got: t0=63, t1=775207, type=0, code=0, v=1
01-01 01:43:01.687 D/EventHub( 1377): /dev/input/event0 got: t0=63, t1=774963, type=0, code=0, v=1

Attached is the full logcat.

Revision history for this message
vishal (vishalbhoj) wrote :

Attached is the logs from serial terminal and logcats

Revision history for this message
warmcat (andy-warmcat) wrote :

Thanks for the extra logs Vishal.

As discussed on IRC it does look like evdev syncs are coming but not always the key down nor the key up that actually woke us. We don't seem to get a key up alone coming or a key down when it doesn't arrive, which is interesting because key up event is discrete in time and if this is a race with something you would think it would win the race sometimes.

I can imagine a few scenarios that would do this but as you point out it looks like it is kernelside at the moment.

So we'll have a try in the next days to see if we can get closer to the root cause.

Revision history for this message
vishal (vishalbhoj) wrote :

Updating this info about the above logs.

<bhoj> The first time it resumes is due to the press and release of F1 key .its around 73 seconds
<bhoj> and it goes back to suspend at 76 second timestamp
<bhoj> In between this userspace got wrong event
<bhoj> and after that I wake it by keeping F1 pressed which is the log at the end of the file around 83 seconds
<bhoj> and second time it got the right key press event and it resumes properly
<agreen> yeah I see it
<agreen> but this thing about 'wrong' event is wrong
<agreen> if type=0 is sync then it's a legit event okay
<agreen> if you look at the sequence where it does see the F1, a type=0 is also coming afterwards
<bhoj> yes I agree its a legit event but that is not suffecient to wake android ...
<agreen> so it seems more correct to say the F1 key down and key up messages are just missing
<bhoj> android will wake when It receives F1 keypress event
<agreen> right not saying it's not busted
<agreen> agreed
<agreen> just saying it's not getting a mangled, 'wrong' event
<bhoj> okay I agree to that :)
<agreen> (when it doesn't work) but the two events that should be there in addition to the type=0 are missing
<agreen> ok ^^
<bhoj> yes ,so it should be fixed at the kernel level right ?
<agreen> I guess it looks that way at the moment
<bhoj> just bringing to your notice .I see is I get 3 type=0 events ...ideally I should have got 1 type=0 event and following 2 corresponding to key press event ...
<agreen> Either that or accept any wake as a valid wake on Panda for now
<agreen> yeah type 0 is a 'sync' iiui that just flushes queued multi-events as an atomic action
<agreen> so it's a houskeeping message
<bhoj> I cannot change it in userspace since its a common code for all the platform and I will have to justify that fix.
<bhoj> I think present workaround is press and hold F1 till the board wakes up ...
<agreen> well, I cansee us spending some time on this alright, but likewise I have to justify going on a sabbatical to wrestle with something quite esoteric
<agreen> let's leave it we'll take a look in next days and see if we can at least arrive at a deeper understanding of root cause
<bhoj> okay ...

Revision history for this message
vishal (vishalbhoj) wrote :

workaround:
For the 11.10 RC release builds https://android-build.linaro.org/builds/~linaro-android/tracking-panda-11.10-release/#build=5 ,
device will resume from suspend if F1 is pressed and held till the device completely resumes.

vishal (vishalbhoj)
Changed in linaro-android:
status: Confirmed → In Progress
status: In Progress → Confirmed
Alexander Sack (asac)
tags: added: linaro-android panda-tracking suspend-resume
tags: added: panda-4460-tracking
Revision history for this message
warmcat (andy-warmcat) wrote :

BTW on current tilt-android-tracking (it's 3.2-rc0-pre now) suspend is broken completely same way as linux-linaro-3.1 does to it. But tilt-tracking itself, on the same basis, suspend is working 100%. Not sure what to make of that.... maybe it's config related.

Revision history for this message
warmcat (andy-warmcat) wrote :

tilt-3.1 has working suspend again thanks to Jassi, the same patch helps on tracking but something else (probably simpler) is still broken there.

Jassi also reported he saw resume key event every time in Ubuntu...

Revision history for this message
warmcat (andy-warmcat) wrote :

No longer seems to be the case on ICS, it wakes on any key

Changed in linaro-landing-team-ti:
status: Confirmed → Invalid
vishal (vishalbhoj)
Changed in linaro-android:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.