Pandaboard ES freezes with the default CPU scaling governor ondemand

Bug #971091 reported by P. S.
100
This bug affects 19 people
Affects Status Importance Assigned to Milestone
Linaro Ubuntu
Fix Released
Medium
Unassigned
linaro-landing-team-ti
Fix Released
Medium
Unassigned
linux-ti-omap4 (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Pandaboard ES freezes sporadically with the CPU scaling governor default setting, which is "ondemand".
It is a complete freeze, no syslog entry, no serial access, no network, no keyboard/mouse anymore. When pressing the reset button, it will not even reboot. The SD card interface seems to be hanging. Pulling and re-inserting the SD card before pressing the reset button, or a global power-cycle will re-boot the board properly.

The error occurs during normal operation on desktop, but can also be procuded in an unattended way:
1. Create a RAM disk by adding this to fstab
     none /tmp tmpfs defaults,noatime,mode=1777,size=600M 0 0
2. Mount it and create a big file on the RAM disk:
     dd bs=1M count=210 if=/dev/urandom of=/tmp/a
3. Change to /tmp and start an endless loop:
     while true; do cp a b; date; sleep 3; done

Note that the "sleep" is important to cause the governor to switch CPU speed up and down. Wait for 1 to 8 hrs and find the Pandaboard ES in frozen state. You can do the same with a bigger file and /tmp on the SD card. This will often produce the error faster (possibly due to CPU idling at flash write delays) but will stress your flash card.

Setting the CPU scaling governor to "performance" completely solves the problem. I guess there might be a HW issue on the Pandaboard ES that kicks in with frequent CPU speed changes.
Solution proposal: set the CPU scaling governor to "performance" as default until the issue is further analysed.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-1411-omap4 3.2.0-1411.14
ProcVersionSignature: Ubuntu 3.2.0-1411.14-omap4 3.2.9
Uname: Linux 3.2.0-1411-omap4 armv7l
ApportVersion: 1.95-0ubuntu1
Architecture: armhf
Date: Sun Apr 1 23:01:52 2012
ProcEnviron:
 TERM=xterm
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-ti-omap4
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
P. S. (sadowsky46) wrote :
Revision history for this message
Paolo Pisati (p-pisati) wrote :

do you have the pvr-sgx driver installed? can you cut&paste the output of `lsmod`?

Revision history for this message
P. S. (sadowsky46) wrote :

No, I did not add any graphics driver to the default Beta2 load. Here's the lsmod:
Module Size Used by
joydev 9848 0
usbhid 37879 0
cpufreq_powersave 1020 0
cpufreq_conservative 6707 0
cpufreq_ondemand 7344 0
cpufreq_userspace 2449 0
bnep 11629 2
rfcomm 39016 0
bluetooth 169469 10 bnep,rfcomm
wl12xx_sdio 4168 0
wl12xx 153822 1 wl12xx_sdio
mac80211 480412 1 wl12xx
cfg80211 196495 2 wl12xx,mac80211
leds_gpio 3705 0

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-ti-omap4 (Ubuntu):
status: New → Confirmed
Revision history for this message
warmcat (andy-warmcat) wrote :

I already found out the cause for this on the new tree and fixed it.

I'll prepare a patch on tilt-3.1 with the same fix.

Revision history for this message
warmcat (andy-warmcat) wrote :

Please try the attached... something is wrong with my boot setup atm, tilt-3.1 with or without this patch blows up in per_cpu code early in boot so I can't test it. It's either boot pieces or toolchain, I moved to Linaro 4.6.3 toolchain, which works OK on 3.3 stuff.

tags: added: patch
Revision history for this message
P. S. (sadowsky46) wrote :

Andy, thanks for your effort. I think I'm not experienced enough to apply this patch and get the kernel running on my PandaES. Is there any "cookbook"-style hint available anywhere?

Revision history for this message
Paolo Pisati (p-pisati) wrote :
Revision history for this message
Paolo Pisati (p-pisati) wrote :

don't use the above kernel: i got all kinds of weird hangs and panics minutes after booting.

Revision history for this message
Paolo Pisati (p-pisati) wrote :

ok, seems to really be that patch (stock 1412.15 is stable here): Andy care to comment on that patch?

Revision history for this message
P. S. (sadowsky46) wrote :
Download full text (3.2 KiB)

Warning came too late ;-)
I tried the kernel, it seems to work with the "performance" setting. But it crashes as soon as I switch to "ondemand":

[ 166.025482] Unable to handle kernel paging request at virtual address 00011ecb
[ 166.031646] pgd = ec0e8000
[ 166.031646] [00011ecb] *pgd=00000000
[ 166.039520] Internal error: Oops: 1 [#3] PREEMPT SMP
[ 166.039520] Modules linked in: cpufreq_powersave cpufreq_conservative cpufreq_ondemand cpufreq_userspace rfcomm bnep bluetooth joydev wl12xx_sdio wl12xx mac80211 two
[ 166.047332] CPU: 1 Tainted: G D W (3.2.0-1412-omap4 #15~lp971091)
[ 166.070709] PC is at timerqueue_add+0x54/0xc4
[ 166.077697] LR is at 0xfffee169
[ 166.081024] pc : [<c02be614>] lr : [<fffee169>] psr: 200e0193
[ 166.081024] sp : ec14de50 ip : 00011ebb fp : ec14de6c
[ 166.091491] r10: a3fb8a7c r9 : c0078f3c r8 : 00000026
[ 166.093627] r7 : 00000001 r6 : ee347ae8 r5 : c12465e4 r4 : ee7ee830
[ 166.105499] r3 : ee347ae0 r2 : 00011ebd r1 : 00000026 r0 : a3fb8a7c
[ 166.112152] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
[ 166.112152] Control: 10c5387d Table: ac0e804a DAC: 00000015
[ 166.119964] Process Xorg (pid: 805, stack limit = 0xec14c2f8)
[ 166.119964] Stack: (0xec14de50 to 0xec14e000)
[ 166.119964] de40: ee7ee830 c12465d8 c12465d8 00000001
[ 166.119964] de60: ec14de94 ec14de70 c0077cb0 c02be5cc 00000325 ee7ee830 c12465d8 c12465d8
[ 166.153808] de80: 00000001 00000026 ec14df04 ec14de98 c0078dc0 c0077c08 00000000 00000000
[ 166.153808] dea0: ec14dec4 00000038 c09155a0 c05d13c4 ee7ee830 c0078f3c ec14ded4 c05d13a8
[ 166.153808] dec0: 00000000 00000026 a3fb8a7c 00000026 c05d13a8 200e0193 ec14def4 ec14df70
[ 166.153808] dee0: ec112040 01312d00 3b9aca00 ee7ee830 ec14c000 00000000 ec14df24 ec14df08
[ 166.188201] df00: c0078f3c c0078ae0 00000000 00000001 00000001 ee7ee830 ec14df6c ec14df28
[ 166.188201] df20: c0057ea0 c0078f14 00000001 00000000 01312d00 00000000 c007ec90 c005827c
[ 166.188201] df40: 00000000 00004e20 00000100 00000000 00000000 00000000 00000068 c000dee8
[ 166.188201] df60: ec14dfa4 ec14df70 c00580a4 c0057d38 00000000 00004e20 00000000 00004e20
[ 166.222595] df80: bebf9a30 c0073a08 000000a5 3805b455 000003e8 00000c8c 00000000 ec14dfa8
[ 166.231201] dfa0: c000dc60 c005801c 000003e8 00000c8c 00000000 bebf9a28 00000000 00004e20
[ 166.231201] dfc0: 000003e8 00000c8c 00000000 00000068 00000001 b6f95f74 b6f8b000 b782f328
[ 166.248382] dfe0: b6f8b154 bebf9a24 b6f592b5 b6bc049c 200e0110 00000000 00000000 00000000
[ 166.248382] [<c02be614>] (timerqueue_add+0x54/0xc4) from [<c0077cb0>] (enqueue_hrtimer+0xb4/0xf0)
[ 166.248382] [<c0077cb0>] (enqueue_hrtimer+0xb4/0xf0) from [<c0078dc0>] (__hrtimer_start_range_ns+0x2ec/0x434)
[ 166.276763] [<c0078dc0>] (__hrtimer_start_range_ns+0x2ec/0x434) from [<c0078f3c>] (hrtimer_start+0x34/0x3c)
[ 166.287017] [<c0078f3c>] (hrtimer_start+0x34/0x3c) from [<c0057ea0>] (do_setitimer+0x174/0x264)
[ 166.287017] [<c0057ea0>] (do_setitimer+0x174/0x264) from [<c00580a4>] (sys_setitimer+0x94/0xf8)
[ 166.287017] [<c00580a4>] (sys_setitimer+0x94/0xf8) from [<c000dc60>] (ret_fast_syscal...

Read more...

Revision history for this message
David Long (dave-long) wrote :

I have reproduced assorted kernel panics. I am investigating.

Changed in linux-ti-omap4 (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Trevor Robinson (scurrilous) wrote :

For anyone that winds up here just looking for a workaround:

update-rc.d ondemand disable
apt-get -y install cpufrequtils
echo 'ENABLE="true"
GOVERNOR="performance"
MAX_SPEED="0"
MIN_SPEED="0"' > /etc/default/cpufrequtils
cpufreq-set -r -g performance

After doing this, 2 Pandaboards that would freeze every few hours have been running for several days.

Revision history for this message
Paul W Panish (ppanish) wrote :

I ran "update-rc.d ondemand disable" which renamed all links to ondemand to K01ondemand so that they should not be run during system initialization. However, the system is still coming up with the CPU governor set to ondemand. I can add cpufreq-set to my rc.local, but it would be nice to know why this is happening. Is it a default kernel mode?

Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Andy, do you have any update for this bug? I know Ubuntu is still using 3.2, but I remember we had to disable CPU_FREQ with 3.3 to get it to run without freeze on 4460, so I'm not sure if the fix is already around somewhere.

Changed in linaro-landing-team-ti:
status: New → Confirmed
Changed in linaro-ubuntu:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
warmcat (andy-warmcat) wrote :

AFAIK this should be solved in tilt-3.4, the problem was coming from frequency update code which should now be in good shape.

Changed in linaro-landing-team-ti:
status: Confirmed → Fix Committed
warmcat (andy-warmcat)
Changed in linaro-landing-team-ti:
status: Fix Committed → Fix Released
Usman Ahmad (usman-ah)
Changed in linaro-landing-team-ti:
importance: Undecided → Medium
Revision history for this message
Gao Xianchao (gxcmaillist) wrote :

STILL NOT FIXED???

Revision history for this message
Ricardo Salveti (rsalveti) wrote :

Not able to reproduce with latest kernel from TI (3.4.0-2-linaro-lt-omap #2~ci+120825182553-Ubuntu). Marking it as fix committed for now, but please re-open it in case of issues.

Changed in linaro-ubuntu:
status: Confirmed → Fix Committed
Revision history for this message
Paolo Pisati (p-pisati) wrote :

we switched to CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y since Ubuntu-3.2.0-1405.7

Changed in linux-ti-omap4 (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Paul W Panish (ppanish) wrote :

I don't believe this has been fixed in the Ubuntu release. My system crashed twice within 12 hours after I updated the kernel and switched back to "ondemand". I'm verifying now that it doesn't crash in "performance" mode.

ppanish@ppanda:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04.1 LTS"
ppanish@ppanda:~$ uname -a
Linux ppanda 3.2.0-1419-omap4 #26-Ubuntu SMP PREEMPT Wed Sep 12 14:32:40 UTC 2012 armv7l armv7l armv7l GNU/Linux

If there's any specific information you'd like me to gather let me know.

Revision history for this message
Paul W Panish (ppanish) wrote :

No crash in "performance" mode after 24+ hours. This has definitely not been fixed in the release version with the 3.2.0-1419-omap4 kernel . I don't know about the repository builds. THIS SHOULD NOT BE MARKED AS FIXED!! Please reactivate the bug.

I hope no one is saying that disabling frequency scaling is a legitimate fix for this problem. That would be like saying if you power down the board there are no issues with how it runs.

Fathi Boudra (fboudra)
Changed in linaro-ubuntu:
status: Fix Committed → Fix Released
Revision history for this message
Ben Gamari (bgamari) wrote :

Is it true that the fix implemented here was to disable frequency scaling? I agree that this can be considered a workaround, but to claim that the problem is solved is a bit far. Users expect CPU power management to function on a modern system. Disabling it as a long-term solution is not an option unless documented hardware errata leave no other choice.

Revision history for this message
Robert Nelson (robertcnelson) wrote :

Well last I checked, even mainline v3.9.x one needs to explicitly cpufreq on the omap4460 to even boot to console prompt. Haven't personally tried v3.10/v3.11/v3.12 as i've been busy with other projects and that panda es just keeps running on v3.9.x. Long term, remember that division was canned at TI last november, so it's now up to users/community/x-ti'ers to fix it. (or someone at ti who happens to have an interest in the board..)

Revision history for this message
Tommi Tikkanen (tommi-exe) wrote :

After changing governor to "performance", crashes were still appearing.

However, I traced this to be most likely caused by CPU overheating. /var/log/syslog.* recorded messages like
omap_monitor_zone:hot spot temp 86874, and even temperatures over 100 C.

This is just a reminder to all who switch to performance mode: remember to attach a heat sink!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.