[ARM] Reboot sometimes fails on highbank
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| The Eilt project |
Undecided
|
Unassigned | ||
| linux (Ubuntu) |
Medium
|
Unassigned |
Bug Description
Reproduced on:
Linux version 3.2.0-30-highbank (buildd@chort) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #48-Ubuntu SMP PREEMPT Fri Aug 24 20:04:03 UTC 2012 (Ubuntu 3.2.0-30.
Linux version 3.2.0-32-highbank (buildd@musimon) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #51-Ubuntu SMP PREEMPT Thu Sep 27 00:36:21 UTC 2012 (Ubuntu 3.2.0-32.
Evidently does not affect:
Linux version 3.5.0-16-highbank (buildd@shedir) (gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-2ubuntu1) ) #25-Ubuntu SMP PREEMPT Sat Sep 29 01:44:21 UTC 2012 (Ubuntu 3.5.0-16.
When I use busybox's reboot command, I usually get:
Freeing init memory: 176K
pre-reboot
sd 2:0:0:0: [sda] Synchronizing SCSI cache
Restarting system.
U-Boot 2012.07 (Sep 21 2012 - 14:54:04)
Instead, sometimes I get:
Freeing init memory: 176K
pre-reboot
sd 2:0:0:0: [sda] Synchronizing SCSI cache
Restarting system.
Reboot failed -- System halted
To reproduce, I created an initrd that just has busybox and a simple /init sh script that just reboots. I set a node to persistently boot "pxe", and constantly served the same kernel and initrd, for a constantly cycling test. On affected kernels, this reproduces within around five reboots.
Impact: when this strikes, MAAS fails to deploy nodes. MAAS needs three reboots to fully deploy nodes, so the MAAS failure probability is higher than a single reboot failure. This will cause great difficulty when using nodes at scale.
ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-highbank 3.2.0.32.35
ProcVersionSign
Uname: Linux 3.2.0-31-highbank armv7l
AcpiTables:
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 Oct 3 12:20 seq
crw-rw---T 1 root audio 116, 33 Oct 3 12:20 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu13
Architecture: armhf
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: eth0: no IPv6 routers present
Date: Wed Oct 3 12:23:30 2012
HibernationDevice: RESUME=
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
PciMultimedia:
ProcFB:
ProcKernelCmdLine: console=ttyAMA0 root=UUID=
ProcModules:
RelatedPackageV
linux-
linux-
linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
---
AcpiTables:
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 Oct 3 12:20 seq
crw-rw---T 1 root audio 116, 33 Oct 3 12:20 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu13
Architecture: armhf
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: eth0: no IPv6 routers present
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
Package: linux (not installed)
PciMultimedia:
ProcFB:
ProcKernelCmdLine: console=ttyAMA0 root=UUID=
ProcModules:
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.2.0-31-highbank armv7l
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
Robie Basak (racb) wrote : | #2 |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
tags: | added: kernel-da-key |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1061070
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Robie Basak (racb) wrote : BootDmesg.txt | #4 |
apport information
tags: | added: apport-collected |
description: | updated |
Robie Basak (racb) wrote : ProcCpuinfo.txt | #5 |
apport information
Robie Basak (racb) wrote : ProcEnviron.txt | #6 |
apport information
Robie Basak (racb) wrote : ProcInterrupts.txt | #7 |
apport information
Robie Basak (racb) wrote : UdevDb.txt | #8 |
apport information
Robie Basak (racb) wrote : UdevLog.txt | #9 |
apport information
Robie Basak (racb) wrote : WifiSyslog.txt | #10 |
apport information
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
It is strange you don't see it on 3.5.0, but I believe this commit will fix the problem. It is in process of being applied for quantal.
commit 1f191ef8c716e17
Author: Rob Herring <email address hidden>
Date: Tue Sep 18 15:09:31 2012 -0500
ARM: highbank: retry wfi on reset request
In some cases, an interrupt can occur and prevent cause failure to enter
wfi. This causes reset to hang. Retrying the wfi should be enough to
prevent reset from hanging.
Signed-off-by: Rob Herring <email address hidden>
Changed in linux (Ubuntu): | |
assignee: | nobody → Ike Panhc (ikepanhc) |
Robie Basak (racb) wrote : | #12 |
What I've found so far is that this patch seems to affect the problem on Precise, but just changes the failure symptom. I got a message about flushing sda and then a BMC hang instead.
I had to leave this for the time being so I regret that this report isn't quite complete, but hopefully will help Ike if he carries on with it. It's pretty easy to reproduce by arranging an initrd to cause an immediate reboot, and then leaving a node running in a reboot loop for a while.
Changed in linux (Ubuntu): | |
assignee: | Ike Panhc (ikepanhc) → Girish Sanenahalli (girish-cs7036) |
Changed in linux (Ubuntu): | |
assignee: | Girish Sanenahalli (girish-cs7036) → Ike Panhc (ikepanhc) |
Changed in linux (Ubuntu): | |
assignee: | Ike Panhc (ikepanhc) → Haitao Zhang (minipanda) |
summary: |
- Reboot sometimes fails on highbank + [ARM] Reboot sometimes fails on highbank |
Changed in linux (Ubuntu): | |
assignee: | Haitao Zhang (minipanda) → Ike Panhc (ikepanhc) |
Changed in linux (Ubuntu): | |
assignee: | Ike Panhc (ikepanhc) → nobody |
Subscribing Ike to have a look at this, please.