[ARM] Reboot sometimes fails on highbank

Bug #1061070 reported by Robie Basak on 2012-10-03
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Eilt project
Undecided
Unassigned
linux (Ubuntu)
Medium
Unassigned

Bug Description

Reproduced on:

Linux version 3.2.0-30-highbank (buildd@chort) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #48-Ubuntu SMP PREEMPT Fri Aug 24 20:04:03 UTC 2012 (Ubuntu 3.2.0-30.48-highbank 3.2.27)
Linux version 3.2.0-32-highbank (buildd@musimon) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #51-Ubuntu SMP PREEMPT Thu Sep 27 00:36:21 UTC 2012 (Ubuntu 3.2.0-32.51-highbank 3.2.30)

Evidently does not affect:

Linux version 3.5.0-16-highbank (buildd@shedir) (gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-2ubuntu1) ) #25-Ubuntu SMP PREEMPT Sat Sep 29 01:44:21 UTC 2012 (Ubuntu 3.5.0-16.25-highbank 3.5.4)

When I use busybox's reboot command, I usually get:

Freeing init memory: 176K
pre-reboot
sd 2:0:0:0: [sda] Synchronizing SCSI cache
Restarting system.

U-Boot 2012.07 (Sep 21 2012 - 14:54:04)

Instead, sometimes I get:

Freeing init memory: 176K
pre-reboot
sd 2:0:0:0: [sda] Synchronizing SCSI cache
Restarting system.
Reboot failed -- System halted

To reproduce, I created an initrd that just has busybox and a simple /init sh script that just reboots. I set a node to persistently boot "pxe", and constantly served the same kernel and initrd, for a constantly cycling test. On affected kernels, this reproduces within around five reboots.

Impact: when this strikes, MAAS fails to deploy nodes. MAAS needs three reboots to fully deploy nodes, so the MAAS failure probability is higher than a single reboot failure. This will cause great difficulty when using nodes at scale.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-highbank 3.2.0.32.35
ProcVersionSignature: User Name 3.2.0-31.50-highbank 3.2.28
Uname: Linux 3.2.0-31-highbank armv7l
AcpiTables:

AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Oct 3 12:20 seq
 crw-rw---T 1 root audio 116, 33 Oct 3 12:20 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu13
Architecture: armhf
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: eth0: no IPv6 routers present
Date: Wed Oct 3 12:23:30 2012
HibernationDevice: RESUME=UUID=c9ff4c94-9a06-45db-9517-811db708d878
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
PciMultimedia:

ProcFB:

ProcKernelCmdLine: console=ttyAMA0 root=UUID=cd7e02ed-5dad-4dff-ba5d-a5afe52991c1 nosplash
ProcModules:

RelatedPackageVersions:
 linux-restricted-modules-3.2.0-31-highbank N/A
 linux-backports-modules-3.2.0-31-highbank N/A
 linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
---
AcpiTables:

AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Oct 3 12:20 seq
 crw-rw---T 1 root audio 116, 33 Oct 3 12:20 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu13
Architecture: armhf
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: eth0: no IPv6 routers present
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=c9ff4c94-9a06-45db-9517-811db708d878
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
Package: linux (not installed)
PciMultimedia:

ProcFB:

ProcKernelCmdLine: console=ttyAMA0 root=UUID=cd7e02ed-5dad-4dff-ba5d-a5afe52991c1 nosplash
ProcModules:

ProcVersionSignature: User Name 3.2.0-31.50-highbank 3.2.28
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-31-highbank N/A
 linux-backports-modules-3.2.0-31-highbank N/A
 linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.2.0-31-highbank armv7l
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo

Robie Basak (racb) wrote :
tags: added: bot-stop-nagging
Robie Basak (racb) wrote :

Subscribing Ike to have a look at this, please.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1061070

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

Robie Basak (racb) wrote : UdevDb.txt

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed

It is strange you don't see it on 3.5.0, but I believe this commit will fix the problem. It is in process of being applied for quantal.

commit 1f191ef8c716e17b768cb3371b6c507b3932e33e
Author: Rob Herring <email address hidden>
Date: Tue Sep 18 15:09:31 2012 -0500

    ARM: highbank: retry wfi on reset request

    In some cases, an interrupt can occur and prevent cause failure to enter
    wfi. This causes reset to hang. Retrying the wfi should be enough to
    prevent reset from hanging.

    Signed-off-by: Rob Herring <email address hidden>

Ike Panhc (ikepanhc) on 2012-10-15
Changed in linux (Ubuntu):
assignee: nobody → Ike Panhc (ikepanhc)
Robie Basak (racb) wrote :

What I've found so far is that this patch seems to affect the problem on Precise, but just changes the failure symptom. I got a message about flushing sda and then a BMC hang instead.

I had to leave this for the time being so I regret that this report isn't quite complete, but hopefully will help Ike if he carries on with it. It's pretty easy to reproduce by arranging an initrd to cause an immediate reboot, and then leaving a node running in a reboot loop for a while.

Changed in linux (Ubuntu):
assignee: Ike Panhc (ikepanhc) → Girish Sanenahalli (girish-cs7036)
Adolfo Jayme (fitojb) on 2013-01-05
Changed in linux (Ubuntu):
assignee: Girish Sanenahalli (girish-cs7036) → Ike Panhc (ikepanhc)
Haitao Zhang (minipanda) on 2013-01-10
Changed in linux (Ubuntu):
assignee: Ike Panhc (ikepanhc) → Haitao Zhang (minipanda)
summary: - Reboot sometimes fails on highbank
+ [ARM] Reboot sometimes fails on highbank
Changed in linux (Ubuntu):
assignee: Haitao Zhang (minipanda) → Ike Panhc (ikepanhc)
Ike Panhc (ikepanhc) on 2014-02-21
Changed in linux (Ubuntu):
assignee: Ike Panhc (ikepanhc) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers