Linaro Ubuntu Engineering Builds

Snowball: Android hangs while bootup with both soft/hard boot

Reported by Abhishek Paliwal on 2011-10-12
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
IglooCommunity
High
Ramesh Chandrasekaran
Linaro Android
Medium
Unassigned
Linaro Ubuntu
Medium
Unassigned

Bug Description

Description:
==========
Rebooting device causes device to hangs before booting up completely.
How to test this:
Over serial console give the command "reboot" - DUT does not bootup completely
It hangs while booting up.

Not seen always but seen approximately 2/10 times.

Last few lines from bootup logs (both times):
--------------Logs---------------------------------------
av8100_hdmi av8100_hdmi.0: HDMI display probed
av8100 0-0070: chip version:2
av8100_hdmi av8100_hdmi.0: Framebuffer created (av8100_hdmi)
--------------Logs---------------------------------------

Reproducible:
===========
Yes, 2/10

IMPACT:
========
Bootup Instability.

Steps:
==========
1. Once Device is up and running give "reboot" command on serial console.
Issue: Device does not reboots completely and hangs while booting up.
Expected Behavior: Bootup should be error free.

Hardware:
==========
Snowball: V5
Other setup:
USB cable (mini B type) connected to Host machine
USB-serial port logging
HDMI-HDMI connection to Acer Monitor.

Software:
==========
https://android-build.linaro.org/builds/~linaro-android/staging-snowball-11.10-release/
build 1

Logs:
======
Serial port logs attached.

Anmar Oueja (anmar) on 2011-10-12
Changed in linaro-landing-team-ste:
status: New → Confirmed

Same issue is also observed sometimes ( ~4/10 times ) while booting up device pressing Power ON key.

https://android-build.linaro.org/builds/~linaro-android/landing-snowball-11.10-release/
#build1

Lee Jones (lag) on 2011-10-17
Changed in linaro-landing-team-ste:
importance: Undecided → Wishlist
assignee: nobody → Lee Jones (lag)
importance: Wishlist → Medium
milestone: none → 2011.11

Issue seen very frequently(5/10 times) while booting up the device (via both hard boot and soft boot)
using following build:
https://android-build.linaro.org/builds/~linaro-android/staging-snowball-11.10-release/#build=6

Issue seen very frequently (via both hard boot and soft boot) on :
https://android-build.linaro.org/builds/~linaro-android/landing-snowball-11.11-release/#build=1

Lee Jones (lag) on 2011-11-23
Changed in linaro-landing-team-ste:
milestone: 2011.11 → none
Anmar Oueja (anmar) wrote :

confirmed to exist on ubuntu and android

summary: - Snowball: Device hangs while bootup when giving "reboot" command on
- serial console
+ Snowball: Device hangs while bootup with both soft/hard boot

This is affect the LAVA team using Snowball PDK.

Anmar Oueja (anmar) on 2011-12-14
Changed in linaro-landing-team-ste:
importance: Medium → Critical
Anmar Oueja (anmar) on 2011-12-14
Changed in linaro-landing-team-ste:
assignee: Lee Jones (lag) → Philippe Langlais (philang)
Philippe Langlais (philang) wrote :

First analyze:

Hardware & Sofware:
-----------------------------
Snowball: V5 & V11
PRCMU firmware v3.4.0 & v3.4.3
Kernel: stable-*-ux500-3.0 on Igloo Kernel

The kernel trace:
-----------------------
reboot calls machine_restart()->ux500_restart()->prcmu_system_reset()->db8500_prcmu_system_reset()
which finally call write (1, PRCM_APE_SOFRST) then the kernel freeze here and wait infinitely that PRCMU issue a sofware reset.
On HREF board (PRCMU v3.4.0) with same kernel this sequence reboot at 100% the board.
On my Snowball v5 (PRCMU v3.4.0), this never works but sometime with my SB V11 (PRCMU v3.4.3, same kernel or old 2.6.38) it works but this is not 100% reproductible.

This is a strange behavior and I don't think that it's a kernel problem, but more probably a PRCMU timing reset sequence problem specific to Snowball hardware.

To go further, I need to have a look in PRCMU firmware, analyse the timing reset sequence (needs to look at the schematic to find the right pins) and PRCMU STM trace .... This isn't simple.
Other questions I have:
Is PRCMU firmware Snowball specific and where are the sources?
Even, if I can find PRCMU source and rebuild it how can I reflash it into SB?
...

Philippe Langlais (philang) wrote :

After many efforts, I succeded to setup my STM trace environment on Snowball with JTAG Lauterbach combiprobe
and took traces after issue a reboot command in console.

One file contains PRCMU trace see trace-PRCMU-reboot-pb.log (in the attached .tar.bz2):
we saw at end the RESET command send by linux kernel: IT6_A9WD_SFTRST_IN & SOFT_RESET_ACTIVE,
after only a specialist can tell if it's OK.

The second file Trace_PRCMU_reboot+ftrace.log (in the attached .tar.bz2) contains a STM capture trace with console+PRCMU+ftrace function interleaved.
How to read this trace:
"ssssss.nnnnnnnnn" is the STM timestamp seconds.nanoseconds
lines begin with [ssssss.nnnnnnnnn 5:00]* ===> PRCMU traces: example "[000113.337969740 5:00]IT6_A9WD_SFTRST_IN"
lines begin with [ssssss.nnnnnnnnn 0:00]<d>string ===> Linux console printk: example " [000113.337983960 0:00]<0>db8500_prcmu_system_reset after writel(1, PRCM_APE_SOFTRST)"
lines begin with [ssssss.nnnnnnnnn 0:00]function <-caller ===> ftrace function traces: example "[000113.315268900 0:00]ux500_restart <-machine_restart+0x58"

I can't do more software traces, the second stage is to analyse in parallel the reset pin behavior with an analyzer.
But before to go further, I wait a Robert Marklund workaround patch based on the usage of a Watchdog (PRCMU or AB).

Remark:
Often, I noticed also that the reset button has no effect on Snowball board, is it an already known hardware problem?

This is prcm urelated, we get stuck somewhere in the boot-rom after the reset has happed.

This is a workaround untill its fixed:

# Set 1 sek time out on ab8500 wdt
echo 0x02 > /sys/kernel/debug/ab8500/register-bank
echo 0x02 > /sys/kernel/debug/ab8500/register-address

# Set reboot and enable the wdt
echo 0x01 > /sys/kernel/debug/ab8500/register-value
echo 0x01 > /sys/kernel/debug/ab8500/register-address
echo 0x11 > /sys/kernel/debug/ab8500/register-value

To use this as normal set it in your .bashrc:
reboot ()
{
  # Set 1 sek time out on ab8500 wdt
  echo 0x02 > /sys/kernel/debug/ab8500/register-bank
  echo 0x02 > /sys/kernel/debug/ab8500/register-address

  # Set reboot and enable the wdt
  echo 0x01 > /sys/kernel/debug/ab8500/register-value
  echo 0x01 > /sys/kernel/debug/ab8500/register-address
  echo 0x11 > /sys/kernel/debug/ab8500/register-value
}

poweroff ()
{
  # Set 1 sek time out on ab8500 wdt
  echo 0x02 > /sys/kernel/debug/ab8500/register-bank
  echo 0x02 > /sys/kernel/debug/ab8500/register-address

  # Set reboot and enable the wdt
  echo 0x01 > /sys/kernel/debug/ab8500/register-value
  echo 0x01 > /sys/kernel/debug/ab8500/register-address
  echo 0x01 > /sys/kernel/debug/ab8500/register-value
}

/R

Anmar Oueja (anmar) wrote :

Robert: Can you please add this patch to our IK kernel tree (TB perhaps?) . We need it for future images for LAVA integration. Please ping Lee if you need help.

Changed in igloocommunity:
assignee: nobody → Robert Marklund (robert-marklund)
importance: Undecided → Critical
Anmar Oueja (anmar) wrote :

Please don't close this bug since Robert's fix (#11) is just a work around and not a real fix.

The patch is sent to the IGLOO kernel.

/R

Anmar Oueja (anmar) on 2012-01-04
Changed in igloocommunity:
status: New → Fix Committed
milestone: none → 2012.01

On
https://android-build.linaro.org/builds/~linaro-android/landing-snowball/#build=110

Device hangs after giving reboot command. Serial console shows following messages:
--------------------
root@android:/ #
root@android:/ # reboot
[ 107.014587] SysRq : Emergency Remount R/O
[ 107.823577] EXT4-fs (mmcblk1p3): re-mounted. Opts: (null)
[ 107.895965] EXT4-fs (mmcblk1p5): re-mounted. Opts: (null)
[ 107.926971] Emergency Remount complete
[ 107.934173] musb-hdrc musb-hdrc: remove, state 1
[ 107.938995] usb usb1: USB disconnect, device number 1
[ 107.944091] usb 1-1: USB disconnect, device number 2
[ 108.019653] musb-hdrc musb-hdrc: USB bus 1 deregistered
[ 108.026275] Restarting system.
-------------

I am not sure if the kernel in that build has the workaround that
Robert Marklund provided. I will ask Mathieu.

Anmar Oueja (anmar) on 2012-01-05
Changed in linaro-landing-team-ste:
assignee: Philippe Langlais (philang) → nobody
no longer affects: linaro-landing-team-ste

I think we are waiting on the fix from PRCMU and not bother with the workaround.

Anmar Oueja (anmar) wrote :

Robert indicated that a new PRCMU update is request to fix this issue. I will touch base with him this week.

Changed in igloocommunity:
assignee: Robert Marklund (robert-marklund) → Thomas Espersson (espersson)
Thomas Espersson (espersson) wrote :

Error report filed inside STE to get new PRCMU versio, no progress yet.

Changed in igloocommunity:
status: Fix Committed → In Progress
Changed in igloocommunity:
assignee: Thomas Espersson (espersson) → Mats Bergström (mats-m-bergstrom)
Anmar Oueja (anmar) wrote :

ChiThu found a work around that addressed the needs of the LAVA team hence the lowering of the priority level. We still need the fix still.

Changed in igloocommunity:
importance: Critical → High
Lee Jones (lag) on 2012-01-19
Changed in igloocommunity:
milestone: 2012.01 → 2012.02
Le Chi Thu (le-chi-thu) wrote :

LAVA has implemented the work around.

Changed in linaro-ubuntu:
status: New → Fix Released
Changed in linaro-android:
status: New → Fix Released
Anmar Oueja (anmar) wrote :

I am assigning this to Thomas since he is chasing the PRCMU changes inside of ST-Ericsson

Changed in igloocommunity:
assignee: Mats Bergström (mats-m-bergstrom) → Thomas Espersson (espersson)
Thomas Espersson (espersson) wrote :

New PRCMU binaries now available (FIDO 405382)
Will be tested by Mats B (after the USB work).

Changed in igloocommunity:
assignee: Thomas Espersson (espersson) → Mats Bergström (mats-m-bergstrom)
Ricardo Salveti (rsalveti) wrote :

Opening it again at Linaro Ubuntu as this is not properly fixed at our images yet.

Changed in linaro-ubuntu:
status: Fix Released → In Progress
importance: Undecided → Medium
Anmar Oueja (anmar) wrote :

Mats: Any update on the PRCMU testing?

Anmar Oueja (anmar) on 2012-02-20
Changed in igloocommunity:
milestone: 2012.02 → 2012.03
Changed in linaro-android:
status: Fix Released → New
Changed in igloocommunity:
assignee: Mats Bergström (mats-m-bergstrom) → Jayeeta Bandyopadhyay (jayeeta)
Changed in igloocommunity:
assignee: Jayeeta Bandyopadhyay (jayeeta) → Sunil Kamath (sunil-kamath)
Anmar Oueja (anmar) wrote :

to be tested against the latest PRCMU

Sunil Kamath (sunil-kamath) wrote :

I am able to reboot everytime when i enter reboot in command prompt. But just that it takes 1min to reset again.

Complete details as below:

I am currently using 176 release for this test.
Also using startup files which are created from "tag 5.19" by security team. Also using equivatent ISSW (Which doesnt check and verify).
This contains 5.19 specific prcmu binary as well. (Which is better than what we are been using so far).

When i give reboot command. it shows

root@android:/ # reboot
[ 27.905975] SysRq : Emergency Remount R/O
[ 27.931060] EXT4-fs (mmcblk0p5): re-mounted. Opts: (null)
[ 27.961730] EXT4-fs (mmcblk0p6): re-mounted. Opts: (null)
[ 27.969848] Emergency Remount complete
[ 28.018615] Restarting system.

and then it takes 1min to restart.

I have tried continuous 5 reboots with new binaries and it reboots ok.

I will carry on with further tests and debugging in Bangalore next week.

Sunil Kamath (sunil-kamath) wrote :

Some conflicting results when i was trying with linaro_android_4.0.3. As sometimes boot itself is not fine. Previously used landing-snowball.xml. Now syncing with tracking-snowball.xml and trying the same.

Sunil Kamath (sunil-kamath) wrote :

we have the error as posted in https://bugs.launchpad.net/igloocommunity/+bug/934034.
This is even after taking expected startup patch.
We are testing this by creating single ICS eMMC image.

Sunil Kamath (sunil-kamath) wrote :

After using proper LAMC, we were able to reproduce the issue in proper way. Debugging ongoing. Also have sent request for prcmu support. This is required for both graphics and reset issues.

Sunil Kamath (sunil-kamath) wrote :

In addition to that i also could see some dependency on USB for graceful reboot.
Every time during soft reboot, i can see crash:

The crash points at adbd process. (issue during adb_release).

I tried to block the process for test and to find root cause:
setprop persist.service.adb.enable 0

finally it was ending in USB issue itself.
[ 38.883544] [<c0026a68>] (clk_disable+0x44/0x50) from [<c033ecf8>] (ab8500_usb_phy_disable+0x64/0xec)
[ 38.883575] [<c033ecf8>] (ab8500_usb_phy_disable+0x64/0xec) from [<c033ee98>] (ab8500_usb_phy_disable_work+0x3c/0x50)
[ 38.883605] [<c033ee98>] (ab8500_usb_phy_disable_work+0x3c/0x50) from [<c004f8a4>] (process_one_work+0x138/0x4ac)

Related to this, i can see the warning just when we reboot:
WARNING: at arch/arm/mach-ux500/clock.c:161 clk_disable+0x44/0x50()

It clearly shows something is blocking from usb for graceful reboot.

I can as well see that even though there was nothing connected to USB, ab8500_usb_phy_disable gets called.
This is not right design.
It should get called only if there is something connected.
When Clock not enabled, caling clk_disable is not right.

Discussing with USB team on the same.

According to them even in 8500 they followed the same architecture.
Code compare between the two is not giving much scope to change. (as code almost similar between snowball and u8500).

Sunil Kamath (sunil-kamath) wrote :

I also can got new prcmu binaries.

1) latest 8500 prcmu.
2) 9500 prcmu.

version details:
PRCMU FW release version of v2.3.5.2 for 8500, and v4.2.6.0 for 9500.
Compared with v3.4.17 (what we have now), these has fix many bugs.

But also open question is if equivalent other boot binaries need to be used.

I have got prcmu binaries signed from security team.
Testing in progress.

Sunil Kamath (sunil-kamath) wrote :

I tried with latest two prcmu binaries:

PRCMU firmware: U9500, version 2.6
PRCMU firmware: U8500, version 3.5.2

With both, its same behavior while doing soft reset.

Checking with USB team about "ab8500_usb_phy_disable", "adb_release" and "clk_disable".
Solving this should ensure the graceful reset.

Changed in igloocommunity:
milestone: 2012.03 → 2012.04
Sunil Kamath (sunil-kamath) wrote :

that adbd crash we were seeing is because adb_release was getting called twice
and we commented out the code under ab8500_usb_phy_disable_work then it reboots fine

Got inputs from USB expert Supriya.

But still its not clear why adb_release getting called twice.
Will investigate after current release.

But definitely we need to use new boot binaries. (tag 5.19)

Fathi Boudra (fboudra) on 2012-03-28
Changed in linaro-ubuntu:
status: In Progress → New
Sunil Kamath (sunil-kamath) wrote :

For ICS, we could clearly see dependency on USB graceful exit for reboot to occur.

I have opened new issue on this:
#965926 Graceful reboot is not possible as "adb_release" gets called twice

This issue has dependency on 965926 issue.

Anmar Oueja (anmar) on 2012-04-09
summary: - Snowball: Device hangs while bootup with both soft/hard boot
+ Snowball: Android hangs while bootup with both soft/hard boot
Changed in igloocommunity:
milestone: 2012.04 → 2012.05
Anmar Oueja (anmar) wrote :

To be tested with stable-android-ux500-3.3-1 tomorrow

Sunil Kamath (sunil-kamath) wrote :

The issue is not expected to solve with this migration. Dependency is on the adb_release issue. Analysis ongoing on issue 965926. Will be able to check further after progressing further on 965926 on 3.3-1.

Sunil Kamath (sunil-kamath) wrote :

The investigation the dependency issue is still ongoing.
We have adb dependency and also power management dependency on this issue.

Its mandatory that we use latest boot binaries as well.

We will check this once after the dependency issue gets resolved, 965926.
targeting for this release itself for now.

Sunil Kamath (sunil-kamath) wrote :

Still waiting for updates on above mentioned issue. else we will have to push to next release.

Changed in igloocommunity:
milestone: 2012.05 → 2012.06
Changed in igloocommunity:
milestone: 2012.06 → 2012.07
Changed in igloocommunity:
assignee: Sunil Kamath (sunil-kamath) → Ramesh Chandrasekaran (ramesh-chandrasekaran)
milestone: 2012.07 → 2012.09
Changed in igloocommunity:
importance: High → Medium
importance: Medium → Low
importance: Low → High

1) At reboot, there is a crash from the usb side.

     Analysis
     ---------
     a) As part of reboot, the composite driver "composite_unbind" function removes the adb usb configuration(android_config_driver).
     b) After this, as part of sysfs close, adb_release, again tries to remove the same config which is already removed

   Fix
   ----
     In the adb_release code flow path, remove the adb usb config only when the composite device is connected.

    Yet to get review and comments from the usb team, on this.

2) The second issue as part of reboot, is there is a warning, from the clock framework, again for usb device as below,

       [ 78.357788] WARNING: at arch/arm/mach-ux500/clock.c:167 clk_disable+0x48/0x50()
       [ 78.374908] usb usb1: USB disconnect, device number 1
       [ 78.382843] musb-hdrc musb-hdrc: USB bus 1 deregistered
       [ 78.391265] Modules linked in: cw1200_core(C) mac80211 gator(O)
       [ 78.397277] [<c001f9cc>] (unwind_backtrace+0x0/0x148) from [<c06933c4>] (dump_stack+0x20/0x24)
       [ 78.409484] Restarting system.
       [ 78.412597] [<c06933c4>] (dump_stack+0x20/0x24) from [<c0036e34>] (warn_slowpath_common+0x64/0x74)
       [ 78.421630] [<c0036e34>] (warn_slowpath_common+0x64/0x74) from [<c0036e70>] (warn_slowpath_null+0x2c/0x34)
       [ 78.431304] [<c0036e70>] (warn_slowpath_null+0x2c/0x34) from [<c002aa88>] (clk_disable+0x48/0x50)
       [ 78.440216] [<c002aa88>] (clk_disable+0x48/0x50) from [<c039b0ac>] (ab8500_usb_phy_disable+0x60/0xc4)
       [ 78.449462] [<c039b0ac>] (ab8500_usb_phy_disable+0x60/0xc4) from [<c039b230>] (ab8500_usb_phy_disable_work+0x3c/0x54)
       [ 78.460266] [<c039b230>] (ab8500_usb_phy_disable_work+0x3c/0x54) from [<c0052504>] (process_one_work+0x138/0x50c)
       [ 78.470581] [<c0052504>] (process_one_work+0x138/0x50c) from [<c0052c90>] (worker_thread+0x1b0/0x488)
       [ 78.479858] [<c0052c90>] (worker_thread+0x1b0/0x488) from [<c005a4f8>] (kthread+0xa4/0xa8)
       [ 78.488189] [<c005a4f8>] (kthread+0xa4/0xa8) from [<c0017a70>] (kernel_thread_exit+0x0/0x8)

     Analysis in progress for this.

3) Finally, as seen from the above logs, there is a indication like "Restarting system", which had triggered a restart. May be because of the prcmu issue, mentioned by Robert Marklund in the above posts is stopping the reboot. Need to check with his workaround for this & will update here.

The flow, goes up to setting PRCM_APE_SOFTRST, after which, the prcmu is suppose to reset all the IP's and then reset the AP.

Workaround given by Robert, just enables the main watchdog timer in ab8500, which later hard-resets the AP9500.

As already discussed, seems to be a prcmu issue to me.(May be debugging needed from prcmu side to tell what happens during reset)

Changed in igloocommunity:
milestone: 2012.09 → none
Botao Sun (botao-sun) wrote :

For STE Snowball reboot issue on Linux Linaro ubuntu image, please refer to and update related information here:

https://bugs.launchpad.net/linaro-linux-baseline/+bug/1103231

Fathi Boudra (fboudra) on 2013-06-07
Changed in igloocommunity:
status: In Progress → New
status: New → Won't Fix
Changed in linaro-android:
status: New → Won't Fix
Changed in linaro-ubuntu:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers