ofono does not start after upgrade (initclt stalls when sending an event)

Bug #1480785 reported by Alfonso Sanchez-Beato
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Incomplete
Undecided
Unassigned
upstart (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Some times, on first boot after either using ubuntu-device-flash or upgrading the phone via OTA, some processes do not start because initctl seems to stall and does not send events that trigger the launch of those processes. For instance, I see in my arale:

$ ps -ef | grep initclt
root 742 563 1 08:25 ? 00:00:00 initctl emit android
root 1826 746 0 08:25 ? 00:00:00 initctl emit rild-ready OFONO_RIL_DEVICE=mtk OFONO_RIL_NUM_SIM_SLOTS=1

and "ofonod" is not started because "rild-ready" is not emitted.

A second boot solves the issue and I do not see any initctl process still executing after some seconds.

This bug can certainly scare people when they see that they do not have telephony just after an OTA.

I have reproduced in

$ system-image-cli -i
current build number: 80
device name: arale
channel: ubuntu-touch/devel-proposed/meizu.en
last update: 2015-08-03 08:24:54
version version: 80
version ubuntu: 20150802
version device: 20150709-8965e37
version custom: 20150716-819-8-42

Bill Filler (bfiller)
Changed in upstart (Ubuntu):
importance: Undecided → Critical
status: New → Confirmed
Changed in canonical-devices-system-image:
milestone: none → ww40-2015
importance: Undecided → Critical
summary: - initclt stalls when sending an event
+ ofono does not start after upgrade (initclt stalls when sending an
+ event)
Revision history for this message
Bill Filler (bfiller) wrote :

I just reproduced this, using ubuntu-device-flash to flash the rc-proposed on arale. My first boot after upgrade I get no ofono running.

Revision history for this message
Bill Filler (bfiller) wrote :

Here is the output of ps -ef | grep ril

phablet@ubuntu-phablet:~$ ps -ef | grep ril
radio 1752 700 0 10:04 ? 00:00:00 /system/bin/rild
root 1769 779 0 10:04 ? 00:00:00 initctl emit rild-ready OFONO_RIL_DEVICE=mtk OFONO_RIL_NUM_SIM_SLOTS=1
phablet 4687 4670 0 10:47 pts/18 00:00:00 grep --color=auto ril

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

From an old comment by sjr

"initctl emit" will not return until all side-effects of the event have completed. In some cases, this can actually block forever.
If you don't want to wait, use "initctl emit --no-wait filesystem"

Changed in canonical-devices-system-image:
status: New → Confirmed
assignee: nobody → Steve Langasek (vorlon)
Revision history for this message
Steve Langasek (vorlon) wrote :

This is not a bug in upstart itself, upstart is behaving by design. If these initctl calls are blocking the boot but should not, then the jobs should be changed to call with the --no-wait argument as you found. If the initctl calls need to block the boot but are failing to return, the bug is in whatever is failing to run to completion.

Revision history for this message
Steve Langasek (vorlon) wrote :

The ofono job on the Ubuntu phone has the following start rule:

  start on started dbus and rild-ready

The rild-ready event will have been emitted as soon as 'initctl emit' is called, but the command won't return until all jobs depending on it have finished. If ofono has not started, the cause will be that the 'started dbus' event has not triggered, rather than that rild hasn't started correctly.

dbus is 'start on filesystem'. The filesystem event happens late in boot and requires all the filesystems to have been successfully mounted. So something is probably blocking one or more filesystem mounts from completing.

Additional information to gather when this problem occurs:

ps awx|grep mountall
initctl list
status dbus

If this problem is consistently reproducible, modifying /etc/init/mountall.conf to run mountall with --verbose and capturing /var/log/upstart/mountall.log would also be helpful.

Changed in canonical-devices-system-image:
status: Confirmed → Incomplete
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :
Download full text (5.6 KiB)

Got it again, the android event seems stalled too and maybe that's stopping other things:

root@ubuntu-phablet:/home/phablet# ps -ef|grep initctl
root 749 562 0 12:38 ? 00:00:01 initctl emit android
root 1852 753 0 12:38 ? 00:00:00 initctl emit rild-ready OFONO_RIL_DEVICE=mtk OFONO_RIL_NUM_SIM_SLOTS=1
root 3798 3769 0 12:42 pts/9 00:00:00 grep --color=auto initctl

root@ubuntu-phablet:/home/phablet# ps awx|grep mountall
 3786 pts/9 S+ 0:00 grep --color=auto mountall

root@ubuntu-phablet:/home/phablet# initctl list
mountnfs-bootclean.sh start/running
rsyslog start/running, process 697
udev start/running, process 884
upstart-udev-bridge start/running, process 777
whoopsie start/running, process 1554
mountall-net stop/waiting
mtp-state stop/waiting
passwd stop/waiting
rc stop/waiting
tty4 stop/waiting
ureadahead-other stop/waiting
apport start/running
rild start/running, process 753
bluetooth-touch-grouper stop/waiting
click-system-hooks stop/waiting
console-setup stop/waiting
hwclock-save stop/waiting
plymouth-log stop/waiting
tty5 stop/waiting
zram-touch stop/waiting
mountall.sh start/running
urfkill start/running, process 895
procps-instance stop/waiting
rfkill-store stop/waiting
set.pretty-hostname stop/waiting
tethering stop/waiting
ubuntu-location-provider-here-posclientd stop/waiting
failsafe stop/waiting
dbus start/running, process 729
lxc-android-config start/post-start, process 561
 post-start process 562
resolvconf start/running
alsa-utils stop/waiting
bluetooth-touch-flo stop/waiting
lxc-android-boot stop/waiting
mounted-var stop/waiting
plymouth-shutdown stop/waiting
plymouth stop/waiting
system-watchdog stop/waiting
udev-fallback-graphics stop/waiting
ssh start/running, process 1796
checkroot.sh start/running
lxc start/running
network-manager start/running, process 1288
ubuntu-location-service start/running, process 1853
bluetooth-touch stop/waiting
control-alt-delete stop/waiting
factory-wipe stop/waiting
hwclock stop/waiting
mounted-proc stop/waiting
alsa-store stop/waiting
setvtrgb stop/waiting
shutdown stop/waiting
cron start/running, process 1502
lightdm start/running, process 1887
mountkernfs.sh start/running
upstart-local-bridge start/running, process 560
alsa-restore stop/waiting
apport-noui stop/waiting
mountall stop/waiting
mounted-debugfs stop/waiting
ubuntu-location-provider-here-after-wizard stop/waiting
hostname.sh start/running
android-usb-state stop/waiting
console stop/waiting
mounted-run stop/waiting
ofono stop/waiting
bluetooth start/running, process 739
checkfs.sh start/running
checkroot-bootclean.sh start/running
mountnfs.sh start/running
plymouth-stop start/running
ufw start/running
kmod stop/waiting
rcS stop/waiting
ssh-keygen stop/waiting
ubuntu-location-provider-here-slpgwd stop/waiting
wait-for-state stop/waiting
bootmisc.sh start/running
android-tools-adbd start/running, process 2379
bluetooth-touch-maguro stop/waiting
flush-early-job-log stop/waiting
lxc-instance stop/waiting
rc-sysinit stop/waiting
enable-cpu-hotplugging stop/waiting
powerd start/running, process 758
upstart-socket-bridge start/running, process 1438
cgproxy stop/waiting
force-mtp stop/wa...

Read more...

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1480785] Re: ofono does not start after upgrade (initclt stalls when sending an event)

On Fri, Sep 04, 2015 at 10:44:33AM -0000, Alfonso Sanchez-Beato wrote:
> Got it again, the android event seems stalled too and maybe that's
> stopping other things:

> root@ubuntu-phablet:/home/phablet# ps -ef|grep initctl
> root 749 562 0 12:38 ? 00:00:01 initctl emit android
> root 1852 753 0 12:38 ? 00:00:00 initctl emit rild-ready OFONO_RIL_DEVICE=mtk OFONO_RIL_NUM_SIM_SLOTS=1
> root 3798 3769 0 12:42 pts/9 00:00:00 grep --color=auto initctl

The android event is stalled beacuse the rild-ready event chains from it.

> root@ubuntu-phablet:/home/phablet# ps awx|grep mountall
> 3786 pts/9 S+ 0:00 grep --color=auto mountall

Ok.

> root@ubuntu-phablet:/home/phablet# initctl list
<snip>
> mountall stop/waiting
<snip>
> ofono stop/waiting

Ok, not what I expected to see here. This means that ofono's start has not
been triggered (if it had, we would see start/starting or start/running),
*and* mountall is not running.

We will definitely need some debug logs out of mountall. Can you please do
the following as root:

 mount -orw,remount /
 sed -i -e's/--daemon/& --verbose/' /etc/init/mountall.conf
 mount -oro,remount /

And when you can reproduce the problem after a reboot, attach
/var/log/upstart/mountall.log to this bug?

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

@steve if this only happens on the first boot will the steps in the last comment help or do we need to make a change in the package so it is already set for first boot after flash

Revision history for this message
Steve Langasek (vorlon) wrote :

On Tue, Sep 08, 2015 at 02:02:53PM -0000, Pat McGowan wrote:
> @steve if this only happens on the first boot will the steps in the last
> comment help or do we need to make a change in the package so it is
> already set for first boot after flash

Pat, if this is reproducible after an OTA, you can safely change the file
before the OTA - mountall isn't getting updated, so this file won't get
clobbered after the OTA.

If it's only reproducible on first boot after flashing, then yes you would
need to get the change into the filesystem some other way. I'm not sure
this is a change that we want in production however, so I don't know the
best way to put this change in so that it doesn't accidentally leak into a
stable OTA update.

Changed in canonical-devices-system-image:
milestone: ww40-2015 → backlog
Changed in canonical-devices-system-image:
importance: Critical → Undecided
Changed in upstart (Ubuntu):
status: Confirmed → Incomplete
importance: Critical → Undecided
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for upstart (Ubuntu) because there has been no activity for 60 days.]

Changed in upstart (Ubuntu):
status: Incomplete → Expired
Steve Langasek (vorlon)
Changed in canonical-devices-system-image:
assignee: Steve Langasek (vorlon) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.