Watchdog is too aggressive, can lead to unusable device
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | Canonical System Image |
Critical
|
Alejandro J. Cura | ||
| | upstart-watchdog (Ubuntu) |
Critical
|
dobey | ||
Bug Description
upstart-watchdog reboots the phone if a process is restarting repeatedly.
In a situation where apport is disabled (which is the proposed situation on retail devices), a process which encounters a problem may restart repeatedly, causing watchdog to reboot the phone. If the situation hasn't changed then the next time the phone starts the same issue could happen.
I had this (bug 1498080) where a process - mediascanner - was crashing.
My phone was then stuck in a boot loop with watchdog rebooting due to repeated mediascanner crashes.
We should probably re-think this.
This solution was implemented to address bug #1394350
The original idea was to include a check on the number of reboots and go into recovery
Related branches
- Mathieu Trudel-Lapierre: Approve on 2015-10-05
- Alejandro J. Cura (community): Approve on 2015-10-05
-
Diff: 100 lines (+56/-19)3 files modifieddebian/changelog (+7/-0)
debian/session-watchdog.conf (+26/-11)
debian/upstart-watchdog.system-watchdog.upstart (+23/-8)
| description: | updated |
| Changed in canonical-devices-system-image: | |
| assignee: | nobody → Pat McGowan (pat-mcgowan) |
| importance: | Undecided → Critical |
| milestone: | none → ww46-2015 |
| status: | New → Confirmed |
| Changed in canonical-devices-system-image: | |
| milestone: | ww46-2015 → ww40-2015 |
| Pat McGowan (pat-mcgowan) wrote : | #1 |
| tags: | added: hotfix |
| Pat McGowan (pat-mcgowan) wrote : | #2 |
In the short term lets not include this package, please check with phonedations prior to disabling
| Changed in upstart-watchdog (Ubuntu): | |
| assignee: | nobody → Łukasz Zemczak (sil2100) |
| Łukasz Zemczak (sil2100) wrote : | #3 |
I have the seed changes prepared for upload. Are we sure we want to remove upstart-watchdog from the images? I suppose it does save us from certain issues like the device hanging-up completely, right? Anyway, waiting for the final +1.
| Pat McGowan (pat-mcgowan) wrote : | #4 |
The decision short term is to disable the session watchdog and leave the system watchdog in place
Then work toward a more suitable solution
| Łukasz Zemczak (sil2100) wrote : | #5 |
I have a package of upstart-watchdog ready that has the session watchdog disabled, waiting for QA to confirm that this does fix the particular issue at hand.
| Jean-Baptiste Lallement (jibel) wrote : | #6 |
This is a problem on krillin/
The phone reboots after a few minutes then enters a boot loop. Logs attached.
| Changed in upstart-watchdog (Ubuntu): | |
| importance: | Undecided → Critical |
| Changed in canonical-devices-system-image: | |
| status: | Confirmed → Triaged |
| Changed in upstart-watchdog (Ubuntu): | |
| status: | New → Confirmed |
| Jean-Baptiste Lallement (jibel) wrote : | #7 |
The log indicates that the system watchdog is causing the bootloop on devel-proposed
Oct 2 07:28:51 ubuntu-phablet watchdog: 'ubuntu-
Moreover it is rebooting the system for a service that is important but not vital to critical features of the phone. In this case it means there will be no location, but dialler or data should still be working.
I think we should consider the option of disabling the system watchdog too.
| Jean-Baptiste Lallement (jibel) wrote : | #8 |
On devel-proposed I disabled both watchdogs, the services crashes, but the shell starts and I can make a call or networking is up for example.
Here is what the watchdog noticed and the reason it wopuld have rebooted if it was not disabled:
Oct 2 08:05:47 ubuntu-phablet watchdog: 'ubuntu-
Oct 2 08:05:48 ubuntu-phablet watchdog: 'ubuntu-
Oct 2 08:05:48 ubuntu-phablet watchdog: 'ubuntu-
Oct 2 08:06:11 ubuntu-phablet watchdog: 'ubuntu-
A system service crash that would make the device unusable is not worst than rebooting it in a loop.
| Changed in canonical-devices-system-image: | |
| assignee: | Pat McGowan (pat-mcgowan) → Alejandro J. Cura (alecu) |
| Pat McGowan (pat-mcgowan) wrote : | #9 |
We are working on a limit of one reboot, then we can choose to continue to allow both session and system services to be watched
I will note although without specifics
- users have reported spontaneous reboots of their phones
- users have reported boot loops that were not explained
| Changed in upstart-watchdog (Ubuntu): | |
| assignee: | Łukasz Zemczak (sil2100) → Rodney Dawes (dobey) |
| Changed in upstart-watchdog (Ubuntu): | |
| status: | Confirmed → In Progress |
| Launchpad Janitor (janitor) wrote : | #10 |
This bug was fixed in the package upstart-watchdog - 0.4
---------------
upstart-watchdog (0.4) wily; urgency=medium
* Don't trigger reboot if reboot was triggered in last hour.
(LP: #1498133)
-- Rodney Dawes <email address hidden> Mon, 05 Oct 2015 15:22:53 -0400
| Changed in upstart-watchdog (Ubuntu): | |
| status: | In Progress → Fix Released |
| Changed in canonical-devices-system-image: | |
| status: | Triaged → Fix Committed |
| Changed in canonical-devices-system-image: | |
| status: | Fix Committed → Fix Released |


see also bug #1381075 upstart should report applications that hit respawn limit to errors.ubuntu.com