Pi3 kernel crash and is unreliable

Bug #1630586 reported by Simon Fels
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snappy
Fix Released
Undecided
Unassigned
linux-raspi2 (Ubuntu)
Fix Released
Critical
Paolo Pisati

Bug Description

Build an image today with

 $ sudo ubuntu-image --channel edge -o pi3.img pi3.model
 $ cat pi3.model

type: model
authority-id: 4BKZlf4WMNBKgQfij0rftmp5BzDdVhlf
series: 16
brand-id: 4BKZlf4WMNBKgQfij0rftmp5BzDdVhlf
model: pi3
architecture: armhf
gadget: pi3
kernel: pi2-kernel
timestamp: 2016-09-09T08:27:36+00:00
...

Once flashed to sdcard and booted on the Pi3 it crashes again and again in the same place (__uart_start -> __dabt_svc) without doing anything. Device can't be used afterwards as this crashed the whole system. Reduced cable connections to just HDMI and power but no change.

See attached picture.

Kernel snap is pi2-kernel_15.snap

Tags: patch
Revision history for this message
Simon Fels (morphis) wrote :
  • syslog Edit (332.0 KiB, application/octet-stream)
Revision history for this message
Simon Fels (morphis) wrote :
description: updated
Changed in linux-raspi2 (Ubuntu):
importance: Undecided → Critical
description: updated
Revision history for this message
Paolo Pisati (p-pisati) wrote :

I've reproduced it just once using today's daily image[1] - i've flashed the image many times(~10 times), but i've got the panic on the screen only once, right before it should print "Press enter to configure" text.

1: 5 October 2016 image:

flag@harukaze:~/Downloads$ ls -la ubuntu-core-16-pi3.img\(1\)*
-rw-rw-r-- 1 flag flag 621553664 ott 5 18:08 ubuntu-core-16-pi3.img(1)
-rw-rw-r-- 1 flag flag 334496896 ott 5 18:08 ubuntu-core-16-pi3.img(1).xz

flag@harukaze:~/Downloads$ md5sum ubuntu-core-16-pi3.img\(1\)
ebdde9fba284b70e2f7ecd3fbf418549 ubuntu-core-16-pi3.img(1)

Revision history for this message
Paolo Pisati (p-pisati) wrote :

By unplugging the ttl cable i have more chances to catch this, but it's still spotty and it took me +5 reflashes and reboots to get this (see the attached blurry screenshot).

Do you have it regularly on first boot? That might be useful for testing a fix.

Here's a short video of the crash: http://people.canonical.com/~ppisati/lp1630586/Crash_rpi3.mp4

Revision history for this message
Paolo Pisati (p-pisati) wrote :

The stack trace shows a NULL ptr dereference in __uart_start() and that made me think that the attached patch could solve it, but it wasn't enough.

Another peculiarity of the serial in the rpi3 is that it's a sw serial, and to void baud rate changes we fixed the core frequency in config.txt:

core_freq=250

By commenting the above line in config.txt, and using a patched kernel (with the aforementioned patch) i can't reproduce this panic anymore - here's a prerolled image with these modifications:

http://people.canonical.com/~ppisati/lp1630586/ubuntu-core-16-pi3-lp1630586.img.xz

To get back the serial, people will have to manually uncomment the core_freq line in config.txt - i'll keep digging into this, in the mean time it would be nice to have some testing of this image.

Revision history for this message
Simon Fels (morphis) wrote :

@Paolo: Will give this image a try tomorrow morning. Thanks!

tags: added: patch
Paolo Pisati (p-pisati)
Changed in linux-raspi2 (Ubuntu):
assignee: nobody → Paolo Pisati (p-pisati)
Revision history for this message
Paolo Pisati (p-pisati) wrote :

http://people.canonical.com/~ppisati/lp1630586/pi3-newbsp.img.xz

here's today's ubuntu-core edge with a new kernel - with this image[*] i was unable to reproduce the oops anymore, can i get some testing?

*: i flashed it +10 times and booted on a raspi3

Revision history for this message
Federico Gimenez (fgimenez) wrote :

With the image from [1] I wasn't able to reproduce the oops on 10 consecutive boots (logs attached), the image was flashed with:

$ sudo dd if=~/Desktop/pi3-newbsp.img of=/dev/sdc bs=4M oflag=sync status=noxfer

[1] http://people.canonical.com/~ppisati/lp1630586/pi3-newbsp.img.xz

Revision history for this message
Paolo Pisati (p-pisati) wrote :

The fix was committed, and it'll be present in the next Xenial/raspi2 kernel.

Changed in linux-raspi2 (Ubuntu):
status: New → Fix Committed
Michael Vogt (mvo)
Changed in snappy:
status: New → Fix Released
Changed in linux-raspi2 (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.