First trust prompt leads to reboot (arale, midori)

Bug #1551811 reported by Víctor R. Ruiz on 2016-03-01
74
This bug affects 12 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Critical
Unassigned
Mir
Invalid
Undecided
Unassigned
Telegram app
Undecided
Unassigned
arale
Undecided
Unassigned
trust-store
Undecided
Unassigned
mir (Ubuntu)
Critical
Unassigned
qtdeclarative-opensource-src (Ubuntu)
Undecided
Unassigned

Bug Description

Test case.
- Flash the phone.
- Open Telegram app.
- Fill the account details.
- Add a contact (if needed).
- Select a contact to send a message.
- Tap on the clip to attach a photo.
- Select the camera.

Expected result.
- After accepting the permission dialog, camera app allows to take a picture.

Actual result.
- After accepting the permission dialog, phone reboots.

I'm not sure whether this is a problem with Telegram or with the camera. However, I couldn't reproduce the problem using the messaging-app. And once the permission dialog is accepted, the reboot is gone.

com.ubuntu.telegram_2.0.8.0_armhf.click
current build number: 259
device name: arale
channel: ubuntu-touch/rc-proposed/meizu.en

--------------------------

Same happens with the following steps:

1. Flash device (arale) with

$ ubuntu-device-flash touch --device arale --channel ubuntu/rc-proposed/meizu.en --recovery-image <path-to-recovery-with-adb-support> --bootstrap

2. Wait until flash process is done
3. Go through first-start wizard
4. Pass tutorial
5. Directly launch camera-app, approve trust-prompt

Repeat 1-5 multiple times and you will see a frozen device after step 5. You can verify it happened due to a kernel panic mentioned in comment #7.

You can reset the trust db entry for the camera app using :
$rm -rf ~/.local/share/CameraService
$restart cameraservice-trust-stored

Run the following to get the command line for all threads so you can match the pid from the kmsg later (see below):
ps H -eo 'pid tid cmd:250 comm'

To verify it was really the kernel panic run

$ sudo cat /proc/last_kmsg

once the device has rebooted.

Michał Karnicki (karni) wrote :

If the device reboots, I'm fairly sure this shouldn't happen on a "lower" level. Is this better suited for canonical-devices-system-image perhaps?

Víctor R. Ruiz (vrruiz) wrote :

Using image 260, I cannot reproduce this.

Changed in canonical-devices-system-image:
status: New → Invalid
Changed in telegram-app:
status: New → Invalid
Dave Morley (davmor2) wrote :

This happened to me just on a fresh flash of rc-proposed image 260

All I did was open the camera app and when the trust prompt appeared accepted permission for the camera app to use the camera. System rebooted after a short delay.

Changed in canonical-devices-system-image:
status: Invalid → Confirmed
Dave Morley (davmor2) wrote :
Víctor R. Ruiz (vrruiz) wrote :
Dave Morley (davmor2) wrote :
Changed in canonical-devices-system-image:
importance: Undecided → Critical
milestone: none → ww08-2016
Simon Fels (morphis) wrote :

The actual crash happens on the kernel side:

[ 282.818416] Process QSGRenderThread (pid: 7530, stack limit = 0xd47a2248)
[ 282.818436] Stack: (0xd47a3e20 to 0xd47a4000)
[ 282.818456] 3e20: d47a3e44 c08dd0d4 d47a3e44 00000000 c08dd0d4 dec06000 d47a3ec0 d4bdd380
[ 282.818482] 3e40: d47a3e84 d47a3e50 c06feed0 c06d7b24 c1379448 d47a3ec0 d4bdd380 00000000
[ 282.818507] 3e60: 00000000 00000000 00000004 00000009 d47a3ec0 d4bdd380 d47a3eb4 d47a3e88
[ 282.818533] 3e80: c06e1b8c c06fee94 ad19ed48 c0186709 df0787f8 ad19ed48 d4bdd380 ad19ed48
[ 282.818559] 3ea0: c0186709 df0787f8 d47a3eec d47a3eb8 c06cf368 c06e19f4 00000081 b627d7a8
[ 282.818583] 3ec0: 00000009 00000018 ad19ed84 00000004 ad19ed88 00000008 00000000 d8af7f00
[ 282.818609] 3ee0: d47a3efc d47a3ef0 c015265c c06cf2e4 d47a3f74 d47a3f00 c015314c c0152630
[ 282.818635] 3f00: d47a3f1c d47a3f10 c08dd184 c0080bec d47a3f3c d47a3f20 c0080a90 c08dd158
[ 282.818661] 3f20: ffffffff b627d7a8 000007f8 00000001 00000000 00000081 b627d7a8 c015c444
[ 282.818686] 3f40: 00000001 d8af7f00 d47a3f74 00000000 d8af7f00 ad19ed48 c0186709 00000001
[ 282.818711] 3f60: 0000000e 00000000 d47a3fa4 d47a3f78 c0153258 c0152c98 c08dbce8 00000001
[ 282.818736] 3f80: c0186709 00000008 ad19ed88 00000036 c000e2e4 d47a2000 00000000 d47a3fa8
[ 282.818762] 3fa0: c000e0c0 c0153204 c0186709 00000008 0000000e c0186709 ad19ed48 ad19ed30
[ 282.818787] 3fc0: c0186709 00000008 ad19ed88 00000036 ad19f59c b61901a4 ad19f49c 00000000
[ 282.818813] 3fe0: 00000018 ad19ed20 b43d3c8c b43bf6b8 200e0010 0000000e 49191230 785be128
[ 282.818854] [<c06d7b8c>] (PVRSRVLookupHandle+0x74/0xf8) from [<c06feed0>] (PVRSRVBridgeEventObjectOpen+0x48/0x110)
[ 282.818896] [<c06feed0>] (PVRSRVBridgeEventObjectOpen+0x48/0x110) from [<c06e1b8c>] (BridgedDispatchKM+0x1a4/0x210)
[ 282.818936] [<c06e1b8c>] (BridgedDispatchKM+0x1a4/0x210) from [<c06cf368>] (PVRSRV_BridgeDispatchKM+0x90/0xc0)
[ 282.818975] [<c06cf368>] (PVRSRV_BridgeDispatchKM+0x90/0xc0) from [<c015265c>] (vfs_ioctl+0x38/0x4c)
[ 282.819011] [<c015265c>] (vfs_ioctl+0x38/0x4c) from [<c015314c>] (do_vfs_ioctl+0x4c0/0x56c)
[ 282.819045] [<c015314c>] (do_vfs_ioctl+0x4c0/0x56c) from [<c0153258>] (SyS_ioctl+0x60/0x8c)
[ 282.819080] [<c0153258>] (SyS_ioctl+0x60/0x8c) from [<c000e0c0>] (ret_fast_syscall+0x0/0x30)
[ 282.819107] Code: 1a000011 e3550000 e51b3020 0a00000a (e5930004)
[ 282.819128] ---[ end trace acfe48d73450b7ec ]---
[ 282.819144] Kernel panic - not syncing: Fatal exception
[ 287.508653]
[ 287.508653] No log for WDT
[ 287.517300] ipanic_data_to_sd: dump SYS_WQ_LOG failed[-3]
[ 287.517320] ipanic_data_to_sd: dump SYS_MMPROFILE failed[-3]
[ 287.547513] Rebooting in 1 seconds..
[ 288.547462] machine_shutdown: start, Process (QSGRenderThread:7530). father trust-stored-sk:2592. grandfather upstart:202

kevin gunn (kgunn72) wrote :

adding mir b/c that looks like a gpu kernel panic - might be the driver itself, so we might be limited or have to look for a workaround.

@reporters - does anyone know if this behavior is identical or exists on other HW/gpu's ? e.g. mako, frieza, krillin? if not, can someone test?

Changed in mir (Ubuntu):
importance: Undecided → Critical
assignee: nobody → Cemil Azizoglu (cemil-azizoglu)
Changed in canonical-devices-system-image:
assignee: nobody → kevin gunn (kgunn72)
milestone: ww08-2016 → 11
Cemil Azizoglu (cemil-azizoglu) wrote :

I took a look and was able to repro on Arale with the latest rc-proposed but it's difficult.

In my instance of the crash, I did not see the PVR kernel logs reported by Simon in comment #7. (Also, I don't see them in the logs attached by Dave and Victor). How were they obtained?

From the logs I obtained, I saw a couple of lines spewed out by the PVR kernel side :

./logs/logs:Mar 17 17:51:36 ubuntu-phablet kernel: [ 688.646227] PVR_K:(Error): 104: RGXGetDeviceHealthStatus: KCCB for DM2 has not progressed (ROFF=10 WOFF=11) [0, ]
./logs/logs:Mar 17 20:55:38 ubuntu-phablet kernel: [ 2670.497641] PVR_K:(Error): 105: RGXGetDeviceHealthStatus: KCCB for DM0 has not progressed (ROFF=1 WOFF=2) [0, ]

but not when the crash happened. So I don't think they are related.

I did notice a constant stream of the following messages, but they seem to happen even when the camera is seemingly working normally. And sometimes, they don't happen.
...
./logs/logs:Mar 17 21:04:59 ubuntu-phablet kernel: [ 7.331986] [mt-i2c]ERROR,454: id=0,addr: 10, transfer error
./logs/logs:Mar 17 21:04:59 ubuntu-phablet kernel: [ 7.363560] [mt-i2c]ERROR,454: id=0,addr: 10, transfer error
./logs/logs:Mar 17 21:04:59 ubuntu-phablet kernel: [ 7.363701] [mt-i2c]ERROR,454: id=0,addr: 10, transfer error
...

I can investigate the GL (PVR) driver side of things more if I someone can tell me how/when PVRSRV related logs appear and how to collect them?

kevin gunn (kgunn72) wrote :

it's very possible the PVR spew is a red-herring. so let's consider incomplete for the gfx side of things until someone has more to add

Changed in mir (Ubuntu):
status: New → Incomplete
Simon Fels (morphis) wrote :

@Kevin: I've seen it only on arale so far and also Dave says that is the only device he saw this so far on. So suspecting an issue on the device side is valid.

Simon Fels (morphis) wrote :

@Cemil: Also that the syslog's don't show the actual crash I've seen is likely because this is a kernel panic and syslog might not get the kernel output before the device is rebooted. Use something like

$ cat /proc/last_kmsg

on the next boot to get the kernel side output only which is stored in memory and remembered from last boot (not if you turn the device off).

Simon Fels (morphis) wrote :

Just reproduce the crash again. Still caused by the same kernel panic as mentioned in comment 7.

description: updated
Simon Fels (morphis) wrote :

Updated the descriptions with the steps I used to reproduce this.

Changed in mir (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → New

@Simon, thanks for the clear steps.

I see that the panic is caused by the 'trust-prompt' executable's quick scene graph renderer (QSGRenderThread) thread. It happens in the Camera app too, but easier to repro using the Telegram app.

Updating the description with a quicker method to repro.

description: updated
tags: added: regression-proposed

I (with tvoss' help) created some tools to recreate the scenario in a more controlled setup (e.g. no Qt). I have tested the trusted prompt feature at the mir level, as well as the unity8 level, with Camera app and with other apps. I haven't been able to repro the issue after thousands of iterations.

It involves running the camera unconfined like so :
phablet@ubuntu-phablet:/usr/share/click/preinstalled/com.ubuntu.camera/current$ ./camera-app --desktop_file_hint=dialer-app

and then running the following in a script doing thousands of iterations :
mir_demo_client_prompt_helper_provider.bin -p <camera-pid> -m /run/user/32011/mir_socket_trusted -- --desktop_file_hint=dialer-app

mir_demo_client_prompt_helper_provider.bin sets up a trust session and forks a gl app (simulating the dialog box) that renders 20 frames and exits (so it could be run automatically.

This setup didn't yield any panic or crash.

One thing I haven't tried is using trivial code (no Qt) as the actual TPP - this way the timing that TPP cutting in right before Camera starts to render can be recreated. Whereas the above case requires the Camera app to have been launched and rendering already. This involves modifying the lp:trust-store code that I'm not familiar with, but with some help, am willing to give it a try.

I 've tested with a trivial TPP (one that doesn't use Qt, that only connects to and disconnects from U8) and wasn't able to repro the issue. Thanks to Thomas for his help. As I said before, the camera driver is a bit flakey - I get errors like

[ 933.537054] [mt-i2c]ERROR,454: id=0,addr: 50, transfer error
[ 933.537093] [mt-i2c]ERROR,460: I2C_ACKERR

or of the following sort,

[ 1006.938770] inv_mpu: error to reset fifo
[ 1007.055090] inv_mpu: error in header parsing=3c00

but the camera preview still seems to work, though the startup delay varies slightly due to these.

This seems to indicate Qt Quick Scenegraph Renderer is doing something bogus. I guess we could also try a TPP which sets up EGL only to glClear the surface to be 100% sure.

Also, Alberto told me that he saw this issue on N4 while doing some unrelated release testing, though he couldn't repro it later after trying 20 times or so.

Changed in mir (Ubuntu):
assignee: Cemil Azizoglu (cemil-azizoglu) → nobody
status: New → Invalid
Changed in mir:
status: New → Incomplete
status: Incomplete → Invalid

As the last step to isolate the issue, I've created a trivial GL-based TPP and was NOT able to repro after trying many times. I assume at this point that the Qt QuickSceneGraph Renderer the actual TPP uses is doing something funny.

I am off the next couple of days and if anyone wants to try the TPP is in silo 80 (https://requests.ci-train.ubuntu.com/#/ticket/1187). Please use the packages with the '15.04.20160330-0ubuntu1' tag as 'citrain device upgrade' picks up the older versions, probably because not all 'arch'es built successfully the last time.

At this time, I'll remove myself and Mir from the bug, as I am not familiar with the QSG.

Thomas Voß (thomas-voss) wrote :

@Cemil: Thanks for the investigation. With Cemil's findings, I'm marking the bug as invalid for trust-store, marking Qt (Quick) as affected instead.

Changed in trust-store:
status: New → Invalid
Pat McGowan (pat-mcgowan) wrote :

best to fix in the kernel

Changed in canonical-devices-system-image:
assignee: kevin gunn (kgunn72) → Yuan-Chen Cheng (ycheng-twn)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qtdeclarative-opensource-src (Ubuntu):
status: New → Confirmed
Selene ToyKeeper (toykeeper) wrote :

Not sure if it helps, but I encountered this using a slightly different sequence of steps in bug 1569660: Install uTorch, run it, then try to grant it permission to access camera-service. I get a kernel panic every time, and am never able to grant access. This was with arale rc-proposed 304.

Michael Terry (mterry) wrote :

Same here. On my arale rc-proposed 315 image, I install Authenticator, run it, click the camera icon, grant it permission, and get a reboot. I got it twice in a row, then it worked the third time... (that time I was plugged into adb, so maybe that makes a difference?) Same kernel stack trace as comment #7 when it did happen though.

Víctor R. Ruiz (vrruiz) wrote :

Still happening

current build number: 322
device name: arale
channel: ubuntu-touch/rc-proposed/meizu.en

Changed in canonical-devices-system-image:
milestone: 11 → 12
Pat McGowan (pat-mcgowan) wrote :

This will block midori rc

summary: - Opening the camera for the first time leads to reboot
+ First trust prompt leads to reboot (arale, midori)
no longer affects: camera-app
Changed in arale:
status: New → Confirmed
Changed in midori:
status: New → Confirmed
gue5t gue5t (gue5t) wrote :

Midori is a web browser with no phone-specific code and does nothing that could effect wider system state or cause reboots. I don't know what "midori rc" is. I'm removing this bug from the Midori bug tracker.

no longer affects: midori
no longer affects: midori
Changed in canonical-devices-system-image:
milestone: 12 → 12.1
Changed in canonical-devices-system-image:
milestone: fi → backlog
Changed in canonical-devices-system-image:
assignee: Yuan-Chen Cheng (ycheng-twn) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments