Mir

[usc] Mir gives up and shuts down due to input with multimonitor qtmir (std::exception::what: Failure sending input event)

Bug #1496069 reported by Gerry Boland on 2015-09-15
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Critical
Alberto Aguirre
mir (Ubuntu)
Undecided
Unassigned

Bug Description

I'm implementing proper multimonitor support in QtMir in this branch:
lp:~gerboland/qtmir/multimonitor/

Please build & install on device. You don't need to change unity8. The immediate goal is for unity8 to remain on the tablet screen, and have nothing (well a black screen) drawn on the external monitor.

This is mostly happening, but I'm experiencing problems related to input.

Steps to repro:
1. When you plug in the external monitor, nothing will appear on it (for later), but unity8 should still be interactive on the N7
    - unity8 will flicker when the display config changes, due to it tearing down the old gl context and creating a new one.
    - unity8 is not getting any touch input events. I checked that USC is not sending it any. [1]
    - however the vol up/down key does cause the volume notification to appear.

2. Now unplug the external monitor.
    - *sometimes* unity8 works fine, gets input events again just fine and works great.
    - *sometimes* unity8 crashes as soon as you tap on the screen. It crashes because USC shuts down [2]! Relevant failure message:

ERROR: /home/gerry/dev/projects/mir/mir-0.15/src/server/input/android/input_sender.cpp(218): Throw in function void mir::input::android::InputSender::ActiveTransfer::send(mir::input::android::InputSendEntry&&)
Dynamic exception type: N5boost16exception_detail10clone_implINS0_19error_info_injectorISt13runtime_errorEEEE
std::exception::what: Failure sending input event :
9, "Bad file descriptor"

Something gone screwy with input then.

Perhaps QtMir needs to do something special to its input receiver when the multimonitor situation changes and its surface is destroyed & recreated??
Guidance appreciated
-G

[1] http://pastebin.ubuntu.com/12418660/ - is USC log with input reporting enabled. Started with 1 monitor, then was 2, then 1 again. Observe while in multimonitor case, no input events are "Published" and "Received"
[2] http://pastebin.ubuntu.com/12419006/ - USC backtrace when it shuts down

Related branches

Gerry Boland (gerboland) wrote :

Am using Nexus7 & a slimport cable to reproduce. Am working on the rc-proposed image.

Changed in mir:
assignee: nobody → Alberto Aguirre (albaguirre)
summary: - [usc] crash due to input with multimonitor qtmir
+ [usc] Mir gives up and shuts down due to input with multimonitor qtmir
+ (std::exception::what: Failure sending input event)
Changed in mir:
importance: Undecided → High
milestone: none → 0.17.0
Changed in mir:
status: New → In Progress
kevin gunn (kgunn72) on 2015-09-16
Changed in mir:
importance: High → Critical
Gerry Boland (gerboland) wrote :

I did some mucking around, it might help.

I was concentrating on the Nexus7. Due to the multimonitor group buffer post design of Android, QtMir currently crashes with my multimonitor branch if it runs as host server with multiple displays. But it works fine as a nested server.

So I wanted to check if this case works ok. Here's how you can reproduce it:
0. Have my multimonitor branch installed
1. push the demos/qml-demo-shell directory of my multimonitor branch to the device. Install mir-demos
2. ssh into the device, and do:
        sudo stop lightdm
3. turn back on the backlight:
        echo 255 > /sys/devices/platform/msm_fb.591617/leds/lcd-backlight/brightness
4. start a host mir server & share the socket:
       sudo mir_proving_server --display-config=sidebyside
       sudo chmod 777 /tmp/mir_socket
5. in another terminal on the device, launch the QML demo:
      export MIR_SOCKET=/tmp/mir_socket
      export QT_QPA_PLATFORM=mirserver
      qmlscene qml-demo-shell.qml
you should see the unity logo on a mild gradient. Tapping the logo makes it spin, wheeee.

6. Now plug in external monitor. You should see a second copy of the unity logo on a mild gradient. This is actually a separate scene, it's not a clone. Tap the logo, one will spin but the other won't.

Now I find that tapping the unity logo is not reliable, random tapping occasionally sets the logo going, but usually not.

To see the input events Qt is getting, run

      QT_LOGGING_RULES='qtmir.*=true' qmlscene qml-demo-shell.qml

I've noticed the coordinates of the input events Qt is printing are off, so that's probably QtMir's fault and I'll fix.

** But there are also portions of the N7 screen I tap where Qt isn't getting input at all. Right now the bottom right of the shell on the tablet is not getting any. **

Input seems to work better if I start the QML demo with external display connected, then unplug, then replug.

Gerry Boland (gerboland) wrote :

Now considering case which most accurately reflects this bug: where shell/unity8 is not going to draw on the external display, just do

qmlscene Shell.qml

The external display will be black. But interacting with the single shell is fine. (rotation animation double speed, unsure why).

Now bring back unity-system-compositor:
    sudo start lightdm
    stop unity8
    export MIR_SOCKET=/run/mir_socket
    QT_LOGGING_RULES='qtmir.*=true' MIR_SERVER_NAME=session-0 qmlscene Shell.qml
on plugging in the external display, QML is not getting input events any more. If I unplug, It does get input events, but something gets confused about the target window size - events in some portions of the screen are rejected.

I did get a rendering hang in this case though too, this being the culprit:

Thread 2 (Thread 0xaf897450 (LWP 9247)):
#0 0xffffffff in __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1 0xffffffff in __pthread_cond_wait (cond=0x1ab0c98, mutex=0x1ab0c80) at pthread_cond_wait.c:186
#2 0xffffffff in std::condition_variable::wait(std::unique_lock<std::mutex>&) () at /usr/lib/arm-linux-gnueabihf/libstdc++.so.6
#3 0xffffffff in MirWaitHandle::wait_for_all() (__p=..., __lock=..., this=0x1ab0c98) at /usr/include/c++/4.9/condition_variable:98
#4 0xffffffff in MirWaitHandle::wait_for_all() (this=0x1ab0c80) at /build/mir-GYyRoo/mir-0.15.1+15.04.20150903/src/client/mir_wait_handle.cpp:53
#5 0xffffffff in mir::client::BufferStream::request_and_wait_for_next_buffer() (this=0x1ab0b40) at /build/mir-GYyRoo/mir-0.15.1+15.04.20150903/src/client/buffer_stream.cpp:316
#6 0xffffffff in mir::client::android::EGLNativeSurfaceInterpreter::driver_returns_buffer(ANativeWindowBuffer*, int) (this=0x1aab0ac, fence_fd=<optimized out>) at /build/mir-GYyRoo/mir-0.15.1+15.04.20150903/src/platforms/android/client/egl_native_surface_interpreter.cpp:53
#7 0xffffffff in mir::graphics::android::MirNativeWindow::queueBuffer(ANativeWindowBuffer*, int) (this=<optimized out>, buffer=<optimized out>, fence=<optimized out>) at /build/mir-GYyRoo/mir-0.15.1+15.04.20150903/src/common/graphics/android/mir_native_window.cpp:212
#8 0xffffffff in ()

Gerry Boland (gerboland) wrote :

Can also get mir_proving_server to fail with:

0. USC stopped
1. external display unplugged
2. sudo mir_proving_server --display-config=sidebyside
3. QT_LOGGING_RULES='qtmir.*=true' qmlscene Shell.qml
4. tap unity logo to start it spinning
5. plug in display (unity logo spins x2 speed??)
6. unplug display
crash!

Daniel van Vugt (vanvugt) wrote :

Side note: We have a script in mir-utils to fix the backlight like you do. Just run 'mirbacklight' or 'mirbacklight <percentage>' (note: arale: is odd and needs 99, not 100).

Daniel van Vugt (vanvugt) wrote :

std::exception::what: Failure sending input event :
9, "Bad file descriptor"

Could that be related to the fd exhaustion of bug 1495871?

Daniel van Vugt (vanvugt) wrote :

"unity logo spins x2 speed": That generally means you're not using user_id correctly when calling compositor_acquire. Each monitor needs to provide a separate user_id so that the clients frames are only consumed at normal speed (the speed of the fastest monitor).

Gerry Boland (gerboland) wrote :

@vanvugt - the thing is, the unity logo is being drawn & animated by the nested server itself, it's not a client surface being composited! I.e. is it USC/Mir calling compositor_acquire.

I expected the nested server (i.e. Qt's renderer) to be throttled to 60fps by eglSwapBuffers as per usual.

Gerry Boland (gerboland) wrote :

> std::exception::what: Failure sending input event :
> 9, "Bad file descriptor"
>
> Could that be related to the fd exhaustion of bug 1495871?

Good idea, checked it, but not the culprit.

Daniel van Vugt (vanvugt) wrote :

QtMir is throttled to 60 FPS. But if you're somehow calling compositor_acquire(SOME_USER_ID) twice per frame, then your clients of QtMir are going to get 120FPS (not evenly spaced but groups of two frames with 16ms between groups).

Gerry Boland (gerboland) wrote :

@vanvugt as I said before, there is *no* client of QtMir in this case. It appears either QtMir is going at 120fps, or something is confusing Qt's animation system to double its perceived frame rate. Anyway, that's not the main problem in this bug, will investigate myself

Alberto Aguirre (albaguirre) wrote :

It seems that only touch is affected, key input is still delivered correctly.

The root cause is that surfaces are marked occluded in a side-by-side configuration (unless they overlap the two monitors) which then get excluded when calling scene->for_each.

I'm looking into why they are marked occluded in a side-by-side configuration.

Daniel van Vugt (vanvugt) wrote :

If you're using mir_proving_server (or in general any server with the built-in MultiThreadedCompositor) then you may find the answer in: --compositor-report=log (or env MIR_SERVER_COMPOSITOR_REPORT=log)

That will show the frame rate of the individual (per-display) compositors.

Alberto Aguirre (albaguirre) wrote :

So back to the input problem, it's an issue with how we composite with DisplayGroups. We use a single compositor ID, which then makes the RenderingTracker toggle surface visibility on/off.

The RenderingTracker is intended to track if a surface is occluded in all monitors to properly update the surface visibility status. However, since a singe compositor ID is used, the RenderingTracker then incorrectly determines the surface is not visible anywhere after the second composition pass.

Alberto Aguirre (albaguirre) wrote :

I've separated the various issues I've encountered while debugging this issue, I'll leave this bug to just be aboutt

"std::exception::what: Failure sending input event :
9, "Bad file descriptor"
"

Input not dispatched:
https://bugs.launchpad.net/mir/+bug/1498045

Wrong input coordinates with nested:
https://bugs.launchpad.net/mir/+bug/1498540

Exception during hwc set when unplugging external monitor
https://bugs.launchpad.net/mir/+bug/1498550

Hang due to mismatch between number of buffers between client and server (workaround in mir 0.16)
https://bugs.launchpad.net/mir/+bug/1441553

Hang in nested server when plugging external display:
https://bugs.launchpad.net/mir/+bug/1498571

Alberto Aguirre (albaguirre) wrote :

So after fixing
https://bugs.launchpad.net/mir/+bug/1498045

I cannot replicate this crash anymore... so I'll consider this one indirectly fixed.

"
ERROR: /home/gerry/dev/projects/mir/mir-0.15/src/server/input/android/input_sender.cpp(218): Throw in function void mir::input::android::InputSender::ActiveTransfer::send(mir::input::android::InputSendEntry&&)
Dynamic exception type: N5boost16exception_detail10clone_implINS0_19error_info_injectorISt13runtime_errorEEEE
std::exception::what: Failure sending input event :
9, "Bad file descriptor"
"

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
Changed in mir (Ubuntu):
status: New → Fix Released
Changed in mir (Ubuntu):
status: Fix Released → Triaged
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package mir - 0.17.0+15.10.20151008.2-0ubuntu1

---------------
mir (0.17.0+15.10.20151008.2-0ubuntu1) wily; urgency=medium

  [ Alexandros Frantzis ]
  * New upstream release 0.17.0 (https://launchpad.net/mir/+milestone/0.17.0)
    - ABI summary: Only servers and graphics drivers need rebuilding;
      . Mirclient ABI unchanged at 9
      . Mirserver ABI bumped to 35
      . Mircommon ABI unchanged at 5
      . Mirplatform ABI bumped to 11
      . Mirprotobuf ABI bumped to 3
      . Mirplatformgraphics ABI bumped to 6
      . Mirclientplatform ABI unchanged at 3
    - Enhancements:
      . Introduce libmircookie, a simple mechanism for a group of cooperating
        processes to hand out and verify difficult-to-forge timestamps to
        untrusted 3rd parties.
      . More refactorings to support renderers other than GL.
      . Add MirBlob to the client API - a tool for serializing and
        deserializing data.
      . Introduce a libinput based input platform, not yet used by default.
      . Provide a mechanism for the shell to send events on surface
        construction.
      . Provide mir::shell::DisplayConfigurationController allowing shells
        to correctly change the display configuration, notifying clients
        as appropriate.
      . New DSO versioning guide.
      . Send events pertaining to the output a surface is currently on (dpi,
        form factor, scale) to clients.
    - Bug fixes:
      . [enhancement] XMir specific documentation should live in its own
        subsection (LP: #1200114)
      . Nested servers need cursor support (LP: #1289072)
      . Mir cursor is missing/invisible until the client sets it multiple
        times (LP: #1308133)
      . [regression] Fullscreen software surfaces (like Xmir -sw) can crash
        the Mir server (LP: #1493721)
      . [usc] Mir gives up and shuts down due to input with multimonitor qtmir
        (std::exception::what: Failure sending input event) (LP: #1496069)
      . Mouse cursor disappears upon entering the surface area of a nested
        client (LP: #1496849)
      . [android] input is not dispatched when attaching an external monitor
        (LP: #1498045)
      . [android] input coordinates are scaled incorrectly when an external
        display is connected (LP: #1498540)
      . [android] std::exception::what: error during hwc set() when unplugging
        external monitor (LP: #1498550)
      . tests do not compile without precompiled headers (LP: #1498829)
      . [android] std::exception::what: Failed to monitor fd: Operation not
        permitted when unplugging external display in a nested configuration
        (LP: #1499042)
      . Mir suddenly no longer builds since 'mesa (11.0.0-1ubuntu1) wily':
        /usr/include/EGL/eglplatform.h:100:35: fatal error:
        android/native_window.h: No such file or directory (LP: #1499134)
      . [android] various crashes when unplugging external display on a
        nested configuration (LP: #1501927)
      . Cursor becomes visible by itself when an external monitor is connected
        (LP: #1502200)
      . mesa FTBFS due to missing Requires in mirclient (LP: #1503450)

  [ CI Trai...

Read more...

Changed in mir (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers