Mir

[gallium] EGL clients using a gallium driver (radeon, nouveau, freedreno) that saturate the GPU cause the Mir server to slow, freeze and stutter, displaying very few frames

Bug #1211700 reported by Daniel van Vugt
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Triaged
Medium
Daniel van Vugt
mesa (Ubuntu)
Undecided
Unassigned
mir (Ubuntu)
Medium
Unassigned

Bug Description

GPU-heavy clients on Mesa gallium drivers (e.g. radeon, nouveau and freedreno) cause the Mir server to slow, sometimes to a halt. This is seen as a frozen screen, unable to move surfaces, or unable to switch VTs (for a while).

For example:
   mir_demo_client_egltriangle -n
   mir_demo_client_eglplasma -n

The -n flag (swapinterval = 0) seems to cause the client to overload the mir server to the point where it cannot render physical frames very often.

$ es2_info
EGL_VERSION = 1.4 (Gallium)
EGL_VENDOR = Mesa Project
EGL_EXTENSIONS = EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync EGL_KHR_surfaceless_context EGL_NOK_swap_region EGL_NV_post_sub_buffer
EGL_CLIENT_APIS = OpenGL OpenGL_ES OpenGL_ES2 OpenVG
GL_VERSION: OpenGL ES 3.0 Mesa 9.2.0-devel
GL_RENDERER: Gallium 0.4 on AMD CEDAR
...
$ es2_info
EGL_VERSION = 1.4 (Gallium)
EGL_VENDOR = Mesa Project
EGL_EXTENSIONS = EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_reusable_sync EGL_KHR_fence_sync EGL_KHR_surfaceless_context EGL_NOK_swap_region EGL_NV_post_sub_buffer
EGL_CLIENT_APIS = OpenGL OpenGL_ES OpenGL_ES2 OpenVG
GL_VERSION: OpenGL ES 3.0 Mesa 9.2.0-devel
GL_RENDERER: Gallium 0.4 on NVA8
...

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: radeon+nouveau: Unthrottled EGL clients cause Mir to slow, sometimes to a halt

Confirmed on nouveau as well as radeon.

summary: - radeon: Unthrottled EGL clients cause Mir to slow, sometimes to a halt
+ radeon+nouveau: Unthrottled EGL clients cause Mir to slow, sometimes to
+ a halt
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

For nouveau at least, this seems to be a timing/flooding problem. Inserting a 10ms sleep between frames in the demo client solves it.

summary: - radeon+nouveau: Unthrottled EGL clients cause Mir to slow, sometimes to
- a halt
+ Unthrottled EGL clients cause Mir to slow, sometimes to a halt
description: updated
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Unthrottled EGL clients cause Mir to slow, sometimes to a halt

This problem seems to be amplified by bug 1195811 with nouveau :(

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Feels similar to an old Compiz bug 1007299

summary: - Unthrottled EGL clients cause Mir to slow, sometimes to a halt
+ [radeon] [nouveau] Unthrottled EGL clients cause Mir to slow, sometimes
+ to a halt
Changed in mir:
importance: Undecided → High
kevin gunn (kgunn72)
Changed in mir:
importance: High → Medium
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [radeon] [nouveau] Unthrottled EGL clients cause Mir to slow, sometimes to a halt

This bug will be a problem with real clients crippling the whole desktop. I'm unsure if Medium is appropriate or if it should still be High.

tags: added: performance
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Some people may have experienced an improvement in Mir release 0.10.0, due to bug 1379685 being fixed.

Changed in mir:
status: New → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Still happening on wily with nouveau :(

[1433234604.336598] mirserver: Mir version 0.14.0
[1433234604.340480] mirserver: GL vendor: nouveau
[1433234604.340494] mirserver: GL renderer: Gallium 0.4 on NVD9
[1433234604.340498] mirserver: GL version: OpenGL ES 3.0 Mesa 10.5.2
[1433234604.340502] mirserver: GLSL version: OpenGL ES GLSL ES 3.00

Unthrottling any Mir client makes it stutter, almost freeze, and apparently starves the compositor of GPU cycles.

summary: - [radeon] [nouveau] Unthrottled EGL clients cause Mir to slow, sometimes
- to a halt
+ [radeon] [nouveau] Unthrottled EGL clients cause Mir to slow, freeze and
+ stutter
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [radeon] [nouveau] Unthrottled EGL clients cause Mir to slow, freeze and stutter

Confirmed again on zesty with AMD CEDAR:

Client:
[2017-02-23 15:14:09.845719] perf: mir_demo_client_egltriangle: 4197.00 FPS, render time 0.23ms, buffer lag 0.48ms (3 buffers)
[2017-02-23 15:14:10.845913] perf: mir_demo_client_egltriangle: 4189.00 FPS, render time 0.23ms, buffer lag 0.48ms (4 buffers)
[2017-02-23 15:14:11.846025] perf: mir_demo_client_egltriangle: 4194.00 FPS, render time 0.23ms, buffer lag 0.48ms (4 buffers)
[2017-02-23 15:14:12.846037] perf: mir_demo_client_egltriangle: 4194.00 FPS, render time 0.23ms, buffer lag 0.48ms (3 buffers)
[2017-02-23 15:14:13.846235] perf: mir_demo_client_egltriangle: 4195.00 FPS, render time 0.23ms, buffer lag 0.48ms (3 buffers)

Server:
[2017-02-23 15:14:10.525057] compositor: Display 0x7f7b38426ba0 averaged 0.180 FPS, 5499.426 ms/frame, latency 0.134 ms, 3 frames over 16.659 sec, 0% bypassed
[2017-02-23 15:14:39.307688] compositor: Display 0x7f7b38426ba0 averaged 0.034 FPS, 28703.507 ms/frame, latency 0.219 ms, 1 frames over 28.782 sec, 0% bypassed
[2017-02-23 15:14:42.539820] compositor: Display 0x7f7b38426ba0 averaged 0.309 FPS, 3158.218 ms/frame, latency 0.211 ms, 1 frames over 3.232 sec, 0% bypassed

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Interestingly same problem with software rendering (GBM_ALWAYS_SOFTWARE=1):

Client:
[2017-02-23 15:27:04.483795] perf: mir_demo_client_egltriangle: 107.24 FPS, render time 9.16ms, buffer lag 27.72ms (4 buffers)
[2017-02-23 15:27:05.487787] perf: mir_demo_client_egltriangle: 137.58 FPS, render time 7.12ms, buffer lag 21.46ms (4 buffers)
[2017-02-23 15:27:06.494955] perf: mir_demo_client_egltriangle: 101.29 FPS, render time 9.72ms, buffer lag 30.37ms (4 buffers)
[2017-02-23 15:27:07.503672] perf: mir_demo_client_egltriangle: 106.15 FPS, render time 9.27ms, buffer lag 26.98ms (4 buffers)
[2017-02-23 15:27:08.505853] perf: mir_demo_client_egltriangle: 147.70 FPS, render time 6.60ms, buffer lag 20.29ms (4 buffers)
[2017-02-23 15:27:09.507812] perf: mir_demo_client_egltriangle: 128.87 FPS, render time 7.60ms, buffer lag 15.45ms (4 buffers)

Server:
[2017-02-23 15:27:03.626119] compositor: Display 0x7ff2440008c0 averaged 1.049 FPS, 943.246 ms/frame, latency 1.800 ms, 2 frames over 1.905 sec, 0% bypassed
[2017-02-23 15:27:05.630513] compositor: Display 0x7ff2440008c0 averaged 0.997 FPS, 994.407 ms/frame, latency 0.811 ms, 2 frames over 2.004 sec, 0% bypassed
[2017-02-23 15:27:07.450491] compositor: Display 0x7ff2440008c0 averaged 1.098 FPS, 897.537 ms/frame, latency 1.111 ms, 2 frames over 1.819 sec, 0% bypassed
[2017-02-23 15:27:08.524524] compositor: Display 0x7ff2440008c0 averaged 0.931 FPS, 1063.914 ms/frame, latency 1.862 ms, 1 frames over 1.074 sec, 0% bypassed
[2017-02-23 15:27:09.530076] compositor: Display 0x7ff2440008c0 averaged 0.994 FPS, 1001.222 ms/frame, latency 1.750 ms, 1 frames over 1.005 sec, 0% bypassed

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

More interesting is that the hardware cursor keeps responding perfectly smoothly. So that makes me think we might have a problem in our compositor logic, if the compositor thinks there are orders of magnitude fewer frames than the client does.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The cursor moving actually is not interesting. Hardware cursor movement is a separate thread of the Mir server and drawn separately to the Mir server process. So the Mir compositor can be hung and the hardware cursor will keep moving smoothly.

Turns out my compositor thread is hung deep in the radeon code below eglSwapBuffers, which supports the idea that this is purely GPU flooding/starvation:

#0 pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007fffef2624ab in cnd_wait (mtx=0x7ffff7ebd1c8, cond=0x7ffff7ebd1f0)
    at ../../../../include/c11/threads_posix.h:159
#2 util_queue_job_wait (fence=fence@entry=0x7ffff7ebd1c8)
    at ../../../../src/gallium/auxiliary/util/u_queue.c:46
#3 0x00007fffef59770f in radeon_drm_cs_sync_flush (rcs=0x7ffff7e95010)
    at ../../../../../../src/gallium/winsys/radeon/drm/radeon_drm_cs.c:489
#4 radeon_drm_cs_flush (rcs=0x7ffff7e95010, flags=2, pfence=<optimised out>)
    at ../../../../../../src/gallium/winsys/radeon/drm/radeon_drm_cs.c:680
#5 0x00007fffef46cd47 in r600_context_gfx_flush (context=0x555555809b90,
    flags=2, fence=0x7fffe8c94838)
    at ../../../../../src/gallium/drivers/r600/r600_hw_context.c:283
#6 0x00007fffef5b666e in r600_flush_from_st (ctx=0x555555809b90,
    fence=0x7fffe8c948c0, flags=<optimised out>)
    at ../../../../../src/gallium/drivers/radeon/r600_pipe_common.c:357
#7 0x00007fffef0b513a in st_context_flush (stctxi=0x55555581f620, flags=2,
    fence=<optimised out>) at ../../../src/mesa/state_tracker/st_manager.c:506
#8 0x00007fffef1c860c in dri_flush (cPriv=<optimised out>,
    dPriv=<optimised out>, flags=<optimised out>, reason=<optimised out>)
    at ../../../../../src/gallium/state_trackers/dri/dri_drawable.c:532
#9 0x00007ffff4322714 in ?? ()
   from /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1
#10 0x00007ffff4312faf in eglSwapBuffers ()
   from /usr/lib/x86_64-linux-gnu/mesa-egl/libEGL.so.1
#11 0x00007fffee5c18fb in mir::graphics::mesa::helpers::EGLHelper::swap_buffers
    (this=0x555555807e08)
    at /home/dan/bzr/mir/toy/src/platforms/mesa/server/display_helpers.cpp:402
#12 0x00007fffee5953b3 in mir::graphics::mesa::DisplayBuffer::swap_buffers (
    this=0x555555807d60)
    at /home/dan/bzr/mir/toy/src/platforms/mesa/server/kms/display_buffer.cpp:284
#13 0x00007ffff5e2bbbb in mir::renderer::gl::CurrentRenderTarget::swap_buffers
    (this=0x7fffe40008c8)
    at /home/dan/bzr/mir/toy/src/renderers/gl/renderer.cpp:75
#14 0x00007ffff5e2c878 in mir::renderer::gl::Renderer::render (
    this=0x7fffe40008c0,
    renderables=std::vector of length 2, capacity 2 = {...})
    at /home/dan/bzr/mir/toy/src/renderers/gl/renderer.cpp:217

summary: - [radeon] [nouveau] Unthrottled EGL clients cause Mir to slow, freeze and
- stutter
+ [gallium] EGL clients that saturate the GPU cause the Mir server to
+ slow, freeze and stutter, showing very few frames
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [gallium] EGL clients that saturate the GPU cause the Mir server to slow, freeze and stutter, showing very few frames

I can't find upstream bug reports for the issue right now, but can find a research paper about fixing the "GPU command bomb" problem:

https://www.usenix.org/legacy/event/atc11/tech/final_files/Kato.pdf

summary: - [gallium] EGL clients that saturate the GPU cause the Mir server to
- slow, freeze and stutter, showing very few frames
+ [gallium] EGL clients using a gallium driver (radeon, nouveau,
+ freedreno) that saturate the GPU cause the Mir server to slow, freeze
+ and stutter, displaying very few frames
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Changed in mir:
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in mir:
milestone: none → 1.0.0
status: Confirmed → In Progress
Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 1.0.0

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → In Progress
Changed in mir:
status: In Progress → Triaged
milestone: 1.0.0 → none
Revision history for this message
Michał Sawicz (saviq) wrote :

Syncing task from Mir.

Changed in mir (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Mesa is slowly getting there...

https://www.phoronix.com/scan.php?page=news_item&px=Intel-Lands-EGL-Ctx-Priority

Hopefully radeon and nouveau will catch up too.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers