[Mali GPU] corruption in software clients on some platforms when GL-rendered on the server

Bug #1573014 reported by Kevin DuBois
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Won't Fix
High
Kevin DuBois
mir-android-platform
New
Undecided
Unassigned

Bug Description

Android software cliest will tear on tagged platforms, due to lack of available eglCreateSyncKHR extensions. They are disabled and/or non functioning on tagged platforms.

This is an intentional regression of bug 1517205, introduced in lp:mir r3466 in order to solve other performance problems.

Test case: Run two or more mir_demo_client_flicker instances and move their windows so all are visible simultaneously.
Expected: No tearing/corruption.
Observed: Tearing and corruption.

Changed in mir:
milestone: none → 0.23.0
description: updated
Kevin DuBois (kdub)
description: updated
Revision history for this message
Kevin DuBois (kdub) wrote :

Problem is turning out to be a bit of bear...

The corruption is due to releasing a buffer that's being used as a texture in the server's GL render loop. Unfortunately though, the egl sync extensions that we need to get this right don't appear to working properly on mali.

The installation of sync points on mali can be very non-performant, taking between 500us-1ms per fence install (see lp: 1563287).
This can be worked around by installing post-eglSwapBuffers (this is the path surfaceflinger takes). This limits install time to a reasonable 50-80us.

With this worked around, the sync points still appear to be broken somehow, as waiting on the sync point (even after glFlush and eglSwapBuffers) is seemingly triggered by the hwc commit, delaying by many ms (which is incorrect, the fence should clear very shortly after issuing the gpu commands)

I've experimented with shifting the synchronization from the compositor loop as it sends back the buffers, to the client when it tries to access the buffers. This though still seems to have the fences tied to the post/vsync event though, so it degrades our swapinterval-0 performance unacceptably.

summary: - corrpution in software clients on some platforms
+ corrpution in swapinterval 0 software clients on some platforms when GL-
+ rendered on the server
Revision history for this message
Kevin DuBois (kdub) wrote : Re: corrpution in swapinterval 0 software clients on some platforms when GL-rendered on the server

narrowed the description a bit... seems this will happen when the client renders via the CPU, and the server elects to GL-composite the texture. If its hw rendered, we benefit from gpu serialization, and if its overlays, we don't have to rely on the seemingly broken egl fences.

Revision history for this message
Kevin DuBois (kdub) wrote :

external evidence from chromium that mali has dubious fencing. https://src.chromium.org/viewvc/chrome?view=revision&revision=290680

Revision history for this message
Kevin DuBois (kdub) wrote :

Built a native client against surfaceflinger that demonstrates broken fencing on the Mali T720. Tested same client on Adreno-320 series (with success). Test case is pretty primitive, so not sure sync via EGL_KHR_fence_sync will be achievable on device.

kevin gunn (kgunn72)
summary: - corrpution in swapinterval 0 software clients on some platforms when GL-
- rendered on the server
+ [Mali GPU] corrpution in swapinterval 0 software clients on some
+ platforms when GL-rendered on the server
Revision history for this message
Kevin DuBois (kdub) wrote : Re: [Mali GPU] corrpution in swapinterval 0 software clients on some platforms when GL-rendered on the server

the attachment has an Android.mk, so it will produce a binary called "broken_fences" when compiled from within an android build tree.

Pushing that binary to the device, just run the binary with ./broken_fences while surfaceflinger is running. It will print out "FAILURE" and have a program exit code of -1 on devices that don't have functioning sync.

Kevin DuBois (kdub)
Changed in mir:
status: In Progress → Confirmed
summary: - [Mali GPU] corrpution in swapinterval 0 software clients on some
+ [Mali GPU] corruption in swapinterval 0 software clients on some
platforms when GL-rendered on the server
Kevin DuBois (kdub)
Changed in mir:
milestone: 0.23.0 → none
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Correction: It's not just swap interval 0, but also swap interval 1. Test case: mir_compsitor_performance_test on krillin, and you see corruption in the flicker clients.

Suggestion: ShmBuffer would totally solve this problem. It's now shared code and has mature non-corrupting GL support too.

summary: - [Mali GPU] corruption in swapinterval 0 software clients on some
- platforms when GL-rendered on the server
+ [Mali GPU] corruption in software clients on some platforms when GL-
+ rendered on the server
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The Android platform was deleted from lp:mir at revision 4155.

Changed in mir:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.