Mir

[regression] [OTA-10] Spread animation stutters badly with only a few apps opened

Bug #1563287 reported by Omer Akram on 2016-03-29
102
This bug affects 21 people
Affects Status Importance Assigned to Milestone
Canonical System Image
High
Stephen M. Webb
Mir
Fix Released
High
Kevin DuBois
0.20
Fix Released
High
Kevin DuBois
0.21
Fix Committed
High
Kevin DuBois
libhybris
Undecided
Unassigned
mir (Ubuntu)
High
Unassigned
qtmir (Ubuntu)
Undecided
Unassigned
unity8 (Ubuntu)
Undecided
Unassigned

Bug Description

Regression in OTA-10:
tested on krillin and arale on rc-proposed.
also I tried the fix for bug 1556763 but that's not related to this as well.

The right edge switcher on both krillin and arale stutter when I drag it from the right edge and then swipe through the list of opened windows.

This was not the case a few months ago, so we clearly regressed here.

Related branches

Omer Akram (om26er) wrote :

Subscribed Daniel to this report, as he is probably someone who cares about smoothness in the OS.

description: updated
kevin gunn (kgunn72) on 2016-03-29
Changed in canonical-devices-system-image:
milestone: none → 11
assignee: nobody → kevin gunn (kgunn72)
importance: Undecided → High
Changed in unity8 (Ubuntu):
importance: Undecided → High
Changed in canonical-devices-system-image:
assignee: kevin gunn (kgunn72) → Michał Sawicz (saviq)
Gerry Boland (gerboland) wrote :

Need to profile and check the code to see how this could have regressed.

Stutter could be due to the GPU having to do too much work per frame, inefficiencies in unity8's code. Can't say for certain without measuring.

I know of several optimizations to be done with spread still.

kevin gunn (kgunn72) wrote :

@Omer can you refine the regression claim a little? like when do you know the right edge performed better? - the reason i ask is so we can measure that and potentially determine what/when it regressed.
also, in your assessment is there some minimum amount or particular set of applications needing to be opened in the spread to witness this?

kevin gunn (kgunn72) wrote :

@omer video might also be nice

Omer Akram (om26er) wrote :
Omer Akram (om26er) wrote :

Kevin: I just tested and its all good on last OTA (image 30), so the delta is between then and now (rc-proposed). I have also attached the video (second is being uploaded)

Omer Akram (om26er) wrote :
Daniel van Vugt (vanvugt) wrote :

Nice videos. Although I found I had to save them offline before they play nicely and the difference is clear.

I've been wondering about spread performance actually. In particular how we implement texture smoothing. If it's just GL_LINEAR filtering that should be fast. But if someone has unwittingly turned on mipmapping then it could get very slow.

tags: added: performance regression regression-proposed
summary: - Right edge switcher stutters badly with only a few apps opened
+ [regression] Right edge switcher stutters badly with only a few apps
+ opened
Changed in canonical-devices-system-image:
status: New → Confirmed

just wondering could the recent SDK release have effected this?
or are we relying on all our own objs within unity8?

Daniel van Vugt (vanvugt) wrote :

Omer: One simple but important question: Do you see a significant change in 'top' output with the regression?

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qtmir (Ubuntu):
status: New → Confirmed
Changed in unity8 (Ubuntu):
status: New → Confirmed
Timo Jyrinki (timo-jyrinki) wrote :

Agreed than on OTA-10 this is far worse on Bq E4.5 with 4+ apps open, when scrolling so that 4+ app tiles are also shown on screen at the same time. It's still smooth with between 2-3 tiles shown before scrolling more, then immediately starts to slow down a lot.

It used to be 60fps and a good thing to demo on Bq, now it can't be used to show off anymore.

Andrea Bernabei (faenil) wrote :

I agree the change is visible, and there is a clear regression compared to what we had a few weeks ago.

I also noticed the same in browser scrolling, for what it's worth...

Daniel van Vugt (vanvugt) wrote :

The BQ screen is 67Hz BTW :)

Timo Jyrinki (timo-jyrinki) wrote :

It's also not just the task switcher, but also when opening for example a new app. Quite often now the new tile slides in from the right with low fps instead of being smooth.

SB (emehntehtt) wrote :

I am experiencing this on my BQ as well, quite a serious regression unfortunately.

Daniel van Vugt (vanvugt) wrote :

OK, here's a summary of some bits that changed between OTA-9 and OTA-10:

* mir upgraded from mir 0.18.1+15.04.20160115-0ubuntu1 to 0.20.3+15.04.20160322-0ubuntu1
* qtmir upgraded from 0.4.7+15.04.20160122-0ubuntu1 to qtmir 0.4.8+15.04.20160330-0ubuntu1
* unity8 upgraded from 8.11+15.04.20160122-0ubuntu1 to 8.11+15.04.20160318.1-0ubuntu1
* unity-system-compositor upgraded from 0.2.0+15.04.20151222.1-0ubuntu1 to 0.4.3+15.04.20160323-0ubuntu1

That last one includes "Enable dynamic performance boosting whenever the screen is on" which I guess needs checking in case we've allowed the phone performance to scale down too low now.

summary: - [regression] Right edge switcher stutters badly with only a few apps
- opened
+ [regression] [OTA-10] Right edge switcher stutters badly with only a few
+ apps opened
description: updated

Here are performance measurements of OTA-9 and OTA-10's Unity8 spread animation on krillin. You can see the Unity8/QtMir render time has just about doubled, and the frame rate rougly halved:

OTA-9
[1460535448.978692] perf: Mir nested display for output #1: 64.80 FPS, render time 14.44ms, buffer lag 37.48ms (3 buffers)
[1460535449.981195] perf: Mir nested display for output #1: 61.87 FPS, render time 15.16ms, buffer lag 33.33ms (3 buffers)
[1460535451.194805] perf: Mir nested display for output #1: 53.58 FPS, render time 17.50ms, buffer lag 31.80ms (3 buffers)
[1460535452.207826] perf: Mir nested display for output #1: 67.12 FPS, render time 13.88ms, buffer lag 37.20ms (3 buffers)
[1460535453.217917] perf: Mir nested display for output #1: 63.36 FPS, render time 14.77ms, buffer lag 32.58ms (3 buffers)
[1460535454.220495] perf: Mir nested display for output #1: 65.86 FPS, render time 14.13ms, buffer lag 31.42ms (3 buffers)
[1460535455.228508] perf: Mir nested display for output #1: 63.49 FPS, render time 14.73ms, buffer lag 32.47ms (3 buffers)

OTA-10:
[2016-04-13 16:57:14.969391] perf: : 27.39 FPS, render time 35.50ms, buffer lag 75.76ms (3 buffers)
[2016-04-13 16:57:15.986094] perf: : 43.30 FPS, render time 22.03ms, buffer lag 46.16ms (3 buffers)
[2016-04-13 16:57:16.990369] perf: : 33.86 FPS, render time 28.53ms, buffer lag 62.27ms (3 buffers)
[2016-04-13 16:57:18.017071] perf: : 37.03 FPS, render time 25.98ms, buffer lag 53.13ms (3 buffers)
[2016-04-13 16:57:19.032120] perf: : 41.37 FPS, render time 23.15ms, buffer lag 50.95ms (3 buffers)
[2016-04-13 16:57:20.064561] perf: : 29.06 FPS, render time 33.39ms, buffer lag 67.45ms (3 buffers)
[2016-04-13 16:57:21.083266] perf: : 48.13 FPS, render time 19.79ms, buffer lag 43.90ms (3 buffers)

Daniel van Vugt (vanvugt) wrote :

Since the display on krillin refreshes at 66.58Hz, the target render time to get under is 15ms.

Daniel van Vugt (vanvugt) wrote :

Also confirmed that OTA-10 images 31 and 32 are slow only in the spread, and only with a few windows or more.

The Mir performance report for Unity8 is showing smooth performance in Unity8 other than a well populated spread. In fact, performance of the spread varies dynamically according to the number of windows in it.

If you view the top of the window stack then you get 4-5 windows visible and it's slow and stuttery. If you go to the bottom of the stack though and wiggle your finger, it becomes fast again -- because at the bottom of the stack only two windows come into view simultaneously.

So the regression is probably somewhere between Unity8 and QtMir, where our ability to put multiple surfaces on screen at once has degraded.

Daniel van Vugt (vanvugt) wrote :

CPU usage during spread animation:
   OTA-9: 60%
   OTA-10: 70%
So that's a noticeable increase but not an indication we're CPU bound at all. We are close to being CPU bound, but OTA-10 isn't that much worse than OTA-9's CPU usage.

The relatively low increase in CPU usage, high increase in render time, and sensitivity to viewport clipping (comment #21) all point to the regression being in the GPU load (Qt/QML/shaders).

Changed in qtmir:
status: New → Confirmed
summary: - [regression] [OTA-10] Right edge switcher stutters badly with only a few
+ [regression] [OTA-10] Spread animation stutters badly with only a few
apps opened
Daniel van Vugt (vanvugt) wrote :

Bisected the regression and found it happened in rc-proposed krillin image 266 on 24 Feb 2016.

See the small diff attached for what changed.

Changed in mir:
status: New → Confirmed
importance: Undecided → High
Changed in unity8 (Ubuntu):
status: Confirmed → Invalid
Daniel van Vugt (vanvugt) wrote :

Both Unity8 and QtMir are excused. On the day of the regression, unity8 was unchanged and qtmir changed but not in logic.

This leaves only Mir. On the day of the regression Mir was upgraded from 0.19.3 to 0.20.0. And looking at the source code, a large amount of code changed in Mir's buffering/GL logic between those releases.

Changed in qtmir:
status: Confirmed → Invalid
Changed in qtmir (Ubuntu):
status: Confirmed → Invalid
Changed in mir:
milestone: none → 0.22.0
Changed in mir (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in mir:
status: Confirmed → Triaged
Changed in unity8 (Ubuntu):
importance: High → Undecided
Kevin DuBois (kdub) wrote :

The 'buffering/GL' logic changes were installation of EGL sync points during GL draws so that software clients (eg, Xmir) have synchronized buffer releases. (fix https://bugs.launchpad.net/mir/+bug/1517205).

Poking around this today, it seems that the actual installation of the sync points has some time cost (we have more cost on each buffer mapping with that proper synchronization). So, if many buffers need mapping, the time cost adds up. It is unreasonable that the driver should have any cost when calling eglCreateSyncFence, but that's mali code we don't have much control over.

We have quirks that allow for disabling the synchronization, but we should avoid that if possible.

Kevin DuBois (kdub) wrote :

bit of measuring... the increase seems to average ~500us per map

Daniel van Vugt (vanvugt) wrote :

Yeah I don't think 500us is affordable. That's 0.5ms and we only have 1-2ms (1000-2000us) of headway according to comment #19. Before we miss the frame deadline.

So yes, that 500us times-a-couple will hurt us, and would explain it.

Daniel van Vugt (vanvugt) wrote :

On a related note; I mentioned in the description of this MP how in theory we might fall off the edge with smoothness for overlayed surfaces (and Unity8 is our only overlayed surface):
    https://code.launchpad.net/~vanvugt/mir/ClientLatency-of-overlays/+merge/291348

Daniel van Vugt (vanvugt) wrote :

Wow. Turns out manual testing with mir-demos (and --compositor-report=log) shows quite a difference on krillin:

One triangle:
lp:mir/0.19 = 1.1ms/frame
lp:mir = 14.2ms/frame

Five triangles:
lp:mir/0.19 = 1.8ms/frame
lp:mir = 36.8ms/frame

Changed in mir:
assignee: nobody → Daniel van Vugt (vanvugt)
status: Triaged → In Progress
Daniel van Vugt (vanvugt) wrote :

Bisected. As kdub suggested, the regression occurred at lp:mir r3297. Mir's compositor report shows (running 5 triangles on krillin):

r3296: 1.8ms/frame
r3297: 35.0ms/frame

------------------------------------------------------------
revno: 3297 [merge]
author: Kevin DuBois <email address hidden>
committer: Tarmac
branch nick: development-branch
timestamp: Tue 2016-02-09 17:07:23 +0000
message:
  repropose already landed branch introducing sync fences
  (https://code.launchpad.net/~kdub/mir/egl-sync-fences/+merge/278181)
  while avoiding the mx4/powervr regression that caused its reversion.

  fixes: LP: #1517205. Fixes: https://bugs.launchpad.net/bugs/1517205.

  Approved by PS Jenkins bot, Alexandros Frantzis, mir-ci-bot, Alan Griffiths.
------------------------------------------------------------

Changed in mir:
assignee: Daniel van Vugt (vanvugt) → nobody
status: In Progress → Triaged
Kevin DuBois (kdub) wrote :

So to restate simply, the egl synchronization is costing 500us/client. (which is unreasonable, but a cost we cannot control without MTK/arm help). Trying to figure out alternatives.

Kevin DuBois (kdub) wrote :

Delving a bit deeper, it seems that we're wasting a fair amount of time in the mali driver during this operation with threading operations, including some calls to pthread_getspecific (ie, TLS), so it could be a cost of hybris

Kevin DuBois (kdub) wrote :

Its looking like the use of the eglCreateImageSyncKHR extension is causing thread activity (creation/destruction/TLS).
Once mir started using these functions, it caused this timing increase in Mali cores, and lp: #1524414 in powervr ones. Linked to hybris, as its looking like this is a hybris/glibc threading issue. For mir, we can disable the use of these sync extensions on mali, until we can fix the deeper TLS problems.

kevin gunn (kgunn72) on 2016-04-18
summary: - [regression] [OTA-10] Spread animation stutters badly with only a few
- apps opened
+ [regression] [OTA-10] [Mali GPU] Spread animation stutters badly with
+ only a few apps opened
Changed in canonical-devices-system-image:
assignee: Michał Sawicz (saviq) → Stephen M. Webb (bregma)

It's worth noting the one client case (comment #29) is also significant (was 1ms/frame, now 14ms/frame). That's more than 500us.

Also, Omer pointed out in the description that there is a regression on arale too (MTK). Although I haven't tested on arale yet.

Daniel van Vugt (vanvugt) wrote :

Regarding fixing bug 1517205, it appears the new EGL synchronization was only ever required to resolve problems with software clients. But this bug is mostly about hardware clients (Unity8 rendering to USC).

So history would seem to suggest that we could live without the new EGL synchronization on hardware clients, and get the performance back.

Daniel van Vugt (vanvugt) wrote :

I can confirm with the original report that arale (MTK) also suffers a significant regression in r3297:

one triangle on arale:
r3296: 1.3ms/frame
r3297: 2.6ms/frame

five triangles on arale:
r3296: 2.3ms/frame
r3297: 11.0ms/frame

It's not as major as the regression on krillin/mali, but still major.

summary: - [regression] [OTA-10] [Mali GPU] Spread animation stutters badly with
- only a few apps opened
+ [regression] [OTA-10] Spread animation stutters badly with only a few
+ apps opened
Kevin DuBois (kdub) wrote :

arale should not be using egl sync fences.

Kevin DuBois (kdub) wrote :

@we can live without the synchronization...
This was the suggestion to roll back the use of the sync extensions on mali. The current plan is to see how far we can get with hybris before OTA-11, and if we can't fix the performance in time, we'll roll back the sync extensions.

The synchronization should be lightweight, and it should be done even with hardware clients. I think we're just fortunate that the commands are getting serialized between the clients and the server in the GPU.

@duflu - please don't alter the title

summary: - [regression] [OTA-10] Spread animation stutters badly with only a few
- apps opened
+ [regression] [OTA-10] [krillin] Spread animation stutters badly with
+ only a few apps opened
kevin gunn (kgunn72) wrote :

oops - i see what you mean

summary: - [regression] [OTA-10] [krillin] Spread animation stutters badly with
- only a few apps opened
+ [regression] [OTA-10] Spread animation stutters badly with only a few
+ apps opened
kevin gunn (kgunn72) wrote :

@duflu can specify which image revision you are running on arale?

Kevin DuBois (kdub) wrote :

@duflu, If you're testing with rev3297 you're likely to see the same sort of slowdown on arale. We disabled the fences in arale in 0.20.1, so our current images, as well as trunk should be fine (with this bug) on arale by now.

kevin gunn (kgunn72) wrote :

Just want to publicize that we've added a manual test in our test spec in the near-term to capture this. but we've also added a task to the mir team's backlog to create an automated test to capture a possible escape for this bug.

Changed in mir:
status: Triaged → In Progress
assignee: nobody → Kevin DuBois (kdub)
Mir CI Bot (mir-ci-bot) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.22.0

Changed in mir:
status: In Progress → Fix Committed
Daniel van Vugt (vanvugt) wrote :

Fix committed into lp:mir/0.21 at revision 3429, scheduled for release in Mir 0.21.1

Daniel van Vugt (vanvugt) wrote :

Fix committed into lp:mir/0.20 at revision 3339, scheduled for release in Mir 0.20.4

kevin gunn (kgunn72) wrote :

branch now associated with bug for automated CI test to help capture this kind of regression in the future.

Changed in canonical-devices-system-image:
status: Confirmed → In Progress
Changed in mir:
status: Fix Committed → Fix Released
Changed in canonical-devices-system-image:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mir - 0.22.1+16.04.20160516.2-0ubuntu2

---------------
mir (0.22.1+16.04.20160516.2-0ubuntu2) yakkety; urgency=medium

  [ Dimitri John Ledkov ]
  * Fix FTBFS error: call of overloaded ‘abs(float)’ is ambiguous, by
    including cmath c++ header.

 -- Łukasz 'sil2100' Zemczak <email address hidden> Thu, 19 May 2016 21:58:43 +0200

Changed in mir (Ubuntu):
status: Triaged → Fix Released
kevin gunn (kgunn72) wrote :

this shoulda been ota10 milestone, it's already confirmed to be released

Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
Kevin DuBois (kdub) on 2016-07-06
Changed in libhybris:
status: New → Invalid
Daniel van Vugt (vanvugt) wrote :

kgunn: OTA-10 introduced this bug. The fix only came in OTA-11.

Daniel van Vugt (vanvugt) wrote :

Correction: Mir 0.21.1 does not exist yet, but it might in future.

Michał Sawicz (saviq) on 2017-03-13
no longer affects: qtmir
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers