Mir

Clients freeze on startup if 10 or more are already running

Bug #1267323 reported by Daniel van Vugt
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
High
Alan Griffiths
mir (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Clients freeze on startup if 10 or more are already running.

Test case:
1. mir_demo_server_shell &
2. mir_demo_client_egltriangle -q &
3. Repeat #2 ten more times.

Expected: Each new client appears and is animated
Observed: The 11th and subsequent clients freeze on startup. At least until you move one with Alt+drag, then it starts moving.

Tags: performance

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Probably related to the frontend thread pool --> bug 1233001

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Rather than looking for ways to dynamically pool threads, I suggest first looking to see why/how clients occupy so much time of the frontend that this is a problem at all. In theory if we're doing things right, the time spent in the front end for each message should be so small that we could survive with one thread.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also, the bug is not a matter of processing power. I see it occur reliably even on high-end i7 hardware.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

I don't think this is related to bug 1233001 (except in so far as with fewer threads the problem described here is more obvious).

Using the scenario described in 1274208 I find that the current client call is waiting on a condition variable in SwitchingBundle::client_acquire() (line 189) while the compositor is also waiting on a CV in CompositingFunctor::operator()() (line 94).

This looks like a race condition as a client should not be waiting on a buffer to be released from the rendering queue at the same time the compositor is waiting for there to be something to render.

Changed in mir:
assignee: nobody → Alan Griffiths (alan-griffiths)
status: Triaged → In Progress
Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

The problem is that we don't consume buffers from an occluded surface. So when it swaps buffers we block. But that block occurs on a frontend thread - so the number of frontend threads indirectly limits the number of occluded surfaces.

Supporting evidence for this can be gleaned by hacking OcclusionFilter::operator() to return false - we then see the expected behaviour. (But this isn't a good idea for other reasons.)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The blocking of client_acquire is correct behaviour and a critical performance feature. When a surface is invisible, this puts the client rendering to sleep. The client can't stay busy and occupy your CPU time (as easily) while it's out of view. See --> bug 1227739.

Also the blocking of client_acquire for shorter times is what keeps clients well behaved and rendering only at the monitor refresh rate (at most).

So the SwitchingBundle and Occlusion classes are behaving correctly, as designed. And it sounds like we need to be smarter on the front end. Allowing all clients to render continuously without blocking would be a significant performance setback.

One potential solution is to get away from the limitations of the request-response protocol model that led us here. And try to be more asynchronous, even driven. That would also solve related performance issues like bug 1253868.

Revision history for this message
Alan Griffiths (alan-griffiths) wrote :

Yes. I know the clients need to block. But we don't need to waste threads by blocking one for each client. (There will be an MP along shortly.)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sounds good.

Changed in mir:
milestone: none → 0.1.5
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir/devel at revision None, scheduled for release in mir, milestone Unknown

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → In Progress
Changed in mir:
milestone: 0.1.5 → 0.1.6
Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This bug was fixed in the package mir - 0.1.6+14.04.20140310-0ubuntu1
---------------
mir (0.1.6+14.04.20140310-0ubuntu1) trusty; urgency=medium

Changed in mir (Ubuntu):
importance: Undecided → High
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.