Mir

[regression] mir_demo_client_eglplasma results in 100% cpu and hardly any updates in dragonboard

Bug #1665802 reported by Alberto Aguirre on 2017-02-17
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mir
Opinion
High
Daniel van Vugt
0.26
Opinion
High
Daniel van Vugt
mesa (Ubuntu)
High
Unassigned

Bug Description

On a dragonboard 410c,

With Mir 0.25 mir_demo_client_eglplasma renders steadily at 15FPS and with low cpu usage.

With mir 0.26.1, the process takes 100% cpu (according to top) and frames hardly update on screen, but the client performance is 60-27FPS.

To replicate:

mir_demo_server --arw-file --launch-client mir_demo_client_eglplasma

Related branches

Changed in mir:
importance: Undecided → Critical
Daniel van Vugt (vanvugt) wrote :

Sounds like it's spinning (experiencing swap interval 0 behaviour by accident), similar to bug 1661072 and bug 1661128.

1. Can you provide the output from 'mirout' or just the display config part of the server log?

2. Also please provide output from 'env MIR_CLIENT_PERF_REPORT=log mir_demo_client_eglplasma' ...

3. Do other clients like egltriangle suffer?

Changed in mir:
status: New → Incomplete
milestone: none → 1.0.0
tags: added: performance
Daniel van Vugt (vanvugt) wrote :

Actually this could also be bug 1639725 ... The present workaround for that is to remove all vsync and render unthrottled:

http://bazaar.launchpad.net/~mir-team/mir/development-branch/view/head:/src/client/mir_surface.cpp#L523

Changed in mir:
milestone: 1.0.0 → none
Daniel van Vugt (vanvugt) wrote :

Also note "frames hardly update" is a common consequence of compositing swap interval 0 on some drivers (if they have inadequate GPU time sharing between processes). See bug 1211700.

Alberto Aguirre (albaguirre) wrote :

The rate is 60 (confirmed on client) and here's the server spew on dragonboard:

[2017-02-21 16:05:35.673186] mirserver: Initial display configuration:
[2017-02-21 16:05:35.673356] mirserver: Output 33: HDMI-A connected, used
[2017-02-21 16:05:35.673489] mirserver: EDID monitor name: VW246
[2017-02-21 16:05:35.673590] mirserver: EDID manufacturer: ACI
[2017-02-21 16:05:35.673678] mirserver: EDID product code: 9458
[2017-02-21 16:05:35.673792] mirserver: Physical size 24.0" 530x300mm
[2017-02-21 16:05:35.673860] mirserver: Power is on
[2017-02-21 16:05:35.673922] mirserver: Current mode 1920x1080 60.00Hz
[2017-02-21 16:05:35.673977] mirserver: Preferred mode 1920x1080 60.00Hz
[2017-02-21 16:05:35.674026] mirserver: Orientation normal
[2017-02-21 16:05:35.674074] mirserver: Logical size 1920x1080
[2017-02-21 16:05:35.674119] mirserver: Logical position +0+0

Alberto Aguirre (albaguirre) wrote :

I confirmed this bug is caused by client side vsync by making

mcl::BufferStream::swap_buffers_sync simply just calling "swap_buffers([](){})->wait_for_all();"
without wait_for_vsync or changing the swap interval to 0.

Then cpu usage is low and rendering is smooth.

Alberto Aguirre (albaguirre) wrote :

Output of mirout:

Connected to server: <default>
Max 1 simultaneous outputs
Output 33: HDMI-A, connected, "VW246", 1920x1080+0+0, enabled, on, 530mm x 300mm (24.0"), normal, 1.00x, unknown, monitor
    1920x1080 60.00*+ 59.94 59.98 50.00
    1600x1200 60.00
    1680x1050 59.88
    1280x1024 75.02 60.01
    1440x900 59.90
    1280x960 60.00
    1366x768 59.78
    1152x864 75.00
    1280x720 60.00 59.94 50.00
    1024x768 75.07 70.06 60.00
     832x624 74.55
     800x600 75.00 72.18 60.31 56.25
     720x576 50.00
     720x480 60.00 59.94
     640x480 75.00 72.80 66.66 60.00 59.94
     720x400 70.08

Alberto Aguirre (albaguirre) wrote :

[2017-02-21 23:20:08.433959] perf: mir_demo_client_eglplasma.bin: 57.65 FPS, render time 1.75ms, buffer lag 54.37ms (4 buffers)
[2017-02-21 23:20:09.433867] perf: mir_demo_client_eglplasma.bin: 60.00 FPS, render time 0.47ms, buffer lag 60.64ms (4 buffers)
[2017-02-21 23:20:10.450471] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.55ms, buffer lag 73.48ms (4 buffers)
[2017-02-21 23:20:11.467188] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.53ms, buffer lag 49.46ms (4 buffers)
[2017-02-21 23:20:12.467195] perf: mir_demo_client_eglplasma.bin: 60.00 FPS, render time 0.53ms, buffer lag 49.46ms (3 buffers)
[2017-02-21 23:20:13.483590] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.66ms, buffer lag 48.52ms (4 buffers)
[2017-02-21 23:20:14.483623] perf: mir_demo_client_eglplasma.bin: 60.00 FPS, render time 0.49ms, buffer lag 49.51ms (4 buffers)
[2017-02-21 23:20:15.500524] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.40ms, buffer lag 49.59ms (3 buffers)
[2017-02-21 23:20:16.500635] perf: mir_demo_client_eglplasma.bin: 60.00 FPS, render time 0.46ms, buffer lag 49.54ms (3 buffers)
[2017-02-21 23:20:17.517178] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.46ms, buffer lag 49.53ms (3 buffers)
[2017-02-21 23:20:18.533536] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.45ms, buffer lag 49.52ms (3 buffers)
[2017-02-21 23:20:19.550187] perf: mir_demo_client_eglplasma.bin: 60.03 FPS, render time 0.55ms, buffer lag 49.44ms (3 buffers)
[2017-02-21 23:20:20.582459] perf: mir_demo_client_eglplasma.bin: 45.54 FPS, render time 9.19ms, buffer lag 53.94ms (4 buffers)
[2017-02-21 23:20:21.585826] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 32.85ms, buffer lag 74.65ms (4 buffers)
[2017-02-21 23:20:22.589183] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 33.43ms, buffer lag 74.07ms (3 buffers)
[2017-02-21 23:20:23.592578] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 33.47ms, buffer lag 74.02ms (3 buffers)
[2017-02-21 23:20:24.595900] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 33.41ms, buffer lag 74.08ms (3 buffers)
[2017-02-21 23:20:25.599285] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 33.43ms, buffer lag 74.07ms (3 buffers)
[2017-02-21 23:20:26.602642] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 32.88ms, buffer lag 74.61ms (3 buffers)
[2017-02-21 23:20:27.605998] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 33.44ms, buffer lag 74.05ms (3 buffers)
[2017-02-21 23:20:28.609350] perf: mir_demo_client_eglplasma.bin: 27.91 FPS, render time 33.49ms, buffer lag 74.00ms (3 buffers)

Changed in mir:
status: Incomplete → Triaged
Alberto Aguirre (albaguirre) wrote :

When the fps drops down to 27fps from the report above, that's when high cpu usage starts.

And yep looking at the PosixTimestamp in sleep_until after that point, they are in the past (so app is spinning - hence the high cpu usage).

Decreasing the size of the window helps.

It may be indeed this driver doesn't like swapInterval 0.

Changed in mir:
assignee: nobody → Alberto Aguirre (albaguirre)
Daniel van Vugt (vanvugt) wrote :

That's correct, and not a bug in itself. The timestamps are meant to be in the past if the client has fallen behind (its render time jumped from almost zero to 33ms). That's a feature and it has test cases :)

It's possible what's triggering this is bug 1666372 so try the workaround listed for that, or just give your server --composite-delay=0.

However the trigger is one thing, high CPU usage during "27.91 FPS" is still unexplained. I would hazard a guess that this device might have a broken kernel which fails to implement sleep-until-the past correctly [1].

I intentionally did not check if the target time was in the past because it would be an unnecessary extra dip into the kernel. But if this device proves to have a broken timer implementation then it would be a simple workaround.

Can you please try modifying the code to just skip the call to sleep_until for timestamps in the past?

[1] See TIMER_ABSTIME in http://pubs.opengroup.org/onlinepubs/009695399/functions/clock_nanosleep.html

Changed in mir:
importance: Critical → High
status: Triaged → Incomplete
milestone: none → 1.0.0
Daniel van Vugt (vanvugt) wrote :

Another concerning aspect is that we appear to have lost a buffer (dropped from 4 to 3). Where did it go?

So three questions:
  1. Can you please try modifying the code to just skip the call to sleep_until for timestamps in the past?
  2. Any idea why the number of buffers in the stream dropped from 4 to 3? Did the compositor leak a bypass buffer?
  3. What thread/function exactly is spinning the CPU? Is it inside clock_nanosleep? Needs some profiling.

Daniel van Vugt (vanvugt) wrote :

  4. Do other demos like egltriangle and flicker suffer this bug at all?

Daniel van Vugt (vanvugt) wrote :

  5. Does using a small window below 480x480 avoid the problem? (that's the threshold for nested passthrough)

Alberto Aguirre (albaguirre) wrote :

I tried --composite-delay=0 yesterday but didn't help.

1. Tried that - still has high cpu usage
2. It goes back to 4 occasionally
3. The main thread, but not sure where yet. It's not in clock_nanosleep.

So with mir 0.25 I get around 15fps out of eglplasma, but no skipped frames.

Using -n to set swap interval to 0 in mir_client_demo_eglplasma, under mir 0.25 results in the same behavior, client gets around 27fps, high cpu usage, and compositor doesn't seem to get any chance to render things,

So I suppose I can mark this bug as a duplicate of lp:1211700.

It does mean we need a way to disable client side vsync for problematic drivers.

Daniel van Vugt (vanvugt) wrote :

That analysis might be incorrect. This is not the same as bug 1211700 unless we show that vsync has somehow gone missing (sleeps into the future spin or are never requested). The thing about bug 1211700 is that it's solved by any sleep of 10ms or so. And that's essentially what client-side vsync does.

Please find answers to the rest of the questions when you can:
  3. What thread/function exactly is spinning the CPU? Is it inside clock_nanosleep? Needs some profiling.
  4. Do other demos like egltriangle and flicker (software) suffer this bug at all?
  5. Does using a small window below 480x480 avoid the problem? (that's the threshold for nested passthrough)

"It does mean we need a way to disable client side vsync for problematic drivers."
I doubt that, simply because drivers have no interest in the swap interval in use (we implement it in Mir). Even bug 1211700 is just a scheduling/starvation problem that is solved with better timing, which is what client-side vsync should provide.

Needs more debugging...

Daniel van Vugt (vanvugt) wrote :

  6. What driver does the dragonboard use? Is it mesa-kms based or android?

Alberto Aguirre (albaguirre) wrote :

3. Not sure. This is snap world, so it's cumbersome to bring profiler + deps to do the job.

4. No - I suspect because the driver can keep up with the very simple shader used in those.
5. It's not a nested server.
6. Yes mesa-kms, it's freedreno.

""It does mean we need a way to disable client side vsync for problematic drivers."
I doubt that, simply because drivers have no interest in the swap interval in use (we implement it in Mir)."

From my experiments, removing client side vsync and not using swap interval 0, resolves the issues I'm seeing. So it's a potential workaround.

Daniel van Vugt (vanvugt) wrote :

My concern is not that such a workaround increases latency again, but that it requires more work to get right than a potential proper fix might :)

Daniel van Vugt (vanvugt) wrote :

OK, I think we just about understand the problem now.

On IRC Alberto pointed out that he was testing with mir_demo_server which will fail to bypass due to its little titlebar. By switching to mir_proving_server that halved the GPU load and everything became smooth.

Alberto also pointed out at least one scenario where Mir 0.25 could only achieve 30FPS but Mir 0.26.1 could briefly sustain 60FPS. So client-side vsync is trying to improve things like it did in bug 1388490.

It appears that by asking the Freedreno chip to render 1920x1080x120Hz (composited, no bypass) we have hit or exceeded the capabilities of the hardware. It's happy at 60Hz with bypass, or with a lighter-weight client like egltriangle instead of eglplasma.

When a client takes too long to render in Mir 0.26, client-side vsync notices and schedules the next frame immediately without delay so that the client might catch up. Normally this works well and does the job of avoiding the large jump down to 30Hz. However a new problem arises if you have a driver that can't timeshare it's CPU very well (like in bug 1211700). If such a driver can't meet the frame deadline, then the new catch-up logic scheduling the next client frame immediately will saturate the GPU, starving the compositor of time to get it to the screen. So you have a very busy client and very few compositor frames.

It seems the answer is that we need to be more attentive to systems that don't have the GPU/CPU power to keep up with their displays, and find an elegant way to throttle them down to a lower framerate without repeatedly trying to catch up and make the problem worse.

description: updated
description: updated
Daniel van Vugt (vanvugt) wrote :

I've just spent more time analysing the similar bug 1211700 that radeon experiences and found the common factor is the Mesa gallium code -- and that's also where the server process is hung in my case.

So I've generalized bug 1211700 and arguably this could now be a duplicate of that. But I won't mark it as a duplicate just yet since this one is a special case we might be able to work around more easily than the swap interval 0 case of bug 1211700.

The high CPU usage part of this bug is apparently not a bug. After I extracted some more data points from Alberto today it became clear that Mir 0.26 is trying to and partially succeeding in achieving significantly higher frame rates than 0.25 could. Mir 0.25 only had lower resource usage because the same client was achieving only a quarter to half the FPS that 0.26 does. Mir 0.26 (client-side vsync) tries harder to achieve a perfect smooth frame rate, but that's what's saturating the Freedreno GPU, triggering bug 1211700.

So once again, this bug is basically a duplicate of bug 1211700, but I won't mark it as duplicate while I try to come up with a special fix just for this one (which is easier in theory).

Changed in mir:
assignee: Alberto Aguirre (albaguirre) → Daniel van Vugt (vanvugt)
status: Incomplete → In Progress
Daniel van Vugt (vanvugt) wrote :

WORKAROUND (1):

Could you test these environment settings with Mir 0.26? The last one should in theory give you the same performance as you got in Mir 0.25 ...

MIR_CLIENT_FORCE_SWAP_INTERVAL=2
MIR_CLIENT_FORCE_SWAP_INTERVAL=3
MIR_CLIENT_FORCE_SWAP_INTERVAL=4

WORKAROUND (2):

Use only fullscreen RGBX clients and a server that supports bypass properly (mir_proving_server, not mir_demo_server). That will ensure the server process is never contending for GPU time and won't freeze or stutter.

Daniel van Vugt (vanvugt) wrote :

-Freedreno chip
+Adreno chip

I've been trying to find out what the specified fill rate is for the chip, but would hazard a guess that in advertising 1920x1200 they actually mean it can fill:
  1920 x 1200 x 32bit x 60Hz (553 MB/s)
So that's one GL process, and not a separate fullscreen GL client + GL server (which doubles the required fill rate to over 1GB/s).

Handy then that we have fullscreen bypass to eliminate the server GL load. Perhaps that's what we should rely on in future demos rather than trying to show both a client and server on screen simultaneously.

Another workaround that should work is to lower the screen resolution... Sufficiently lower resolution and you should be able to render a large client and shell simultaneously.

Alberto Aguirre (albaguirre) wrote :

Workaround 1:

Yeah it works, but you need to know what is the maximum fps a specific application can handle - so not scalable.

Workaround 2:
Didn't help in case of eglplasma

In any case, the cpu usage is caused by the driver, dri_flush specifically (actually looks like a bunch of drmCommandWriteRead->drmIoctl so perhaps the time is actually spent in the kernel driver.

Daniel van Vugt (vanvugt) wrote :

Workaround 2 might work for egltriangle but not eglplasma because eglplasma is more of a GPU hog (also not a realistic shader you would use in production).

Daniel van Vugt (vanvugt) wrote :

I've now tried some additional ideas like overscheduling the compositor and giving the server higher priority than the client. Neither works, which is not entirely surprising if the GPU is independent of CPU scheduling.

Daniel van Vugt (vanvugt) wrote :

Dropped priority.

Now we understand this problem, it's just going to be a series of workarounds in Mir, and Chris suggested another good one in the hangout today that might also fix it for radeon and nouveau (bug 1211700). Plus we have some workarounds proven already in comment #20.

Long term we need a proper fix in the driver, so it's a Mesa bug. A solution in theory already exists:
   https://www.usenix.org/legacy/event/atc11/tech/final_files/Kato.pdf

Changed in mir:
importance: High → Medium
Changed in mesa (Ubuntu):
importance: Undecided → High
Changed in mir:
importance: Medium → High
Daniel van Vugt (vanvugt) wrote :

Alberto,

Please try this experiment: lp:~vanvugt/mir/gpu-load-balance
It improves the situation on radeon quite a lot (from below 0.1FPS to 20-40FPS).

Mir CI Bot (mir-ci-bot) wrote :

Fix committed into lp:mir at revision 4072, scheduled for release in mir, milestone 1.0.0

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → In Progress
Changed in mir:
milestone: 1.0.0 → none
status: In Progress → Won't Fix
Daniel van Vugt (vanvugt) wrote :

OK, I am done with this for now. Here are two relevant branches to try:

lp:~vanvugt/mir/testfix-1211700
Proof of concept workaround which also solves bug 1211700 in some cases. Not good enough to release though.

lp:~vanvugt/mir/workaround-1665802
A proper workaround that should work nicely if you set the environment variable it introduces.

I'm proposing that neither of those should land though. This bug is a perfect storm of four things that are not Mir bugs:
  1. Poor quality driver that lacks GPU fair scheduling (bug 1211700).
  2. Slow mobile GPU.
  3. Choice of an over-the-top demo 'eglplasma' that the GPU can only run at 15 FPS at best.
  4. Accidentally failing to trigger fullscreen bypass (which halves the performance potential of the system).

You can fix #3 and #4 by simply not choosing eglplasma on this low-end configuration, and by ensuring that demos are always fullscreen bypassed. Then you can have smooth fullscreen demos, and no workarounds need ever land.

Changed in mir:
status: Won't Fix → Opinion
Daniel van Vugt (vanvugt) wrote :

Alberto:

I think what I was recommending in that last comment is to just not run eglplasma on dragaonboard. It's a poor demo if the best it can get is 15FPS (you just never noticed it was so low). Change your demos to use egltriangle (fullscreen to get bypass) and you won't hit this bug any more, plus you should get 60 FPS.

Long term, you should push for a driver fix (bug 1211700) to resolve it.

Daniel van Vugt (vanvugt) wrote :

P.S. client-side vsync isn't just a good idea that improves performance on non-buggy drivers [2], but apparently also the norm. It is the design used by gtk and clutter too. I had no idea it was so common till recently.

If you disable client-side vsync in Mir then you take latency backwards [1], and take rendering performance (on mature drivers) backwards [2]

[1] https://docs.google.com/spreadsheets/d/1RbTVDbx04ohkF4-md3wAlgmxbSI1DttstnT6xdcXhZQ/pubchart?oid=1566479835&format=interactive
[2] https://bugs.launchpad.net/mir/+bug/1388490

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers