Extra glClear() calls (stellarium 0.15, Raspberry Pi)

Bug #1661375 reported by Eric Anholt on 2017-02-02
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Florian Schaukowitsch

Bug Description

I was wondering why stellarium was so slow on vc4 (Raspberry Pi), and VC4_DEBUG=perf told me that we were getting some extra frame draws due to multiple glClear() calls within a frame. Here's an example of an extraneous glClear call I found in apitrace:

11207 glClearColor(red = 0, green = 0, blue = 0, alpha = 1)
11208 glClear(mask = GL_COLOR_BUFFER_BIT)
11209 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 7)
11210 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 49680, data = blob(49680))
11211 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
11212 glClearColor(red = 0, green = 0, blue = 0, alpha = 0)
11213 glClear(mask = GL_COLOR_BUFFER_BIT)

For vc4 and other tiled renderers, it's really important that we get all the glClear() calls stacked up at the start of the frame before any other drawing has happens. I've got some tricks to recognize a series of glClear()s as a single clear, and I succeed at this one, but non-tiled renderers won't go to the work to merge the two clears you've done here, so it's a waste for them.

The actual vc4 performance penalty is here:

17488 glDrawElements(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_INT, indices = blob(144))
17489 glVertexAttrib3fv(index = 3, v = {0.001041667, 0, 0})
17490 glVertexAttrib3fv(index = 4, v = {0, -0.001851852, 0})
17491 glVertexAttrib3fv(index = 5, v = {-1, 1, 1})
17492 glStencilMask(mask = 255)
17493 glClearStencil(s = 0)
17494 glScissor(x = 0, y = 0, width = 1920, height = 1080)
17495 glClear(mask = GL_STENCIL_BUFFER_BIT)
17496 glDisable(cap = GL_STENCIL_TEST)
17497 glStencilFunc(func = GL_ALWAYS, ref = 0, mask = 255)
17498 glDisable(cap = GL_SCISSOR_TEST)
17499 glColorMask(red = GL_FALSE, green = GL_FALSE, blue = GL_FALSE, alpha = GL_FALSE)
17500 glUseProgram(program = 24)
17501 glEnable(cap = GL_STENCIL_TEST)
17502 glStencilMask(mask = 128)
17503 glStencilOp(fail = GL_KEEP, zfail = GL_KEEP, zpass = GL_REPLACE)
17504 glStencilFunc(func = GL_ALWAYS, ref = 128, mask = 255)

stellarium has done some drawing in the frame, and decides to clear the stencil for the first time. I could potentially notice that the stencil has never been used before, but if I do so then I have the problem that stencil and depth are in the same buffer, so I can't just clear stencil on its own (I would have to draw a full screen quad with func=GL_ALWAYS). I'm going to try to fix vc4 to draw a quad instead of flushing at this point, but we could do better if you cleared color and depth at the start of the frame. Doing so would hurt non-tiled renderers, unfortunately, so maybe the solution would be for me to expose visuals with depth but not stencil (so that you can communicate to the driver that it can always ignore the depth bits)

Attaching the stellarium apitrace trace dump

Related branches

Alexander Wolf (alexwolf) wrote :

Thank you very much for report and for suggest for an apitrace utilite!

tags: added: opengl
gzotti (georg-zotti) wrote :

Thank you Eric to go into these depths. I had tried to build apitrace for sending the trace for the whiteout issue but was stuck with a build error.

I think your VC4 drivers are a big leap forward to great graphics on the small devices! May the final issues be found soon!

What version of Qt do you have to run Stellarium 0.15 on a Pi? Current development has changed the use of a few important OpenGL-related widget classes which only came to completion in Qt5.4. Debian likely is more up-to-date than Raspbian with its Qt5.3.2. When you say, clearing color and depth hurts non-tiles renderers, maybe this is what happens now. We saw a performance loss on Windows, but gain on Linux, both on NVidia hardware (of different generations, though). Rather strange.

I hope my colleagues with more in-depth knowledge of OpenGL can have a look into this. Is it possible to identify tiled from non-tiled renderers and decide at runtime whether to glClear() or not?

Eric Anholt (eric-anholt) wrote :

There's no hint telling you what kind of renderer you have. But not doing that redundant clear in the first case would help all GPUs.

QT is 5.7.1+dfsg-3, but is QT doing all your scene drawing? All this overhead seems to be in your scene, not recognizable toolkit widgets.

gzotti (georg-zotti) wrote :

I am not sure at the moment, we are doing most drawing ourselves, but maybe Qt does the first glClear() hidden in some function?

Alexander Wolf (alexwolf) wrote :

Eric, are you have Scenery3D plugin enabled?

Florian Schaukowitsch (fschauk) wrote :

Note that we had a change of the internal widget structure recently to replace the deprecated QGLWidget with the newer QOpenGLWidget (which also uses an explicit FBO for rendering). This is not in 0.15, and quite possibly affects this.

In 0.15, the first duplicated glClear can easily be removed (we clear in StelSkyItem::paint and then again later in StelCore::preDraw before we have drawn anything).

In the current trunk version, only one glClear is explicitly called by us at all.
I don't think we use the stencil buffer for anything in our rendering, but Qt most likely uses it for rendering the widgets on top of our graphics. It seems to clear it when it starts the widget rendering, this is what your excerpt shows. We can probably not do much about that.

Scenery3d is a whole different beast, but it should not affect the rest of the program if no scene is shown.

r9130 now always clears color/depth/stencil at the start of the frame, can someone with a Raspi test if this commit improved things?

Changed in stellarium:
assignee: nobody → Florian Schaukowitsch (fschauk)
status: New → Confirmed
gzotti (georg-zotti) wrote :

Thank you Florian!

Raspbian still comes with Qt5.3.2, so a build here has the old QGLWidget classes.

I just tried r9131.

I cannot really notice a performance difference. It is still at 10-12 fps (with extra star catalogs downloaded) on a 1920x1200 screen. Framerate is OK for me, but still whenever a planet sphere comes into screen (moon, or I just tried to zoom into Mars or Venus), the shader error fires the dmesg problems described in the VC4 site.


Zooming on the sun seems ok, I see no problems! Is there a decisive difference in shader complexity here?

Alexander Wolf (alexwolf) wrote :

Any news for this report?

tags: added: performance
gzotti (georg-zotti) wrote :

RaspberryPi issues will be followed again as soon as Raspbian includes Mesa17. As of September 2017 it still includes Mesa13 which has an ugly bug.

gzotti (georg-zotti) wrote :

News from this issue, see also https://github.com/anholt/mesa/issues/62.

Instructions: On Raspbian Stretch, you still must build libdrm and Mesa17 as described by Eric Anholt at https://github.com/anholt/mesa/wiki/VC4-complete-Raspbian-upgrade, then build Stellarium 0.16+ from sources, else use installable 0.15.0. On Ubuntu Mate 16.04.3 LTS you must activate VC4 overlay as described at https://ubuntu-mate.community/t/tutorial-activate-opengl-driver-for-ubuntu-mate-16-04/7094, then install latest Stellarium from our ppa.

Open issues:
- rendering of 3D OBJ planets causes driver messages and does not work.
- some out-of-memory problems (?) leading to system freeze, exit-to-black-screen or other bad behaviour. Not sure yet what causes this and how to avoid.

Else Stellarium/trunk works quite well also here now with up to 30fps (typically rather 12-20fps) on a 1400x1050 screen -- faster than my AMD netbook. Just don't overstress it.

But yes, it seems there is some slight performance loss between 0.15.0 and 0.16, maybe with the QOpenGLWidget switch :-(

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.