Flickering Moon

Bug #1411958 reported by Nick Fedoseev
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Stellarium
Fix Released
High
gzotti

Bug Description

Moon flickers:
- if Atmosphere is turned On
  and
- if more than 50% of Moon's disk is illuminated.

See
https://www.youtube.com/watch?v=oLnVz1Cm5ic&feature=youtu.be

Related branches

Changed in stellarium:
milestone: none → 0.13.2
importance: Undecided → High
status: New → Confirmed
tags: added: solar-system
Revision history for this message
Alexander Wolf (alexwolf) wrote :

Nick, are you seen it on x64 system?

Revision history for this message
Alexander Wolf (alexwolf) wrote :

I can confirm it on Linux x64, but I don't see flickering on Windows x86.

Revision history for this message
Alexander Wolf (alexwolf) wrote :

I can reproduce it on Windows x64.

tags: added: x64
Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

Yes, x64 version is here.

Revision history for this message
gzotti (georg-zotti) wrote :

OK, I also see it now on the x64 RC4. There is no flicker on MinGW/win32. A real showstopper! :-(

I think there is some numerical extinction happening here, but I dont's see how at the moment if we use doubles only, and why this involves lunar brightness, exclusively. Esp I don't see where to avoid it. There is a division with an end result very close to 1, if some user of this result subtracts 1 and uses the result, strange consequences may be expected. But where in the code around lunar brighness is that?

The previous solution for refraction was numerically wrong but does not cause flicker for the moon on x64. Shall we, if necessary, postpone really fixing it towards 0.14 or further (if at all possible?)

Revision history for this message
Alexander Wolf (alexwolf) wrote :

Calculation for brightness of the Moon you can found into Planet.cpp (lines 1226-1243). For the refraction... Georg, can you make a unit tests for refraction and extinction? It can be helpful here.

Revision history for this message
Alexander Wolf (alexwolf) wrote :

I'm switch airmass calculation to double precision and flikering is gone for me. Can you check latest revision?

Revision history for this message
Alexander Wolf (alexwolf) wrote :

Sorry, I'm was wrong!

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

Being a newcomer, I don't fully understand what's going on, but perhaps the modification described below will help with locating the problem:
Planet.cpp, line 1244
  " mediump float roughness = 1.;\n"

changed the value from 1. to 0.01

  " mediump float roughness = 0.01.;\n"

... and flikering stops.

But I do not like the Moon image in this case.

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

One more observation: (Planet.cpp, line 1244)
    lum = cosAngleLightNormal;
instead of original
   lum = max(0.0, cosAngleLightNormal) * (A + B * max(0.0, gamma) * C) * 2.;

produces nice non-flickering night Moon image, but it is not so good for day times.

Revision history for this message
Alexander Wolf (alexwolf) wrote :

I'm reduce couple coefficients for Oren-Nayar model for rough surfaces and flickering is gone now and brightness of the Moon is like from previous version.

Nick and Georg, please check latest revision (r7331+).

Changed in stellarium:
status: Confirmed → In Progress
Revision history for this message
gzotti (georg-zotti) wrote :

This issue should not have to do with airmass or extinction. The projection geometry has changed due to bugfixing refraction, and now we likely see numerical effects in some vector computations.

I am just downloading Qt5.3.2/64 for MSVC2013...
My best bet would be at changing one or two significant mediump to highp (but which ones? Maybe all down to gamma?), and not destroy the Oren-Najar reflection model for lunar surface roughness! Turning down roughness likely removes the brightness peak at full moon.

From what I have read, however, highp may not be available (or defined away to be equal to mediump) on OpenGL ES2, depending on implementation. If the solution depends on highp, this would again be quite stupid. (like #define double float)

Revision history for this message
Alexander Wolf (alexwolf) wrote :

But model not destroyed. roughness can be set in range 0.0-1.0 (current 0.01, previous 1.0). Coefficient 2.0 for final luminosity? But in original article this coefficient doesn't exists. I don't see visual changes.

Revision history for this message
gzotti (georg-zotti) wrote :

Actually final luminosity should not be 2 or details from texture become lost in glare. (Maybe a global reduction scale should be added to fight this, but its another issue.) 0.01 roughness means almost no retro-reflection, the lunar rim may be dark at full moon, which is visually bad. I have not compiled it yet, though.

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

7333: no flickering but very poor render quality for daylight.

Just to do not waste time for waiting the end of compilation each time, I added the following to Planet.cpp
 QString myfsrc;
 {
  QFile file("fsrc.txt");
   if (!file.open(QIODevice::ReadOnly))
    return;

  myfsrc = file.readAll();
 }

and use myfsrc instead of fsrc for GL.

 arr = "#define IS_MOON\n\n";
 arr+=myfsrc;
etc.

Now the result of GL shader program modification is seen in a couple of seconds .
Just restart Stellarium.

My currrent fsrc.txt is attached.

Also simple startup.scc helps to locate the Moon and set initial time

core.setDate("2015:01:01T21:21:32");
LandscapeMgr.setFlagAtmosphere(true);
core.selectObjectByName("Moon", false);
StelMovementMgr.setFlagTracking(true);

Hope this will be useful for this task.

Revision history for this message
gzotti (georg-zotti) wrote :

OK, switching to highp does not help either, unfortunately.

Removing the rough retroreflecting part is like switching off the flickering tube in a two-tube fluorescent lamp. It does not flicker any longer, but the lamp looks different (has less light in total). Same for the lunar rim here. The O-N reflection model is the right one to use for the lunar surface. Just that some bad effect is going on here. What's the difference between GCC and MSVC here? Still no flicker in MinGW.

Revision history for this message
Alexander Wolf (alexwolf) wrote :

On linux I use GCC and Clang and I get a same behaviour - Moon are flickering.

OK, I'm reverted changes of coefficients for model.

Changed in stellarium:
milestone: 0.13.2 → 0.13.3
Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

May be useful:
filckering does not exist if Moon phase is in range third quater - new - first quater (disk is less than 50% illuminated).

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

Attached is an animation with two states of Moon flickering. The central part is unchanged, looks like image is almost always in the wrong over-brighten state (as described in Bug #1416644), with occasional flicks to normal state.

Revision history for this message
gzotti (georg-zotti) wrote :

Note that the correct state for (almost) full moon is the one with the bright rim here! The overexposure of the moon is another problem, but the shader program explicitly makes the moon appear retroreflective around full moon. Else (with light falloff at the rim) it looks like the usual rough plastic. The amount of retroreflectivity can be controlled with the roughness factor. However what we see is I think buried somewhere in implicit float/double conversions and loss of numerical precision.

What puzzles me most is that it is only visible with 64bit systems. I have seen a hint that GCC may use FPU87 instructions (with 80bit extended precision floating point numbers), while on 64bit systems only 64bit SSE instructions are used. But this is yet unconfirmed! And I don't know yet which vector causes these troubles.

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

Finally found the source of problem.

void Planet::drawSphere(StelPainter* painter, float screenSz, bool drawOnlyRing)
{
...
line 1620:
 StelApp::getInstance().getCore()->getHeliocentricEclipticModelViewTransform()->forward(eyePos);
 projector->getModelViewTransform()->backward(eyePos);

eyePos is occasionally equal to (nan, nan, nan).
Since I'm still a newcomer, I've used a simple patch, which just hides the problem instead of actual bug fix:

 static Vec3d eyePos0;
 if (eyePos[0] != eyePos[0]) // detect for nan
  eyePos = eyePos0; // restore last non-nan value in place of nan
        else
  eyePos0 = eyePos; // Save non-nan value

... and flickering has gone.

Enjoy.

Revision history for this message
gzotti (georg-zotti) wrote : Re: [Bug 1411958] Re: Flickering Moon

Ah, that's great, thank you!

My guess is some remaining asin(x>1) or so somewhere in refraction... Must
look where it is.

Revision history for this message
Alexander Wolf (alexwolf) wrote :

A fix has been committed as revision 7374 of the trunk branch of Stellarium's Bazaar repository at Launchpad: http://bazaar.launchpad.net/~stellarium/stellarium/trunk/revision/7374

Changed in stellarium:
assignee: nobody → Nick Fedoseev (nick-ut2uz)
status: In Progress → Fix Committed
Revision history for this message
Alexander Wolf (alexwolf) wrote :

It's a tentative fix and I hope Georg can find a true fixation in the refraction code.

Revision history for this message
gzotti (georg-zotti) wrote :

Alex or Nick, please try r7378 on 64bit. There is a qdebug in Planet.cpp in Nick's hack-fix area. If all works, the hack can be removed again, and Refraction cleaned up somewhat. I cross my fingers...

Revision history for this message
Alexander Wolf (alexwolf) wrote :

It's fine for me. Nick?

Revision history for this message
gzotti (georg-zotti) wrote :

does the qdebug show in the log? (I hope not...)

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Error!

Program: J:\launchpad-build64\src\Qt5Cored.dll
Module: 5.4.0
File: global\qglobal.cpp
Line: 2810

ASSERT: "fabs(sinGeo)<=1.0" in file J:\launchpad\src\core\RefractionExtinction.cpp, line 117

(Press Retry to debug the application)

---------------------------
Прервать Повтор Пропустить
---------------------------

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

Flickering comes back

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

One more message
---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Error!

Program: J:\launchpad-build64\src\Qt5Cored.dll
Module: 5.4.0
File: global\qglobal.cpp
Line: 2810

ASSERT: "fabs(sinObs)<=1.0" in file J:\launchpad\src\core\RefractionExtinction.cpp, line 157

(Press Retry to debug the application)

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

sinGeo is nan when QASSERT catches the event.
sinObs is nan as well.

See a black hole loop forward() <-> backward(). It is a place, where real values sometimes are transformed to nan.

Revision history for this message
gzotti (georg-zotti) wrote :

Wow! r7379 please. It implies either Vector3.length() causes NaN, or the vector itself comes with NaN's.

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Error!

Program: J:\launchpad-build64\src\Qt5Cored.dll
Module: 5.4.0
File: global\qglobal.cpp
Line: 2810

ASSERT: "length>0.0" in file J:\launchpad\src\core\RefractionExtinction.cpp, line 116

(Press Retry to debug the application)

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

update 7380, still the same:
---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Error!

Program: J:\launchpad-build64\src\Qt5Cored.dll
Module: 5.4.0
File: global\qglobal.cpp
Line: 2810

ASSERT: "length>0.0" in file J:\launchpad\src\core\RefractionExtinction.cpp, line 116

(Press Retry to debug the application)

Revision history for this message
gzotti (georg-zotti) wrote :

Not quite the same, it reveals that it is possible that length()=0, or even elementary NaNs in the position vectors? Some more Q_ASSERTs to trace this, in r7383. This is getting weird.

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

r7383: zoom out to max or find the Moon results in
---------------------------
Microsoft Visual C++ Runtime Library
---------------------------
Debug Error!

Program: J:\launchpad-build64\src\Qt5Cored.dll
Module: 5.4.0
File: global\qglobal.cpp
Line: 2810

ASSERT: "(fabs(altAzPos[0])>0.0) || (fabs(altAzPos[1])>0.0) || (fabs(altAzPos[2])>0.0)" in file J:\launchpad\src\core\RefractionExtinction.cpp, line 121

(Press Retry to debug the application)

Revision history for this message
gzotti (georg-zotti) wrote :

I understand that we have a zero vector then, right? This is crazy. Anything in the logfile for r7384? I make a zenith vector from a zero vector. I have no idea which visible consequences result here!

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

r7384: very slow (11 FPS). No asserts, occasional flickering remains.
The log is attached.

Revision history for this message
gzotti (georg-zotti) wrote :

r7386: The same, just without the qDebug()s. How does the graphic look like?
I wonder where the zero vectors come from, and why they are only on 64bit systems. Also, what happens now when replacing them with zenith vectors?

I did not do anything more to prevent flickering, just try to locate unexpected numbers, and do something so that in case of NaN there are log entries. In the end, NaNs are the problem!
Sorry, I must stop for today. Good luck if you are able to continue on this crazy thing!

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

r7386: The situation becomes worse for debugging. Now no nan is detected. Flickering remains. Still someting wrong with eyepos. Setting eyepos to a fixed value manually stops flickering. How to return back to initial state? At least we had nan as a detector.

Revision history for this message
gzotti (georg-zotti) wrote :

NaN was caused by dividing by vector length, where I assumed that object
position (on the sphere) is always different from zero (centre of sphere).
There are cases in 64bit builds where this assumption is wrong (therefore
AltAzPos.length()==0 and dividing by zero caused NaN instead of a runtime
error, while on 32bit builds it always holds.

I now test if all elements of AltAzPos are zero (which is something I
never expected, when this coordinate should be on the sphere), and set
coordinate to zenith in this case. You can see:
src/core/RefractionExtinction lines 120 and 175. Test and Assert as much
as you can here to track this stupid beast.

As I said, I don't know which consequence setting to zenith has, because I
don't see whose coordinate this is. Maybe the current flicker is caused by
setting position to 0/0/1 i.e. zenith. - I really have no idea why any
object coordinates can become zero, and why only in 64bit-builds. It is
not really cured by trapping NaN or 0/0/1, these are only hints at some
unexpected error condition.

gzotti (georg-zotti)
Changed in stellarium:
status: Fix Committed → In Progress
Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

I hope that there is a math description somewhere. It's too complex for me to understand the meaning "getHeliocentricEclipticModelViewTransform" and it's role for coordinates transformation.

Just dumped eyePos before and after the following transformation:
 StelApp::getInstance().getCore()->getHeliocentricEclipticModelViewTransform()->forward(eyePos);

what is the physical meaning of such a vector like
3.81742e-34 5.35747e-17 1.45342e-17
?????

One can see a very strange behavior. Very smooth value changes "before" vs greatly jittered values "after". 3 x zeroes, of course correspond to flick.

Any comment will be greatly appreciated.

-0.67329 0.719558 1.48537e-05 7.40768e-17 -2.16526e-17 7.98107e-17
-0.67329 0.719558 1.48537e-05 -4.13477e-17 -9.6979e-18 3.57464e-17
-0.67329 0.719558 1.48536e-05 0 0 0
-0.67329 0.719558 1.48536e-05 -7.40778e-17 -3.19227e-17 -9.43443e-17
-0.67329 0.719558 1.48535e-05 -4.13472e-17 -6.32725e-17 2.12127e-17
-0.67329 0.719558 1.48535e-05 3.81742e-34 5.35747e-17 1.45342e-17
-0.67329 0.719558 1.48534e-05 8.26939e-17 1.93956e-17 -7.14946e-17
-0.67329 0.719558 1.48534e-05 8.61452e-18 -1.25282e-17 -1.65838e-16
-0.67329 0.719558 1.48534e-05 0 0 0
-0.67329 0.719558 1.48533e-05 -9.02566e-34 5.35748e-17 1.45338e-17
-0.67329 0.719558 1.48533e-05 4.13464e-17 9.69769e-18 -3.5748e-17
-0.67329 0.719558 1.48532e-05 -4.99584e-17 -1.04318e-16 1.72519e-16
-0.67329 0.719558 1.48532e-05 -7.40807e-17 7.52248e-17 -6.52743e-17
-0.67329 0.719558 1.48531e-05 -8.26919e-17 -7.29701e-17 5.69635e-17
-0.67329 0.719558 1.48531e-05 -3.27355e-17 3.1347e-17 -1.15556e-16
-0.67329 0.719558 1.4853e-05 -1.15427e-16 -4.16233e-17 -5.85914e-17
-0.67329 0.719558 1.4853e-05 -8.60905e-18 -4.10441e-17 1.51305e-16
-0.67329 0.719558 1.48529e-05 -8.60847e-18 -9.46188e-17 1.36772e-16
-0.67329 0.719558 1.48529e-05 4.13452e-17 6.32725e-17 -2.12164e-17
-0.67329 0.719558 1.48528e-05 4.13451e-17 -4.38776e-17 -5.02824e-17
-0.67329 0.719558 1.48528e-05 -7.40832e-17 2.16481e-17 -7.98059e-17
-0.67329 0.719558 1.48528e-05 -1.15428e-16 1.19505e-17 -4.40558e-17
-0.67329 0.719558 1.48527e-05 4.13447e-17 9.69741e-18 -3.575e-17
-0.67329 0.719558 1.48527e-05 -4.13446e-17 -9.69738e-18 3.57502e-17
-0.67329 0.719558 1.48526e-05 -4.13444e-17 -9.69736e-18 3.57504e-17
-0.67329 0.719558 1.48526e-05 -1.15429e-16 -4.16255e-17 -5.85865e-17
-0.67329 0.719558 1.48525e-05 -1.15429e-16 1.19495e-17 -4.40539e-17
-0.67329 0.719558 1.48525e-05 0 0 0
-0.67329 0.719558 1.48525e-05 -4.13439e-17 -6.32725e-17 2.1219e-17

Revision history for this message
gzotti (georg-zotti) wrote :

The (at least approximately normalized) eyepos is transformed to something very small (~1e-17, also on win32/GCC. Such numbers may be "dangerous" already, but there is no flicker here) or, on win64, sometimes directly zero. the backward projection becomes also quite small, after normalisation it is fed to the shader. In case of zero I can only guess it stays at zero. I also don't fully understand this chain of forward/backward projections. But apparently forward allows zeros to happen in 64bit builds, which causes our trouble. Where/why does the rounding occur?

In StelCore.cpp lines 658 and 670 there are copy constructor calls for Refraction. Those have not been implemented explicitly as far as I can see. Do they behave differently in GCC/32 vs MSVC/CLANG/64?

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

If we try to stop time several times, we can catch "dim state" of the Moon.
We see no flickering, just a stable image, which looks much better than "bright" for me.
No blinks means no random processing, no multitasking issues, and finally the ability to reproduce the state easily, just by setting data to caught values.

An alternate way to catch dim state is just playing with F5 dialog when time is stopped. The dim state also can be easily reproduced just changing time forward/backward.

Two sets below are taken by qDebug dump of eyePos. 3x values before forward(), 3x before backward(), 3x after backward():

// dim
-0.679101 0.714234 4.75443e-06 2.66814e-17 0.62864 -0.777696 -0.133013 0.822989 0.548518

// bright
 -0.679101 0.714235 4.75225e-06 1.16685e-16 4.68546e-17 3.64692e-17 0.000167991 -0.00264058 0.000248317

Now it's a time to learn math. What's going on inside forward and backward ? I understand matrix operations, but I can't imagine the reason of having so big output difference for so small input change. We deal with highly unstable matrix conversion. Very unusual case endeed.

Revision history for this message
gzotti (georg-zotti) wrote :

I had just thought to extract matrices from forward- and backward-computations and pre-multiply them before applying to the eyepos. This is however not possible as one of the projectors may be a Refraction itself with a non-Matrix part...

So there is a refraction computation in the forward(eyepos) transformation, but this eyepoint is only used in the Moon shader and some settings in Ring face-culling. The positional difference between refracted and unrefracted positions for illumination angle changes is totally insignificant. Nick or Alex, can you please try Planet.cpp, line ca. 1612:

StelApp::getInstance().getCore()->getHeliocentricEclipticModelViewTransform(StelCore::RefractionOff)->forward(eyePos);

and remove Nick's hack-fix.

Is this enough to avoid the flicker in 64bit builds?

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

After suggested changes:
- flickering is much less frequent, but still exists;
- the flickered (dim) state looks differently and no longer can be caught with F5/time manipulation like described above.

Also NAN no longer appears here, so "dirty hack" can't be used for flicker detection :(

Revision history for this message
Alexander Wolf (alexwolf) wrote :

I have same behaviour as described in Nick's message.

Revision history for this message
gzotti (georg-zotti) wrote :

OK, thanks.
NaN is no longer produced because I avoid the (totally unexpected before) division by zero.

In RefractionExtinction.cpp near line 120 and 176 I currently replace occurences of zero eyepoint by 0/0/1, which apparently creates the different appearance of the flicker-state moon. but it still means there is zero-eyepoint under some circumstances.

Nick, can you try to replace in the line following, "altAz[2]=1;" by simple "return;".

If this fails to stop it, we need to get deeper into the matrix stuff.

Revision history for this message
Alexander Wolf (alexwolf) wrote :

I'm checked solution from comments #45 and #48 and now Moon are not flickering for me.

Revision history for this message
gzotti (georg-zotti) wrote :

Great! I commited this (but left some comments for further testing if required) as r7393. Hopefully this comes to an end now!

Revision history for this message
Nick Fedoseev (nick-ut2uz) wrote :

r7394:
I see no flickering in x64

Changed in stellarium:
assignee: Nick Fedoseev (nick-ut2uz) → gzotti (georg-zotti)
status: In Progress → Fix Committed
Changed in stellarium:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.