384.90-0ubuntu0.16.04.2 EGL crashes at startup

Bug #1731968 reported by helltone
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-384 (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-390 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Release: Ubuntu 16.04.3 LTS
Package version: 384.90-0ubuntu0.16.04.2

In the latest driver 384.90-0ubuntu0.16.04.2, EGL initialisation is now broken and the GL context returned is inconsistent leading to crashes. I have prepared a minimised testcase that showcases the problem with the latest driver. I have also made sure it works fine with both previous versions 384.90-0ubuntu0.16.04.1 and 384.81-0ubuntu1 in a clean install.

Here's the code: https://gist.github.com/funchal/bff0a8d6dae5b3ace1a88c392416b5bc

It can be compiled using "gcc main.c -lGL -lEGL". The crash is:

egl 1.4
a.out: main.c:59: main: Assertion `renderer' failed.
Aborted (core dumped)

This is caused by NULL return from glGetString for GL renderer. Note this isn't the only way to cause a crash, for example attempting to use the GL context in other ways will also crash, but this shows the regression in a minimal testcase.

Previous drivers successfully complete the testcase with return code 0.

I have tested this on both a desktop machine with a GTX 1080, and a display-less server with a Tesla K80, with a fresh Ubuntu install.

Revision history for this message
helltone (gafunchal) wrote :

Attaching testcase

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-384 (Ubuntu):
status: New → Confirmed
Revision history for this message
Ilya Melnikov (rayanayar) wrote :
Download full text (4.3 KiB)

I confirm problems for GT640 and GT430.
Version 384.90-0ubuntu0.16.04.1 - works fine.
Version 384.90-0ubuntu0.16.04.2 - occasional crashes of applications.
Version 384.111-0ubuntu0.16.04.1 - crashes (same as previous).

Very often hangs KDE konsole. Checking by primitive testcase:
watch -n 1 konsole -e bash -c echo

After some seconds "konsole" crashes:
Application: Konsole (konsole), signal: Aborted
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Current thread is 1 (Thread 0x7fdfd5433940 (LWP 26581))]

Thread 2 (Thread 0x7fdfc29db700 (LWP 26582)):
#0 0x00007fdfd4eec27d in read () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007fdfc8586073 in ?? () from /usr/lib/nvidia-384/tls/libnvidia-tls.so.384.111
#2 0x00007fdfcc96c6f0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007fdfcc928e74 in g_main_context_check () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007fdfcc929330 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007fdfcc92949c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007fdfd1a3e37b in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#7 0x00007fdfd19e6ffa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#8 0x00007fdfd180f9e4 in QThread::exec() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9 0x00007fdfd5506515 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5DBus.so.5
#10 0x00007fdfd1814808 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007fdfcd0606ba in start_thread (arg=0x7fdfc29db700) at pthread_create.c:333
#12 0x00007fdfd4efc41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fdfd5433940 (LWP 26581)):
[KCrash Handler]
#6 0x00007fdfd4e2a428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#7 0x00007fdfd4e2c02a in __GI_abort () at abort.c:89
#8 0x00007fdfd4e6c7ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fdfd4f85ed8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#9 0x00007fdfd4e7537a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, str=0x7fdfd4f85fe8 "double free or corruption (out)", action=3) at malloc.c:5006
#10 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867
#11 0x00007fdfd4e7953c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#12 0x00007fdfd1a12c1c in QMetaCallEvent::~QMetaCallEvent() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#13 0x00007fdfd1a12c79 in QMetaCallEvent::~QMetaCallEvent() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007fdfd19eb89f in QCoreApplication::removePostedEvents(QObject*, int) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007fdfd1a1502a in QObjectPrivate::~QObjectPrivate() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007fdfd1a151d9 in QObjectPrivate::~QObjectPrivate() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007fdfd1a1d0dc in QObject::~QObject() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#18 0x00007fdfc1ce3299 in...

Read more...

Revision history for this message
Ilya Melnikov (rayanayar) wrote :
Mathew Hodson (mhodson)
tags: added: testcase xenial
tags: added: regression-update
Revision history for this message
Matthew Matl (mmatl) wrote :

This is still a problem in 396.45 on Ubuntu 16.04. The minimal test case fails, and EGL doesn't work cleanly. Tested on clean installation with an Nvidia Titan Xp and Titan X (Pascal).

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-390 (Ubuntu):
status: New → Confirmed
Revision history for this message
Matthew Matl (mmatl) wrote :

This works when installing using the runfile method, but fails for the packaged drivers. So it definitely appears to be a packaging issue specific to Ubuntu. Confirmed to work on Fedora and Arch with 390-series drivers as well.

Revision history for this message
Saxon Druce (saxondruce) wrote :

I've been documenting my attempt to work around this problem here: https://stackoverflow.com/questions/47415198/missing-gl-version-from-glewinit-using-egl/54668271#54668271

I discovered an nvidia blog post - https://devblogs.nvidia.com/linking-opengl-server-side-rendering/ - which says:

"If you want to use EGL context management instead, link against libOpenGL.so and libEGL.so."

After installing the nvidia-410 package, as reported in the original post, the EGL test program from the original post doesn't work if linking against libGL.so:

$ wget https://gist.githubusercontent.com/funchal/bff0a8d6dae5b3ace1a88c392416b5bc/raw/1427821a2390a30779881ab59c55b5550a468919/main.c
$ gcc main.c -lGL -lEGL
$ ./a.out
egl 1.5
a.out: main.c:53: main: Assertion `renderer' failed.
Aborted (core dumped)

However it does work if linking against libOpenGL.so:

$ wget https://gist.githubusercontent.com/funchal/bff0a8d6dae5b3ace1a88c392416b5bc/raw/1427821a2390a30779881ab59c55b5550a468919/main.c
$ gcc main.c -L/usr/lib/nvidia-410 -lOpenGL -lEGL
$ ./a.out
egl 1.5
renderer: Tesla K80/PCIe/SSE2
version: 4.6.0 NVIDIA 410.78

With installing via a runfile as mentioned by mmatl in post #8, the test program works when linking against either libGL.so or libOpenGL.so.

The difference between the packaged and runfile drivers may be due to the dependencies linked into libGL.so (I included a comparison in the stackoverflow post).

The final solution is therefore to link against libOpenGL.so instead of libGL.so.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.