384.90-0ubuntu0.16.04.2 EGL crashes at startup

Bug #1731968 reported by helltone on 2017-11-13
This bug affects 7 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers (Ubuntu)
nvidia-graphics-drivers-384 (Ubuntu)
nvidia-graphics-drivers-390 (Ubuntu)

Bug Description

Release: Ubuntu 16.04.3 LTS
Package version: 384.90-0ubuntu0.16.04.2

In the latest driver 384.90-0ubuntu0.16.04.2, EGL initialisation is now broken and the GL context returned is inconsistent leading to crashes. I have prepared a minimised testcase that showcases the problem with the latest driver. I have also made sure it works fine with both previous versions 384.90-0ubuntu0.16.04.1 and 384.81-0ubuntu1 in a clean install.

Here's the code: https://gist.github.com/funchal/bff0a8d6dae5b3ace1a88c392416b5bc

It can be compiled using "gcc main.c -lGL -lEGL". The crash is:

egl 1.4
a.out: main.c:59: main: Assertion `renderer' failed.
Aborted (core dumped)

This is caused by NULL return from glGetString for GL renderer. Note this isn't the only way to cause a crash, for example attempting to use the GL context in other ways will also crash, but this shows the regression in a minimal testcase.

Previous drivers successfully complete the testcase with return code 0.

I have tested this on both a desktop machine with a GTX 1080, and a display-less server with a Tesla K80, with a fresh Ubuntu install.

helltone (gafunchal) wrote :

Attaching testcase

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-384 (Ubuntu):
status: New → Confirmed
Ilya Melnikov (rayanayar) wrote :
Download full text (4.3 KiB)

I confirm problems for GT640 and GT430.
Version 384.90-0ubuntu0.16.04.1 - works fine.
Version 384.90-0ubuntu0.16.04.2 - occasional crashes of applications.
Version 384.111-0ubuntu0.16.04.1 - crashes (same as previous).

Very often hangs KDE konsole. Checking by primitive testcase:
watch -n 1 konsole -e bash -c echo

After some seconds "konsole" crashes:
Application: Konsole (konsole), signal: Aborted
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Current thread is 1 (Thread 0x7fdfd5433940 (LWP 26581))]

Thread 2 (Thread 0x7fdfc29db700 (LWP 26582)):
#0 0x00007fdfd4eec27d in read () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007fdfc8586073 in ?? () from /usr/lib/nvidia-384/tls/libnvidia-tls.so.384.111
#2 0x00007fdfcc96c6f0 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007fdfcc928e74 in g_main_context_check () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007fdfcc929330 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007fdfcc92949c in g_main_context_iteration () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#6 0x00007fdfd1a3e37b in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#7 0x00007fdfd19e6ffa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#8 0x00007fdfd180f9e4 in QThread::exec() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9 0x00007fdfd5506515 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5DBus.so.5
#10 0x00007fdfd1814808 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007fdfcd0606ba in start_thread (arg=0x7fdfc29db700) at pthread_create.c:333
#12 0x00007fdfd4efc41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7fdfd5433940 (LWP 26581)):
[KCrash Handler]
#6 0x00007fdfd4e2a428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#7 0x00007fdfd4e2c02a in __GI_abort () at abort.c:89
#8 0x00007fdfd4e6c7ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fdfd4f85ed8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#9 0x00007fdfd4e7537a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, str=0x7fdfd4f85fe8 "double free or corruption (out)", action=3) at malloc.c:5006
#10 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867
#11 0x00007fdfd4e7953c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#12 0x00007fdfd1a12c1c in QMetaCallEvent::~QMetaCallEvent() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#13 0x00007fdfd1a12c79 in QMetaCallEvent::~QMetaCallEvent() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#14 0x00007fdfd19eb89f in QCoreApplication::removePostedEvents(QObject*, int) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#15 0x00007fdfd1a1502a in QObjectPrivate::~QObjectPrivate() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#16 0x00007fdfd1a151d9 in QObjectPrivate::~QObjectPrivate() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#17 0x00007fdfd1a1d0dc in QObject::~QObject() () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#18 0x00007fdfc1ce3299 in...


tags: added: testcase xenial
tags: added: regression-update
Matthew Matl (mmatl) wrote :

This is still a problem in 396.45 on Ubuntu 16.04. The minimal test case fails, and EGL doesn't work cleanly. Tested on clean installation with an Nvidia Titan Xp and Titan X (Pascal).

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-390 (Ubuntu):
status: New → Confirmed
Matthew Matl (mmatl) wrote :

This works when installing using the runfile method, but fails for the packaged drivers. So it definitely appears to be a packaging issue specific to Ubuntu. Confirmed to work on Fedora and Arch with 390-series drivers as well.

Saxon Druce (saxondruce) wrote :

I've been documenting my attempt to work around this problem here: https://stackoverflow.com/questions/47415198/missing-gl-version-from-glewinit-using-egl/54668271#54668271

I discovered an nvidia blog post - https://devblogs.nvidia.com/linking-opengl-server-side-rendering/ - which says:

"If you want to use EGL context management instead, link against libOpenGL.so and libEGL.so."

After installing the nvidia-410 package, as reported in the original post, the EGL test program from the original post doesn't work if linking against libGL.so:

$ wget https://gist.githubusercontent.com/funchal/bff0a8d6dae5b3ace1a88c392416b5bc/raw/1427821a2390a30779881ab59c55b5550a468919/main.c
$ gcc main.c -lGL -lEGL
$ ./a.out
egl 1.5
a.out: main.c:53: main: Assertion `renderer' failed.
Aborted (core dumped)

However it does work if linking against libOpenGL.so:

$ wget https://gist.githubusercontent.com/funchal/bff0a8d6dae5b3ace1a88c392416b5bc/raw/1427821a2390a30779881ab59c55b5550a468919/main.c
$ gcc main.c -L/usr/lib/nvidia-410 -lOpenGL -lEGL
$ ./a.out
egl 1.5
renderer: Tesla K80/PCIe/SSE2
version: 4.6.0 NVIDIA 410.78

With installing via a runfile as mentioned by mmatl in post #8, the test program works when linking against either libGL.so or libOpenGL.so.

The difference between the packaged and runfile drivers may be due to the dependencies linked into libGL.so (I included a comparison in the stackoverflow post).

The final solution is therefore to link against libOpenGL.so instead of libGL.so.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments