Comment 76 for bug 1750937

Revision history for this message
John Sopko (sopko) wrote : Re: 4.4.0-116 Kernel update on 2/21 breaks Nvidia drivers (on 14.04 and 16.04) by an insufficient compiler!

I manage 300+ machines that run openafs that has a dkms built kernel module like the nvidia module that needs to be built. I also manage dozens of nvidia gpu servers where users have sudo access and can install anything they want. Here is a snippet of what I found. Note this is for 16.04 systems but 14.04 systems running the 4.4.0-116 kernel will have similar problems:

Short story, if your machine is not using the Ubuntu supplied gcc you
will have issues with afs and nvidia built kernel modules or any dkms
built kernel modules. Longer story below.

NOTE! this problem affects at least, openafs, nvidia, virtual box or
any dkms built module. I am going to forward this info to
<email address hidden>. This started with the latest Ubuntu 4.4.0-116
kernel version.

Looking through that bug and testing took me hours. The short story is
the machines having issues with openafs.ko module are ones that have
the Ubuntu toolchain ppa that has a gcc compiler suite that does not
support the "retpoline" feature which was recently put in to fix the
Spectre security issue. The nvidia module will also have issues.

The machines using the Ubuntu supplied gcc compiler are the ones that
are not having issues. But, host olympia was a special case.

The compiler that works, using "gcc -v"

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)

The ones that don't work like host bvisionserver8:

gcc version 5.4.1 20160904

You can use "apt-cache policy gcc" to show what repo the compiler
comes from. WARNING, /usr/bin/gcc is a link to /usr/bin/gcc-5, the gcc
package is a meta package and you need to query gcc-5. If you query
gcc it shows coming from the standard Ubuntu repo but /usr/bin/gcc-5
is coming from the toolchain repo.

A good gcc-5 shows:
----------------------------

classroom:55% apt-cache policy gcc-5
gcc-5:
  Installed: 5.4.0-6ubuntu1~16.04.9
  Candidate: 5.4.0-6ubuntu1~16.04.9
  Version table:
 *** 5.4.0-6ubuntu1~16.04.9 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main
amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main
amd64 Packages
        100 /var/lib/dpkg/status
     5.3.1-14ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

The bad compilers show:
----------------------------------

bvisionserver8:/> apt-cache policy gcc-5
gcc-5:
  Installed: 5.4.1-2ubuntu1~16.04
  Candidate: 5.4.1-2ubuntu1~16.04
  Version table:
 *** 5.4.1-2ubuntu1~16.04 500
        500 http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
xenial/main amd64 Packages
        100 /var/lib/dpkg/status
     5.4.0-6ubuntu1~16.04.9 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main
amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main
amd64 Packages
     5.3.1-14ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

And you can see
/etc/apt/sources.list.d/ubuntu-toolchain-r-ubuntu-test-xenial.list
repo is configure on those machines.

On a good machine modinfo openafs shows that retpoline is turned on in
the vermagic: line:

classroom:56% modinfo openafs
filename: /lib/modules/4.4.0-116-generic/updates/dkms/openafs.ko
license: http://www.openafs.org/dl/license10.html
srcversion: 4E1BEB8CE16072EF8E64542
depends:
vermagic: 4.4.0-116-generic SMP mod_unload modversions retpoline

And not turned on a bad machine:

bvisionserver8:/> modinfo openafs
filename: /lib/modules/4.4.0-116-generic/updates/dkms/openafs.ko
license: http://www.openafs.org/dl/license10.html
srcversion: 66044F5DC18AA3288DB22FF
depends:
vermagic: 4.4.0-116-generic SMP mod_unload modversions