4.4.0-116 Kernel update on 2/21 breaks Nvidia drivers (on 14.04 and 16.04) due to outdated gcc-4.8

Bug #1750937 reported by Bill Miller on 2018-02-22
338
This bug affects 63 people
Affects Status Importance Assigned to Milestone
gcc-4.8 (Ubuntu)
Undecided
Unassigned

Bug Description

Running fine with nvidia-384 until this kernel update came along. When booted into the new kernel, got super low resolution and nvidia-settings was missing most of its functionality - could not change resolution.

Rebooted into 4.4.0-112 kernel and all was well.

The root cause of the problem has been found to be installing the -116 kernel without a sufficiently updated version of gcc. In my case, my system received the gcc update AFTER the kernel update.

Uninstalling the -116 kernel and reinstalling it with the updated version of gcc solved the problem for me.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: xorg 1:7.7+1ubuntu8.1
ProcVersionSignature: Ubuntu 4.4.0-112.135~14.04.1-generic 4.4.98
Uname: Linux 4.4.0-112-generic x86_64
NonfreeKernelModules: nvidia_uvm nvidia_drm nvidia_modeset nvidia
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.111 Tue Dec 19 23:51:45 PST 2017
 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
ApportVersion: 2.14.1-0ubuntu3.27
Architecture: amd64
CurrentDesktop: LXDE
Date: Wed Feb 21 19:23:39 2018
DistUpgraded: Fresh install
DistroCodename: trusty
DistroVariant: ubuntu
DkmsStatus:
 bbswitch, 0.7, 4.4.0-112-generic, x86_64: installed
 bbswitch, 0.7, 4.4.0-116-generic, x86_64: installed
 nvidia-384, 384.111, 4.4.0-112-generic, x86_64: installed
 nvidia-384, 384.111, 4.4.0-116-generic, x86_64: installed
GraphicsCard:
 NVIDIA Corporation Device [10de:1c82] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: eVga.com. Corp. Device [3842:6253]
InstallationDate: Installed on 2015-03-03 (1086 days ago)
InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 (20150218.1)
MachineType: ASUSTeK COMPUTER INC. M11AD
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-112-generic root=UUID=5a88d2a1-0a24-415b-adc2-28435b13248a ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/15/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0302
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: M11AD
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0302:bd08/15/2013:svnASUSTeKCOMPUTERINC.:pnM11AD:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnM11AD:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: M11AD
dmi.product.version: System Version
dmi.sys.vendor: ASUSTeK COMPUTER INC.
version.compiz: compiz 1:0.9.11.3+14.04.20160425-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.67-1ubuntu0.14.04.2
version.libgl1-mesa-dri: libgl1-mesa-dri N/A
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.nvidia-graphics-drivers: nvidia-graphics-drivers N/A
version.xserver-xorg-core: xserver-xorg-core N/A
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati N/A
version.xserver-xorg-video-intel: xserver-xorg-video-intel N/A
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau N/A
xserver.bootTime: Wed Feb 21 18:48:14 2018
xserver.configfile: default
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:

xserver.version: 2:1.18.3-1ubuntu2.3~trusty4

Bill Miller (wbmilleriii) wrote :
Bill Miller (wbmilleriii) wrote :

In the kernel log when I booted with the -116 kernel, I found this

nvidia: version magic '4.4.0-116-generic SMP mod_unload modversions ' should be '4.4.0-116-generic SMP mod_unload modversions retpoline '

This line doesn't occur in the kernel log for the -112 kernel, the case when the driver works.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xorg (Ubuntu):
status: New → Confirmed
mtron (mtron) wrote :

same here on trusty with kernel 4.4.0-116-generic and NVIDIA Driver 340.102 on a GeForce GT 420

Same here with the 384.89 driver (Trusty with 4.4.0-166-generic kernel)

nurgazy (nurgasemetey) wrote :

Same here with 384.111 driver (Linux Mint with 4.4.0-116-generic kernel).

Dmesg

[ 119.939634] nvidia: version magic '4.4.0-116-generic SMP mod_unload modversions ' should be '4.4.0-116-generic SMP mod_unload modversions retpoline '
[ 119.943728] nvidia: version magic '4.4.0-116-generic SMP mod_unload modversions ' should be '4.4.0-116-generic SMP mod_unload modversions retpoline '

reetp (jcrisp) wrote :

Same here on a variety of machines

Nvidia 340.102 on Nvidia NVS 300

Bernhard (baumber) wrote :

It affects me too:

Nvidia NVS 300 and GT 430 . Nvidia 340.x and 384.111

Tim Habigt (narfg) wrote :

It seems to me that all modules that were re-built with DKMS are affected by this. For example, I get the same error message when I try to load the "vboxdrv" module from VirtualBox.

Mythological (mythological2) wrote :

One more to report, this one got my son's computer, an older Asrock machine. Uninstalling the nVidia drivers allowed booting into the desktop but then when none of the available nVidia drivers worked we made the mistake of trying to install the Linux driver from nVidia's site. Apparently somewhere in the process it uninstalled the Noveau drivers and now we can't even log into the system except via ssh.

I then tried reinstalling the Noveau package xserver-xorg-video-nouveau and it claims it cannot be installed because

The following packages have unmet dependencies:
xserver-xorg-video-nouveau : Depends: xorg-video-abi-15
                             Depends: xserver-xorg-core (>= 2:1.14.99.902)
                             Recommends: libgl1-mesa-dri (>= 9.0)

If I tried to reinstall the xorg/xserver stuff, no matter what I tried I kept getting these:

The following packages have unmet dependencies:
unity-control-center : Depends: libcheese-gtk23 (>= 3.4.0) but it is not going to be installed
                       Depends: libcheese7 (>= 3.0.1) but it is not going to be installed
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.

But if I try to install those I get:

libcheese-gtk23 is already the newest version.
libcheese7 is already the newest version.

And if I tried to force install them, it showed that it is already running libcheese-gtk23:amd64 (3.10.2-0ubuntu2) and libcheese7:amd64 (3.10.2-0ubuntu2) which are both higher versions than what xorg seems to require, so I don't understand what is going on here. Anyway, at this point the system is pretty much totally broken as far as getting into the desktop is concerned.

Is there any chance you can fix this and get a new kernel update out very soon, like tonight or tomorrow? Because if not, it looks like I may have to reinstall from scratch, and even then I'm not entirely certain I won't run into the exact same issue.

Bill Miller (wbmilleriii) wrote :

Mythological, instead of waiting for a new kernel, you could install the -112 instead, it still works fine.

gumbeto (gumbeto) on 2018-02-22
summary: - 14.04 Kernel update on 2/21 breaks Nvidia drivers
+ 4.4.0-116 Kernel update on 2/21 breaks Nvidia drivers (on 14.04 and
+ 16.04)

Same thing here on 16.04, with nvidia driver 384.111.
Kernel 4.4.0-112: no problems
kernel 4.4.0-116: Low res and can't login on graphical mode.

I have tried to uninstall and reinstall nvidia drivers to no effect.

I did a diff for dmesg output on both cases. The relevant bits are:

> Spectre V2 mitigation: Mitigation: Full generic retpoline
> Spectre V2 mitigation: Speculation control IBPB not-supported IBRS not-supported

and

< nvidia: module license 'NVIDIA' taints kernel.
< Disabling lock debugging due to kernel taint
< nvidia: module verification failed: signature and/or required key missing - tainting kernel
---
> nvidia: version magic '4.4.0-116-generic SMP mod_unload modversions ' should be '4.4.0-116-generic SMP mod_unload modversions retpoline '
> nvidia: version magic '4.4.0-116-generic SMP mod_unload modversions ' should be '4.4.0-116-generic SMP mod_unload modversions retpoline '

Mythological (mythological2) wrote :

Bo9b Miller, how do I install "-112"? I am afraid I have already dug myself into so deep a hole I don't want to make any more mistakes, and I have no idea what you mean by "-112". Please explain, or point me to a page that does.

Mythological (mythological2) wrote :

Bob Miller, how do I install "-112"? I am afraid I have already dug myself into so deep a hole I don't want to make any more mistakes, and I have no idea what you mean by "-112". Please explain, or point me to a page that does.

Bill Miller (wbmilleriii) wrote :

Mythological, you have to fix the system to the point that you can install packages. I am not sure you are that point yet from your description. Once you are there, you can install an old kernel by following the procedure outlined here https://askubuntu.com/questions/928146/install-an-older-kernel-14-04-lts If you can only log into a terminal, you need to identify the 4 kernel packages you want and install them with sudo apt install [packagenames]

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-384 (Ubuntu):
status: New → Confirmed
Mythological (mythological2) wrote :

Bill Miller, I can ssh into the system and install packages - apt-get still works, etc. Right now the only thing I can't do is get into the desktop; it shows a login screen but won't accept my password (even though my password works fine in ssh). I think it is because when I tried to install the nVida Linux drivers it removed Noveau, and I can't seem to get it back.

What I am not understanding is what you meant by "the -112" - I assume you are talking about a version number or something, but of what? I went to that link but they are using Synaptic and I have no way to get to that because I can't get into the desktop. Also, unlike that guy, as far as i know I did not "made the mistake of purging the functional kernel" which means that the previous kernel is probably still around on the hard drive, I just don't know how you would set it up to boot into the previous kernel rather than the new kernel.

Rather than continue this discussion here I suggest we move it to https://ubuntuforums.org/showthread.php?t=2385621&p=13742281#post13742281 as this is probably not appropriate for the bug tracker, and I suspect many others are affected by this and would like to find a solution.

Erik Kratzenberg (erik5) wrote :

This also seems to affect ZFS on Linux modules:

zavl: version magic '4.4.0-116-generic SMP mod_unload modversions ' should be '4.4.0-116-generic SMP mod_unload modversions retpoline '

Wadcutter (wadcutter) wrote :

Not only same video issue as others on Trusty with Nvidia 340.102 driver for GEForce 8400GS, but am also unable to log in.Entering PW just keeps flipping me back to "Enter PW box". Old 112 kernel works just fine so I will stick with that until a fix is available.

Jason (linuxguy39) wrote :

Add me to the List of People with this problem, nvidia 384.111 worked great until this kernel now same problem as original poster. Had to go back to 112 to get it to work.

chipschap (bobnewell) wrote :

FWIW some NVIDIA-related changes were apparently made in 114:

2018-02-09 - Khalid Elmously <email address hidden>
linux (4.4.0-114.137) xenial; urgency=medium
* linux: 4.4.0-114.137 -proposed tracker (LP: #1748484)
* ALSA backport missing NVIDIA GPU codec IDs to patch table to
Ubuntu 16.04 LTS Kernel (LP: #1744117)
- ALSA: hda - Add missing NVIDIA GPU codec IDs to patch table
* Shutdown hang on 16.04 with iscsi targets (LP: #1569925)

I don't know if this caused the problem, but regressing from 116 to 112 also solved it for me (after about an hour of utter frustration).

Kaan Akşit (kunguz) wrote :

Having the exact same issue with nvidia-390 package at my Ubuntu 16.04, rolled back to nvidia-387 and had the same issue. And later I rolled back nvidia-384, and still the same issue.

Then I downloaded NVIDIA-Linux-x86_64-384.111.run from Nvidia's website to see if I am able to get it up and running as Jason claimed in his earlier post. Started installing the manual driver by killing the x server. It prompt as unable to load the kernel module 'Nvidia.ko'. Simply installation failed.

I am curious to find out how Jason switched to that version of Nvidia drivers.

So for the one last time I tried removing all nvidia-* packages, and tried installin nvidia-331 package from the repository... and the result is the same on my end.

Mythological (mythological2) wrote :

Kaan Akşit: You are going down the same rabbit hole that I did and it won't work, and will just mess up your system. The problem is not the nVidia driver, it's the kernel upgrade. Go to the thread at https://ubuntuforums.org/showthread.php?t=2385621&p=13742281#post13742281 and you will see what you need to do.

This does not only affect the NVIDIA modules, but seems to affect all kernel modules built on the system itself. (Meaning all DKMS-built modules and for example VMWare Workstation/Player modules)

Nucleo (admar505) wrote :

ok so boot in to old 112 is ok, but not ideal. this also may be affecting some ethernet/usb connections as well.

Jay Flory (jayflory2) wrote :

The newer Ubuntu kernel has the new retpoline Spectre mitigation. In order to load, your kernel modules will need to be compiled with a version of gcc that supports retpoline. You can determine if the module has it with the command "modinfo <module>" and look for the version magic.

If you are using DKMS then your kernel modules should have updated automatically. However this depends on the version of gcc in use.

If you are using a Ubuntu version of gcc, then gcc probably updated when you got the newer kernel. The changes to gcc necessary to support retpoline should have been backported to most active versions of gcc. However if you have installed a custom version of gcc then your kernel module probably will not build correctly.

Bill Miller (wbmilleriii) wrote :

Jay Flory, that's important information and sounds like it can lead those of us affected to a solution. However, my version of gcc is the default one for 14.04, I do not have a custom version, yet I still have this problem. Do you know what version of gcc I should install?

Bill Miller (wbmilleriii) wrote :

Looking at apt logs, it appears my system got a gcc update the day AFTER the kernel was pushed out. I'll try reinstalling the kernel and see if that makes a difference.

Tim Habigt (narfg) wrote :

Yes, I also got the gcc update after the kernel update. With the new gcc, I could fix the Nvidia kernel module.

Here's how:
# Show the current vermagic string of the kernel module
modinfo nvidia-384 -k 4.4.0-116-generic

# It will probably say:
# vermagic: 4.4.0-116-generic SMP mod_unload modversions
# Here the retpoline string is missing.
# You can fix this by removing and re-building the module with DKMS

sudo dkms remove nvidia-384/384.111 -k 4.4.0-116-generic
sudo dkms install nvidia-384/384.111 -k 4.4.0-116-generic

# After that the modinfo command will show
# vermagic: 4.4.0-116-generic SMP mod_unload modversions retpoline

You can now use the new kernel. You will probably have to do this with all your DKMS kernel modules.

Bill Miller (wbmilleriii) wrote :

Jay Flory, your information was the key! I removed the -116 kernel and reinstalled it, and the graphics are fine now. It looks like this was indeed due to a timing problem between when my system got the gcc update and the kernel update.

It appears that I'll have to reinstall virtualbox as well.

Thanks so much for sharing your insight.

Jay Flory (jayflory2) wrote :

Unfortunately I don't currently have a 14.04 installation so I don't know which versions of gcc are available to you. I am using the package "gcc-5" from the Xenial (16.04) repositories and it is working. You may also want to check:

1. "which gcc"
a. Make sure this points to the version of gcc that you expect.
b. '/usr/bin/gcc'
2. "gcc --version"

Shouldn't the -116 kernel be blacklisted? On a fresh install I'm still seeing the -116 update available.

Bill Miller (wbmilleriii) wrote :

As long as you have the updated gcc installed before you install the kernel, it works fine.

gumbeto (gumbeto) wrote :

That was it Jay Flory! I too had a different version of gcc in my 16.04 installation (an upgrade I got from the ubuntu-toolchain-r ppa, which I used in order to test something with gcc-7).

I use Timeshift, so I was able to go back to before adding the ppa. Upgrading normally from that point, while using the gcc that came from the official repos (gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609) I did not have any problems any more: everything loaded happily with -116.

Thanks :)

How would the average person know to (or know how to) avoid this problem, before it happens?

Is there something in the -116 kernel that's so great, and time sensitive that it's worth putting folks through the huge inconvenience of tracking down the root cause and booting into a previous kernel?

Bill Miller (wbmilleriii) wrote :

The -116 kernel contains protections against the Spectre / Meltdown exploits. So yeah, it's pretty great.

Clifford Mather (cjmather) wrote :

I took the *-116 kernel a few days ago on my 16.04.3 desktop, got a mouse freeze after booting, and went back to *-112.

I see from my dpkg.log that version 5.4.0-6ubuntu1~16.04.9 of gcc installed over a day before the *-116 kernel. I see also from https://ubuntu.pkgs.org/16.04/ubuntu-updates-main-amd64/gcc-5_5.4.0-6ubuntu1~16.04.9_amd64.deb.html that the .8 version added support for thunk-extern, which is needed to build the Spectre retpoline changes. So at least in my case, the gcc version seemed sufficient to build DKMS drivers at the time of the *-116 kernel upgrade.

I also am seeing kernel panics from other DKMS drivers on the *-116 kernel, and on other kernels that have merged the Spectre retpoline changes.

Brad Figg (brad-figg) wrote :

@cjmather
If you are sure that you have the latest gcc and you are still seeing kernel panics with the -116 kernel, please file a new bug and report back here what that bug # is.

Glen Pike (glenpike) wrote :

I had the same problem with my -116 kernel upgrade.

So I am running `gcc version 5.4.1 20160904 (Ubuntu 5.4.1-2ubuntu1~16.04)` but this was already installed it seems - I can't see in the dpkg.log* files when this was added. I am assuming that this is a newer version of GCC than is needed, but how do I know it contains the reptoline stuff needed to compile my dkms modules - as I said, it didn't come down with the update.

Can you just clarify that I should uninstall & reinstall my -116 kernel to fix the issue, or do I need a different version of GCC?

Thanks

description: updated
21 comments hidden view all 101 comments
Jacek Wieczorek (mrjjot) wrote :

Following instructions provided by @cjjefcoat, I managed to get kernel -116 running.
It turns out that just removing the "ppa:ubuntu-toolchain-r/test" ppa and reinstalling the gcc-4.8 wasn't sufficient to get it downgraded to "gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4". Anyway, *purging* the ppa did the trick.
Thanks!

Bronson (bronson-philippa) wrote :

The instructions provided by @cjjefcoat also fixed it for me.

Paul Lieverse (paul-lieverse) wrote :

Had the same issue on 14.04 with HWE kernel.

Last Friday morning I got offered the -116 kernel - which broke the nvidia driver module, forcing me to revert back to the -112 kernel.

Only today (not checked during the weekend) I got offered the compiler update, from 4.8.4-2ubuntu1~14.04.3 to 4.8.4-2ubuntu1~14.04.4.

Didn't re-install the -116 kernel yet, but given other people's experiences I trust that should now go fine.

However, it does show that there seems to be some dependency or distribution issue between this kernel and the required compiler version, offering the kernel update before the compiler update was available, and not holding off the kernel update if the compiler update isn't in place yet.

Alberto Milone (albertomilone) wrote :

It seems like the kernel and the nvidia driver weren't built using the same compiler.

Please, either ssh into your machine, or press CTRL+ALT+F2 and do the following:

1) Make sure security updates are enabled, and update your system ("apt-cache policy gcc-5" should report 5.4.0-6ubuntu1~16.04.9 on Xenial)

2) Make sure no driver from NVIDIA's website is installed. If it is, please uninstall it first.

3) When gcc-5 is updated, re-install the nvidia package from the archive:
sudo apt-get install --reinstall nvidia-384 (or nvidia-340)

4) Restart your computer.

Santiago (santiagobanchero) wrote :

I resolved the problem with #57 instructions.
Thanks!

Silvano (spaccabyte) wrote :

trying the comment 57

sudo ppa-purge ppa: ubuntu- toolchain- r / test
Updating packages lists
PPA to be removed: ppa: ppa:
Warning: Could not find package list for PPA: ppa: ppa:

Mythological (mythological2) wrote :

Silvano (spaccabyte), if you never added that PPA (and the majority of Ubuntu users will not have) then there is nothing to remove. In that case, you probably have the "standard" gcc and could just follow the instructions at https://ubuntuforums.org/showthread.php?t=2385621&s=f41dc2da85d14b7d5854118ae4594254&p=13742673#post13742673 in post #7. You may need to exit and restart Synaptic during the process, if it won't let you reinstall the kernel after removing it.

henczati (henczati) wrote :
Download full text (3.2 KiB)

As mentioned in #47, same problem on trusty + xenial HWE stack (14.04.5).

Additionally, had a gcc version mismatch, don't know where that came from.

Meta package gcc was 4:4.8.2-1ubuntu6, but /usr/bin/gcc (and corresponding package gcc-4.8) was 4.8.5-1ubuntu1~14.04 .
I also had installed just gcc-4.9-base (without even gcc-4.9) due to some dependencies.

Went back to -112 kernel.

Tried #57 (@cjjefcoat) steps, but did not trigger a driver rebuild.

I had a lot of PPA-s and manual file-wrapped deb lines in /etc/apt/sources.list.d/.
I had ubuntu-toolchain-r/ppa, not ubuntu-toolchain-r/test, but IIRC purging failed somehow (cant remember, couple of days ago), probably due to same mismatch.

So just deleted all entries from /etc/apt/sources.list.d/, manually cleaned /etc/apt/sources.list, updated repo cache.

Then tried to force-install (downgrade) gcc-4.8 and gcc-4.8-base to 4.8.4-2ubuntu1~14.04.4, to make it in sync with the gcc package again.
After force-picking all the needed dependencies, I ended up with
sudo apt-get install gcc-4.8 gcc-4.8-base:i386=4.8.4-2ubuntu1~14.04.4 gcc-4.8-base:amd64=4.8.4-2ubuntu1~14.04.4 g++-4.8=4.8.4-2ubuntu1~14.04.4 libstdc++-4.8-dev=4.8.4-2ubuntu1~14.04.4 libgcc-4.8-dev=4.8.4-2ubuntu1~14.04.4 cpp-4.8=4.8.4-2ubuntu1~14.04.4 libgl1-mesa-dri libgfortran3=4.8.4-2ubuntu1~14.04.4 libstdc++6=4.8.4-2ubuntu1~14.04.4 libquadmath0=4.8.4-2ubuntu1~14.04.4 libgomp1=4.8.4-2ubuntu1~14.04.4 libitm1=4.8.4-2ubuntu1~14.04.4 libatomic1=4.8.4-2ubuntu1~14.04.4 libasan0=4.8.4-2ubuntu1~14.04.4 libtsan0=4.8.4-2ubuntu1~14.04.4

Somewhere in the process (do not remember when) the nvidia driver, dkms, the HWE stack (kernel + xorg) and some other stuff got uninstalled (due to missing dependencies).
Probably I could have avoided that and was just too strict on removing potentially screwed up stuff.
I think I totally removed gcc-4.8 or even all of gcc before reinstalling with the (assumed) correct version.

So after "aligning" gcc, I installed back the xenial kernel (a'la ubuntu wiki HWE page, for multiarch):
sudo apt-get install --install-recommends linux-generic-lts-xenial xserver-xorg-core-lts-xenial xserver-xorg-lts-xenial xserver-xorg-video-all-lts-xenial xserver-xorg-input-all-lts-xenial libwayland-egl1-mesa-lts-xenial libgl1-mesa-glx-lts-xenial libgl1-mesa-glx-lts-xenial:i386 libglapi-mesa-lts-xenial:i386

There the amd64 and i386 versions of libgl1-mesa-dri-lts-xenial had a /etc/drirc conflict.
Apt was useless, just returned a dpkg fail (with or without -f).
Overwriting /etc/drirc with /etc/drirc.dpkg-new was not enough, had to remove /etc/drirc alltogether.
THEN on `sudo apt-get install -f` apt asked me to resolve /etc/drirc conflict using apt-s command line query.
One said Unigine Heaven 3.0 this and that setting, the other said "Unigine Heaven 3.0 and older contain too many bugs and can't be supported [...]".
Went with the second, seemingly newer one.

Then came the nvidia driver (with forcing the dependencies previously removed by apt):
sudo apt-get -f install nvidia-340 dkms acpid lib32gcc1 libc6-i386 nvidia-prime nvidia-opencl-icd-340

This compiled the driver and
modinfo nvidia-340 -k 4.4.0-116-generic | grep vermagic
said ret...

Read more...

Fox Liu (tianxiang1989) wrote :

@cjjefcoat
Thanks for your advice,I have meet the same issue in virtualbox after the ubuntu 14.04 kernel update to 116.

After downgrade the gcc to 4.8.4, and reinstall the 116 kernel and virtualbox 5.2,it work!

Vincent (v.+) wrote :

Same here error with edvi's DisplayLinkManager process to get a displaylink screen from a USB 3.0 hub/docking station like https://www.anker.com/store/USB-3.0-Docking-Station/68ANDOCKS-BA

went back to 4.4.0-112 to work.
Will wait future 4.4.0-(>116) kernel to retry.

are you sure it has arrived with _some_ gcc version already.

publications indicate its very experimental still and i can not see any of the typical switches beeing honored by the gcc versions on my machine. (and i was able to switch them forth and back using "update-alternatives") needless to say that even after a few attempts a call to "modinfo" does not at all report the "retpoline" option for the dkms built nvidia kernel modules (using -384 on top of 4.4.0-116).

https://www.phoronix.com/scan.php?page=news_item&px=GCC-7-Gets-Retpolines
https://www.phoronix.com/scan.php?page=article&item=gcc8-mindirect-thunk&num=1

can you please be more specific with the compiler versions you are using? please give exact versions. thank you.

are you thinking of something like this?

https://launchpad.net/~jonathonf/+archive/ubuntu/gcc-7.3

switching to "beta" version of gcc (Ubuntu 7.3.0-5ubuntu1~14.04.york0) 7.3.0
and rebuilding the kernel modules in question for 4.4.0-116 seemingly did the job.

is this really what you wanted from a good amount of the end users by publishing those kernel?

summary: 4.4.0-116 Kernel update on 2/21 breaks Nvidia drivers (on 14.04 and
- 16.04)
+ 16.04) by an insufficient compiler!
John Sopko (sopko) wrote :
Download full text (3.6 KiB)

I manage 300+ machines that run openafs that has a dkms built kernel module like the nvidia module that needs to be built. I also manage dozens of nvidia gpu servers where users have sudo access and can install anything they want. Here is a snippet of what I found. Note this is for 16.04 systems but 14.04 systems running the 4.4.0-116 kernel will have similar problems:

Short story, if your machine is not using the Ubuntu supplied gcc you
will have issues with afs and nvidia built kernel modules or any dkms
built kernel modules. Longer story below.

NOTE! this problem affects at least, openafs, nvidia, virtual box or
any dkms built module. I am going to forward this info to
<email address hidden>. This started with the latest Ubuntu 4.4.0-116
kernel version.

Looking through that bug and testing took me hours. The short story is
the machines having issues with openafs.ko module are ones that have
the Ubuntu toolchain ppa that has a gcc compiler suite that does not
support the "retpoline" feature which was recently put in to fix the
Spectre security issue. The nvidia module will also have issues.

The machines using the Ubuntu supplied gcc compiler are the ones that
are not having issues. But, host olympia was a special case.

The compiler that works, using "gcc -v"

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)

The ones that don't work like host bvisionserver8:

gcc version 5.4.1 20160904

You can use "apt-cache policy gcc" to show what repo the compiler
comes from. WARNING, /usr/bin/gcc is a link to /usr/bin/gcc-5, the gcc
package is a meta package and you need to query gcc-5. If you query
gcc it shows coming from the standard Ubuntu repo but /usr/bin/gcc-5
is coming from the toolchain repo.

A good gcc-5 shows:
----------------------------

classroom:55% apt-cache policy gcc-5
gcc-5:
  Installed: 5.4.0-6ubuntu1~16.04.9
  Candidate: 5.4.0-6ubuntu1~16.04.9
  Version table:
 *** 5.4.0-6ubuntu1~16.04.9 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main
amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main
amd64 Packages
        100 /var/lib/dpkg/status
     5.3.1-14ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

The bad compilers show:
----------------------------------

bvisionserver8:/> apt-cache policy gcc-5
gcc-5:
  Installed: 5.4.1-2ubuntu1~16.04
  Candidate: 5.4.1-2ubuntu1~16.04
  Version table:
 *** 5.4.1-2ubuntu1~16.04 500
        500 http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
xenial/main amd64 Packages
        100 /var/lib/dpkg/status
     5.4.0-6ubuntu1~16.04.9 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main
amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main
amd64 Packages
     5.3.1-14ubuntu2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

And you can see
/etc/apt/sources.list.d/ubuntu-toolchain-r-ubuntu-test-xenial.list
repo is configure on those machines.

On a good machine modinfo openafs shows that retpoline is turned on in
the vermagic: line:

classroom:56% modinfo openafs
filename: /lib/modules/4.4.0-116-generic/updates/dkms...

Read more...

ke bai (beckybai) wrote :

Solve the problem with the instruction from #57 under Ubuntu 16.04.4 LTS.

At first, I cannot find the driver when I typed nvidia-smi. So I try to reinstall the nvidia driver. In this process, I met the absolutely same problem as

https://devtalk.nvidia.com/default/topic/1030325/nvidia-driver-installation-nvidia-version-magic-4-4-0-116-generic-smp-mod_unload-modversions-should-be-4-4-0-116-generic-smp-mod_unload-modversions-retpoline-/

And I also cannot log in and get to the desktop.

Thanks so much!

Bryce Schober (bryce-schober) wrote :

On Ubuntu 14.04, I've installed the ubuntu-toolchain-r/test ppa in order to get a gcc-5 compiler, and the assumedly-conflicting 4.8.5-2ubuntu1~14.04.1 came from that PPA along for the ride.

Is there a better PPA option for installing gcc-5 on Ubuntu 14.04?

gumbeto (gumbeto) wrote :

Bryce Schober (bryce-schober): I had a very similar issue, but on 16.04 and wanting gcc-7. Adding the the toolchain ppa made gcc-5 subject to upgrading, triggering this issue. I managed to deal with it with package pinning and aptitude (see comment 58).

Here are a few more details if in case you want to try this yourself (I recommend using Timeshift or similar in case you want to go back).

$ cat /etc/apt/preferences.d/toolchain.pref
Package: *
Pin: release o=LP-PPA-ubuntu-toolchain-r-test
Pin-Priority: 60

Here are some quick instructions on how to use aptitude: https://makandracards.com/makandra/47558-how-to-install-packages-from-newer-ubuntu-releases#planning-the-upgrade

Arna Ghosh (arnatubai) wrote :

Hello everyone,
Thanks to your answers and this discussion. I ran into this problem yesterday. Although I tried the instructions by @cjjefcoat on #57, it didn't give me back my display as expected. Unity got screwed up and although I could see the display, the rendering was not proper. The windows didn't have the title bar and most keyboard functionality won't work. I didn't even have the task bar.
I went back to -112 version of the kernel but only found the same result. So I reinstalled all NVIDIA drviers and uninstalled CUDA with no luck at all.
Finally I tried restarting Unity by following #14 here - https://bugs.launchpad.net/ubuntu/+source/sddm/+bug/1569357/comments/14

and it worked for me :D
Now I tried running everything on 116 kernel and it works. So in case unity screws up for any of you after you reinstall the 116 kernel, try restarting Unity and hope it works :D

Cheers

lithorus (lithorus) wrote :

I had the same problem with zfs too.

Changed :
CONFIG_RETPOLINE=y
to
CONFIG_RETPOLINE=n
in
/boot/config-4.4.0-116-generic

and reinstalled the nvidia/zfs kernel modules.

To me it seems that the config file shipped with the kernel is not matching the version used for compiling.

I had the same problem however I don't have a Nvidia driver (Intel HD Graphics instead)

I got this message before enter a loop:

video: version magic '3.13.0-143-generic SMP mod_unload modversions ' should be '3.13.0-143-generic SMP mod_unload modversions retpoline '
drm: version magic '3.13.0-143-generic SMP mod_unload modversions ' should be '3.13.0-143-generic SMP mod_unload modversions retpoline '

BETLOG (betlog-hax) wrote :

Yep, mine reverts to 1024x760 76Hz and has no options whatsoever.
It's a bit of a showstopper.

dmesg:
nvidia: version magic '3.13.0-143-generic SMP mod_unload modversions ' should be '3.13.0-143-generic SMP mod_unload modversions retpoline '

works:
Linux betlogbrick 3.13.0-142-generic #191-Ubuntu SMP Fri Feb 2 12:13:35 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

fails:
*3.13.0-143*

Could we please know when this bug is going to be fixed at last?

If is not uncommon, actually quite common I would say, to have a NVIDIA graphic card and a more recent version of GCC installed than the one in the Ubuntu distribution, which is quite old. And then the machine is broke when the new kernel is installed!

All the fixes in the above risk breaking the machine irreparably. Even reinstalling Ubuntu from scratch may not work in the long run, if the bug is not fixed in the new kernel version, because one also need to use the most recent versions of GCC and thus the problem is bound to reoccur. And having to keep using kernel 112 also looks quite unsatisfactorily as a long term solution.

We need the matter to be fixed once and for all.

Sebastien (olorinseb) wrote :

I'm running Ubuntu 14.04 kernel 3.13.0-143-generic.
I have the infinite login loop problem.
On older kernel (-142) everything still working correctly

I have gcc 4.8.5

fu2qi4 (fu2qi4) wrote :

How to install the latest gcc compiler without pain

sudo add-apt-repository --remove ppa:ubuntu-toolchain-r/test # the best remedy whatsoever
sudo add-apt-repository --remove ppa:jonathonf/gcc-7.2 # (also broken https://askubuntu.com/questions/1009433/dependency-issues-while-installing-gcc-7-3-from-jonathon-fs-ppa)

sudo add-apt-repository -y ppa:jonathonf/gcc
sudo apt-get update
sudo apt-get install gcc-8 g++-8 # will install 7.3.0-5ubuntu1~16.04.york0
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 40 --slave /usr/bin/g++ g++ /usr/bin/g++-8

As Asus ROG user i'm aware of ongoing struggle with Nvidia driver, which is in conflict with Devian-based systems (https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=nvidia-driver;dist=unstable https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852152 ).

I disable NVidia card editing boot config

(Restart your computer and enter BOIS (hold down f2)
Change boot priority to boot to your USB Stick first.
Save and reboot
When the grub boot loader menu is visible highlight “Try Ubuntu GNOME without
installing” (use your up and down arrows)
 Press “e” on your keyboard, this will allow you to edit the boot config.
nouveau.modeset=0 tpm_tis.interrupts=0 acpi_osi=Linux
i915.preliminary_hw_support=1 idle=nomwait

Run the command: sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y
dist-upgrade
Remove all Nvidia installs(if any), run this command:
sudo apt-get purge nvidia*
 Run the command: sudo apt-get install nvidia-361-updates nvidia-prime
 Reboot: sudo reboot

I used boot-repair software https://sourceforge.net/p/boot-repair/home/Home/
It was indispensable
=========================================
Links, used in combination with each other. Nothing is like a silver bullet.
https://jeremymdyson.wordpress.com/2016/04/27/ubuntugnome-16-04-on-asus-rog-gl552v/
https://www.howtogeek.com/114884/how-to-repair-grub2-when-ubuntu-wont-boot/
https://askubuntu.com/questions/624966/cant-login-after-nvidia-driver-install-v-14-04
https://www.reddit.com/r/linux/comments/4etbsw/nouveau_error_during_installation_of_ubuntu_gnome/

fu2qi4 (fu2qi4) wrote :

to Sebastien (olorinseb) about " infinite login loop problem":
It's another bug, reported here:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891960

https://devtalk.nvidia.com/default/topic/1029683/suspend-swap-group-failed-resume-swap-group-failed-nvidia-390-25/

According to Debian staff, it's Nvidia bug, not Debian's.
This bug is supposed to be fixed in new Nvidia driver's update

Ross Campbell (ross-campbell) wrote :

Multiple Ubuntu 14.04 systems with newer compilers, and I don't really want to remove them...

Can we get updated compilers in the ubuntu-toolchain-r/test ppa
... or optional kernels without 'retpoline'?

Sebastien (olorinseb) wrote :

Tomorrow, I boot my PC on 3.13.0.142 kernel.

There was lots of updates for several compiler.

Right after I reboot my PC and go to the -143 kernel. dkms remove nvidia, dkms install nvidia.

then, modinfo nvidia| grep vermagic now shoes "retpoline".

At this point I know everything was fixed.

Reboot the PC, load -143 kernel.

It works. Finally.

Doest it mean the -143 kernel was compiled with different version of gcc that default 14.04 gcc?

Alex Muntada (alex.muntada) wrote :

Just a quick note to confirm that removing and installing previous failing DKMS modules for kernel 4.4.0-116 as explained in comment 30, worked wonders for our issue with vboxdrv issue and retpoline.

Thanks!

no longer affects: xorg (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

affects: gcc → gcc-4.8 (Ubuntu)
Changed in gcc-4.8 (Ubuntu):
status: New → Confirmed
no longer affects: nvidia-graphics-drivers-384 (Ubuntu)

Bill Miller, given your issue was root caused by you to having an outdated gcc-4.8 package (now since updated), and when receiving the update this problem is resolved, I'll mark this Fix Released.

Changed in gcc-4.8 (Ubuntu):
status: New → Fix Released
no longer affects: linux (Ubuntu)
summary: 4.4.0-116 Kernel update on 2/21 breaks Nvidia drivers (on 14.04 and
- 16.04) by an insufficient compiler!
+ 16.04) due to outdated gcc-4.8

just found this on the GCC thread:

Launchpad Janitor (janitor) wrote on 2018-03-19: #7
This bug was fixed in the package gcc-4.8 - 4.8.5-4ubuntu8

---------------
gcc-4.8 (4.8.5-4ubuntu8) bionic; urgency=medium

  [ Matthias Klose ]
  * Stop build gcj in current distro releases. Addresses: #892536.
  * Split off the documentation part from the retpoline patches.
  * Fix build dependency on realpath.

  [ Steve Beattie ]
  * Add retpoline support for x86 via adding -mindirect-branch=,
    -mindirect-branch-register, and -mfunction-return= support
    (LP: #1749261)
    - 0001-i386-Move-struct-ix86_frame-to-machine_function.diff,
      0002-i386-Use-reference-of-struct-ix86_frame-to-avoid-cop.diff,
      0003-i386-Use-const-reference-of-struct-ix86_frame-to-avo.diff,
      0004-x86-Add-mindirect-branch.diff,
      0005-x86-Add-mfunction-return.diff,
      0006-x86-Add-mindirect-branch-register.diff,
      0007-x86-Add-V-register-operand-modifier.diff,
      0008-x86-Disallow-mindirect-branch-mfunction-return-with-.diff,
      0009-Use-INVALID_REGNUM-in-indirect-thunk-processing.diff:
      implement -mindirect-branch=<keep|thunk|thunk-inline|thunk-extern>
      with attribute support, -mindirect-branch-register, and
      -mfunction-return=<thunk|thunk-inline|thunk-extern> with
      attribute support. Thanks to H.J. Lu.

 -- Matthias Klose <email address hidden> Mon, 19 Mar 2018 16:12:10 +0800

Changed in gcc-4.8 (Ubuntu):
status: Confirmed → Fix Released
Martin W. (webi123) wrote 2 hours ago: #8
I've seen that the fix is currently only released for bionic.
What is the release date for trusty (gcc-4.8.5 Ubuntu 4.8.5-2ubuntu1~14.04.1) ?

Bill Miller (wbmilleriii) wrote :

Christopher M. Penalver - sounds good. Thanks! Bill

alex-mobigo (alex-mobigo) wrote :

Followed #57 with success (had an exactly same problem), no need for this:
"Then after removing and reinstalling the 4.4.0-116 kernel"

So what's the official approach for 14.04 with 4.4.0-116 kernel?

user1033 (user1033) wrote :

14.04 here as well. I didn't try purging the PPA, so I don't know if that works, but what I did was the following:

1. disable the PPA (and sudo apt update)

2. Synaptic refused to apply the changes, so:

$ sudo apt install gcc-4.9-base=4.9.3-0ubuntu4 lib32gcc1=1:4.9.3-0ubuntu4 gcc-4.9-base:i386=4.9.3-0ubuntu4 libmpfr4=3.1.2-1 libcloog-isl4=0.18.2-1 gcc-4.8-base=4.8.4-2ubuntu1~14.04.4 gcc-4.8-base:i386=4.8.4-2ubuntu1~14.04.4 cpp-4.8=4.8.4-2ubuntu1~14.04.4 gcc-4.8=4.8.4-2ubuntu1~14.04.4 libatomic1=4.8.4-2ubuntu1~14.04.4 libasan0=4.8.4-2ubuntu1~14.04.4 libgcc-4.8-dev=4.8.4-2ubuntu1~14.04.4 libgomp1=4.8.4-2ubuntu1~14.04.4 libgfortran3=4.8.4-2ubuntu1~14.04.4 libquadmath0=4.8.4-2ubuntu1~14.04.4 libitm1=4.8.4-2ubuntu1~14.04.4 libtsan0=4.8.4-2ubuntu1~14.04.4 libstdc++6=4.8.4-2ubuntu1~14.04.4 libstdc++6:i386=4.8.4-2ubuntu1~14.04.4 libgcc1=1:4.9.3-0ubuntu4 libgcc1:i386=1:4.9.3-0ubuntu4

...should list 21 packages to install, none to remove.

3. for good measure, reinstall the kernel (4.4.0-119 came out in the meantime) and the nVidia driver

4. reboot

5. $ modinfo nvidia-xxx -k 4.4.0-11x-generic | grep vermagic
...should return retpoline at the end (just replace the x's with your specific versions).

Bryce Schober (bryce-schober) wrote :

Just a follow-up note for 14.04 users that still want a source for a newer gcc-5 (with c++14 support) or gcc-6/7/8:

I have purged the ppa:ubuntu-toolchain-r/test and switched to the ppa:jonathonf/gcc per comment #86 and not had any further nvidia driver problems.

Still having missing retpoline issues with vboxdrv 5.2.8 on 4.4.0-119 (14.04). Anyone worked this around?

Dmitry S (dstepanovs) wrote :

Running Ubuntu 16.04 (xenial) in VirtualBox (Windows host). After Ubuntu update got affected too - vboxdrv dkms won't install. dmesg would give:
[ 260.993242] vboxdrv: version magic '4.4.0-119-generic SMP mod_unload modversions ' should be '4.4.0-119-generic SMP mod_unload modversions retpoline '

$ uname -a
Linux ... 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

In my case I had two GCC versions installed side by side (4.9 and 5):
$ sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).

  Selection Path Priority Status
------------------------------------------------------------
  0 /usr/bin/gcc-5 20 auto mode
* 1 /usr/bin/gcc-4.9 10 manual mode
  2 /usr/bin/gcc-5 20 manual mode

update-alternatives was pointing to 4.9, so g++ --version was giving me 4.9:
$ gcc --version
gcc (Ubuntu 4.9.3-13ubuntu2) 4.9.3

Switched to version 5 and re-installed virtualbox-dkms:

$ sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).

  Selection Path Priority Status
------------------------------------------------------------
  0 /usr/bin/gcc-5 20 auto mode
  1 /usr/bin/gcc-4.9 10 manual mode
* 2 /usr/bin/gcc-5 20 manual mode

Press <enter> to keep the current choice[*], or type selection number: 2

$ sudo apt-get install virtualbox-dkms --reinstall

Simon Reed (xubuntu-o) wrote :

Running Xubuntu 14.04 and just upgraded to 3.13.0-144 from 3.13.0-143 and needed to re-apply the fix again.

The fix I have used is to edit /usr/src/linux-headers-3.13.0-144/include/linux/vermagic.h

$ cd /usr/src/linux-headers-3.13.0-144/include/linux
$ diff vermagic.h vermagic.h.WAS
27,28c27
< /* #ifdef RETPOLINE */
< #if 1
---
> #ifdef RETPOLINE

Displaying first 40 and last 40 comments. View all 101 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.