Jammy 22.04.3 gcc compiler no longer builds modules for 6.5.0 kernel series

Bug #2051457 reported by Mike Ferreira
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
dkms (Ubuntu)
Confirmed
Undecided
Unassigned
gcc-defaults (Ubuntu)
Won't Fix
Undecided
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since Jammy went to the 6.5.0 kernel series, that kernel series was compiled with gcc-12. The previous 6.2.0 series kernels were compiled with gcc-11.

The current version of gcc in jammy is 11.4.

Many NVidia drivers, wifi drivers, VirtualBox (from our Repo), etc fail to build the modules, because of a gcc compiler version mismatch between what the current running kernel was built by, and the module being built.

I recognized this while helping Users with NVidia driver compile errors on the forum, and came up with this work-around for them:
https://ubuntuforums.org/showthread.php?t=2494273&p=14175164#post14175164

Summary:
Install gcc-12 & g++-12 on 22.04.3 and use it as the compiler.

Since then, this work-around has helped to resolve problems with running the 6.5.0 series kernels and building modules for other applications and hardware drivers: VirtualBox, WiFi Drivers, etc. for use with the 6.5.0 series.

I have run this for over a month as my default with no ill affects. I have continued to recommend this work-around to many user to solve their problems...

*** I think it is time to look at pushing through gcc-12 as the default compiler for 22.04.3 through the normal updates channel.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: gcc 4:11.2.0-1ubuntu1
ProcVersionSignature: Ubuntu 6.5.0-14.14~22.04.1-generic 6.5.3
Uname: Linux 6.5.0-14-generic x86_64
NonfreeKernelModules: zfs nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
CloudArchitecture: x86_64
CloudID: none
CloudName: none
CloudPlatform: none
CloudSubPlatform: config
CurrentDesktop: GNOME
Date: Sun Jan 28 08:23:37 2024
InstallationDate: Installed on 2022-09-19 (496 days ago)
InstallationMedia: Ubuntu-Server 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809)
SourcePackage: gcc-defaults
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Mike Ferreira (mafoelffen) wrote :
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gcc-defaults (Ubuntu):
status: New → Confirmed
Revision history for this message
Mike Ferreira (mafoelffen) wrote :

Note: This will not help with the NVidia Legacy drivers and building their modules for 6.5.0 series kernels. The only things I have found so far for them it to not use 6.5.0 kernells yet. Which is in the second half of that work-around.

I see on Discourse, that Canonical says that are working on that:
RE: https://discourse.ubuntu.com/t/introducing-kernel-6-8-for-the-24-04-noble-numbat-release/41958
>>>
Open issues (WIP):

    Fix the build of all the supported nvidia drivers (dkms)
>>>

description: updated
Revision history for this message
Ubfan (ubfan1) wrote :

There may be a few issues involved here.
Hardware RTX3080, driver Nvidia 535.146.06? and 535.154.05
1) Phasing of the Nvidia drivers. When the hwe package update to the 6.5 kernel occurred, Nvidia package(s) were held back due to phasing. The Nvidia module could not be built, so black screen. Fall back to the older 6.2 kernel, of course still worked.
2) Source of the Nvidia modules and script/build files. With only the standard repos (no graphics-drivers ppa), forcing the held-back phased Nvidia packages worked, and the Nvidia module build successfully. Only after seeing this problem, looking around, I see gcc-12 is installed, but not manually by me. The apt list output shows an automatic installation:
$ apt list g{cc,++}-*{0,1,2,3,4,5,6,7,8,9} --installed
Listing... Done
g++-10/jammy-updates,jammy-security,now 10.5.0-1ubuntu1~22.04 amd64 [installed]
g++-11/jammy-updates,jammy-security,now 11.4.0-1ubuntu1~22.04 amd64 [installed,automatic]
gcc-10/jammy-updates,jammy-security,now 10.5.0-1ubuntu1~22.04 amd64 [installed,automatic]
gcc-11/jammy-updates,jammy-security,now 11.4.0-1ubuntu1~22.04 amd64 [installed,automatic]
gcc-12/jammy-updates,jammy-security,now 12.3.0-1ubuntu1~22.04 amd64 [installed,automatic]
gcc-9/jammy-updates,jammy-security,now 9.5.0-1ubuntu1~22.04 amd64 [installed,automatic]

I presume if the Nvidia modules were not phased, and installed with the 6.5 kernel, the gcc-12 would have been downloaded too, and everything would work as expected. The only manual action I took was to add the option to the upgrade: sudo apt -o APT::Get::Always-Include-Phased-Updates=true upgrade

I don't see any depends or reverse depends on the kernel or Nvidia packages which would explain how the gcc-12 showed up, so maybe in some install script.

Revision history for this message
Matthias Klose (doko) wrote :

we are not changing major software versions in an LTS release. There should be other ways to communicate which compiler to use.

Changed in gcc-defaults (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

I see this going forward, just getting further from this unless something changes to adapt. Usually an LTS stays within a narrow range of kernel series. This sandbox has changed. Cause and effect.

So if LTS no longer supports building working modules on the current running kernel for the hardware stack that is the default installed with Desktop Edition, and the hinted plan is to extend 22.04 to 12 years, like 24.04, then you will just let it fail and dead-end within that LTS period?

Is that the official stance now? Just asking.

Maybe the HWE Stack should include a framework convention to support this?

Or is the quote from Canonical (quoted above) on making nvidia dkms work, going to try to go about that via another path?

If you know of another way, please share that wisdom. It doesn't help anyone to keep a different/smarter way to solve that a secret. Does it?

Revision history for this message
Michael Rakijas (rocky714) wrote :

I have this problem as well. I've made multiple approaches trying to install the stock package for CUDA:

cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb

but had problems with the 530 Nvidia driver. Got suggestions to try to install the 535 Nvidia driver first so that the 530 driver install would be bypassed as deprecated. The 535 package would not install with the following packages being held back:

gjs libgjs0g

These attempts were both on a fresh install. I may try the suggestion in

https://ubuntuforums.org/showthread.php?t=2494273&page=2&p=14175164#post14175164

but this might be excessive for me and I've got work to do.

-Rocky714

Revision history for this message
Ubfan (ubfan1) wrote :

And yet to me, it looks like things will just work if gcc-12 is simply installed, not as the default.
Checking some file dates, I see my gcc-12 was brought (by something) in last April, and has been just sitting around. My /bin/gcc and /bin/cc links are years old (and link to gcc-11). The /lib/modules/6.5.0-14-generic/kernel/nvidia-535/bits/nvidia.o was compiled (last week) with gcc-12.3 (correct for my kernel 6.5). It looks like if gcc-12 is installed, it will be used for building an nvidia module, without changing the system /bin/gcc default of gcc-11. When you (MAFoElfen) were trying your workaround by changing links (/bin/gcc to gcc-12), did you ever try the nvidia driver install before changing links with gcc-12 present? If the right compiler gets used if present, that suggests a simple fix of adding a gcc-12 dependency on the hwe kernel package (to allow any modules to be build if they do what nvidia apparently does). I'll try out some jammy iso boots if I cannot find a spare partition to install to and see what happens (I've never bothered to figure out how to make my VMs use Nvidia). I can understand not wanting to change the default gcc, but simply adding one should not cause problems.

Revision history for this message
Ubfan (ubfan1) wrote :

Testing was done with the Ubuntu 22.04.1 ISO, booted off disk, no persistence.
The Nvidia GPU was an RTX 3080 (mobile).
The gcc-12 and the generic hwe packages were installed, then the Nvidia
driver 535 selected in the Software & Updates/Additional Drivers tab. Clicked
the apply button, and the nvidia.ko module(s) build successfully using gcc-12.
The build was done while running the default 5.15 kernel, and succeeded whether or
not gcc-11 was even present.

Steps Taken
1)Boot the Ubuntu 22.04.1 Desktop ISO. The default kernel used is 5.15.
2)Set up wireiess.
3)In Software & Updates, add the universe and multiverse repositories. Click on
"update" button when presented.
4)sudo apt update (Just in case)
5)sudo apt install gcc-12 build-essential
(Note, the Nvidia module will still build without the build-essential package, and its /bin/gcc link. )
6)Install the generic hwe packages. Note, using the hwe packages without "generic" in their
names immediately failed on the Nvidia module creation step. This installed the 6.5.0-15 kernel.
sudo apt install linux-generic-hwe-22.04 linux-headers-generic-hwe-22.04 linux-image-generic-hwe-22.04
7)In Software & Updates, under the Additional Drivers tab, select Nvidia 535. Click on the apply button, and watch the progress bar. No errors should occur.
8)Examine the nvidia.ko module created.
strings /lib/modules/6.5.0-15-generic/kernel/nvidia-535/nvidia.ko |grep gcc
The output should include that gcc 12.3 was used. (Whether or not /bin/gcc ->gcc-11 is even present.)

Did not try a reboot, since persistence was not even used. The test was quick since everything is in memory.

It looks like just the presence of gcc-12 will solve this problem, and that may be addressed by adding a dependency on it to some (hwe, kernel ?) package.

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

Ubfan1 and I have talking about this out on the Forum: Were we have to help other Users with this.
RE: https://ubuntuforums.org/showthread.php?t=2494834

I think his idea on something is genius: That if the compiler version is brought in by the kernel on kernel updates as a dependency, so that the compiler is of the same version as the Kernel was compiled with... Noting that it doesn't not have to replace the old compiler, can be additional, along side it, then this solves this problem.

He just tested this on some fresh 22.04.3 installs and it works.

Sorry, It's hard to grasp that the answer is
"Not going to fix",
...when there is a drastic problem that is being caused by the current update path, that is causing Users to have non-working installations.

We are just here to report it and to help in any way to resolve it.

If this is not the accepted solution, then lets work together to find one that is.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dkms (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Ubfan (ubfan1) wrote :

A just finished test install of the (same) Ubuntu 22.04.1 Desktop ISO now resulted in the gcc-12 being automatically installed, and the Nvidia 545 driver modules successfully compiling. After a fresh install, set up wireless, installed build-essential, and then used software & Updates/Additional drivers to select the Nvidia 545 driver, which successfully compiled (gcc 11.4) under the 5.15 kernel. Ran apt update, and the new 6.5 kernel and gcc-12 were downloaded, and the nvidia 545 modules successfully compiled with the gcc-12. Looks a fix has been applied to get the gcc-12.

Revision history for this message
Scott Moore (scottbomb) wrote :

Been using 22.04 with Nvidia 390 for a long time now with no issues. Suddenly, I ran into the dkms problem with a kernel upgrade, Thankfully, I found the solution is to lock my kernel at 6.12. Hopefully I don't have to keep it that way for long!

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

Link to this bug, just for package 'nvidia-driver-390': https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-390/+bug/2051436, which is a duplicate of : https://bugs.launchpad.net/ubuntu/jammy/+source/nvidia-graphics-drivers-390/+bug/2028165

For nvidia-driver-390, there is another work-around now, credit to Daniel Letzeisen (dtl131) with builds for mantic & Jammy, that works with the 6.5.0 series kernels until the patch is released:
>>>
sudo add-apt-repository ppa:dtl131/nvidiaexp
sudo apt update
sudo apt install nvidia-drivers-390
>>>
I tested it and confirm it installs builds, and work fine for Jammy with 6.5.0-17.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.