compilation errors due to "peermem" module

Bug #1960256 reported by D M
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-470 (Ubuntu)
New
Undecided
Unassigned

Bug Description

Installed with the following:

apt-get install --no-install-recommends nvidia-driver-470 nvidia-modprobe libnvidia-cfg1-470 libnvidia-common-470 libnvidia-compute-470 libnvidia-decode-470 libnvidia-encode-470 libnvidia-extra-470 libnvidia-fbc1-470 libnvidia-gl-470 libnvidia-ifr1-470 nvidia-compute-utils-470 nvidia-dkms-470 nvidia-driver-470 nvidia-kernel-common-470 nvidia-kernel-source-470 nvidia-utils-470 xserver-xorg-video-nvidia-470

Seems to generate /var/crash/nvidia-dkms-470.0.crash (tail -20):

   /usr/bin/ld.bfd -m elf_x86_64 -z max-page-size=0x200000 -r -o /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.o /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-drv.o […] { echo /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o […] /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-format.o; echo; } > /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.mod
 make -f ./scripts/Makefile.modpost
   sed 's/ko$/o/' /var/lib/dkms/nvidia/470.103.01/build/modules.order | scripts/mod/modpost -m -a -i ./Module.symvers -I /var/lib/dkms/nvidia/470.103.01/build/Module.symvers -e /usr/src/ofa_kernel/default/Module.symvers -o /var/lib/dkms/nvidia/470.103.01/build/Module.symvers -s -T -
 FATAL: parse error in symbol dump file
 scripts/Makefile.modpost:93: recipe for target '__modpost' failed
 make[2]: *** [__modpost] Error 1
 Makefile:1675: recipe for target 'modules' failed
 make[1]: *** [modules] Error 2
 make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
 Makefile:80: recipe for target 'modules' failed
 make: *** [modules] Error 2
DKMSKernelVersion: 5.4.0-97-generic
Date: Mon Feb 7 11:32:36 2022
Package: nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1
PackageVersion: 470.103.01-0ubuntu0.18.04.1
SourcePackage: nvidia-graphics-drivers-470
Title: nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1: nvidia kernel module failed to build

It is added:

# dkms status -k `uname -r`
iser, 4.7: added
kernel-mft-dkms, 4.13.0, 5.4.0-97-generic, x86_64: installed
knem, 1.1.3.90mlnx1: added
mlnx-ofed-kernel, 4.7: added
nvidia, 470.103.01: added
rshim, 1.8, 5.4.0-97-generic, x86_64: installed
srp, 4.7: added

However, doing a build gets:

 dkms build nvidia/470.103.01

Kernel preparation unnecessary for this kernel. Skipping...
applying patch disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
Hunk #1 succeeded at 82 (offset 11 lines).

Building module:
cleaning build area...
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-97-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.4.0-97-generic/build LD=/usr/bin/ld.bfd modules.........(bad exit status: 2)
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-dkms-470.0.crash'
Error! Bad return status for module build on kernel: 5.4.0-97-generic (x86_64)
Consult /var/lib/dkms/nvidia/470.103.01/build/make.log for more information.

Which I suspect is related to the "peermem" module:

  { echo /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-drv.o [...] /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-format.o; echo; } > /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.mod
  /usr/bin/ld.bfd -m elf_x86_64 -z max-page-size=0x200000 -r -o /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.o /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o
  { echo /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o; echo; } > /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.mod
make -f ./scripts/Makefile.modpost
  sed 's/ko$/o/' /var/lib/dkms/nvidia/470.103.01/build/modules.order | scripts/mod/modpost -m -a -i ./Module.symvers -I /var/lib/dkms/nvidia/470.103.01/build/Module.symvers -e /usr/src/ofa_kernel/default/Module.symvers -o /var/lib/dkms/nvidia/470.103.01/build/Module.symvers -s -T -
FATAL: parse error in symbol dump file
scripts/Makefile.modpost:93: recipe for target '__modpost' failed
make[2]: *** [__modpost] Error 1
Makefile:1675: recipe for target 'modules' failed
make[1]: *** [modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
Makefile:80: recipe for target 'modules' failed
make: *** [modules] Error 2
(END)

This does not seem to be supported on (unpatched?) 5.4 kernels per this thread I found:

https://forums.linuxmint.com/viewtopic.php?p=2106512#p2106512

Downloading and running "NVIDIA-Linux-x86_64-470.103.01.run" directly also fails, UNLESS the following options is used:

  --no-peermem
      Do not install the nvidia-peermem kernel module. This kernel module provides support for peer-to-peer memory sharing with Mellanox HCAs (Host Channel Adapters) via GPUDirect RDMA (Remote Direct Memory Access).

BUT only when the Nvidia installer did NOT dry to register into DMKS, because otherwise it got:

  CC [M] /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o
  LD [M] /var/lib/dkms/nvidia/470.103.01/build/nvidia.o
ld -r -o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-interface.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-pci.o /var/lib/dkms/nvidia/470.103.01
/build/nvidia/nv-acpi.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-cray.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-dma.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-i2c.o /var/lib/dkms/nvi
dia/470.103.01/build/nvidia/nv-mmap.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-p2p.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-pat.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-procfs.o /
var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-procfs-utils.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-usermap.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-vm.o /var/lib/dkms/nvidia/470.103.01
/build/nvidia/nv-vtophys.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/os-interface.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/os-mlock.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/os-pci.o /var/li
b/dkms/nvidia/470.103.01/build/nvidia/os-registry.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/os-usermap.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-modeset-interface.o /var/lib/dkms/nvidia/470.1
03.01/build/nvidia/nv-pci-table.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-kthread-q.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-memdbg.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-ibmnp
u.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-report-err.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-rsync.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-msi.o /var/lib/dkms/nvidia/470.103.
01/build/nvidia/nv-caps.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv-frontend.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nv_uvm_interface.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nvlink_lin
ux.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/nvlink_caps.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/linux_nvswitch.o /var/lib/dkms/nvidia/470.103.01/build/nvidia/procfs_nvswitch.o /var/lib/dkms/n
vidia/470.103.01/build/nvidia/i2c_nvswitch.o
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-crtc.c: In function 'plane_req_config_update':
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-crtc.c:89:9: warning: unused variable 'ret' [-Wunused-variable]
     int ret = 0;
         ^~~
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-crtc.c: In function
'nv_drm_plane_atomic_set_property':
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-crtc.c:371:32: warning: unused variable 'nv_drm_plane_state' [-Wunused-variable]
     struct nv_drm_plane_state *nv_drm_plane_state =
                                ^~~~~~~~~~~~~~~~~~
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-modeset.c: In function '__will_generate_flip_event':
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-modeset.c:96:23: warning: unused variable 'primary_plane' [-Wunused-variable]
     struct drm_plane *primary_plane = crtc->primary;
                       ^~~~~~~~~~~~~
  LD [M] /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.o
ld -r -o /var/lib/dkms/nvidia/470.103.01/build/nvidia-modeset/nv-modeset-interface.o /var/lib/dkms/nvidia/470.103.01/build/nvidia-modeset/nvidia-modeset-linux.o /var/lib/dkms/nvidia/470.103.01/build/nvidia-modeset/nv-kthread-q.o
  LD [M] /var/lib/dkms/nvidia/470.103.01/build/nvidia-modeset.o
  LD [M] /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.o
  LD [M] /var/lib/dkms/nvidia/470.103.01/build/nvidia-uvm.o
  Building modules, stage 2.
  MODPOST 5 modules
FATAL: parse error in symbol dump file
scripts/Makefile.modpost:93: recipe for target '__modpost' failed
make[2]: *** [__modpost] Error 1
Makefile:1675: recipe for target 'modules' failed
make[1]: *** [modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
Makefile:80: recipe for target 'modules' failed
make: *** [modules] Error 2

So it seems that attempts to install the driver:

* either via the Ubuntu package, or
* via the Nvidia shell installer

fail when it tries to install via DKMS, because DKMS does not (has not way?) to tell the build infrastructure to not build "peermem" sub-module.

Doing a "bash NVIDIA-Linux-x86_64-470.103.01.run --no-peermem" and NOT hooking it into DKMS allows for the driver to be built.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nvidia-driver-470 470.103.01-0ubuntu0.18.04.1
ProcVersionSignature: Ubuntu 5.4.0-97.110~18.04.1-generic 5.4.162
Uname: Linux 5.4.0-97-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.9-0ubuntu7.27
Architecture: amd64
Date: Mon Feb 7 11:57:39 2022
SourcePackage: nvidia-graphics-drivers-470
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
D M (dmone2022) wrote :
D M (dmone2022)
summary: - compilation due to "peermem" module
+ compilation errors due to "peermem" module
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.