mumax3 test suite fails against glibc 2.38

Bug #2032624 reported by Simon Chopin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GLibC
New
Medium
Ubuntu
Fix Released
Undecided
Unassigned
aspectc++ (Debian)
New
Unknown
aspectc++ (Ubuntu)
New
Undecided
Unassigned
cbmc (Debian)
Fix Released
Unknown
cbmc (Ubuntu)
Fix Released
Undecided
Unassigned
cxref (Debian)
Fix Released
Unknown
cxref (Ubuntu)
Fix Released
Undecided
Unassigned
gauche-c-wrapper (Ubuntu)
New
Undecided
Unassigned
glibc (Ubuntu)
Won't Fix
Medium
Unassigned
mumax3 (Ubuntu)
Fix Released
Critical
Unassigned
nvidia-nccl (Ubuntu)
Fix Released
Undecided
Unassigned
pyvkfft (Ubuntu)
Fix Released
Undecided
Unassigned
rocm-hipamd (Debian)
Fix Released
Unknown
rocm-hipamd (Ubuntu)
Fix Released
Undecided
Unassigned
stdgpu-contrib (Ubuntu)
New
Undecided
Unassigned

Bug Description

The autopkgtests fail with the following error:

921s nvcc -std=c++03 -ccbin=/usr/bin/cuda-gcc --compiler-options -Werror --compiler-options -Wall -Xptxas -O3 -ptx -arch=compute_50 -code=sm_50 copypadmul2.cu -o copypadmul2_50.ptx
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(30): error: identifier "__Float32x4_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(31): error: identifier "__Float64x2_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(40): error: identifier "__SVFloat32_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(41): error: identifier "__SVFloat64_t" is undefined
922s
922s /usr/include/aarch64-linux-gnu/bits/math-vector.h(42): error: identifier "__SVBool_t" is undefined

Marking as critical as this blocks the glibc transition.

Related branches

Simon Chopin (schopin)
Changed in glibc (Ubuntu):
importance: Undecided → Critical
tags: added: update-excuse
tags: added: foundations-todo
Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

This commit https://sourceware.org/git/?p=glibc.git;a=commit;h=cd94326a1326c4e3f1ee7a8d0a161cc0bdcaf07e added the file `sysdeps/aarch64/fpu/bits/math-vector.h.

On a mantic system, the header file gets placed at /usr/include/aarch64-linux-gnu/bits/math-vector.h, which used to do only a single thing for aarch64, which was:
#include <bits/libm-simd-decl-stubs.h>

And after the commit, a few types are added such as

#if __GNUC_PREREQ(9, 0)
# define __ADVSIMD_VEC_MATH_SUPPORTED
typedef __Float32x4_t __f32x4_t;
typedef __Float64x2_t __f64x2_t;
...

Simply commenting out the new types is enough to fix this issue, but completely removing the newly added support for libmvec is not a great idea.

Perhaps nvidia-cuda-toolkit-gcc needs to be rebuilt with support for these types?

Revision history for this message
Graham Inggs (ginggs) wrote :

The nvidia-cuda-toolkit-gcc package only contains the /usr/bin/cuda-g++ and /usr/bin/cuda-gcc wrappers and has a dependency on the highest supported g++, currently g++-12.

See: https://packages.ubuntu.com/mantic/devel/nvidia-cuda-toolkit-gcc

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Tried a no-change rebuild of nvidia-cuda-toolkit (https://launchpad.net/~mitchdz/+archive/ubuntu/nvidia-cuda-toolkit-mantic-merge) using the proposed archive and that did not solve the problem.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Ah I posted my comment right after your ginggs. Thanks for the pointer! You're right, on my system cuda-gcc just points to gcc-12

$ ll $(which /usr/bin/cuda-gcc)
lrwxrwxrwx 1 root root 6 Aug 23 14:17 /usr/bin/cuda-gcc -> gcc-12*

I tried using gcc-13 instead as I would hope that version would see these new types, but I'm still seeing __Float32x4_t undefined, in addition to some new types being undefined

nvcc -std=c++11 -ccbin=/usr/bin/g++-13 --allow-unsupported-compiler --compiler-options -Werror --compiler-options -Wall -Xptxas -O3 -ptx -arch=compute_50 -code=sm_50 copypadmul2.cu -o copypadmul2_50.ptx
...
/usr/include/stdlib.h(147): error: identifier "_Float64" is undefined
/usr/include/stdlib.h(153): error: identifier "_Float128" is undefined
/usr/include/stdlib.h(159): error: identifier "_Float32x" is undefined
/usr/include/stdlib.h(165): error: identifier "_Float64x" is undefined
...

Also another note, these particular CUDA code snippets don't really need these types, so finding a way to not include them will work (maybe patching libc6-dev to include another preprocessor directive) but I think ultimately that's a bad idea because someone could want a .cu file that uses both arm SIMD extensions in addition to the CUDA code.

Revision history for this message
Graham Inggs (ginggs) wrote :

Some similar reports I found (although from some years ago):

https://forums.developer.nvidia.com/t/nvcc-compilation-errors-on-24-2-l4t-platform-tx1/45937

https://github.com/InsightSoftwareConsortium/ITK/issues/1959

"The user space in R23.x is 32-bit. NEON is also from the 32-bit compatibility mode that makes ARMv8 able to execute armhf. The errors tend to imply that some 32-bit compatibility mode library for NEON is missing."

Seems to imply some mismatch between NEON (32-bit) and arm64?

Revision history for this message
Graham Inggs (ginggs) wrote :

We'll ignore this failure and allow glibc to migrate, and that does not preclude further investigation.

Note that mumax3/arm64 is not built in Debian, and did not built in jammy, so we may end up removing the arm64 binary.

Revision history for this message
Matthias Klose (doko) wrote :

Removing packages from mantic:
 mumax3 3.10-8 in mantic arm64
Comment: LP: #2032624, remove mumax3 binary on arm64
1 package successfully removed.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This seems to be causing bug 2033747 too.

Revision history for this message
Graham Inggs (ginggs) wrote :

This also seems to cause nvidia-nccl to FTBFS on arm64 in the test rebuild

https://people.canonical.com/~ginggs/ftbfs-report/test-rebuild-20230830-mantic-mantic.html

tags: added: ftbfs
Revision history for this message
Graham Inggs (ginggs) wrote :

cxref also FTBFS on arm64 in the test rebuild

Revision history for this message
Graham Inggs (ginggs) wrote :

Also gauche-c-wrapper, rocm-hipamd and stdgpu-contrib

Revision history for this message
Heinrich Schuchardt (xypron) wrote :

cbmc fails to build from source on arm64 with LTO disabled as reported in LP 2036745:

Failed test: fmod1
CBMC version 5.89.0 (cbmc-5.89.0) 64-bit arm64 linux
Parsing main.c
file /usr/include/aarch64-linux-gnu/bits/math-vector.h line 30: syntax error before '__f32x4_t'
PARSING ERROR

https://launchpadlibrarian.net/688275364/buildlog_ubuntu-mantic-arm64.cbmc_5.89.0-2ubuntu1~ppa1_BUILDING.txt.gz

Revision history for this message
In , Simon Chopin (schopin) wrote :

The use of vector types such as __Float32x4_t in the aarch64 math-vector.h header breaks quite a few programs that are essentially parsing C code but using GCC as their preprocessor. GCC expands to the paths using its own intrinsic types, which aren't implemented by the consuming programs.

I'm not sure if this qualifies as a bug in glibc, as it seems reasonable to rely on those types, but we've seen this happen in quite a few instances in Ubuntu:

https://bugs.launchpad.net/ubuntu/+source/mumax3/+bug/2032624

Revision history for this message
Simon Chopin (schopin) wrote (last edit ):
Changed in glibc:
importance: Unknown → Medium
status: Unknown → New
Revision history for this message
In , Simon Chopin (schopin) wrote :

I posted a tentative patch adding a way to work around those types at https://sourceware.org/pipermail/libc-alpha/2023-September/151770.html

I'll ship it in my next Ubuntu upload for Mantic as a way to unblock us due to our fairly tight schedule, but I'm hoping we can come up with a better long-term solution.

Revision history for this message
Simon Chopin (schopin) wrote :

I'll be shipping a temporary workaround patch that disables the vec types if __ARM_VEC_MATH_DISABLED is defined. We still need to patch each failure individually to add that flag to the preprocessor step (not at build time but at runtime!), but at least the patching should be easier and quicker than providing proper support for the various vector types.

We shouldn't bother upstreaming those fixes to Debian, as I'm pretty sure the final glibc part of the solution will look fairly different than my current patch, but at least we can get those packages working in the mean time.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.38-1ubuntu5

---------------
glibc (2.38-1ubuntu5) mantic; urgency=medium

  * Update from upstream release branche:
    - CVE-2023-4527: Stack read overflow with large TCP responses in
      no-aaaa mode
    - CVE-2023-4806: use after free in getcanonname
    - LP: #2031909: Fix oversized __io_vtables
  * d/p/u/0001-Fix-leak-in-getaddrinfo-introduced-by-the-fix-for-CV:
    Cherry-picked to fix a regression in one of the previous CVE fixes
    (LP: #2037516, CVE-2023-5156)
  * d/p/lp2032624.patch: add an escape hatch in arm64 math-vector.h.
    This should help fixing multiple FTBFS (LP: #2032624)

 -- Simon Chopin <email address hidden> Wed, 27 Sep 2023 16:38:18 +0200

Changed in glibc (Ubuntu):
status: New → Fix Released
Revision history for this message
Simon Chopin (schopin) wrote :

Reopening in glibc as I had some upstream feedback that basically mean my workaround is not a good idea. I agree with them, and thus we should drop it, both in upcoming releases but also in the upcoming Mantic SRU to avoid users starting to depend on it, however unlikely that would be.

Changed in glibc (Ubuntu):
importance: Critical → Medium
status: Fix Released → In Progress
Revision history for this message
In , Connor-baker (connor-baker) wrote :

Adding some additional context:

We're running into this issue in Nixpkgs: https://github.com/NixOS/nixpkgs/pull/264599#pullrequestreview-1707381631.

The GLIBC 2.38 update introduces intrinsics for `aarch64-linux` in `math.h`.

NVCC (NVIDIA's CUDA Compiler) declares itself to be the same compiler as its host compiler. This causes inclusion of unsupported `aarch64-linux` intrinsics. NVCC is now unable to compile any CUDA file for `aarch64-linux` because it does not support these intrinsics: https://forums.developer.nvidia.com/t/nvcc-fails-to-build-with-arm-neon-instructions-cpp-vs-cu/248355/2.

I'll be submitting the same patch I've made for Nixpkgs.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.38-3ubuntu1

---------------
glibc (2.38-3ubuntu1) noble; urgency=medium

  * debian/patches/git-updates.diff: update from upstream stable branch
    Dropped changes, superseded by the upstream git updates:
    - debian/patches/CVE-2023-4911.patch: terminate immediately if end of
      input is reached in elf/dl-tunables.c.
    - d/p/u/0001-Fix-leak-in-getaddrinfo-introduced-by-the-fix-for-CV:
      Cherry-picked to fix a regression in one of the previous CVE fixes
  * Merge 2.38-3 from Debian experimental
    Dropped changes, included in Debian:
    - debian/patches/hurd-i386/git-powerpc-longjmp.diff: Fix build after chk
      hidden builtin fix.
  * Drop d/p/lp2032624.patch as advised by upstream.
    Downstream users will have to actually implement those types or stop
    pretending they're GCC. (LP: #2032624)
  * d/p/lp2031495.patch: fix test suite on armhf for -prof variant
    (LP: #2031495)
  * d/control.in/i386: fix math-vector-fortran.h file move (LP: #2039234)

 -- Simon Chopin <email address hidden> Mon, 23 Oct 2023 18:54:07 +0200

Changed in glibc (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Cory Bloor (slavik81) wrote :

All HIP language libraries have been FTBFS on arm64 since the vector types were added, so this issue has been blocking the libraries from syncing for several months. The only ones that have been able to update have been the ones that had always been broken on arm64 for other reasons.

I've opened a merge request for the glibc package that fixes the issue for rocm-hipamd, using the following patch:

```
--- glibc.orig/sysdeps/aarch64/fpu/bits/math-vector.h
+++ glibc/sysdeps/aarch64/fpu/bits/math-vector.h
@@ -101,7 +101,8 @@ typedef __attribute__ ((__neon_vector_ty
 typedef __attribute__ ((__neon_vector_type__ (2))) double __f64x2_t;
 #endif

-#if __GNUC_PREREQ(10, 0) || __glibc_clang_prereq(11, 0)
+#if (__GNUC_PREREQ(10, 0) || __glibc_clang_prereq(11, 0)) \
+ && !defined(__HIP_DEVICE_COMPILE__)
 # define __SVE_VEC_MATH_SUPPORTED
 typedef __SVFloat32_t __sv_f32_t;
 typedef __SVFloat64_t __sv_f64_t;
```

I think the only real alternative would be to remove the arm64 build of rocm-hipamd from the archive. The existing ROCm libraries in noble all FTBFS on arm64, but the versions that successfully built on arm64 with older copies of glibc are blocking transitions from proposed to release.

Revision history for this message
Simon Chopin (schopin) wrote :

I'm much more OK with removing the binary than uploading that patch to glibc.

System headers are just that: headers that reflect the system they're installed in. The fact that you can sometimes get away with using system headers when cross-compiling to a different environment is just an accident, not a feature. Your compiling environment should be providing its own *complete* set of headers that matches the target environment.

Revision history for this message
Simon Chopin (schopin) wrote :

(and of course not include the system headers in its include paths)

Revision history for this message
Graham Inggs (ginggs) wrote :

rocm-hipamd's arm64 binaries were removed in LP: #2061048

Changed in rocm-hipamd (Ubuntu):
status: New → Fix Released
Changed in glibc (Ubuntu):
status: Fix Released → Won't Fix
Graham Inggs (ginggs)
Changed in mumax3 (Ubuntu):
status: New → Fix Released
Revision history for this message
Graham Inggs (ginggs) wrote :

aspectc++, cbmc and nvidia-nccl have new versions in noble-proposed which are unable to migrate due to missing builds on arm64.

Please removed the following binaries from noble:

 aspectc++ | 1:2.3+git20230726-1 | noble/universe | arm64
 libpuma-dev | 1:2.3+git20230726-1 | noble/universe | arm64
 cbmc | 5.12-5 | noble/universe | arm64
 libnccl-dev | 2.18.3-1-1 | noble/multiverse | arm64
 libnccl2 | 2.18.3-1-1 | noble/multiverse | arm64

The only reverse-dependency is gloo-cuda, which has never built on arm64.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

$ remove-package -m "Remove the arm64 binaries to unblock new proposed packages (LP: #2032624)" -s noble -a arm64 -b aspectc++ libpuma-dev cbmc libnccl-dev libnccl2
Removing packages from noble:
 aspectc++ 1:2.3+git20230726-1 in noble arm64
 libpuma-dev 1:2.3+git20230726-1 in noble arm64
 cbmc 5.12-5 in noble arm64
 libnccl-dev 2.18.3-1-1 in noble arm64
 libnccl2 2.18.3-1-1 in noble arm64
Comment: Remove the arm64 binaries to unblock new proposed packages (LP: #2032624)
Remove [y|N]? y
5 packages successfully removed.

Graham Inggs (ginggs)
Changed in cbmc (Ubuntu):
status: New → Fix Released
affects: aspectc++ (Ubuntu) → ubuntu
Changed in ubuntu:
status: New → Fix Released
Changed in nvidia-nccl (Ubuntu):
status: New → Fix Released
Changed in cbmc (Debian):
status: Unknown → New
Changed in aspectc++ (Debian):
status: Unknown → New
Changed in cxref (Debian):
status: Unknown → New
Changed in rocm-hipamd (Debian):
status: Unknown → New
Revision history for this message
Graham Inggs (ginggs) wrote :

pyvkfft seems fixed in 2024.1.2+ds1-1

Changed in pyvkfft (Ubuntu):
status: New → Fix Released
Changed in cbmc (Debian):
status: New → Confirmed
Changed in cxref (Debian):
status: New → Fix Released
Revision history for this message
Graham Inggs (ginggs) wrote :

cxref was fixed in version 1.6e-9

Changed in cxref (Ubuntu):
status: New → Fix Released
Changed in cbmc (Debian):
status: Confirmed → Fix Released
Changed in rocm-hipamd (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.