2023-02-21 15:47:30 |
Cory Bloor |
bug |
|
|
added bug |
2023-02-21 15:47:30 |
Cory Bloor |
attachment added |
|
debian patch that fixes this bug in ubuntu 23.04 https://bugs.launchpad.net/bugs/2007993/+attachment/5648969/+files/0003-fix-static-initialization-order.patch |
|
2023-02-21 16:22:04 |
Ubuntu Foundations Team Bug Bot |
tags |
|
patch |
|
2023-02-21 16:22:12 |
Ubuntu Foundations Team Bug Bot |
bug |
|
|
added subscriber Ubuntu Review Team |
2023-02-28 19:15:21 |
Stefano Rivera |
nominated for series |
|
Ubuntu Jammy |
|
2023-02-28 19:15:21 |
Stefano Rivera |
bug task added |
|
rocr-runtime (Ubuntu Jammy) |
|
2023-02-28 19:15:46 |
Stefano Rivera |
rocr-runtime (Ubuntu): status |
New |
Fix Released |
|
2023-02-28 19:16:04 |
Stefano Rivera |
bug watch added |
|
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1031089 |
|
2023-02-28 19:16:04 |
Stefano Rivera |
bug task added |
|
rocr-runtime (Debian) |
|
2023-02-28 19:41:07 |
Stefano Rivera |
attachment added |
|
rocr-runtime_5.0.0-1ubuntu0.1.debdiff https://bugs.launchpad.net/debian/+source/rocr-runtime/+bug/2007993/+attachment/5650532/+files/rocr-runtime_5.0.0-1ubuntu0.1.debdiff |
|
2023-02-28 19:41:25 |
Stefano Rivera |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2023-02-28 19:41:34 |
Stefano Rivera |
bug |
|
|
added subscriber Stefano Rivera |
2023-02-28 20:59:33 |
Cory Bloor |
description |
# System Information:
Description: Ubuntu 22.04.2 LTS
Release: 22.04
# Package Version:
libhsa-runtime64-1:
Installed: 5.0.0-1
Source: rocr-runtime
# What was done:
# on Ubuntu 22.04 or 22.10 with an AMD GPU installed
apt install rocminfo kmod
rocminfo
# What was seen:
ROCk module is loaded
Segmentation fault (core dumped)
Note that the rocminfo utility will not try to initialize libhsa-runtime64 unless you have an AMD GPU installed, which is required to reproduce this problem.
After some debugging, I came to the conclusion that this is a null pointer dereference in libhsa-runtime64. The order of static initialization is different when building the rocr-runtime package on Ubuntu as compared to on Debian, and this results in the package working on Debian but crashing when it's rebuilt for Ubuntu. A couple of static variables are being copied before they are initialized, leading to a null pointer dereference later on in the program.
# What was expected:
rocminfo should not crash
# Debian Bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1031089
# Debian Patch:
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.2.3-3/debian/patches/0003-fix-static-initialization-order.patch
The patch applied to the Debian package has fixed this bug in Ubuntu 23.04. It would be great if the fix could also be applied to Ubuntu 22.04 LTS. There's not a lot of ROCm functionality in Jammy, but fixing this bug would at least get the basics like rocminfo working. |
[ Impact ]
The rocr-runtime provides the basic interface between compute code written to run on AMD GPUs and the AMDGPU/AMDKFD driver within the kernel. On Ubuntu 22.04, the library crashes with a segfault during initialization. This bug makes the library unusable.
On Ubuntu 22.04, the main use for this library is in rocminfo, which provides AMD GPU users with a description of the compute capabilities of their hardware. For example, rocminfo provides the name of the ISA for the hardware, which is useful for choosing compiler flags when building GPU libraries from source. Invoking rocminfo is also an easy way for novice users to find information about their hardware (e.g., for inclusion in bug reports filed against GPU libraries). It would therefore be useful if this fix could be backported to Ubuntu 22.04.
The fix changes the order of initialization of a pair of static variables in the rocr-runtime by moving them into the same translation unit, thereby ensuring the order is both deterministic and correct.
[ Test Plan ]
To reproduce this bug, you will need an AMD GPU installed on the machine. Then the following terminal commands should be sufficient to cause a segfault originating in the rocr-runtime:
apt install rocminfo kmod
rocminfo
Once the bug is fixed, you should see detailed information about your installed GPU hardware printed to standard output. This bug is deterministic at runtime, so it is relatively easy to verify if you have the necessary hardware.
On Ubuntu 22.04, the rocminfo utility is the only package that depends on rocr-runtime, so this simple test is fairly comprehensive.
[ Where problems could occur ]
The rocr-runtime package is already badly broken, so the risk associated with backporting a fix is low. If a mistake were made in fixing this bug, the most likely outcome would be that the package remains broken.
[ Other info ]
The same fix is in use on Debian Unstable, Ubuntu 23.04 and upstream, so it is already being used in other environments (albeit with different versions of rocr-runtime).
[ Original bug report ]
# System Information:
Description: Ubuntu 22.04.2 LTS
Release: 22.04
# Package Version:
libhsa-runtime64-1:
Installed: 5.0.0-1
Source: rocr-runtime
# What was done:
# on Ubuntu 22.04 or 22.10 with an AMD GPU installed
apt install rocminfo kmod
rocminfo
# What was seen:
ROCk module is loaded
Segmentation fault (core dumped)
Note that the rocminfo utility will not try to initialize libhsa-runtime64 unless you have an AMD GPU installed, which is required to reproduce this problem.
After some debugging, I came to the conclusion that this is a null pointer dereference in libhsa-runtime64. The order of static initialization is different when building the rocr-runtime package on Ubuntu as compared to on Debian, and this results in the package working on Debian but crashing when it's rebuilt for Ubuntu. A couple of static variables are being copied before they are initialized, leading to a null pointer dereference later on in the program.
# What was expected:
rocminfo should not crash
# Debian Bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1031089
# Debian Patch:
https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.2.3-3/debian/patches/0003-fix-static-initialization-order.patch
The patch applied to the Debian package has fixed this bug in Ubuntu 23.04. It would be great if the fix could also be applied to Ubuntu 22.04 LTS. There's not a lot of ROCm functionality in Jammy, but fixing this bug would at least get the basics like rocminfo working. |
|
2023-02-28 23:06:39 |
Cory Bloor |
attachment added |
|
the test results for the proposed package https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5650566/+files/rocr-runtime_5.0.0-1ubuntu0.1-test-results.txt |
|
2023-03-29 07:17:51 |
Bug Watch Updater |
rocr-runtime (Debian): status |
Unknown |
Fix Released |
|
2023-03-30 14:02:31 |
Robie Basak |
rocr-runtime (Ubuntu Jammy): status |
New |
Fix Committed |
|
2023-03-30 14:02:33 |
Robie Basak |
bug |
|
|
added subscriber SRU Verification |
2023-03-30 14:02:35 |
Robie Basak |
tags |
patch |
patch verification-needed verification-needed-jammy |
|
2023-03-30 18:53:05 |
Cory Bloor |
attachment added |
|
jammy verification log https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5659198/+files/libhsa-runtime64-1_5.0.0-1ubuntu0.1_verification.txt |
|
2023-03-30 18:54:38 |
Cory Bloor |
tags |
patch verification-needed verification-needed-jammy |
patch verification-done-jammy verification-needed |
|
2023-04-13 13:10:32 |
Andreas Hasenack |
tags |
patch verification-done-jammy verification-needed |
patch verification-needed verification-needed-jammy |
|
2023-04-13 17:50:13 |
Andreas Hasenack |
tags |
patch verification-needed verification-needed-jammy |
patch verification-done-jammy verification-needed |
|
2023-04-13 18:23:55 |
Andreas Hasenack |
nominated for series |
|
Ubuntu Kinetic |
|
2023-04-13 18:23:55 |
Andreas Hasenack |
bug task added |
|
rocr-runtime (Ubuntu Kinetic) |
|
2023-04-21 21:23:27 |
Cory Bloor |
attachment added |
|
rocr-runtime_5.1.0-2ubuntu0.1.debdiff https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5666560/+files/rocr-runtime_5.1.0-2ubuntu0.1.debdiff |
|
2023-05-17 10:01:32 |
Robie Basak |
bug |
|
|
added subscriber Ubuntu Sponsors Team |
2023-05-19 22:20:16 |
Erich Eickmeyer |
rocr-runtime (Ubuntu Kinetic): status |
New |
In Progress |
|
2023-05-19 22:20:25 |
Erich Eickmeyer |
rocr-runtime (Ubuntu Kinetic): assignee |
|
Erich Eickmeyer (eeickmeyer) |
|
2023-05-19 23:50:11 |
Steve Langasek |
rocr-runtime (Ubuntu Kinetic): status |
In Progress |
Fix Committed |
|
2023-05-19 23:50:16 |
Steve Langasek |
tags |
patch verification-done-jammy verification-needed |
patch verification-done-jammy verification-needed verification-needed-kinetic |
|
2023-05-19 23:50:48 |
Steve Langasek |
removed subscriber Ubuntu Sponsors Team |
|
|
|
2023-05-20 08:06:25 |
Cory Bloor |
attachment added |
|
kinetic verification log https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5674282/+files/libhsa-runtime64-1_5.1.0-2ubuntu0.1_verification.txt |
|
2023-05-20 08:07:17 |
Cory Bloor |
tags |
patch verification-done-jammy verification-needed verification-needed-kinetic |
patch verification-done-jammy verification-done-kinetic verification-needed |
|
2023-05-20 16:00:56 |
Erich Eickmeyer |
rocr-runtime (Ubuntu Kinetic): assignee |
Erich Eickmeyer (eeickmeyer) |
|
|
2023-05-31 08:17:50 |
Launchpad Janitor |
rocr-runtime (Ubuntu Kinetic): status |
Fix Committed |
Fix Released |
|
2023-05-31 08:17:55 |
Chris Halse Rogers |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2023-05-31 08:20:10 |
Launchpad Janitor |
rocr-runtime (Ubuntu Jammy): status |
Fix Committed |
Fix Released |
|