Activity log for bug #2007993

Date Who What changed Old value New value Message
2023-02-21 15:47:30 Cory Bloor bug added bug
2023-02-21 15:47:30 Cory Bloor attachment added debian patch that fixes this bug in ubuntu 23.04 https://bugs.launchpad.net/bugs/2007993/+attachment/5648969/+files/0003-fix-static-initialization-order.patch
2023-02-21 16:22:04 Ubuntu Foundations Team Bug Bot tags patch
2023-02-21 16:22:12 Ubuntu Foundations Team Bug Bot bug added subscriber Ubuntu Review Team
2023-02-28 19:15:21 Stefano Rivera nominated for series Ubuntu Jammy
2023-02-28 19:15:21 Stefano Rivera bug task added rocr-runtime (Ubuntu Jammy)
2023-02-28 19:15:46 Stefano Rivera rocr-runtime (Ubuntu): status New Fix Released
2023-02-28 19:16:04 Stefano Rivera bug watch added https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1031089
2023-02-28 19:16:04 Stefano Rivera bug task added rocr-runtime (Debian)
2023-02-28 19:41:07 Stefano Rivera attachment added rocr-runtime_5.0.0-1ubuntu0.1.debdiff https://bugs.launchpad.net/debian/+source/rocr-runtime/+bug/2007993/+attachment/5650532/+files/rocr-runtime_5.0.0-1ubuntu0.1.debdiff
2023-02-28 19:41:25 Stefano Rivera bug added subscriber Ubuntu Stable Release Updates Team
2023-02-28 19:41:34 Stefano Rivera bug added subscriber Stefano Rivera
2023-02-28 20:59:33 Cory Bloor description # System Information: Description: Ubuntu 22.04.2 LTS Release: 22.04 # Package Version: libhsa-runtime64-1: Installed: 5.0.0-1 Source: rocr-runtime # What was done: # on Ubuntu 22.04 or 22.10 with an AMD GPU installed apt install rocminfo kmod rocminfo # What was seen: ROCk module is loaded Segmentation fault (core dumped) Note that the rocminfo utility will not try to initialize libhsa-runtime64 unless you have an AMD GPU installed, which is required to reproduce this problem. After some debugging, I came to the conclusion that this is a null pointer dereference in libhsa-runtime64. The order of static initialization is different when building the rocr-runtime package on Ubuntu as compared to on Debian, and this results in the package working on Debian but crashing when it's rebuilt for Ubuntu. A couple of static variables are being copied before they are initialized, leading to a null pointer dereference later on in the program. # What was expected: rocminfo should not crash # Debian Bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1031089 # Debian Patch: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.2.3-3/debian/patches/0003-fix-static-initialization-order.patch The patch applied to the Debian package has fixed this bug in Ubuntu 23.04. It would be great if the fix could also be applied to Ubuntu 22.04 LTS. There's not a lot of ROCm functionality in Jammy, but fixing this bug would at least get the basics like rocminfo working. [ Impact ] The rocr-runtime provides the basic interface between compute code written to run on AMD GPUs and the AMDGPU/AMDKFD driver within the kernel. On Ubuntu 22.04, the library crashes with a segfault during initialization. This bug makes the library unusable. On Ubuntu 22.04, the main use for this library is in rocminfo, which provides AMD GPU users with a description of the compute capabilities of their hardware. For example, rocminfo provides the name of the ISA for the hardware, which is useful for choosing compiler flags when building GPU libraries from source. Invoking rocminfo is also an easy way for novice users to find information about their hardware (e.g., for inclusion in bug reports filed against GPU libraries). It would therefore be useful if this fix could be backported to Ubuntu 22.04. The fix changes the order of initialization of a pair of static variables in the rocr-runtime by moving them into the same translation unit, thereby ensuring the order is both deterministic and correct. [ Test Plan ] To reproduce this bug, you will need an AMD GPU installed on the machine. Then the following terminal commands should be sufficient to cause a segfault originating in the rocr-runtime: apt install rocminfo kmod rocminfo Once the bug is fixed, you should see detailed information about your installed GPU hardware printed to standard output. This bug is deterministic at runtime, so it is relatively easy to verify if you have the necessary hardware. On Ubuntu 22.04, the rocminfo utility is the only package that depends on rocr-runtime, so this simple test is fairly comprehensive. [ Where problems could occur ] The rocr-runtime package is already badly broken, so the risk associated with backporting a fix is low. If a mistake were made in fixing this bug, the most likely outcome would be that the package remains broken. [ Other info ] The same fix is in use on Debian Unstable, Ubuntu 23.04 and upstream, so it is already being used in other environments (albeit with different versions of rocr-runtime). [ Original bug report ] # System Information: Description: Ubuntu 22.04.2 LTS Release: 22.04 # Package Version: libhsa-runtime64-1:   Installed: 5.0.0-1   Source: rocr-runtime # What was done:     # on Ubuntu 22.04 or 22.10 with an AMD GPU installed     apt install rocminfo kmod     rocminfo # What was seen:     ROCk module is loaded     Segmentation fault (core dumped) Note that the rocminfo utility will not try to initialize libhsa-runtime64 unless you have an AMD GPU installed, which is required to reproduce this problem. After some debugging, I came to the conclusion that this is a null pointer dereference in libhsa-runtime64. The order of static initialization is different when building the rocr-runtime package on Ubuntu as compared to on Debian, and this results in the package working on Debian but crashing when it's rebuilt for Ubuntu. A couple of static variables are being copied before they are initialized, leading to a null pointer dereference later on in the program. # What was expected: rocminfo should not crash # Debian Bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1031089 # Debian Patch: https://salsa.debian.org/rocm-team/rocr-runtime/-/blob/debian/5.2.3-3/debian/patches/0003-fix-static-initialization-order.patch The patch applied to the Debian package has fixed this bug in Ubuntu 23.04. It would be great if the fix could also be applied to Ubuntu 22.04 LTS. There's not a lot of ROCm functionality in Jammy, but fixing this bug would at least get the basics like rocminfo working.
2023-02-28 23:06:39 Cory Bloor attachment added the test results for the proposed package https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5650566/+files/rocr-runtime_5.0.0-1ubuntu0.1-test-results.txt
2023-03-29 07:17:51 Bug Watch Updater rocr-runtime (Debian): status Unknown Fix Released
2023-03-30 14:02:31 Robie Basak rocr-runtime (Ubuntu Jammy): status New Fix Committed
2023-03-30 14:02:33 Robie Basak bug added subscriber SRU Verification
2023-03-30 14:02:35 Robie Basak tags patch patch verification-needed verification-needed-jammy
2023-03-30 18:53:05 Cory Bloor attachment added jammy verification log https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5659198/+files/libhsa-runtime64-1_5.0.0-1ubuntu0.1_verification.txt
2023-03-30 18:54:38 Cory Bloor tags patch verification-needed verification-needed-jammy patch verification-done-jammy verification-needed
2023-04-13 13:10:32 Andreas Hasenack tags patch verification-done-jammy verification-needed patch verification-needed verification-needed-jammy
2023-04-13 17:50:13 Andreas Hasenack tags patch verification-needed verification-needed-jammy patch verification-done-jammy verification-needed
2023-04-13 18:23:55 Andreas Hasenack nominated for series Ubuntu Kinetic
2023-04-13 18:23:55 Andreas Hasenack bug task added rocr-runtime (Ubuntu Kinetic)
2023-04-21 21:23:27 Cory Bloor attachment added rocr-runtime_5.1.0-2ubuntu0.1.debdiff https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5666560/+files/rocr-runtime_5.1.0-2ubuntu0.1.debdiff
2023-05-17 10:01:32 Robie Basak bug added subscriber Ubuntu Sponsors Team
2023-05-19 22:20:16 Erich Eickmeyer rocr-runtime (Ubuntu Kinetic): status New In Progress
2023-05-19 22:20:25 Erich Eickmeyer rocr-runtime (Ubuntu Kinetic): assignee Erich Eickmeyer (eeickmeyer)
2023-05-19 23:50:11 Steve Langasek rocr-runtime (Ubuntu Kinetic): status In Progress Fix Committed
2023-05-19 23:50:16 Steve Langasek tags patch verification-done-jammy verification-needed patch verification-done-jammy verification-needed verification-needed-kinetic
2023-05-19 23:50:48 Steve Langasek removed subscriber Ubuntu Sponsors Team
2023-05-20 08:06:25 Cory Bloor attachment added kinetic verification log https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2007993/+attachment/5674282/+files/libhsa-runtime64-1_5.1.0-2ubuntu0.1_verification.txt
2023-05-20 08:07:17 Cory Bloor tags patch verification-done-jammy verification-needed verification-needed-kinetic patch verification-done-jammy verification-done-kinetic verification-needed
2023-05-20 16:00:56 Erich Eickmeyer rocr-runtime (Ubuntu Kinetic): assignee Erich Eickmeyer (eeickmeyer)
2023-05-31 08:17:50 Launchpad Janitor rocr-runtime (Ubuntu Kinetic): status Fix Committed Fix Released
2023-05-31 08:17:55 Chris Halse Rogers removed subscriber Ubuntu Stable Release Updates Team
2023-05-31 08:20:10 Launchpad Janitor rocr-runtime (Ubuntu Jammy): status Fix Committed Fix Released