Activity log for bug #1797963

Date Who What changed Old value New value Message
2018-10-15 19:19:45 bugproxy bug added bug
2018-10-15 19:19:47 bugproxy tags architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804
2018-10-15 19:19:48 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2018-10-15 19:19:51 bugproxy affects ubuntu linux (Ubuntu)
2018-10-15 19:21:58 Frank Heimes bug task added ubuntu-power-systems
2018-10-15 19:22:12 Frank Heimes ubuntu-power-systems: importance Undecided High
2018-10-15 19:22:26 Frank Heimes ubuntu-power-systems: assignee Canonical Kernel Team (canonical-kernel-team)
2018-10-15 19:23:27 Joseph Salisbury linux (Ubuntu): importance Undecided High
2018-10-15 19:49:00 Manoj Iyer linux (Ubuntu): assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Canonical Kernel Team (canonical-kernel-team)
2018-10-15 19:52:09 Joseph Salisbury nominated for series Ubuntu Cosmic
2018-10-15 19:52:09 Joseph Salisbury bug task added linux (Ubuntu Cosmic)
2018-10-15 19:52:09 Joseph Salisbury nominated for series Ubuntu Bionic
2018-10-15 19:52:09 Joseph Salisbury bug task added linux (Ubuntu Bionic)
2018-10-15 19:52:15 Joseph Salisbury linux (Ubuntu Bionic): status New In Progress
2018-10-15 19:52:18 Joseph Salisbury linux (Ubuntu Cosmic): status New In Progress
2018-10-15 19:52:21 Joseph Salisbury linux (Ubuntu Bionic): importance Undecided High
2018-10-15 19:52:23 Joseph Salisbury linux (Ubuntu Bionic): assignee Joseph Salisbury (jsalisbury)
2018-10-15 19:52:26 Joseph Salisbury linux (Ubuntu Cosmic): assignee Canonical Kernel Team (canonical-kernel-team) Joseph Salisbury (jsalisbury)
2018-10-16 05:05:58 Frank Heimes ubuntu-power-systems: status New In Progress
2018-10-16 14:17:25 Joseph Salisbury description Description We're not able to unwind the stack from within __kernel_clock_gettime in the Linux vDSO on Summit. This affects both DDT and MAP (via GDB and libunwind). The issue is more serious than may first appear as the function appears to be called somewhat often by the CUDA runtime, and can defer to a syscall making it relatively time consuming (making it more likely to be encountered). To reproduce: Compile $CUDA_DIR/samples/0_Simple/matrixMul (attached is a small patch to modify the Makefile to compile outside of the samples directory) Run the following GDB commands: user@deb3qwsp1:/usr/local/cuda-10.0/samples/0_Simple/matrixMul$ gdb ./matrixMul GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./matrixMul...(no debugging symbols found)...done. (gdb) break main Breakpoint 1 at 0x8284 (gdb) run Starting program: /usr/local/cuda-10.0/samples/0_Simple/matrixMul/matrixMul [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1". Breakpoint 1, 0x0000000100008284 in main () (gdb) break *(__kernel_clock_gettime+144) Breakpoint 2 at 0x7ffff7f805e4: file /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S, line 127. (gdb) continue Continuing. [Matrix Multiply Using CUDA] - Starting... Breakpoint 2, __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:127 127 /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S: No such file or directory. (gdb) bt #0 __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:127 #1 0x00007ffff7b8f530 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6 #2 0x00007ffff6b81118 in ?? () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1 #3 0x00007ffff6a69c70 in ?? () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1 #4 0x00007ffff6bf0ba0 in cuInit () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1 #5 0x000000010003ca50 in cudart::__loadDriverInternalUtil() () #6 0x00007ffff7f05274 in __pthread_once_slow ( once_control=0x1000c00f0 <cudart::globalState::loadDriver()::loadDriverControl>, init_routine=0x10003c950 <cudart::__loadDriverInternalUtil()>) at pthread_once.c:116 #7 0x000000010008ea88 in cudart::cuosOnce(int*, void (*)()) () #8 0x00000001000410a8 in cudart::globalState::initializeDriver() () #9 0x000000010005ec90 in cudaGetDeviceCount () #10 0x0000000100009930 in gpuGetMaxGflopsDeviceId() () #11 0x0000000100009bf4 in findCudaDevice(int, char const**) () #12 0x000000010000836c in main () (gdb) step 128 in /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S (gdb) bt #0 __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:128 #1 0x0000000000000000 in ?? () (gdb) Note: __kernel_clock_gettime+144 is currently the point in the function at which the syscall made, and is liable to change if updated. It corresponds to the "sc" instruction here: https://gitlab.com/TeeFirefly/linux-kernel/blob/7408b38cfdf9b0c6c3bda97402c75bd27ef69a85/arch/powerpc/kernel/vdso64/gettimeofday.S#L127 and can be rediscovered if needed by disassembling the function. Note that a backtrace can be collected before entering the syscall, but not during. The inability to unwind also prevents GDB from being able to "finish" (step out of) the function: (gdb) finish Run till exit from #0 __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:128 Warning: Cannot insert breakpoint 0. Cannot access memory at address 0x0 Command aborted. (gdb) The cause of the issue is a lack of Call Frame Information (CFI) in the syscall code path, and so a potential fix here could be to save the link register and add the corresponding CFI directive for the syscall code path (as is done for the alternative code path).[Less] This is now upstream accepted in the powerpc tree as git commit https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=56d20861c027498b5a1112b4f9f05b56d906fdda ("powerpc/vdso: Correct call frame information") == SRU Justification == IBM is requesting this commit in Bionic and Cosmic. They report that they are not able to unwind the stack from within __kernel_clock_gettime in the Linux vDSO on Summit. This affects both DDT and MAP (via GDB and libunwind). The issue is more serious than may first appear as the function appears to be called somewhat often by the CUDA runtime, and can defer to a syscall making it relatively time consuming (making it more likely to be encountered). This commit is currently still in linux-next. == Fix == 56d20861c027 ("powerpc/vdso: Correct call frame information") linux-next == Regression Potential == Low. Limited to powerpc. Original Bug Description We're not able to unwind the stack from within __kernel_clock_gettime in the Linux vDSO on Summit. This affects both DDT and MAP (via GDB and libunwind). The issue is more serious than may first appear as the function appears to be called somewhat often by the CUDA runtime, and can defer to a syscall making it relatively time consuming (making it more likely to be encountered). To reproduce: Compile $CUDA_DIR/samples/0_Simple/matrixMul (attached is a small patch to modify the Makefile to compile outside of the samples directory) Run the following GDB commands: user@deb3qwsp1:/usr/local/cuda-10.0/samples/0_Simple/matrixMul$ gdb ./matrixMul GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64le-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./matrixMul...(no debugging symbols found)...done. (gdb) break main Breakpoint 1 at 0x8284 (gdb) run Starting program: /usr/local/cuda-10.0/samples/0_Simple/matrixMul/matrixMul [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1". Breakpoint 1, 0x0000000100008284 in main () (gdb) break *(__kernel_clock_gettime+144) Breakpoint 2 at 0x7ffff7f805e4: file /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S, line 127. (gdb) continue Continuing. [Matrix Multiply Using CUDA] - Starting... Breakpoint 2, __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:127 127 /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S: No such file or directory. (gdb) bt #0 __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:127 #1 0x00007ffff7b8f530 in ?? () from /lib/powerpc64le-linux-gnu/libc.so.6 #2 0x00007ffff6b81118 in ?? () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1 #3 0x00007ffff6a69c70 in ?? () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1 #4 0x00007ffff6bf0ba0 in cuInit () from /usr/lib/powerpc64le-linux-gnu/libcuda.so.1 #5 0x000000010003ca50 in cudart::__loadDriverInternalUtil() () #6 0x00007ffff7f05274 in __pthread_once_slow (     once_control=0x1000c00f0 <cudart::globalState::loadDriver()::loadDriverControl>,     init_routine=0x10003c950 <cudart::__loadDriverInternalUtil()>) at pthread_once.c:116 #7 0x000000010008ea88 in cudart::cuosOnce(int*, void (*)()) () #8 0x00000001000410a8 in cudart::globalState::initializeDriver() () #9 0x000000010005ec90 in cudaGetDeviceCount () #10 0x0000000100009930 in gpuGetMaxGflopsDeviceId() () #11 0x0000000100009bf4 in findCudaDevice(int, char const**) () #12 0x000000010000836c in main () (gdb) step 128 in /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S (gdb) bt #0 __kernel_clock_gettime () at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:128 #1 0x0000000000000000 in ?? () (gdb) Note: __kernel_clock_gettime+144 is currently the point in the function at which the syscall made, and is liable to change if updated. It corresponds to the "sc" instruction here: https://gitlab.com/TeeFirefly/linux-kernel/blob/7408b38cfdf9b0c6c3bda97402c75bd27ef69a85/arch/powerpc/kernel/vdso64/gettimeofday.S#L127 and can be rediscovered if needed by disassembling the function. Note that a backtrace can be collected before entering the syscall, but not during. The inability to unwind also prevents GDB from being able to "finish" (step out of) the function: (gdb) finish Run till exit from #0 __kernel_clock_gettime ()     at /build/linux-ZIBxfV/linux-4.15.0/arch/powerpc/kernel/vdso64/gettimeofday.S:128 Warning: Cannot insert breakpoint 0. Cannot access memory at address 0x0 Command aborted. (gdb) The cause of the issue is a lack of Call Frame Information (CFI) in the syscall code path, and so a potential fix here could be to save the link register and add the corresponding CFI directive for the syscall code path (as is done for the alternative code path).[Less] This is now upstream accepted in the powerpc tree as git commit https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=56d20861c027498b5a1112b4f9f05b56d906fdda ("powerpc/vdso: Correct call frame information")
2018-10-23 09:23:01 Kleber Sacilotto de Souza linux (Ubuntu Bionic): status In Progress Fix Committed
2018-10-23 09:23:04 Kleber Sacilotto de Souza linux (Ubuntu Cosmic): status In Progress Fix Committed
2018-10-24 13:35:34 Brad Figg tags architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 verification-needed-bionic
2018-10-24 14:50:04 Brad Figg tags architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 verification-needed-bionic architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 verification-needed-bionic verification-needed-cosmic
2018-10-26 19:01:36 Mike Ranweiler tags architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 verification-needed-bionic verification-needed-cosmic architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 verification-done-bionic verification-needed-cosmic
2018-11-05 15:27:54 Frank Heimes ubuntu-power-systems: status In Progress Fix Committed
2018-11-13 18:51:26 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2018-11-13 18:51:26 Launchpad Janitor cve linked 2017-13168
2018-11-13 18:51:26 Launchpad Janitor cve linked 2018-15471
2018-11-13 18:51:26 Launchpad Janitor cve linked 2018-16658
2018-11-13 18:51:26 Launchpad Janitor cve linked 2018-9363
2018-11-13 19:09:36 Launchpad Janitor linux (Ubuntu Cosmic): status Fix Committed Fix Released
2018-11-14 16:03:19 Joseph Salisbury linux (Ubuntu): status Fix Committed Fix Released
2018-11-14 16:33:53 Andrew Cloke ubuntu-power-systems: status Fix Committed Fix Released
2019-07-24 20:56:26 Brad Figg tags architecture-ppc64le bugnameltc-172349 severity-high targetmilestone-inin1804 verification-done-bionic verification-needed-cosmic architecture-ppc64le bugnameltc-172349 cscc severity-high targetmilestone-inin1804 verification-done-bionic verification-needed-cosmic