Process hangs on HW watchpoint on Power9

Bug #1708451 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Medium
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Undecided
Ubuntu on IBM Power Systems Bug Triage

Bug Description

== Comment: #0 - Pedro Miraglia Franco de Carval <email address hidden> - 2017-06-30 12:52:06 ==
After setting a hardware watchpoint on a variable through gdb, the traced process gets stuck on the instruction that affects the watched address but doesn't stop.

Steps to reproduce:

Make a program to be traced:

pedromfc@perch:~$ cat wp_test.c
int my_global = 0;

void main( void ) {
  my_global = 1;
}

pedromfc@perch:~$ gcc -g -O0 -o wp_test wp_test.c

Debug the binary:

pedromfc@perch:~$ gdb -q wp_test
Reading symbols from wp_test...done.
(gdb) watch my_global
Hardware watchpoint 1: my_global
(gdb) run
Starting program: /home/pedromfc/wp_test

[Program stuck here, manually interrupt]

^C
Program received signal SIGINT, Interrupt.
0x00000000200007a0 in main () at wp_test.c:4
4 my_global = 1;
(gdb) disas
Dump of assembler code for function main:
   0x0000000020000780 <+0>: addis r2,r12,2
   0x0000000020000784 <+4>: addi r2,r2,30592
   0x0000000020000788 <+8>: std r31,-8(r1)
   0x000000002000078c <+12>: stdu r1,-48(r1)
   0x0000000020000790 <+16>: mr r31,r1
   0x0000000020000794 <+20>: nop
   0x0000000020000798 <+24>: addi r9,r2,-32492
   0x000000002000079c <+28>: li r10,1
=> 0x00000000200007a0 <+32>: stw r10,0(r9)
   0x00000000200007a4 <+36>: nop
   0x00000000200007a8 <+40>: addi r1,r31,48
   0x00000000200007ac <+44>: ld r31,-8(r1)
   0x00000000200007b0 <+48>: blr
   0x00000000200007b4 <+52>: .long 0x0
   0x00000000200007b8 <+56>: .long 0x0
   0x00000000200007bc <+60>: .long 0x1000180
End of assembler dump.

(gdb) info reg r9
r9 0x20020014 537002004
(gdb) p /x &my_global
$1 = 0x20020014
(gdb) continue
Continuing.

[Like before, nothing happens]

^C
Program received signal SIGINT, Interrupt.
0x00000000200007a0 in main () at wp_test.c:4
4 my_global = 1;
(gdb) disass
Dump of assembler code for function main:
   0x0000000020000780 <+0>: addis r2,r12,2
   0x0000000020000784 <+4>: addi r2,r2,30592
   0x0000000020000788 <+8>: std r31,-8(r1)
   0x000000002000078c <+12>: stdu r1,-48(r1)
   0x0000000020000790 <+16>: mr r31,r1
   0x0000000020000794 <+20>: nop
   0x0000000020000798 <+24>: addi r9,r2,-32492
   0x000000002000079c <+28>: li r10,1
=> 0x00000000200007a0 <+32>: stw r10,0(r9)
   0x00000000200007a4 <+36>: nop
   0x00000000200007a8 <+40>: addi r1,r31,48
   0x00000000200007ac <+44>: ld r31,-8(r1)
   0x00000000200007b0 <+48>: blr
   0x00000000200007b4 <+52>: .long 0x0
   0x00000000200007b8 <+56>: .long 0x0
   0x00000000200007bc <+60>: .long 0x1000180
End of assembler dump.
(gdb) continue
Continuing.

Check on another terminal that the program is running (but stuck in the same instruction):

pedromfc@perch:~$ ps -C gdb,wp_test -o pid,comm,state
  PID COMMAND S
19178 gdb S
19193 wp_test R

This happened on this machine model:

cpu : POWER9 (raw), altivec supported
revision : 1.0 (pvr 004e 0100)
machine : PowerNV 8375-42A
firmware : OPAL
SN : 13C665W

With this kernel:

pedromfc@perch:~$ uname -a
Linux perch 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:03:05 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

It also happened with this kernel, on another machine:
[pedromfc@zzfp342p1 ~]$ uname -a
Linux zzfp342p1 4.11.0-10.el7a.ppc64le #1 SMP Wed Jun 21 20:50:21 EDT 2017 ppc64le ppc64le ppc64le GNU/Linux

On another machine (P8) the watchpoint triggers as expected and the process stops immediately after the store instruction:

pedromfc@genoa:~$ uname -a
Linux genoa 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:57:29 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
pedromfc@genoa:~$ gdb wp_test
...

Reading symbols from wp_test...done.
(gdb) watch my_global
Hardware watchpoint 1: my_global
(gdb) run
Starting program: /home/pedromfc/wp_test
Hardware watchpoint 1: my_global

Old value = 0
New value = 1
main () at wp_test.c:5
5 }
(gdb)

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=d89ba5353f301971dd7d2f9fdf25c4432728f38e

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-156292 severity-medium targetmilestone-inin1704
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Frank Heimes (fheimes)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Medium
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It looks like commit d89ba5353f301971dd7d2f9fdf25c4432728f38e was cc'd to stable, so it should make it's way into the Ubuntu kernels through the normal stable update process. Is there a need to have this SRU'd prior to getting the update from upstream?

tags: added: kernel-da-key
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-21 05:27 EDT-------
Pedro,

The kernel packages are hosted in:

http://pokgsa.ibm.com/gsa/pokgsa/home/d/i/diegodo/web/public/156292

Can you check please

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-11-09 08:48 EDT-------
(In reply to comment #24)
> Any updates? Does 17.10 have the same issue? If not, since P9 support is
> targeted for Ubuntu 18.04 I would say we can close it without requiring an
> SRU for 17.04.

Sorry, I wasn't able to test the backport.

It seems that the patch that fixes this made its way into both 17.04 and 17.10 anyways, so the issue shouldn't be there.

http://kernel.ubuntu.com/git/ubuntu/ubuntu-zesty.git/tree/arch/powerpc/kernel/exceptions-64s.S
http://kernel.ubuntu.com/git/ubuntu/ubuntu-artful.git/tree/arch/powerpc/kernel/exceptions-64s.S

Thank you for the support.

------- Comment From <email address hidden> 2017-11-09 10:29 EDT-------
Marking bug as fix already available based on previous comment...

Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: In Progress → Fix Released
Changed in linux (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.