Comment 48 for bug 1483343

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-02-06 03:07 EDT-------
(In reply to comment #71)
> Hi,
>
> Ran runltp on 17.04, and ended up with continuous call traces on console.
>
> ---uname output---
> Linux ltc-garri1 4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC
> 2017 ppc64le ppc64le ppc64le GNU/Linux
>
> ---Call Trace---
> [332850.158228] Call Trace:
> [332850.158253] [c000001fcec4fb50] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
> [332850.159375] [c000001fcec4fbb0] [c000000000b193a0] __schedule+0x300/0x990
> [332850.159872] [c000001fcec4fc90] [c000000000b19e88] _cond_resched+0x78/0xa0
> [332850.160475] [c000001fcec4fcc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]

This one is weird, because torture_rwlock_write_lock_irq will acquire a lock with irq disabled, so it will not get any change to reschedule..

And the asm code for /lib/modules/4.9.0-15-generic/kernel/kernel/locking/locktorture.ko:

0000000000000248 <torture_rwlock_write_lock_irq>:
248: f8 ff 4c e8 ld r2,-8(r12)
248: R_PPC64_ENTRY *ABS*
24c: 14 62 42 7c add r2,r2,r12
250: a6 02 08 7c mflr r0
254: 01 00 00 48 bl 254 <torture_rwlock_write_lock_irq+0xc>
254: R_PPC64_REL24 _mcount
258: a6 02 08 7c mflr r0
25c: f8 ff e1 fb std r31,-8(r1)
260: 10 00 01 f8 std r0,16(r1)
264: d1 ff 21 f8 stdu r1,-48(r1)
268: 00 00 22 3d addis r9,r2,0
268: R_PPC64_TOC16_HA .toc
26c: 00 00 e9 eb ld r31,0(r9)
26c: R_PPC64_TOC16_LO_DS .toc
270: 30 00 7f 38 addi r3,r31,48
274: 01 00 00 48 bl 274 <torture_rwlock_write_lock_irq+0x2c>
274: R_PPC64_REL24 _raw_write_lock_irqsave
278: 00 00 00 60 nop
27c: 18 00 3f e9 ld r9,24(r31)
280: 40 00 69 f8 std r3,64(r9)
284: 00 00 60 38 li r3,0
288: 30 00 21 38 addi r1,r1,48

so torture_rwlock_write_lock_irq+0x30/0x60 is

278: 00 00 00 60 nop

right after the return of call to _raw_write_lock_irqsave, which should be fixed up by the instruction that recover the toc, IIUC. How that end up being a call to _cond_resched() is still unknown.

Besides, Pooja, I cannot reproduce this in a PowerKVM guest with 16 VCPUs. What's your setup for the machine that hit the problem? And how many times you need to run the testcase in order to hit one?

Regards,
Boqun

> [332850.161157] [c000001fcec4fcf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.170046] [c000001fcec4fd80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.173108] [c000001fcec4fe30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
> [332850.174169] Task dump for CPU 10:
> [332850.174202] lock_torture_wr R running task 0 37019 2
> 0x00000804
> [332850.174734] Call Trace:
> [332850.175218] [c000001fcf3ab980] [c0000000000d0008] irq_exit+0xe8/0x120
> (unreliable)
> [332850.175813] [c000001fcf3ab9a0] [c000000000024e10]
> timer_interrupt+0xa0/0xe0
> [332850.176438] --- interrupt: d00000001c6008c0 at
> finish_task_switch+0xb0/0x340
> [332850.176438] LR = stutter_wait+0x8c/0x130 [torture]
> [332850.177950] [c000001fcf3ab9d0] [c000000000009514]
> decrementer_common+0x114/0x180 (unreliable)
> [332850.178517] --- interrupt: 901 at torture_shutdown_init+0x2a4/0x770
> [torture]
> [332850.178517] LR = stutter_wait+0x8c/0x130 [torture]
> [332850.179557] [c000001fcf3abcc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture] (unreliable)
> [332850.180675] [c000001fcf3abcf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.181646] [c000001fcf3abd80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.194048] [c000001fcf3abe30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
> [332850.196658] Task dump for CPU 12:
> [332850.198840] lock_torture_wr R running task 0 36921 2
> 0x00000804
> [332850.201377] Call Trace:
> [332850.201402] [c000001fc3747890] [c00000000016127c]
> hrtimer_interrupt+0x11c/0x2a0 (unreliable)
> [332850.206296] [c000001fc3747a60] [c00000000001c3a0] __switch_to+0x2e0/0x4c0
> [332850.208647] [c000001fc3747ac0] [c000000000b193a0] __schedule+0x300/0x990
> [332850.211007] [c000001fc3747b00] [c00000000015bc78]
> lock_timer_base+0xa8/0x100
> [332850.214641] [c000001fc3747b60] [c00000000015bde8]
> try_to_del_timer_sync+0x68/0xa0
> [332850.215236] [c000001fc3747ba0] [c00000000015be90]
> del_timer_sync+0x70/0x90
> [332850.215817] [c000001fc3747bd0] [c000000000b1de14]
> schedule_timeout+0x234/0x470
> [332850.216369] [c000001fc3747cc0] [d00000001c810278]
> torture_rwlock_write_lock_irq+0x30/0x60 [locktorture]
> [332850.217397] [c000001fc3747cf0] [d00000001c810988]
> lock_torture_writer+0xc0/0x270 [locktorture]
> [332850.218433] [c000001fc3747d80] [c0000000000f53c0] kthread+0x110/0x130
> [332850.219013] [c000001fc3747e30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
>
>
> Thanks,
> Pooja