Ubuntu
linux package

Bug #1574814
Comment #18

Comment 18 for bug 1574814

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-05-17: Re: [Bug 1574814] Re: ThunderX: soft lockup in cursor_timer_handler() Edit

#18

On Mon, May 16, 2016 at 5:25 PM, Ming Lei <email address hidden> wrote:
> On Fri, May 13, 2016 at 7:22 AM, dann frazier
> <email address hidden> wrote:
>> I used ftrace to do some duration measuring of the timer function
>> fb_flashcursor(). I noticed several places where this timer takes around
>> 98 ms to complete. This time seems to be due to multiple calls to
>> __memcpy_toio() in ast_dirty_update():
>>
>> for (i = y; i <= y2; i++) {
>> /* assume equal stride for now */
>> src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>> memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>>
>>
>> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>>
>> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
>> easily reproduce this. I think the next step here is to report this to
>> upstream.
>
> Hi Dann,
>
> Andrew asked me to take a look at the issue, and from my tracing,
> most of times, the cpu 'hangs' in the following line of code:
>
> __mod_timer():
> ....
> out_unlock:
> spin_unlock_irqrestore(&base->lock, flags);
>
> If I added two trace points around the above line, most of times only
> the trace point before the line is dumped, and the one after the line
> can't dumped.

Looks the above issue is caused by passing 'jiffies' to mod_timer,
and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
when the issue happened.

The following patch(workaround) can make the issue disappeared:

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 6e92917..5e880ee 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -1095,6 +1095,8 @@ static void fbcon_init(struct vc_data *vc, int init)
con_copy_unimap(vc, svc);

        ops = info->fbcon_par;
+ if (vc->vc_cur_blink_ms)
+ vc->vc_cur_blink_ms = 125;
        ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
        p->con_rotate = initial_rotation;
        set_blitting_type(vc, info);

Thanks,

>
> Thanks,
>
>>
>> --
>> You received this bug notification because you are subscribed to linux
>> in Ubuntu.
>> https://bugs.launchpad.net/bugs/1574814
>>
>> Title:
>> ThunderX: soft lockup in cursor_timer_handler() Edit
>>
>> Status in linux package in Ubuntu:
>> Confirmed
>>
>> Bug description:
>> I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>> http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>>
>> During boot I observed the following lockup on the serial console:
>>
>> [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>> [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>> [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>> [ 84.922749]
>> [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>> [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>> [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>> [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>> [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>> [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>> [ 84.922781] sp : ffff801f6c9ebc20
>> [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>> [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>> [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>> [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>> [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>> [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>> [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>> [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>> [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>> [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>> [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>> [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>> [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>> [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>> [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>> [ 84.922850]
>> [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>> [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>> [ 101.342677] random: nonblocking pool is initialized
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

On Mon, May 16, 2016 at 5:25 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Fri, May 13, 2016 at 7:22 AM, dann frazier
> <dann.frazier@canonical.com> wrote:
>> I used ftrace to do some duration measuring of the timer function
>> fb_flashcursor(). I noticed several places where this timer takes around
>> 98 ms to complete. This time seems to be due to multiple calls to
>> __memcpy_toio() in ast_dirty_update():
>>
>>         for (i = y; i <= y2; i++) {
>>                 /* assume equal stride for now */
>>                 src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>>                 memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>>
>>
>> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>>
>> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
>> easily reproduce this. I think the next step here is to report this to
>> upstream.
>
> Hi Dann,
>
> Andrew asked me to take a look at the issue, and from my tracing,
> most of times, the cpu 'hangs' in the following line of code:
>
> __mod_timer():
> ....
> out_unlock:
>      spin_unlock_irqrestore(&base->lock, flags);
>
> If I added two trace points around the above line, most of times only
> the trace point before the line is dumped, and the one after the line
> can't dumped.

Looks the above issue is caused by passing 'jiffies' to mod_timer,
and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
when the issue happened.

The following patch(workaround) can make the issue disappeared:

ops = info->fbcon_par;
+       if (vc->vc_cur_blink_ms)
+               vc->vc_cur_blink_ms = 125;
        ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
        p->con_rotate = initial_rotation;
        set_blitting_type(vc, info);

Thanks,

>
> Thanks,
>
>>
>> --
>> You received this bug notification because you are subscribed to linux
>> in Ubuntu.
>> https://bugs.launchpad.net/bugs/1574814
>>
>> Title:
>>   ThunderX: soft lockup in cursor_timer_handler() Edit
>>
>> Status in linux package in Ubuntu:
>>   Confirmed
>>
>> Bug description:
>>   I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>>     http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>>
>>   During boot I observed the following lockup on the serial console:
>>
>>   [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>   [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>>   [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>>   [ 84.922749]
>>   [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>>   [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>>   [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>>   [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>>   [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>>   [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>>   [ 84.922781] sp : ffff801f6c9ebc20
>>   [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>>   [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>>   [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>>   [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>>   [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>>   [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>>   [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>>   [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>>   [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>>   [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>>   [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>>   [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>>   [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>>   [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>>   [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>>   [ 84.922850]
>>   [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>   [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>   [ 101.342677] random: nonblocking pool is initialized
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

Ubuntulinux package

Comment 18 for bug 1574814

Ubuntu
linux package