Bug #1574814 “ThunderX: soft lockup in cursor_timer_handler()” : Bugs : linux package : Ubuntu

Revision history for this message

dann frazier (dannf) wrote on 2016-04-25:

#1

dmesg Edit (68.0 KiB, text/plain)

Revision history for this message

dann frazier (dannf) wrote on 2016-04-25:

#2

screenshot of vga console Edit (98.9 KiB, image/png)

Revision history for this message

dann frazier (dannf) wrote on 2016-04-25:

#3

The traceback isn't always the same - here it was stuck in mod_timer():

[ 180.912269] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [swapper/9:0]
[ 180.912293] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
[ 180.912294]
[ 180.912298] CPU: 9 PID: 0 Comm: swapper/9 Tainted: G E 4.4.0-21-generic #37-Ubuntu
[ 180.912299] Hardware name: Cavium ThunderX CN88XX board (DT)
[ 180.912302] task: ffff801f6c9d0000 ti: ffff801f6c9cc000 task.ti: ffff801f6c9cc000
[ 180.912308] PC is at _raw_spin_unlock_irqrestore+0x2c/0x38
[ 180.912312] LR is at mod_timer+0x12c/0x270
[ 180.912314] pc : [<ffff8000008f5dc4>] lr : [<ffff800000131d2c>] pstate: 60400145
[ 180.912315] sp : ffff801f6c9cfba0
[ 180.912318] x29: ffff801f6c9cfba0 x28: ffff801f662982d8
[ 180.912320] x27: ffff801f6c9cfd00 x26: ffff801f7b369bb8
[ 180.912323] x25: ffff800000d69b80 x24: 0000000000000000
[ 180.912325] x23: ffff801f7b369b80 x22: ffff800000d8a000
[ 180.912328] x21: ffff801f7b369b80 x20: ffff801f7b369b80
[ 180.912330] x19: 0000000000000140 x18: 0000000000000069
[ 180.912332] x17: 0000000000468078 x16: ffff8000000c5b98
[ 180.912335] x15: 0000ffff7b779000 x14: 0000000000000008
[ 180.912337] x13: 0000000000000000 x12: 003d090000000000
[ 180.912340] x11: 00000000003d0900 x10: ffff80000090f200
[ 180.912342] x9 : 00003d0900000000 x8 : 000000000000000d
[ 180.912344] x7 : ffff801f7b3b4008 x6 : 0000000000000000
[ 180.912347] x5 : 0000000000000000 x4 : 0000000000000001
[ 180.912349] x3 : ffff801f6c9d0000 x2 : 00000000ffff8afb
[ 180.912351] x1 : 0000000000000140 x0 : 000000000000927c
[ 180.912352]

Revision history for this message

dann frazier (dannf) wrote on 2016-04-26:

#4

Note that, while I haven't been able to reproduce this by booting into Ubuntu from disk, I can easily reproduce it by booting d-i from disk - specifically, the linux and initrd.gz here:

http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/ubuntu-installer/arm64/

Revision history for this message

dann frazier (dannf) wrote on 2016-04-28:

#5

upstream-kernel-softlockup.log Edit (41.2 KiB, text/plain)

I have been able to reproduce a soft lockup using a pure upstream kernel (post-4.5 git, after the thunderx pcie drivers were merged) and the d-i initrd mentioned above. I've attached a log of one of those boots.

I used the Ubuntu config with the following exceptions:

CONFIG_DRM=y and CONFIG_DRM_AST=y - because, so far, I've only seen this with vga enabled.
CONFIG_KVM=n - because some commits fail to boot due to a panic in armpmu code, and KVM won't build w/ ARMPMU diabled.
(and just pressed enter during make oldconfig for everything else).

I could not reproduce with 4.6.0-rc1, so I thought this would be a good bisect candidate. Unfortunately, that went off into the weeds for a couple of reasons - vga doesn't work with some changesets, so I had to skip those, and I also suspect that this issue does not appear in every boot of a broken kernel, leading to false positive results.

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-04-29: Re: [Bug 1574814] Re: ThunderX: soft lockup in cursor_timer_handler() Edit

#6

It can be triggered 100% by running 'tcpdump -I ethX'.

Revision history for this message

dann frazier (dannf) wrote on 2016-05-03:

#7

On Fri, Apr 29, 2016 at 2:06 AM, Ming Lei <email address hidden> wrote:
> It can be triggered 100% by running 'tcpdump -I ethX'.

Thanks Ming. I let that run for a few hours, but was unable to
reproduce. Are you seeing the same traceback along w the softlockup
msg?

-dann

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-05-03:

#8

Download full text (3.1 KiB)

On Tue, May 3, 2016 at 10:35 AM, dann frazier
<email address hidden> wrote:
> On Fri, Apr 29, 2016 at 2:06 AM, Ming Lei <email address hidden> wrote:
>> It can be triggered 100% by running 'tcpdump -I ethX'.
>
> Thanks Ming. I let that run for a few hours, but was unable to
> reproduce. Are you seeing the same traceback along w the softlockup
> msg?

Yes, it can be always triggered by 'tcpdump -I eth7' on cvm13.

>
> -dann
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1574814
>
> Title:
> ThunderX: soft lockup in cursor_timer_handler() Edit
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
> http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>
> During boot I observed the following lockup on the serial console:
>
> [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
> [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
> [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
> [ 84.922749]
> [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
> [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
> [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
> [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
> [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
> [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
> [ 84.922781] sp : ffff801f6c9ebc20
> [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
> [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
> [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
> [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
> [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
> [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
> [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
> [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
> [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
> [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
> [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
> [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
> [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
> [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
> [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
> [ 84.922850]
> [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
> [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
> [ 101.342677] random: nonblocking pool is initialized
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+s...

On Tue, May 3, 2016 at 10:35 AM, dann frazier
<dann.frazier@canonical.com> wrote:
> On Fri, Apr 29, 2016 at 2:06 AM, Ming Lei <1574814@bugs.launchpad.net> wrote:
>> It can be triggered 100% by running 'tcpdump -I ethX'.
>
> Thanks Ming. I let that run for a few hours, but was unable to
> reproduce. Are you seeing the same traceback along w the softlockup
> msg?

Yes, it can be always triggered by 'tcpdump -I eth7' on cvm13.

>
>    -dann
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1574814
>
> Title:
>   ThunderX: soft lockup in cursor_timer_handler() Edit
>
> Status in linux package in Ubuntu:
>   Confirmed
>
> Bug description:
>   I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>     http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>
>   During boot I observed the following lockup on the serial console:
>
>   [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>   [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>   [ 84.922749]
>   [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>   [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>   [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>   [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>   [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>   [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>   [ 84.922781] sp : ffff801f6c9ebc20
>   [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>   [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>   [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>   [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>   [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>   [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>   [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>   [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>   [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>   [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>   [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>   [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>   [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>   [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>   [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>   [ 84.922850]
>   [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 101.342677] random: nonblocking pool is initialized
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

Revision history for this message

Radha Mohan Chintakuntla (rchintakuntla) wrote on 2016-05-03: Re: ThunderX: soft lockup in cursor_timer_handler() Edit

#9

Ming,
The "-I" option of tcpdump is monitoring mode typically applicable only to wifi interfaces. So even if you run it on Thunder's NIC interfaces it will return saying that this is not supported.

And BTW, what is eth7 ?
From 16.04 release all interfaces are coming up as "enP2xxxxx"

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-05-03: Re: [Bug 1574814] Re: ThunderX: soft lockup in cursor_timer_handler() Edit

#10

Download full text (3.5 KiB)

On Tue, May 3, 2016 at 1:14 PM, Radha Mohan Chintakuntla
<email address hidden> wrote:
> Ming,
> The "-I" option of tcpdump is monitoring mode typically applicable only to wifi interfaces. So even if you run it on Thunder's NIC interfaces it will return saying that this is not supported.
>

Even without the '-I', the issue still can be triggered.

> And BTW, what is eth7 ?
> >From 16.04 release all interfaces are coming up as "enP2xxxxx"

Yeah, maybe this box isn't shipped with 16.04, and its kernel is

ubuntu@arm64:~$ uname -a
Linux arm64 4.2.0 #2 SMP Mon Dec 14 04:01:19 CST 2015 aarch64 aarch64
aarch64 GNU/Linux

but the log is very similar with Dann's report:

[337056.617650] PC is at _raw_spin_unlock_irqrestore+0x30/0x40
[337056.617657] LR is at mod_timer+0x110/0x238

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1574814
>
> Title:
> ThunderX: soft lockup in cursor_timer_handler() Edit
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
> http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>
> During boot I observed the following lockup on the serial console:
>
> [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
> [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
> [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
> [ 84.922749]
> [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
> [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
> [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
> [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
> [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
> [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
> [ 84.922781] sp : ffff801f6c9ebc20
> [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
> [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
> [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
> [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
> [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
> [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
> [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
> [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
> [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
> [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
> [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
> [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
> [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
> [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
> [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
> ...

On Tue, May 3, 2016 at 1:14 PM, Radha Mohan Chintakuntla
<rchintakuntla@cavium.com> wrote:
> Ming,
> The "-I" option of tcpdump is monitoring mode typically applicable only to wifi interfaces. So even if you run it on Thunder's NIC interfaces it will return saying that this is not supported.
>

Even without the '-I', the issue still can be triggered.

> And BTW, what is eth7 ?
> >From 16.04 release all interfaces are coming up as "enP2xxxxx"

Yeah, maybe this box isn't shipped with 16.04, and its kernel is

ubuntu@arm64:~$ uname -a
Linux arm64 4.2.0 #2 SMP Mon Dec 14 04:01:19 CST 2015 aarch64 aarch64
aarch64 GNU/Linux

but the log is very similar with Dann's report:

[337056.617650] PC is at _raw_spin_unlock_irqrestore+0x30/0x40
[337056.617657] LR is at mod_timer+0x110/0x238

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1574814
>
> Title:
>   ThunderX: soft lockup in cursor_timer_handler() Edit
>
> Status in linux package in Ubuntu:
>   Confirmed
>
> Bug description:
>   I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>     http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>
>   During boot I observed the following lockup on the serial console:
>
>   [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>   [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>   [ 84.922749]
>   [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>   [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>   [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>   [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>   [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>   [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>   [ 84.922781] sp : ffff801f6c9ebc20
>   [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>   [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>   [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>   [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>   [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>   [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>   [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>   [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>   [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>   [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>   [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>   [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>   [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>   [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>   [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>   [ 84.922850]
>   [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 101.342677] random: nonblocking pool is initialized
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

Revision history for this message

Radha Mohan Chintakuntla (rchintakuntla) wrote on 2016-05-04: Re: ThunderX: soft lockup in cursor_timer_handler() Edit

#11

I tried PXE based install (just booting till the installer does network init) multiple times with Cavium UEFI and did not see this issue, but as soon as I tried AMI UEFI first time the "NMI softlockup ..." appeared after sometime while still in opening screen of "debian-installer".

Revision history for this message

dann frazier (dannf) wrote on 2016-05-04:

#12

fyi, I've found that removing the ast kernel module from the di initrd seems to avoid the issue.

Revision history for this message

Radha Mohan Chintakuntla (rchintakuntla) wrote on 2016-05-04:

#13

I think AMI UEFI has the aspeed vga driver but Cavium UEFI doesn't. so this might be causing issues ?

Joseph Salisbury (jsalisbury) on 2016-05-05

Changed in linux (Ubuntu):
importance:	Undecided → Medium
tags:	added: kernel-da-key

Revision history for this message

dann frazier (dannf) wrote on 2016-05-10:

#14

perf timechart of first 30s Edit (15.6 MiB, image/svg+xml)

I created a perf timechart (attached) to see if that would illuminate the problem. This is taken from the installer environment for the first 30s.

In this boot, the soft lockup occurred on CPU 11:
[ 32.815351] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [swapper/11:0]

This blocked period appears to be visible from the chart, but it isn't clear to me what it was executing at the time.

Revision history for this message

dann frazier (dannf) wrote on 2016-05-11:

#15

I captured a perf data file, starting just before the NMI kicked in - in this case, it again occurred on CPU 11. I found that CPU 11 had spent a lot of its time in cursor_timer_handler():

- 16.92% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore ▒
   - _raw_spin_unlock_irqrestore ▒
      + 16.87% mod_timer ▒
      + 0.05% cursor_timer_handler ▒
- 12.15% swapper [kernel.kallsyms] [k] queue_work_on ▒
   - queue_work_on ▒
      + 12.00% cursor_timer_handler ▒
      + 0.15% call_timer_fn ▒
+ 10.98% swapper [kernel.kallsyms] [k] run_timer_softirq ▒
- 2.23% swapper [kernel.kallsyms] [k] mod_timer ▒
   - mod_timer ▒
      + 1.97% cursor_timer_handler ▒
      + 0.26% call_timer_fn

Looking at the profile of the other CPUs, I found one that was interesting - CPU #12 - which appears to be the one actually running the cursor update code CPU 11 is scheduling:

- 42.18% kworker/u96:2 [kernel.kallsyms] [k] ww_mutex_unlock ▒
   - ww_mutex_unlock ▒
      - 40.70% ast_dirty_update ▒
           ast_imageblit ▒
           soft_cursor ▒
           bit_cursor ▒
           fb_flashcursor ▒
           process_one_work ▒
           worker_thread ▒
           kthread ▒
           ret_from_fork ▒
      + 1.48% ast_imageblit ▒
- 40.15% kworker/u96:2 [kernel.kallsyms] [k] __memcpy_toio ▒
   - __memcpy_toio ▒
      + 31.54% ast_dirty_update ▒
      + 8.61% ast_imageblit

I wonder if this path is blocking, preventing the timer handler on CPU #11 from rescheduling. Indeed, when I time how long the handler takes from start to finish, I am seeing occasional times around ~.1s.

I captured a perf data file, starting just before the NMI kicked in - in this case, it again occurred on CPU 11. I found that CPU 11 had spent a lot of its time in cursor_timer_handler():

-   16.92%  swapper  [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore              ▒
   - _raw_spin_unlock_irqrestore                                                         ▒
      + 16.87% mod_timer                                                                 ▒
      + 0.05% cursor_timer_handler                                                       ▒
-   12.15%  swapper  [kernel.kallsyms]      [k] queue_work_on                            ▒
   - queue_work_on                                                                       ▒
      + 12.00% cursor_timer_handler                                                      ▒
      + 0.15% call_timer_fn                                                              ▒
+   10.98%  swapper  [kernel.kallsyms]      [k] run_timer_softirq                        ▒
-    2.23%  swapper  [kernel.kallsyms]      [k] mod_timer                                ▒
   - mod_timer                                                                           ▒
      + 1.97% cursor_timer_handler                                                       ▒
      + 0.26% call_timer_fn

Looking at the profile of the other CPUs, I found one that was interesting - CPU #12 - which appears to be the one actually running the cursor update code CPU 11 is scheduling:

-   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock                        ▒
   - ww_mutex_unlock                                                                     ▒
      - 40.70% ast_dirty_update                                                          ▒
           ast_imageblit                                                                 ▒
           soft_cursor                                                                   ▒
           bit_cursor                                                                    ▒
           fb_flashcursor                                                                ▒
           process_one_work                                                              ▒
           worker_thread                                                                 ▒
           kthread                                                                       ▒
           ret_from_fork                                                                 ▒
      + 1.48% ast_imageblit                                                              ▒
-   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio                          ▒
   - __memcpy_toio                                                                       ▒
      + 31.54% ast_dirty_update                                                          ▒
      + 8.61% ast_imageblit

I wonder if this path is blocking, preventing the timer handler on CPU #11 from rescheduling. Indeed, when I time how long the handler takes from start to finish, I am seeing occasional times around ~.1s.

Revision history for this message

dann frazier (dannf) wrote on 2016-05-12:

#16

I used ftrace to do some duration measuring of the timer function fb_flashcursor(). I noticed several places where this timer takes around 98 ms to complete. This time seems to be due to multiple calls to __memcpy_toio() in ast_dirty_update():

for (i = y; i <= y2; i++) {
  /* assume equal stride for now */
  src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
  memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);

My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.

Also - I built a custom d-i using pristine 4.6-rc7, and I am able to easily reproduce this. I think the next step here is to report this to upstream.

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-05-16: Re: [Bug 1574814] Re: ThunderX: soft lockup in cursor_timer_handler() Edit

#17

Download full text (4.0 KiB)

On Fri, May 13, 2016 at 7:22 AM, dann frazier
<email address hidden> wrote:
> I used ftrace to do some duration measuring of the timer function
> fb_flashcursor(). I noticed several places where this timer takes around
> 98 ms to complete. This time seems to be due to multiple calls to
> __memcpy_toio() in ast_dirty_update():
>
> for (i = y; i <= y2; i++) {
> /* assume equal stride for now */
> src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
> memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>
>
> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>
> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
> easily reproduce this. I think the next step here is to report this to
> upstream.

Hi Dann,

Andrew asked me to take a look at the issue, and from my tracing,
most of times, the cpu 'hangs' in the following line of code:

__mod_timer():
....
out_unlock:
spin_unlock_irqrestore(&base->lock, flags);

If I added two trace points around the above line, most of times only
the trace point before the line is dumped, and the one after the line
can't dumped.

Thanks,

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1574814
>
> Title:
> ThunderX: soft lockup in cursor_timer_handler() Edit
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
> http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>
> During boot I observed the following lockup on the serial console:
>
> [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
> [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
> [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
> [ 84.922749]
> [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
> [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
> [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
> [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
> [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
> [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
> [ 84.922781] sp : ffff801f6c9ebc20
> [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
> [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
> [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
> [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
> [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
> [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
> [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
> ...

On Fri, May 13, 2016 at 7:22 AM, dann frazier
<dann.frazier@canonical.com> wrote:
> I used ftrace to do some duration measuring of the timer function
> fb_flashcursor(). I noticed several places where this timer takes around
> 98 ms to complete. This time seems to be due to multiple calls to
> __memcpy_toio() in ast_dirty_update():
>
>         for (i = y; i <= y2; i++) {
>                 /* assume equal stride for now */
>                 src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>                 memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>
>
> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>
> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
> easily reproduce this. I think the next step here is to report this to
> upstream.

Hi Dann,

Andrew asked me to take a look at the issue, and from my tracing,
most of times, the cpu 'hangs' in the following line of code:

__mod_timer():
....
out_unlock:
     spin_unlock_irqrestore(&base->lock, flags);

If I added two trace points around the above line, most of times only
the trace point before the line is dumped, and the one after the line
can't dumped.

Thanks,

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1574814
>
> Title:
>   ThunderX: soft lockup in cursor_timer_handler() Edit
>
> Status in linux package in Ubuntu:
>   Confirmed
>
> Bug description:
>   I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>     http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>
>   During boot I observed the following lockup on the serial console:
>
>   [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>   [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>   [ 84.922749]
>   [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>   [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>   [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>   [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>   [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>   [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>   [ 84.922781] sp : ffff801f6c9ebc20
>   [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>   [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>   [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>   [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>   [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>   [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>   [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>   [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>   [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>   [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>   [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>   [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>   [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>   [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>   [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>   [ 84.922850]
>   [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>   [ 101.342677] random: nonblocking pool is initialized
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

dann frazier (dannf) on 2016-05-16

summary:

- ThunderX: soft lockup in cursor_timer_handler() Edit
+ ThunderX: soft lockup in cursor_timer_handler()

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-05-17:

#18

Download full text (4.9 KiB)

On Mon, May 16, 2016 at 5:25 PM, Ming Lei <email address hidden> wrote:
> On Fri, May 13, 2016 at 7:22 AM, dann frazier
> <email address hidden> wrote:
>> I used ftrace to do some duration measuring of the timer function
>> fb_flashcursor(). I noticed several places where this timer takes around
>> 98 ms to complete. This time seems to be due to multiple calls to
>> __memcpy_toio() in ast_dirty_update():
>>
>> for (i = y; i <= y2; i++) {
>> /* assume equal stride for now */
>> src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>> memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>>
>>
>> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>>
>> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
>> easily reproduce this. I think the next step here is to report this to
>> upstream.
>
> Hi Dann,
>
> Andrew asked me to take a look at the issue, and from my tracing,
> most of times, the cpu 'hangs' in the following line of code:
>
> __mod_timer():
> ....
> out_unlock:
> spin_unlock_irqrestore(&base->lock, flags);
>
> If I added two trace points around the above line, most of times only
> the trace point before the line is dumped, and the one after the line
> can't dumped.

Looks the above issue is caused by passing 'jiffies' to mod_timer,
and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
when the issue happened.

The following patch(workaround) can make the issue disappeared:

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 6e92917..5e880ee 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -1095,6 +1095,8 @@ static void fbcon_init(struct vc_data *vc, int init)
con_copy_unimap(vc, svc);

        ops = info->fbcon_par;
+ if (vc->vc_cur_blink_ms)
+ vc->vc_cur_blink_ms = 125;
        ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
        p->con_rotate = initial_rotation;
        set_blitting_type(vc, info);

Thanks,

>
> Thanks,
>
>>
>> --
>> You received this bug notification because you are subscribed to linux
>> in Ubuntu.
>> https://bugs.launchpad.net/bugs/1574814
>>
>> Title:
>> ThunderX: soft lockup in cursor_timer_handler() Edit
>>
>> Status in linux package in Ubuntu:
>> Confirmed
>>
>> Bug description:
>> I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>> http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>>
>> During boot I observed the following lockup on the serial console:
>>
>> [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>> [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>> [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio...

On Mon, May 16, 2016 at 5:25 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Fri, May 13, 2016 at 7:22 AM, dann frazier
> <dann.frazier@canonical.com> wrote:
>> I used ftrace to do some duration measuring of the timer function
>> fb_flashcursor(). I noticed several places where this timer takes around
>> 98 ms to complete. This time seems to be due to multiple calls to
>> __memcpy_toio() in ast_dirty_update():
>>
>>         for (i = y; i <= y2; i++) {
>>                 /* assume equal stride for now */
>>                 src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>>                 memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>>
>>
>> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>>
>> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
>> easily reproduce this. I think the next step here is to report this to
>> upstream.
>
> Hi Dann,
>
> Andrew asked me to take a look at the issue, and from my tracing,
> most of times, the cpu 'hangs' in the following line of code:
>
> __mod_timer():
> ....
> out_unlock:
>      spin_unlock_irqrestore(&base->lock, flags);
>
> If I added two trace points around the above line, most of times only
> the trace point before the line is dumped, and the one after the line
> can't dumped.

Looks the above issue is caused by passing 'jiffies' to mod_timer,
and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
when the issue happened.

The following patch(workaround) can make the issue disappeared:

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index 6e92917..5e880ee 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -1095,6 +1095,8 @@ static void fbcon_init(struct vc_data *vc, int init)
                con_copy_unimap(vc, svc);

ops = info->fbcon_par;
+       if (vc->vc_cur_blink_ms)
+               vc->vc_cur_blink_ms = 125;
        ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
        p->con_rotate = initial_rotation;
        set_blitting_type(vc, info);

Thanks,

>
> Thanks,
>
>>
>> --
>> You received this bug notification because you are subscribed to linux
>> in Ubuntu.
>> https://bugs.launchpad.net/bugs/1574814
>>
>> Title:
>>   ThunderX: soft lockup in cursor_timer_handler() Edit
>>
>> Status in linux package in Ubuntu:
>>   Confirmed
>>
>> Bug description:
>>   I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>>     http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>>
>>   During boot I observed the following lockup on the serial console:
>>
>>   [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>   [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>>   [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>>   [ 84.922749]
>>   [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>>   [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>>   [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>>   [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>>   [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>>   [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>>   [ 84.922781] sp : ffff801f6c9ebc20
>>   [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>>   [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>>   [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>>   [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>>   [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>>   [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>>   [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>>   [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>>   [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>>   [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>>   [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>>   [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>>   [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>>   [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>>   [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>>   [ 84.922850]
>>   [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>   [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>   [ 101.342677] random: nonblocking pool is initialized
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

Revision history for this message

Ming Lei (tom-leiming) wrote on 2016-05-17:

#19

Download full text (5.1 KiB)

On Tue, May 17, 2016 at 12:12 PM, Ming Lei <email address hidden> wrote:
> On Mon, May 16, 2016 at 5:25 PM, Ming Lei <email address hidden> wrote:
>> On Fri, May 13, 2016 at 7:22 AM, dann frazier
>> <email address hidden> wrote:
>>> I used ftrace to do some duration measuring of the timer function
>>> fb_flashcursor(). I noticed several places where this timer takes around
>>> 98 ms to complete. This time seems to be due to multiple calls to
>>> __memcpy_toio() in ast_dirty_update():
>>>
>>> for (i = y; i <= y2; i++) {
>>> /* assume equal stride for now */
>>> src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>>> memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>>>
>>>
>>> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>>>
>>> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
>>> easily reproduce this. I think the next step here is to report this to
>>> upstream.
>>
>> Hi Dann,
>>
>> Andrew asked me to take a look at the issue, and from my tracing,
>> most of times, the cpu 'hangs' in the following line of code:
>>
>> __mod_timer():
>> ....
>> out_unlock:
>> spin_unlock_irqrestore(&base->lock, flags);
>>
>> If I added two trace points around the above line, most of times only
>> the trace point before the line is dumped, and the one after the line
>> can't dumped.
>
> Looks the above issue is caused by passing 'jiffies' to mod_timer,
> and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.
>
> The following patch(workaround) can make the issue disappeared:
>
> diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
> index 6e92917..5e880ee 100644
> --- a/drivers/video/console/fbcon.c
> +++ b/drivers/video/console/fbcon.c
> @@ -1095,6 +1095,8 @@ static void fbcon_init(struct vc_data *vc, int init)
> con_copy_unimap(vc, svc);
>
> ops = info->fbcon_par;
> + if (vc->vc_cur_blink_ms)

oops, it should be 'if (!vc->vc_cur_blink_ms)'

> + vc->vc_cur_blink_ms = 125;
> ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
> p->con_rotate = initial_rotation;
> set_blitting_type(vc, info);
>
> Thanks,
>
>>
>> Thanks,
>>
>>>
>>> --
>>> You received this bug notification because you are subscribed to linux
>>> in Ubuntu.
>>> https://bugs.launchpad.net/bugs/1574814
>>>
>>> Title:
>>> ThunderX: soft lockup in cursor_timer_handler() Edit
>>>
>>> Status in linux package in Ubuntu:
>>> Confirmed
>>>
>>> Bug description:
>>> I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>>> http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>>>
>>> During boot I observed the following lockup on the serial console:
>>>
>>> [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>> [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>>> [ 84.922718] Modules li...

On Tue, May 17, 2016 at 12:12 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Mon, May 16, 2016 at 5:25 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> On Fri, May 13, 2016 at 7:22 AM, dann frazier
>> <dann.frazier@canonical.com> wrote:
>>> I used ftrace to do some duration measuring of the timer function
>>> fb_flashcursor(). I noticed several places where this timer takes around
>>> 98 ms to complete. This time seems to be due to multiple calls to
>>> __memcpy_toio() in ast_dirty_update():
>>>
>>>         for (i = y; i <= y2; i++) {
>>>                 /* assume equal stride for now */
>>>                 src_offset = dst_offset = i * afbdev->afb.base.pitches[0] + (x * bpp);
>>>                 memcpy_toio(bo->kmap.virtual + src_offset, afbdev->sysram + src_offset, (x2 - x + 1) * bpp);
>>>
>>>
>>> My theory is that this is causing mod_timer() to block on the other CPU, resulting in the soft lockup.
>>>
>>> Also - I built a custom d-i using pristine 4.6-rc7, and I am able to
>>> easily reproduce this. I think the next step here is to report this to
>>> upstream.
>>
>> Hi Dann,
>>
>> Andrew asked me to take a look at the issue, and from my tracing,
>> most of times, the cpu 'hangs' in the following line of code:
>>
>> __mod_timer():
>> ....
>> out_unlock:
>>      spin_unlock_irqrestore(&base->lock, flags);
>>
>> If I added two trace points around the above line, most of times only
>> the trace point before the line is dumped, and the one after the line
>> can't dumped.
>
> Looks the above issue is caused by passing 'jiffies' to mod_timer,
> and 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.
>
> The following patch(workaround) can make the issue disappeared:
>
> diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
> index 6e92917..5e880ee 100644
> --- a/drivers/video/console/fbcon.c
> +++ b/drivers/video/console/fbcon.c
> @@ -1095,6 +1095,8 @@ static void fbcon_init(struct vc_data *vc, int init)
>                 con_copy_unimap(vc, svc);
>
>         ops = info->fbcon_par;
> +       if (vc->vc_cur_blink_ms)

oops, it should be 'if (!vc->vc_cur_blink_ms)'

> +               vc->vc_cur_blink_ms = 125;
>         ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
>         p->con_rotate = initial_rotation;
>         set_blitting_type(vc, info);
>
> Thanks,
>
>>
>> Thanks,
>>
>>>
>>> --
>>> You received this bug notification because you are subscribed to linux
>>> in Ubuntu.
>>> https://bugs.launchpad.net/bugs/1574814
>>>
>>> Title:
>>>   ThunderX: soft lockup in cursor_timer_handler() Edit
>>>
>>> Status in linux package in Ubuntu:
>>>   Confirmed
>>>
>>> Bug description:
>>>   I booted a Cavium ThunderX crb1s 2.0 system using the netboot mini iso via virtual media:
>>>     http://ports.ubuntu.com/ubuntu-ports/dists/xenial/main/installer-arm64/20101020ubuntu451/images/netboot/mini.iso
>>>
>>>   During boot I observed the following lockup on the serial console:
>>>
>>>   [ 28.128327] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>>   [ 84.912299] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [swapper/14:0]
>>>   [ 84.922718] Modules linked in: hid_generic(E) usbhid(E) hid(E) usb_storage(E) mdio_thunder(E) nicvf(E) ast(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ttm(E) drm(E) nicpf(E) thunder_bgx(E) mdio_cavium(E)
>>>   [ 84.922749]
>>>   [ 84.922754] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G E 4.4.0-21-generic #37-Ubuntu
>>>   [ 84.922757] Hardware name: Cavium ThunderX CN88XX board (DT)
>>>   [ 84.922761] task: ffff801f6c9d4100 ti: ffff801f6c9e8000 task.ti: ffff801f6c9e8000
>>>   [ 84.922771] PC is at cursor_timer_handler+0x30/0x58
>>>   [ 84.922775] LR is at cursor_timer_handler+0x30/0x58
>>>   [ 84.922778] pc : [<ffff8000004ec4f0>] lr : [<ffff8000004ec4f0>] pstate: 00400145
>>>   [ 84.922781] sp : ffff801f6c9ebc20
>>>   [ 84.922784] x29: ffff801f6c9ebc20 x28: ffff8000f94398d8
>>>   [ 84.922789] x27: ffff801f6c9ebd00 x26: ffff801f7b3bebb8
>>>   [ 84.922793] x25: ffff801f6c9e8000 x24: ffff800000e5ec00
>>>   [ 84.922798] x23: ffff801f667d9800 x22: ffff8000004ec4c0
>>>   [ 84.922802] x21: 0000000000000100 x20: ffff8000f94398d8
>>>   [ 84.922807] x19: ffff8000f9439800 x18: 0000ffffc76a5358
>>>   [ 84.922811] x17: 0000ffff97bbd2a8 x16: ffff8000002a5040
>>>   [ 84.922816] x15: 000000003e4cf1e0 x14: 0000000000000008
>>>   [ 84.922820] x13: 0000000000000000 x12: 003d090000000000
>>>   [ 84.922824] x11: 00000000003d0900 x10: ffff80000090f200
>>>   [ 84.922829] x9 : 00003d0900000000 x8 : 000000000000000e
>>>   [ 84.922833] x7 : ffff801f7b3c5008 x6 : 00000000ffffffff
>>>   [ 84.922837] x5 : 0000000000000000 x4 : 0000000000000001
>>>   [ 84.922842] x3 : 0000000000000000 x2 : ffff801f6c899e05
>>>   [ 84.922846] x1 : ffff801f667d99e0 x0 : 0000000000000000
>>>   [ 84.922850]
>>>   [ 101.008387] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>>   [ 101.180375] usb 1-1.1: reset high-speed USB device number 3 using xhci_hcd
>>>   [ 101.342677] random: nonblocking pool is initialized
>>>
>>> To manage notifications about this bug go to:
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574814/+subscriptions

Revision history for this message

dann frazier (dannf) wrote on 2016-05-17:

#20

Upstream thread: https://lists.freedesktop.org/archives/dri-devel/2016-May/107693.html

Kamal Mostafa (kamalmostafa) on 2016-05-24

Changed in linux (Ubuntu Xenial):
status:	New → Fix Committed

Revision history for this message

Kamal Mostafa (kamalmostafa) wrote on 2016-06-14:

#21

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags:

added: verification-needed-xenial

Revision history for this message

dann frazier (dannf) wrote on 2016-06-14:

#22

I rebuilt d-i against the kernel in proposed and booted it a couple of times. I have not reproduced this issue, so marking it verified.

~ # uname -a
Linux cvm3 4.4.0-25-generic #44-Ubuntu SMP Fri Jun 10 18:15:04 UTC 2016 aarch64 GNU/Linux
~ # dmesg | grep -i soft
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-di root=UUID=bc611283-1f38-4993-a2b9-883922c7ed1f ro earlycon=pl011,0x87e024000000 hardlockup_all_cpu_backtrace=1 softlockup_all_cpu_backtrace=1 earlycon=pl011,0x87e024000000 apt-setup/proposed=true
[ 0.000000] software IO TLB [mem 0xfbfed000-0xfffed000] (64MB) mapped at [ffff8000fb1ed000-ffff8000ff1ecfff]
[ 0.229859] CPU features: detected feature: Software prefetching using PRFM
[ 174.913338] xor: measuring software checksum speed

tags:

added: verification-done-xenial
removed: verification-needed-xenial

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-06-15:

#23

Download full text (24.1 KiB)

This bug was fixed in the package linux - 4.4.0-25.44

---------------
linux (4.4.0-25.44) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
- LP: #1591289

  * Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA context in non-MSA kernels
    - MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)
    - MIPS: ptrace: Fix FP context restoration FCSR regression
    - MIPS: ptrace: Prevent writes to read-only FCSR bits
    - MIPS: Fix sigreturn via VDSO on microMIPS kernel
    - MIPS: Build microMIPS VDSO for microMIPS kernels
    - MIPS: lib: Mark intrinsics notrace
    - MIPS: VDSO: Build with `-fno-strict-aliasing'
    - affs: fix remount failure when there are no options changed
    - ASoC: ak4642: Enable cache usage to fix crashes on resume
    - Input: uinput - handle compat ioctl for UI_SET_PHYS
    - ARM: mvebu: fix GPIO config on the Linksys boards
    - ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description
    - ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
    - ARM: dts: imx35: restore existing used clock enumeration
    - ath9k: Add a module parameter to invert LED polarity.
    - ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.
    - ath10k: fix debugfs pktlog_filter write
    - ath10k: fix firmware assert in monitor mode
    - ath10k: fix rx_channel during hw reconfigure
    - ath10k: fix kernel panic, move arvifs list head init before htt init
    - ath5k: Change led pin configuration for compaq c700 laptop
    - hwrng: exynos - Fix unbalanced PM runtime put on timeout error path
    - rtlwifi: rtl8723be: Add antenna select module parameter
    - rtlwifi: btcoexist: Implement antenna selection
    - rtlwifi: Fix logic error in enter/exit power-save mode
    - rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in
      rtl_pci_reset_trx_ring
    - aacraid: Relinquish CPU during timeout wait
    - aacraid: Fix for aac_command_thread hang
    - aacraid: Fix for KDUMP driver hang
    - hwmon: (ads7828) Enable internal reference
    - mfd: intel-lpss: Save register context on suspend
    - mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table
      correctly
    - PM / Runtime: Fix error path in pm_runtime_force_resume()
    - cpuidle: Indicate when a device has been unregiste...

This bug was fixed in the package linux - 4.4.0-25.44

---------------
linux (4.4.0-25.44) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
    - LP: #1591289

* Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA context in non-MSA kernels
    - MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)
    - MIPS: ptrace: Fix FP context restoration FCSR regression
    - MIPS: ptrace: Prevent writes to read-only FCSR bits
    - MIPS: Fix sigreturn via VDSO on microMIPS kernel
    - MIPS: Build microMIPS VDSO for microMIPS kernels
    - MIPS: lib: Mark intrinsics notrace
    - MIPS: VDSO: Build with `-fno-strict-aliasing'
    - affs: fix remount failure when there are no options changed
    - ASoC: ak4642: Enable cache usage to fix crashes on resume
    - Input: uinput - handle compat ioctl for UI_SET_PHYS
    - ARM: mvebu: fix GPIO config on the Linksys boards
    - ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description
    - ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
    - ARM: dts: imx35: restore existing used clock enumeration
    - ath9k: Add a module parameter to invert LED polarity.
    - ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.
    - ath10k: fix debugfs pktlog_filter write
    - ath10k: fix firmware assert in monitor mode
    - ath10k: fix rx_channel during hw reconfigure
    - ath10k: fix kernel panic, move arvifs list head init before htt init
    - ath5k: Change led pin configuration for compaq c700 laptop
    - hwrng: exynos - Fix unbalanced PM runtime put on timeout error path
    - rtlwifi: rtl8723be: Add antenna select module parameter
    - rtlwifi: btcoexist: Implement antenna selection
    - rtlwifi: Fix logic error in enter/exit power-save mode
    - rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in
      rtl_pci_reset_trx_ring
    - aacraid: Relinquish CPU during timeout wait
    - aacraid: Fix for aac_command_thread hang
    - aacraid: Fix for KDUMP driver hang
    - hwmon: (ads7828) Enable internal reference
    - mfd: intel-lpss: Save register context on suspend
    - mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table
      correctly
    - PM / Runtime: Fix error path in pm_runtime_force_resume()
    - cpuidle: Indicate when a device has been unregistered
    - cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()
    - clk: bcm2835: Fix PLL poweron
    - clk: at91: fix check of clk_register() returned value
    - clk: bcm2835: pll_off should only update CM_PLL_ANARST
    - clk: bcm2835: divider value has to be 1 or more
    - pinctrl: exynos5440: Use off-stack memory for pinctrl_gpio_range
    - PCI: Disable all BAR sizing for devices with non-compliant BARs
    - media: v4l2-compat-ioctl32: fix missing reserved field copy in
      put_v4l2_create32
    - mm: use phys_addr_t for reserve_bootmem_region() arguments
    - wait/ptrace: assume __WALL if the child is traced
    - QE-UART: add "fsl,t1040-ucc-uart" to of_device_id
    - powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel
    - powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
    - powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()
    - xen/events: Don't move disabled irqs
    - xen: use same main loop for counting and remapping pages
    - sunrpc: fix stripping of padded MIC tokens
    - drm/gma500: Fix possible out of bounds read
    - drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
    - drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
    - drm/vmwgfx: Fix order of operation
    - drm/amdgpu: use drm_mode_vrefresh() rather than mode->vrefresh
    - drm/amdgpu: Fix hdmi deep color support.
    - drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config()
    - drm/fb_helper: Fix references to dev->mode_config.num_connector
    - drm/atomic: Verify connector->funcs != NULL when clearing states
    - drm/i915: Don't leave old junk in ilk active watermarks on readout
    - drm/imx: Match imx-ipuv3-crtc components using device node in platform data
    - ext4: fix hang when processing corrupted orphaned inode list
    - ext4: clean up error handling when orphan list is corrupted
    - ext4: fix oops on corrupted filesystem
    - ext4: address UBSAN warning in mb_find_order_for_block()
    - ext4: silence UBSAN in ext4_mb_init()
    - PM / sleep: Handle failures in device_suspend_late() consistently
    - dma-debug: avoid spinlock recursion when disabling dma-debug
    - scripts/package/Makefile: rpmbuild add support of RPMOPTS
    - gcov: disable tree-loop-im to reduce stack usage
    - xfs: disallow rw remount on fs with unknown ro-compat features
    - xfs: Don't wrap growfs AGFL indexes
    - xfs: xfs_iflush_cluster fails to abort on error
    - xfs: fix inode validity check in xfs_iflush_cluster
    - xfs: skip stale inodes in xfs_iflush_cluster
    - xfs: print name of verifier if it fails
    - xfs: handle dquot buffer readahead in log recovery correctly
    - Linux 4.4.13

* 168c:001c [HP Compaq Presario C700 Notebook PC] Wireless led button doesn't
    switch colors (LP: #972604)
    - ath5k: Change led pin configuration for compaq c700 laptop

* Extended statistics from balloon for proper memory management (LP: #1587091)
    - mm/page_alloc.c: calculate 'available' memory in a separate function
    - virtio_balloon: export 'available' memory to balloon statistics

* CAPI: CGZIP AFU contexts do not receive interrupts after heavy afu
    open/close (LP: #1588468)
    - misc: cxl: use kobj_to_dev()
    - cxl: Move common code away from bare-metal-specific files
    - cxl: Move bare-metal specific code to specialized files
    - cxl: Define process problem state area at attach time only
    - cxl: Introduce implementation-specific API
    - cxl: Rename some bare-metal specific functions
    - cxl: Isolate a few bare-metal-specific calls
    - cxl: Update cxl_irq() prototype
    - cxl: IRQ allocation for guests
    - powerpc: New possible return value from hcall
    - cxl: New hcalls to support cxl adapters
    - cxl: Separate bare-metal fields in adapter and AFU data structures
    - cxlflash: Simplify PCI registration
    - cxlflash: Unmap problem state area before detaching master context
    - cxlflash: Split out context initialization
    - cxlflash: Simplify attach path error cleanup
    - cxlflash: Reorder user context initialization
    - cxl: Add guest-specific code
    - cxl: sysfs support for guests
    - cxl: Support to flash a new image on the adapter from a guest
    - cxl: Parse device tree and create cxl device(s) at boot
    - cxl: Support the cxl kernel API from a guest
    - cxl: Adapter failure handling
    - cxl: Add tracepoints around the cxl hcall
    - cxlflash: Use new cxl_pci_read_adapter_vpd() API
    - cxl: Remove cxl_get_phys_dev() kernel API
    - cxl: Ignore probes for virtual afu pci devices
    - cxl: Poll for outstanding IRQs when detaching a context

* NVMe max_segments queue parameter gets set to 1 (LP: #1588449)
    - nvme: set queue limits for the admin queue
    - nvme: fix max_segments integer truncation
    - block: fix blk_rq_get_max_sectors for driver private requests

* workaround cavium thunderx silicon erratum 23144 (LP: #1589704)
    - irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144

* Xenial update to v4.4.12 stable release (LP: #1588945)
    - Btrfs: don't use src fd for printk
    - perf/x86/intel/pt: Generate PMI in the STOP region as well
    - perf/core: Fix perf_event_open() vs. execve() race
    - perf test: Fix build of BPF and LLVM on older glibc libraries
    - ext4: iterate over buffer heads correctly in move_extent_per_page()
    - arm64: Fix typo in the pmdp_huge_get_and_clear() definition
    - arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
    - arm64: Implement ptep_set_access_flags() for hardware AF/DBM
    - arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
    - arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
    - arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tables
    - kvm: arm64: Fix EC field in inject_abt64
    - remove directory incorrectly tries to set delete on close on non-empty
      directories
    - fs/cifs: correctly to anonymous authentication via NTLMSSP
    - fs/cifs: correctly to anonymous authentication for the LANMAN authentication
    - fs/cifs: correctly to anonymous authentication for the NTLM(v1)
      authentication
    - fs/cifs: correctly to anonymous authentication for the NTLM(v2)
      authentication
    - asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
    - ring-buffer: Use long for nr_pages to avoid overflow failures
    - ring-buffer: Prevent overflow of size in ring_buffer_resize()
    - crypto: caam - fix caam_jr_alloc() ret code
    - crypto: talitos - fix ahash algorithms registration
    - crypto: sun4i-ss - Replace spinlock_bh by spin_lock_irq{save|restore}
    - clk: qcom: msm8916: Fix crypto clock flags
    - sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded
      systems
    - mfd: omap-usb-tll: Fix scheduling while atomic BUG
    - Input: pwm-beeper - fix - scheduling while atomic
    - irqchip/gic: Ensure ordering between read of INTACK and shared data
    - irqchip/gic-v3: Configure all interrupts as non-secure Group-1
    - can: fix handling of unmodifiable configuration options
    - mmc: mmc: Fix partition switch timeout for some eMMCs
    - mmc: sdhci-acpi: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
    - ACPI / osi: Fix an issue that acpi_osi=!* cannot disable ACPICA internal
      strings
    - dell-rbtn: Ignore ACPI notifications if device is suspended
    - mmc: longer timeout for long read time quirk
    - mmc: sdhci-pci: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
    - Bluetooth: vhci: fix open_timeout vs. hdev race
    - Bluetooth: vhci: purge unhandled skbs
    - Bluetooth: vhci: Fix race at creating hci device
    - mei: fix NULL dereferencing during FW initiated disconnection
    - mei: amthif: discard not read messages
    - mei: bus: call mei_cl_read_start under device lock
    - USB: serial: mxuport: fix use-after-free in probe error path
    - USB: serial: keyspan: fix use-after-free in probe error path
    - USB: serial: quatech2: fix use-after-free in probe error path
    - USB: serial: io_edgeport: fix memory leaks in attach error path
    - USB: serial: io_edgeport: fix memory leaks in probe error path
    - USB: serial: option: add support for Cinterion PH8 and AHxx
    - USB: serial: option: add more ZTE device ids
    - USB: serial: option: add even more ZTE device ids
    - usb: gadget: f_fs: Fix EFAULT generation for async read operations
    - usb: f_mass_storage: test whether thread is running before starting another
    - usb: misc: usbtest: fix pattern tests for scatterlists.
    - usb: gadget: udc: core: Fix argument of dev_err() in
      usb_gadget_map_request()
    - staging: comedi: das1800: fix possible NULL dereference
    - KVM: x86: fix ordering of cr0 initialization code in vmx_cpu_reset
    - MIPS: KVM: Fix timer IRQ race when freezing timer
    - MIPS: KVM: Fix timer IRQ race when writing CP0_Compare
    - KVM: x86: mask CPUID(0xD,0x1).EAX against host value
    - xen/x86: actually allocate legacy interrupts on PV guests
    - tty: vt, return error when con_startup fails
    - TTY: n_gsm, fix false positive WARN_ON
    - tty/serial: atmel: fix hardware handshake selection
    - Fix OpenSSH pty regression on close
    - serial: 8250_pci: fix divide error bug if baud rate is 0
    - serial: 8250_mid: use proper bar for DNV platform
    - serial: 8250_mid: recognize interrupt source in handler
    - serial: samsung: Reorder the sequence of clock control when call
      s3c24xx_serial_set_termios()
    - locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()
    - clk: bcm2835: add locking to pll*_on/off methods
    - mcb: Fixed bar number assignment for the gdd
    - ALSA: hda/realtek - New codecs support for ALC234/ALC274/ALC294
    - ALSA: hda - Fix headphone noise on Dell XPS 13 9360
    - ALSA: hda/realtek - Add support for ALC295/ALC3254
    - ALSA: hda - Fix headset mic detection problem for one Dell machine
    - IB/srp: Fix a debug kernel crash
    - thunderbolt: Fix double free of drom buffer
    - SIGNAL: Move generic copy_siginfo() to signal.h
    - UBI: Fix static volume checks when Fastmap is used
    - hpfs: fix remount failure when there are no options changed
    - hpfs: implement the show_options method
    - scsi: Add intermediate STARGET_REMOVE state to scsi_target_state
    - Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"
    - kbuild: move -Wunused-const-variable to W=1 warning level
    - Linux 4.4.12

* [Hyper-V] fixes for kdump when running on a VM (LP: #1588965)
    - clocksource: Allow unregistering the watchdog

* net_admin apparmor denial when using Go (LP: #1465724)
    - SAUCE: kernel: Add noaudit variant of ns_capable()
    - SAUCE: net: Use ns_capable_noaudit() when determining net sysctl permissions

* [Hyper-V] Put tools/hv/lsvmbus in /usr/sbin (LP: #1585311)
    - [Debian] Install lsvmbus in cloud tools
    - SAUCE: tools/hv/lsvmbus -- convert to python3
    - SAUCE: tools/hv/lsvmbus -- add manual page

* btrfs: file write crashes with false ENOSPC during snapshot creation since
    kernel 4.4 - fix available (LP: #1584052)
    - btrfs: Continue write in case of can_not_nocow

* boot stalls on USB detection errors (LP: #1437492)
    - usb: core: hub: hub_port_init lock controller instead of bus

* [Bug]KNL:Spread MWAIT cache lines over all nodes (LP: #1585850)
    - kernek/fork.c: allocate idle task for a CPU always on its local node

* [Hyper-V] PCI Passthrough kernel hang and explicit barriers (LP: #1581243)
    - PCI: hv: Report resources release after stopping the bus
    - PCI: hv: Add explicit barriers to config space access

* Kernel 4.2.X and 4.4.X - Fix USB3.0 link power management (LPM)
    claim/release logic in USBFS (LP: #1577024)
    - USB: leave LPM alone if possible when binding/unbinding interface drivers

* STC840.20:tuleta:tul516p01 panic after injecting Leaf EEH (LP: #1581034)
    - NVMe: Fix namespace removal deadlock
    - NVMe: Requeue requests on suspended queues
    - NVMe: Move error handling to failed reset handler
    - blk-mq: End unstarted requests on dying queue

* conflicting modules in udebs - arc4.ko (LP: #1582991)
    - [Config] Remove arc4 from nic-modules

* CVE-2016-4482 (LP: #1578493)
    - USB: usbfs: fix potential infoleak in devio

* mlx5_core kexec fail  (LP: #1585978)
    - net/mlx5: Add pci shutdown callback

* backport fix for /proc/net issues with containers (LP: #1584953)
    - netfilter: Set /proc/net entries owner to root in namespace

* CVE-2016-4951 (LP: #1585365)
    - tipc: check nl sock before parsing nested attributes

* CVE-2016-4578 (LP: #1581866)
    - ALSA: timer: Fix leak in events via snd_timer_user_ccallback
    - ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt

* CVE-2016-4569 (LP: #1580379)
    - ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS

* s390/pci: fix use after free in dma_init (LP: #1584828)
    - s390/pci: fix use after free in dma_init

* s390/mm: fix asce_bits handling with dynamic pagetable levels (LP: #1584827)
    - s390/mm: fix asce_bits handling with dynamic pagetable levels

* CAPI: CGZIP Wrong CAPI MMIO timeout (256usec desired but 1usec default
    setting in cxl.ko driver) (LP: #1584066)
    - powerpc: Define PVR value for POWER8NVL processor
    - cxl: Configure the PSL for two CAPI ports on POWER8NVL
    - cxl: Increase timeout for detection of AFU mmio hang

* ThunderX: soft lockup in cursor_timer_handler() (LP: #1574814)
    - SAUCE: tty: vt: Fix soft lockup in fbcon cursor blink timer.

* debian.master/.../getabis bogus warnings "inconsistant compiler versions"
    and "not a git repository" (LP: #1584890)
    - [debian] getabis: Only git add $abidir if running in local repo
    - [debian] getabis: Fix inconsistent compiler versions check

* Backport cxlflash patch related to EEH recovery into Xenial SRU stream
    (LP: #1584935)
    - cxlflash: Fix to resolve dead-lock during EEH recovery

* Xenial update to 4.4.11 stable release (LP: #1584912)
    - decnet: Do not build routes to devices without decnet private data.
    - route: do not cache fib route info on local routes with oif
    - packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
    - net: sched: do not requeue a NULL skb
    - bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
    - cdc_mbim: apply "NDP to end" quirk to all Huawei devices
    - net: use skb_postpush_rcsum instead of own implementations
    - vlan: pull on __vlan_insert_tag error path and fix csum correction
    - openvswitch: use flow protocol when recalculating ipv6 checksums
    - ipv4/fib: don't warn when primary address is missing if in_dev is dead
    - net/mlx4_en: fix spurious timestamping callbacks
    - bpf: fix check_map_func_compatibility logic
    - samples/bpf: fix trace_output example
    - net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
    - gre: do not pull header in ICMP error processing
    - net_sched: introduce qdisc_replace() helper
    - net_sched: update hierarchical backlog too
    - sch_htb: update backlog as well
    - sch_dsmark: update backlog as well
    - netem: Segment GSO packets on enqueue
    - net: fec: only clear a queue's work bit if the queue was emptied
    - VSOCK: do not disconnect socket when peer has shutdown SEND only
    - net: bridge: fix old ioctl unlocked net device walk
    - bridge: fix igmp / mld query parsing
    - uapi glibc compat: fix compile errors when glibc net/if.h included before
      linux/if.h MIME-Version: 1.0
    - net: fix a kernel infoleak in x25 module
    - net: thunderx: avoid exposing kernel stack
    - tcp: refresh skb timestamp at retransmit time
    - net/route: enforce hoplimit max value
    - ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
    - ocfs2: fix posix_acl_create deadlock
    - zsmalloc: fix zs_can_compact() integer overflow
    - crypto: qat - fix invalid pf2vf_resp_wq logic
    - crypto: hash - Fix page length clamping in hash walk
    - crypto: testmgr - Use kmalloc memory for RSA input
    - ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
    - ALSA: usb-audio: Yet another Phoneix Audio device quirk
    - ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
    - ALSA: hda - Fix white noise on Asus UX501VW headset
    - ALSA: hda - Fix broken reconfig
    - spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
    - spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
    - spi: spi-ti-qspi: Handle truncated frames properly
    - pinctrl: at91-pio4: fix pull-up/down logic
    - regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
    - perf/core: Disable the event on a truncated AUX record
    - vfs: add vfs_select_inode() helper
    - vfs: rename: check backing inode being equal
    - ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
    - workqueue: fix rebind bound workers warning
    - regulator: s2mps11: Fix invalid selector mask and voltages for buck9
    - regulator: axp20x: Fix axp22x ldo_io voltage ranges
    - atomic_open(): fix the handling of create_error
    - qla1280: Don't allocate 512kb of host tags
    - tools lib traceevent: Do not reassign parg after collapse_tree()
    - get_rock_ridge_filename(): handle malformed NM entries
    - Input: max8997-haptic - fix NULL pointer dereference
    - Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
    - drm/radeon: fix PLL sharing on DCE6.1 (v2)
    - drm/i915: Bail out of pipe config compute loop on LPT
    - drm/i915/bdw: Add missing delay during L3 SQC credit programming
    - drm/radeon: fix DP link training issue with second 4K monitor
    - nf_conntrack: avoid kernel pointer value leak in slab name
    - Linux 4.4.11

* Support Edge Gateway's Bluetooth LED (LP: #1512999)
    - SAUCE: Bluetooth: Support for LED on Marvell modules

* Support Edge Gateway's WIFI LED (LP: #1512997)
    - SAUCE: mwifiex: Switch WiFi LED state according to the device status

* Marvell wireless driver update for FCC regulation (LP: #1528910)
    - mwifiex: parse adhoc start/join result
    - mwifiex: handle start AP error paths correctly
    - mwifiex: set regulatory info from EEPROM
    - mwifiex: don't follow AP if country code received from EEPROM
    - mwifiex: correction in region code to country mapping
    - mwifiex: update region_code_index array
    - mwifiex: use world for unidentified region code
    - SAUCE: mwifiex: add iw vendor command support

* Kernel can be oopsed using remap_file_pages (LP: #1558120)
    - Revert "UBUNTU: SAUCE: mm/mmap: fix oopsing on remap_file_pages"
    - SAUCE: AUFS: mm/mmap: fix oopsing on remap_file_pages aufs mmap: bugfix,
      mainly for linux-4.5-rc5, remap_file_pages(2) emulation

* cgroup namespace update (LP: #1584163)
    - Revert "UBUNTU: SAUCE: cgroup mount: ignore nsroot="
    - Revert "UBUNTU: SAUCE: (noup) cgroup namespaces: add a 'nsroot=' mountinfo
      field"
    - cgroup, kernfs: make mountinfo show properly scoped path for cgroup
      namespaces
    - kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
    - cgroup: fix compile warning

* Missing libunwind support in perf (LP: #1248289)
    - [Config] add binutils-dev to the Build-Depends: to fix perf unwinding

* e1000 Tx Unit Hang  (LP: #1582328)
    - e1000: Double Tx descriptors needed check for 82544
    - e1000: Do not overestimate descriptor counts in Tx pre-check

* Unsharing user and ipc namespaces simultaneously makes mqueue unmountable
    (LP: #1582378)
    - SAUCE: (namespace) mqueue: Super blocks must be owned by the user ns which
      owns the ipc ns

* Pull in the amdgpu/radeon code from Linux 4.5.3 (LP: #1580526)
    - drm/radeon: rework fbdev handling on chips with no connectors
    - drm/radeon/mst: fix regression in lane/link handling.
    - drm/amd/powerplay: add uvd/vce dpm enabling flag to fix the performance
      issue for CZ
    - drm/amd/powerplay: fix segment fault issue in multi-display case.
    - drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail

* aufs CONFIG_AUFS_EXPORT build option should be enabled (LP: #1121699)
    - [Config] enable CONFIG_AUFS_EXPORT

* promote *_diag modules from linux-image-extra to linux-image (LP: #1580355)
    - [Config] Update inclusion list for CRIU

* [Xenial] net: updates to ethtool and virtio_net for speed/duplex support
    (LP: #1581132)
    - ethtool: add speed/duplex validation functions
    - ethtool: make validate_speed accept all speeds between 0 and INT_MAX
    - virtio_net: add ethtool support for set and get of settings
    - virtio_net: validate ethtool port setting and explain the user validation

* perf tool: Display event codes for Generic HW (PMU) events (LP: #1578211)
    - powerpc/perf: Remove PME_ prefix for power7 events
    - powerpc/perf: Export Power8 generic and cache events to sysfs

* Mellanox ConnectX4 MTU limits: max and min (LP: #1528466)
    - net/mlx5: Introduce a new header file for physical port functions
    - net/mlx5e: Device's mtu field is u16 and not int
    - net/mlx5e: Fix minimum MTU

* Miscellaneous Ubuntu changes
    - [Config] CONFIG_CAVIUM_ERRATUM_23144=y

-- Kamal Mostafa <kamal@canonical.com>  Fri, 10 Jun 2016 10:07:13 -0700

Changed in linux (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2016-06-27:

#24

Download full text (26.1 KiB)

This bug was fixed in the package linux - 4.4.0-28.47

---------------
linux (4.4.0-28.47) xenial; urgency=low

[ Luis Henriques ]

* Release Tracking Bug
- LP: #1595874

  * Linux netfilter local privilege escalation issues (LP: #1595350)
    - netfilter: x_tables: don't move to non-existent next rule
    - netfilter: x_tables: validate targets of jumps
    - netfilter: x_tables: add and use xt_check_entry_offsets
    - netfilter: x_tables: kill check_entry helper
    - netfilter: x_tables: assert minimum target size
    - netfilter: x_tables: add compat version of xt_check_entry_offsets
    - netfilter: x_tables: check standard target size too
    - netfilter: x_tables: check for bogus target offset
    - netfilter: x_tables: validate all offsets and sizes in a rule
    - netfilter: x_tables: don't reject valid target size on some architectures
    - netfilter: arp_tables: simplify translate_compat_table args
    - netfilter: ip_tables: simplify translate_compat_table args
    - netfilter: ip6_tables: simplify translate_compat_table args
    - netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
    - netfilter: x_tables: do compat validation via translate_table
    - netfilter: x_tables: introduce and use xt_copy_counters_from_user

  * Linux netfilter IPT_SO_SET_REPLACE memory corruption (LP: #1555338)
    - netfilter: x_tables: validate e->target_offset early
    - netfilter: x_tables: make sure e->next_offset covers remaining blob size
    - netfilter: x_tables: fix unconditional helper

linux (4.4.0-27.46) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
- LP: #1594906

* Support Edge Gateway's Bluetooth LED (LP: #1512999)
- Revert "UBUNTU: SAUCE: Bluetooth: Support for LED on Marvell modules"

linux (4.4.0-26.45) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
- LP: #1594442

* linux: Implement secure boot state variables (LP: #1593075)
- SAUCE: UEFI: Add secure boot and MOK SB State disabled sysctl

* failures building userspace packages that include ethtool.h (LP: #1592930)
- ethtool.h: define INT_MAX for userland

linux (4.4.0-25.44) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
- LP: #1591289

  * Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA c...

This bug was fixed in the package linux - 4.4.0-28.47

---------------
linux (4.4.0-28.47) xenial; urgency=low

[ Luis Henriques ]

* Release Tracking Bug
    - LP: #1595874

* Linux netfilter local privilege escalation issues (LP: #1595350)
    - netfilter: x_tables: don't move to non-existent next rule
    - netfilter: x_tables: validate targets of jumps
    - netfilter: x_tables: add and use xt_check_entry_offsets
    - netfilter: x_tables: kill check_entry helper
    - netfilter: x_tables: assert minimum target size
    - netfilter: x_tables: add compat version of xt_check_entry_offsets
    - netfilter: x_tables: check standard target size too
    - netfilter: x_tables: check for bogus target offset
    - netfilter: x_tables: validate all offsets and sizes in a rule
    - netfilter: x_tables: don't reject valid target size on some architectures
    - netfilter: arp_tables: simplify translate_compat_table args
    - netfilter: ip_tables: simplify translate_compat_table args
    - netfilter: ip6_tables: simplify translate_compat_table args
    - netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
    - netfilter: x_tables: do compat validation via translate_table
    - netfilter: x_tables: introduce and use xt_copy_counters_from_user

* Linux netfilter IPT_SO_SET_REPLACE memory corruption (LP: #1555338)
    - netfilter: x_tables: validate e->target_offset early
    - netfilter: x_tables: make sure e->next_offset covers remaining blob size
    - netfilter: x_tables: fix unconditional helper

linux (4.4.0-27.46) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
    - LP: #1594906

* Support Edge Gateway's Bluetooth LED (LP: #1512999)
    - Revert "UBUNTU: SAUCE: Bluetooth: Support for LED on Marvell modules"

linux (4.4.0-26.45) xenial; urgency=low

[ Kamal Mostafa ]

* Release Tracking Bug
    - LP: #1594442

* linux: Implement secure boot state variables (LP: #1593075)
    - SAUCE: UEFI: Add secure boot and MOK SB State disabled sysctl

* failures building userspace packages that include ethtool.h (LP: #1592930)
    - ethtool.h: define INT_MAX for userland