Regression in 4.4.0-65-generic causes very frequent system crashes

Bug #1669611 reported by Stéphane Graber
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Tim Gardner
Xenial
Critical
Unassigned
Yakkety
Critical
Unassigned
Zesty
Critical
Tim Gardner

Bug Description

After upgrading to 4.4.0-65-generic all of our Jenkins test runners are dying every 10 minutes or so. They don't answer on the network, on the console or through serial console.

The kernel backtraces we got are:
```
buildd04 login: [ 1443.707658] BUG: unable to handle kernel paging request at 2d5e501d
[ 1443.707969] IP: [<c11fb2ef>] mntget+0xf/0x20
[ 1443.708086] *pdpt = 0000000024056001 *pde = 0000000000000000
[ 1443.708237] Oops: 0002 [#1] SMP
[ 1443.708325] Modules linked in: ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables xt_comment veth ebtable_filter ebtables dm_snapshot dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c binfmt_misc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables zram lz4_compress bridge stp llc kvm_intel ppdev kvm irqbypass crc32_pclmul aesni_intel aes_i586 xts lrw gf128mul ablk_helper cryptd joydev input_leds serio_raw parport_pc 8250_fintek i2c_piix4 mac_hid lp parport autofs4 btrfs xor raid6_pq psmouse virtio_scsi pata_acpi floppy
[ 1443.710365] CPU: 1 PID: 14167 Comm: apparmor_parser Not tainted 4.4.0-65-generic #86-Ubuntu
[ 1443.710505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1443.710651] task: f5920a00 ti: e63f2000 task.ti: e63f2000
[ 1443.710776] EIP: 0060:[<c11fb2ef>] EFLAGS: 00010286 CPU: 1
[ 1443.710875] EIP is at mntget+0xf/0x20
[ 1443.710946] EAX: f57e4d90 EBX: 00000000 ECX: c1d333cc EDX: 0002801d
[ 1443.711088] ESI: c1d36404 EDI: c1d36408 EBP: e63f3de8 ESP: e63f3de8
[ 1443.711228] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1443.711334] CR0: 80050033 CR2: 2d5e501d CR3: 35072440 CR4: 001406f0
[ 1443.711471] Stack:
[ 1443.711593] e63f3e04 c1203752 c13b7f71 c1d333cc eebb5980 e59d71e0 000041ed e63f3e30
[ 1443.711822] c130546b e59d7230 1a628dcf 00000003 ffffffff e63f3e58 6c0a010a e53b6800
[ 1443.712044] 000000de eebb5980 e63f3e44 c13055be 00000000 00000000 00000000 e63f3e6c
[ 1443.712264] Call Trace:
[ 1443.712314] [<c1203752>] simple_pin_fs+0x32/0xa0
[ 1443.712421] [<c13b7f71>] ? vsnprintf+0x321/0x420
[ 1443.712516] [<c130546b>] securityfs_create_dentry+0x5b/0x150
[ 1443.712632] [<c13055be>] securityfs_create_dir+0x2e/0x30
[ 1443.712729] [<c133a3c6>] __aa_fs_profile_mkdir+0x46/0x3c0
[ 1443.712826] [<c1345000>] aa_replace_profiles+0x4c0/0xbc0
[ 1443.712927] [<c10798c5>] ? ns_capable_common+0x55/0x80
[ 1443.713022] [<c1338ee7>] policy_update+0x97/0x230
[ 1443.713122] [<c1302189>] ? security_file_permission+0x39/0xc0
[ 1443.713247] [<c1339118>] profile_replace+0x98/0xe0
[ 1443.713346] [<c1339080>] ? policy_update+0x230/0x230
[ 1443.713445] [<c11dd99f>] __vfs_write+0x1f/0x50
[ 1443.713535] [<c11ddf7c>] vfs_write+0x8c/0x1b0
[ 1443.713633] [<c11de971>] SyS_write+0x51/0xb0
[ 1443.713738] [<c100385d>] do_fast_syscall_32+0x8d/0x150
[ 1443.713838] [<c17bcd9c>] sysenter_past_esp+0x3d/0x61
[ 1443.713938] Code: c0 74 09 83 42 10 01 89 d0 5b 5d c3 3b 5b 10 b8 fe ff ff ff 75 e3 eb eb 8d 74 26 00 55 89 e5 3e 8d 74 26 00 85 c0 74 06 8b 50 14 <64> ff 02 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 3e
[ 1443.715713] EIP: [<c11fb2ef>] mntget+0xf/0x20 SS:ESP 0068:e63f3de8
[ 1443.715852] CR2: 000000002d5e501d
```

```
buildd07 login: [ 1262.522071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 1262.522339] IP: [<ffffffff8122fbd8>] mntput_no_expire+0x68/0x180
[ 1262.522464] PGD 439912067 PUD 43997f067 PMD 0
[ 1262.522556] Oops: 0002 [#1] SMP
[ 1262.522760] Modules linked in: ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables xt_comment veth ebtable_filter ebtables dm_snapshot dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c binfmt_misc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc zram lz4_compress zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ppdev ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd input_leds joydev i2c_piix4 serio_raw 8250_fintek parport_pc mac_hid lp parport autofs4 btrfs xor raid6_pq psmouse virtio_scsi pata_acpi floppy
[ 1262.535658] CPU: 10 PID: 163332 Comm: apparmor_parser Tainted: P O 4.4.0-65-generic #86-Ubuntu
[ 1262.536544] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1262.536666] task: ffff88043c3fd400 ti: ffff88044fed0000 task.ti: ffff88044fed0000
[ 1262.536773] RIP: 0010:[<ffffffff8122fbd8>] [<ffffffff8122fbd8>] mntput_no_expire+0x68/0x180
[ 1262.536949] RSP: 0018:ffff88044fed3d70 EFLAGS: 00010206
[ 1262.537046] RAX: 0000000000000000 RBX: ffff88046a74e480 RCX: 0000000000000000
[ 1262.537205] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff81f38c40
[ 1262.537354] RBP: ffff88044fed3d88 R08: 0000000000000000 R09: 0000000000000fff
[ 1262.537512] R10: ffff88043aeedb00 R11: 0000000000000246 R12: ffffffff821dbde8
[ 1262.537787] R13: ffff88046a74e4d0 R14: ffff880626c8ca38 R15: ffff880626c91388
[ 1262.538021] FS: 00007fe98994c740(0000) GS:ffff880627480000(0000) knlGS:0000000000000000
[ 1262.538324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1262.538492] CR2: 0000000000000008 CR3: 0000000446b88000 CR4: 00000000001406e0
[ 1262.538688] Stack:
[ 1262.538749] ffffffff821dbde0 ffffffff821dbde8 ffff88043c12d730 ffff88044fed3d98
[ 1262.539034] ffffffff8122fd14 ffff88044fed3db8 ffffffff81236ea5 ffff88043b269600
[ 1262.539372] ffff88043b269000 ffff88044fed3dd8 ffffffff8134aae9 ffff88043c12d758
[ 1262.539657] Call Trace:
[ 1262.539717] [<ffffffff8122fd14>] mntput+0x24/0x40
[ 1262.539888] [<ffffffff81236ea5>] simple_release_fs+0x45/0x50
[ 1262.540041] [<ffffffff8134aae9>] securityfs_remove+0x99/0xb0
[ 1262.540198] [<ffffffff81382892>] __aa_fs_profile_rmdir+0x72/0xd0
[ 1262.540358] [<ffffffff8138c9b0>] __remove_profile+0x40/0xd0
[ 1262.540494] [<ffffffff8138e9e0>] aa_remove_profiles+0xe0/0x370
[ 1262.540654] [<ffffffff813819a4>] profile_remove+0x144/0x2e0
[ 1262.540833] [<ffffffff8120e668>] __vfs_write+0x18/0x40
[ 1262.540945] [<ffffffff8120eff9>] vfs_write+0xa9/0x1a0
[ 1262.541082] [<ffffffff8120df8f>] ? do_sys_open+0x1bf/0x2a0
[ 1262.541233] [<ffffffff8120fcb5>] SyS_write+0x55/0xc0
[ 1262.541359] [<ffffffff8183c672>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 1262.541521] Code: ff 85 c0 0f 85 f6 00 00 00 8b 43 30 a9 00 00 00 01 0f 85 e8 00 00 00 0d 00 00 00 01 48 8b 53 70 4c 8d 6b 50 89 43 30 48 8b 43 78 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 70
[ 1262.544762] RIP [<ffffffff8122fbd8>] mntput_no_expire+0x68/0x180
[ 1262.545050] RSP <ffff88044fed3d70>
[ 1262.545142] CR2: 0000000000000008
```

```
buildd06 login: [ 1330.969096] BUG: unable to handle kernel paging request at 2d5b2000
[ 1330.971286] IP: [<c11fb2ef>] mntget+0xf/0x20
[ 1330.972781] *pdpt = 00000000298be001 *pde = 0000000000000000
[ 1330.980802] Oops: 0002 [#1] SMP
[ 1330.982401] Modules linked in: ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables xt_comment dm_snapshot dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c veth ebtable_filter ebtables binfmt_misc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc zram lz4_compress kvm_intel ppdev kvm irqbypass crc32_pclmul aesni_intel aes_i586 xts lrw gf128mul joydev input_leds ablk_helper cryptd serio_raw i2c_piix4 8250_fintek parport_pc mac_hid lp parport autofs4 btrfs xor raid6_pq psmouse virtio_scsi floppy pata_acpi
[ 1330.987438] CPU: 0 PID: 21383 Comm: lxd Not tainted 4.4.0-65-generic #86-Ubuntu
[ 1330.987629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1330.987827] task: f565e400 ti: e6c22000 task.ti: e6c22000
[ 1330.987952] EIP: 0060:[<c11fb2ef>] EFLAGS: 00210286 CPU: 0
[ 1330.988080] EIP is at mntget+0xf/0x20
[ 1330.988174] EAX: d1dbd190 EBX: 00000000 ECX: c1d333cc EDX: 00004000
[ 1330.988332] ESI: c1d36404 EDI: c1d36408 EBP: e6c23ec8 ESP: e6c23ec8
[ 1330.988493] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1330.988625] CR0: 80050033 CR2: 2d5b2000 CR3: 35762380 CR4: 001406f0
[ 1330.988787] Stack:
[ 1330.988851] e6c23ee4 c1203752 e6c23eec c1d333cc eec950c0 00000000 eec950dc e6c23eec
[ 1330.989186] c130566c e6c23f20 c1338784 eebd6dc0 00000003 e3127100 e6c23f14 eebd6e30
[ 1330.989509] 000001ed e3127100 eebd6dc0 eebd6dc0 c17e7140 e3127100 e6c23f48 c11e8a2e
[ 1330.989832] Call Trace:
[ 1330.989902] [<c1203752>] simple_pin_fs+0x32/0xa0
[ 1330.990048] [<c130566c>] securityfs_pin_fs+0x1c/0x20
[ 1330.990176] [<c1338784>] ns_mkdir_op+0xe4/0x370
[ 1330.990306] [<c11e8a2e>] vfs_mkdir+0xee/0x1d0
[ 1330.990436] [<c1349ec0>] ? wrap_apparmor_path_mkdir+0x20/0x30
[ 1330.990595] [<c11edc2f>] SyS_mkdirat+0xcf/0x110
[ 1330.990725] [<c1003773>] do_syscall_32_irqs_on+0x53/0xb0
[ 1330.990855] [<c17bcdf1>] entry_INT80_32+0x31/0x31
[ 1330.990982] Code: c0 74 09 83 42 10 01 89 d0 5b 5d c3 3b 5b 10 b8 fe ff ff ff 75 e3 eb eb 8d 74 26 00 55 89 e5 3e 8d 74 26 00 85 c0 74 06 8b 50 14 <64> ff 02 5d c3 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 3e
[ 1330.993186] EIP: [<c11fb2ef>] mntget+0xf/0x20 SS:ESP 0068:e6c23ec8
[ 1330.993386] CR2: 000000002d5b2000
```

```
buildd05 login: [ 1400.626415] BUG: unable to handle kernel paging request at 2d62201d
[ 1400.634202] IP: [<c11fc961>] mntput_no_expire+0x11/0x160
[ 1400.634805] *pdpt = 00000000350d5001 *pde = 0000000000000000
[ 1400.634971] Oops: 0002 [#1] SMP
[ 1400.635078] Modules linked in: ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter ip6_tables xt_comment veth ebtable_filter ebtables binfmt_misc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc zram lz4_compress kvm_intel kvm irqbypass crc32_pclmul aesni_intel aes_i586 xts input_leds joydev lrw gf128mul i2c_piix4 ppdev 8250_fintek ablk_helper mac_hid cryptd serio_raw parport_pc lp parport autofs4 btrfs xor raid6_pq psmouse virtio_scsi pata_acpi floppy
[ 1400.638087] CPU: 4 PID: 10296 Comm: lxd Not tainted 4.4.0-65-generic #86-Ubuntu
[ 1400.638300] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 1400.638488] task: e5369e00 ti: e1072000 task.ti: e1072000
[ 1400.638615] EIP: 0060:[<c11fc961>] EFLAGS: 00210282 CPU: 4
[ 1400.638743] EIP is at mntput_no_expire+0x11/0x160
[ 1400.638866] EAX: f546a180 EBX: c1d36404 ECX: 00000000 EDX: 0002801d
[ 1400.639040] ESI: c1d36408 EDI: e2f4e6d0 EBP: e1073e64 ESP: e1073e54
[ 1400.639206] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1400.639371] CR0: 80050033 CR2: 2d62201d CR3: 352eb1a0 CR4: 001406f0
[ 1400.639620] Stack:
[ 1400.639692] f4ad0848 c1d36404 c1d36408 e2f4e6d0 e1073e6c c11fcad0 e1073e7c c12037f7
[ 1400.641243] eb808e00 f4ad0800 e1073e8c c1305634 e2f4e73c e2f4e724 e1073eb4 c133a892
[ 1400.641582] c1af6740 00000001 00000000 00000000 e2f4e708 00200286 e2f4e6c0 e2f4e71c
[ 1400.641920] Call Trace:
[ 1400.641989] [<c11fcad0>] mntput+0x20/0x40
[ 1400.642083] [<c12037f7>] simple_release_fs+0x37/0x40
[ 1400.642221] [<c1305634>] securityfs_remove+0x74/0x90
[ 1400.642359] [<c133a892>] __aa_fs_ns_rmdir+0x152/0x1d0
[ 1400.642487] [<c1359c20>] destroy_ns+0x90/0xa0
[ 1400.642622] [<c1359b64>] __aa_remove_ns+0x24/0x50
[ 1400.642929] [<c133966f>] ns_rmdir_op+0x15f/0x3a0
[ 1400.643030] [<c11e8161>] vfs_rmdir+0xa1/0x110
[ 1400.643099] [<c11ed0dd>] do_rmdir+0x1cd/0x1f0
[ 1400.643169] [<c11edddd>] SyS_unlinkat+0x2d/0x40
[ 1400.643239] [<c1003773>] do_syscall_32_irqs_on+0x53/0xb0
[ 1400.643307] [<c17bcdf1>] entry_INT80_32+0x31/0x31
[ 1400.643373] Code: f0 e8 34 3e 1c 00 3b 05 74 27 b9 c1 7c dc 89 d8 5b 5e 5f 5d c3 90 8d 74 26 00 55 89 e5 57 56 53 83 ec 04 3e 8d 74 26 00 8b 50 24 <64> ff 0a 8b 50 70 85 d2 74 0d 83 c4 04 5b 5e 5f 5d c3 90 8d 74
[ 1400.644544] EIP: [<c11fc961>] mntput_no_expire+0x11/0x160 SS:ESP 0068:e1073e54
[ 1400.644675] CR2: 000000002d62201d
```

CVE References

Revision history for this message
Stéphane Graber (stgraber) wrote :

We can reproduce this very easily by triggering a LXD testsuite run which causes a lot of apparmor profiles and namespaces creation/deletion, causing this issue. A busy LXD host would also hit this eventually (if the similar BUG we had before is any indication).

Revision history for this message
Seth Arnold (seth-arnold) wrote :

It might be reading too much into the tea leaves but this felt funny:

$ aa-decode 2d5e501d
Decoded: -^P
$ aa-decode 0000000000000008
Decoded:
$ aa-decode 2d5b2000
Decoded: -[
$ aa-decode 2d62201d
Decoded: -b

Do any of these sound familiar?

Thanks

tags: added: kernel-key
Revision history for this message
Stéphane Graber (stgraber) wrote :
Download full text (12.3 KiB)

Running the same thing on zesty to see if the problem is present there too.
We get something a bit different but the result ends up being the same, all the test runners crash.

```
buildd07 login: [ 976.607283] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 988.645772] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1004.605673] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1009.642113] INFO: rcu_sched self-detected stall on CPU
[ 1009.645498] 3-...: (13599 ticks this GP) idle=769/140000000000001/0 softirq=32564/32569 fqs=6049
[ 1009.649690] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1009.649697] 3-...: (13599 ticks this GP) idle=769/140000000000001/0 softirq=32564/32569 fqs=6049
[ 1009.649699] 11-...: (1 GPs behind) idle=8cd/140000000000000/0 softirq=36685/36686 fqs=6049
[ 1009.649700] (detected by 9, t=15002 jiffies, g=20785, c=20784, q=16519)
[ 1009.663598] (t=15005 jiffies g=20785 c=20784 q=16519)
[ 1016.645667] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1036.606795] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1044.645665] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1064.605727] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1072.645669] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1092.605690] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1100.645669] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [lxd:22980]
[ 1120.605669] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1128.645698] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [lxd:22980]
[ 1148.605669] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1156.645721] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [lxd:22980]
[ 1176.605670] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lxd:34563]
[ 1184.645668] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [lxd:22980]
[ 1189.665664] INFO: rcu_sched self-detected stall on CPU
[ 1189.669683] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1189.669689] 3-...: (56078 ticks this GP) idle=769/140000000000001/0 softirq=32564/32569 fqs=26269
[ 1189.669695] 11-...: (1 GPs behind) idle=8cd/140000000000000/0 softirq=36685/36686 fqs=26269
[ 1189.669696] (detected by 2, t=60007 jiffies, g=20785, c=20784, q=16775)
[ 1189.691113] 3-...: (56078 ticks this GP) idle=769/140000000000001/0 softirq=32564/32569 fqs=26272
[ 1189.692748] (t=60012 jiffies g=20785 c=20784 q=16775)
[ 1212.645668] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1216.605666] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [lxd:34563]
[ 1240.645876] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1244.606272] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [lxd:34563]
[ 1268.645669] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1272.608277] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [lxd:34563]
[ 1296.645701] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [lxd:22980]
[ 1300.605699] NMI wa...

Changed in linux (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

It's likely Yakkety has the same issues as well, as we merged lots of apparmor sauce patches on xenial and yakkety.

Cascardo.

Revision history for this message
John Johansen (jjohansen) wrote :

The issue appears to be refcount related, I am still chasing this one down but for this release we should revert

UBUNTU: SAUCE: apparmor: fix lock ordering for mkdir
UBUNTU: SAUCE: apparmor: fix leak on securityfs pin count
UBUNTU: SAUCE: apparmor: fix reference count leak when securityfs_setup_d_inode() fails
UBUNTU: SAUCE: apparmor: fix not handling error case when securityfs_pin_fs() fails

a kernel with these patches reverted has been tested and it fixes the issue

Revision history for this message
Tim Gardner (timg-tpi) wrote :

reverted on zesty (master-next)
21c9d3b UBUNTU: SAUCE: apparmor: fix lock ordering for mkdir
994ebf6 UBUNTU: SAUCE: apparmor: fix leak on securityfs pin count
090d374 UBUNTU: SAUCE: apparmor: fix reference count leak when securityfs_setup_d_inode() fails
806f146 UBUNTU: SAUCE: apparmor: fix not handling error case when securityfs_pin_fs() fails

Changed in linux (Ubuntu Zesty):
assignee: nobody → Tim Gardner (timg-tpi)
status: Triaged → Fix Committed
Changed in linux (Ubuntu Xenial):
status: Triaged → Fix Committed
Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
tags: added: verification-needed-xenial
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Stéphane Graber (stgraber) wrote : Re: [Bug 1669611] Re: Regression in 4.4.0-65-generic causes very frequent system crashes

I'll install -67 on our jenkins runners and see if we can reproduce it.
The changelog is a bit confusing as it shows a whole bunch of apparmor
reverts, including the commits that were meant to fix this issue. So
it's unclear whether a proper implementation of the fix was then applied
on top. If not, this kernel obviously wouldn't fix the issue.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Oh, I got confused between the two bug reports. So -67 is just the revert. If so, then it's fine, we've been running with a pre-upload build of this provided by Jon for a while now and haven't seen any full hang. We do still run in the original apparmor bug but it's no worse than before at least.

Revision history for this message
Brad Figg (brad-figg) wrote :

Based on comment #10 I'm going to mark this as verified.

tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed-xenial verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (15.0 KiB)

This bug was fixed in the package linux - 4.4.0-67.88

---------------
linux (4.4.0-67.88) xenial; urgency=low

  * linux: 4.4.0-67.88 -proposed tracker (LP: #1667052)

  * Recent KVM RTC cherry-picks break (some) Windows Live-Migrations
    (LP: #1668594)
    - kvm: x86: correctly reset dest_map->vector when restoring LAPIC state

  * Regression in 4.4.0-65-generic causes very frequent system crashes
    (LP: #1669611)
    - Revert "UBUNTU: SAUCE: apparmor: fix lock ordering for mkdir"
    - Revert "UBUNTU: SAUCE: apparmor: fix leak on securityfs pin count"
    - Revert "UBUNTU: SAUCE: apparmor: fix reference count leak when
      securityfs_setup_d_inode() fails"
    - Revert "UBUNTU: SAUCE: apparmor: fix not handling error case when
      securityfs_pin_fs() fails"

  * Upgrade Redpine RS9113 driver to support AP mode (LP: #1665211)
    - SAUCE: Redpine driver to support Host AP mode

  * NFS client : permission denied when trying to access subshare, since kernel
    4.4.0-31 (LP: #1649292)
    - fs: Better permission checking for submounts

  * [Hyper-V] SAUCE: pci-hyperv fixes for SR-IOV on Azure (LP: #1665097)
    - SAUCE: PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
    - SAUCE: pci-hyperv: properly handle pci bus remove
    - SAUCE: pci-hyperv: lock pci bus on device eject

  * [Hyper-V/Azure] Please include Mellanox OFED drivers in Azure kernel and
    image (LP: #1650058)
    - net/mlx4_en: Fix bad WQE issue
    - net/mlx4_core: Fix racy CQ (Completion Queue) free
    - net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT
      transitions
    - net/mlx4_core: Avoid command timeouts during VF driver device shutdown

  * Xenial update to v4.4.49 stable release (LP: #1664960)
    - ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup
    - selinux: fix off-by-one in setprocattr
    - Revert "x86/ioapic: Restore IO-APIC irq_chip retrigger callback"
    - cpumask: use nr_cpumask_bits for parsing functions
    - hns: avoid stack overflow with CONFIG_KASAN
    - ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write
    - target: Don't BUG_ON during NodeACL dynamic -> explicit conversion
    - target: Use correct SCSI status during EXTENDED_COPY exception
    - target: Fix early transport_generic_handle_tmr abort scenario
    - target: Fix COMPARE_AND_WRITE ref leak for non GOOD status
    - ARM: 8642/1: LPAE: catch pending imprecise abort on unmask
    - mac80211: Fix adding of mesh vendor IEs
    - netvsc: Set maximum GSO size in the right place
    - scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed
      send
    - scsi: aacraid: Fix INTx/MSI-x issue with older controllers
    - scsi: mpt3sas: disable ASPM for MPI2 controllers
    - xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend()
    - ALSA: seq: Fix race at creating a queue
    - ALSA: seq: Don't handle loop timeout at snd_seq_pool_done()
    - drm/i915: fix use-after-free in page_flip_completed()
    - Linux 4.4.49

  * NFS client : kernel 4.4.0-57 crash with nfsv4 enries in /etc/fstab
    (LP: #1650336)
    - SUNRPC: fix refcounting problems with auth_g...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
tags: added: kernel-da-key
removed: kernel-key
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-13.15

---------------
linux (4.10.0-13.15) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1671614

  * ehci-platform needed in usb-modules udeb (LP: #1671589)
    - d-i: add ehci-platform to usb-modules

  * irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints
    (LP: #1671598)
    - irqchip/gic-v3-its: Enable cacheable attribute Read-allocate hints

  * iommu: Fix static checker warning in iommu_insert_device_resv_regions
    (LP: #1671599)
    - iommu: Fix static checker warning in iommu_insert_device_resv_regions

  * QDF2400: Fix panic introduced by erratum 1003 (LP: #1671602)
    - arm64: Avoid clobbering mm in erratum workaround on QDF2400

  * QDF2400 PCI ports require ACS quirk (LP: #1671601)
    - PCI: Add ACS quirk for Qualcomm QDF2400 and QDF2432

  * tty: pl011: Work around QDF2400 E44 stuck BUSY bit (LP: #1671600)
    - tty: pl011: Work around QDF2400 E44 stuck BUSY bit

  * CVE-2017-2636
    - tty: n_hdlc: get rid of racy n_hdlc.tbuf

  * Sync virtualbox to 5.1.16-dfsg-1 in zesty (LP: #1671470)
    - ubuntu: vbox -- Update to 5.1.16-dfsg-1

 -- Tim Gardner <email address hidden> Thu, 09 Mar 2017 06:16:24 -0700

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
Stefan Bader (smb) wrote :

This bug was fixed in Ubuntu-4.8.0-44.47.

Changed in linux (Ubuntu Yakkety):
importance: Undecided → Critical
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers