sysfs test ubuntu_stress_smoke_test will cause kernel oops on X-lowlatency

Bug #1796250 reported by Po-Hsu Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
Fix Released
Undecided
Colin Ian King
linux (Ubuntu)
Won't Fix
Medium
Colin Ian King

Bug Description

This is found on a SRU testing node "gonzo", with 4.4 amd64 lowlatency kernel.

This issue cannot be reproduced with the kernel 4.4.0-137 in -updates (a bit random, see comment #4)
And cannot be reproduced with the AMD64 generic kernel in -proposed (4.4.0-138) as well

16:50:01 DEBUG| [stdout] timer STARTING
16:50:05 ERROR| [stderr] /home/ubuntu/autotest/client/tests/ubuntu_stress_smoke_test/ubuntu_stress_smoke_test.sh: line 111: 39506 Killed ./stress-ng -v -t ${DURATION} --${s} ${INSTANCES} ${STRESS_OPTIONS} &> ${TMP_FILE}
16:50:05 DEBUG| [stdout] timer RETURNED 137
16:50:05 DEBUG| [stdout] timer FAILED (kernel oopsed)
16:50:05 DEBUG| [stdout] [ 1418.982110] BUG: unable to handle kernel paging request at 0000000100000001
16:50:05 DEBUG| [stdout] [ 1419.065329] IP: [<ffffffff811f5ac7>] kmem_cache_alloc+0x77/0x1f0
16:50:05 DEBUG| [stdout] [ 1419.137102] PGD 16f6dd067 PUD 0
16:50:05 DEBUG| [stdout] [ 1419.175602] Oops: 0000 [#6] SMP
16:50:05 DEBUG| [stdout] [ 1419.214101] Modules linked in: unix_diag binfmt_misc vhost_net vhost macvtap cuse macvlan dccp_ipv4 dccp jitterentropy_rng algif_rng ghash_generic salsa20_generic salsa20_x86_64 camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 cast6_avx_x86_64 cast6_generic cast_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common xts algif_skcipher tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 algif_hash af_alg aufs kvm_amd kvm ipmi_devintf ipmi_ssif irqbypass dcdbas ipmi_si fam15h_power acpi_power_meter joydev input_leds ipmi_msghandler serio_raw i2c_piix4 k10temp amd64_edac_mod 8250_fintek mac_hid shpchp edac_mce_amd edac_core ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
16:50:05 DEBUG| [stdout] [ 1420.062172] scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mptsas mptscsih pata_acpi hid_generic aesni_intel aes_x86_64 mptbase lrw gf128mul glue_helper psmouse ahci ablk_helper usbhid cryptd pata_atiixp libahci scsi_transport_sas bnx2 hid
16:50:05 DEBUG| [stdout] [ 1420.472200] CPU: 3 PID: 39506 Comm: ubuntu_stress_s Tainted: G D 4.4.0-138-generic #164-Ubuntu
16:50:05 DEBUG| [stdout] [ 1420.588693] Hardware name: Dell Inc. PowerEdge R415/08WNM9, BIOS 1.9.3 04/26/2012
16:50:05 DEBUG| [stdout] [ 1420.678138] task: ffff880177823800 ti: ffff88016a5b0000 task.ti: ffff88016a5b0000
16:50:05 DEBUG| [stdout] [ 1420.767584] RIP: 0010:[<ffffffff811f5ac7>] [<ffffffff811f5ac7>] kmem_cache_alloc+0x77/0x1f0
16:50:05 DEBUG| [stdout] [ 1420.868478] RSP: 0018:ffff88016a5b3bd0 EFLAGS: 00010202
16:50:05 DEBUG| [stdout] [ 1420.931924] RAX: 0000000000000000 RBX: 00000000024000c0 RCX: 000000000129e216
16:50:05 DEBUG| [stdout] [ 1421.017209] RDX: 000000000129e215 RSI: 00000000024000c0 RDI: 000000000001a5c0
16:50:05 DEBUG| [stdout] [ 1421.102496] RBP: ffff88016a5b3c00 R08: ffff8802156da5c0 R09: 0000000100000001
16:50:05 DEBUG| [stdout] [ 1421.187782] R10: ffff880000000ff0 R11: 0000000000000ff0 R12: 00000000024000c0
16:50:05 DEBUG| [stdout] [ 1421.273070] R13: ffffffff811d53e8 R14: ffff880215003b00 R15: ffff880215003b00
16:50:05 DEBUG| [stdout] [ 1421.358354] FS: 00007f1323076700(0000) GS:ffff8802156c0000(0000) knlGS:0000000000000000
16:50:05 DEBUG| [stdout] [ 1421.455081] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
16:50:05 DEBUG| [stdout] [ 1421.523728] CR2: 0000000100000001 CR3: 000000016f6dc000 CR4: 00000000000406f0
16:50:05 DEBUG| [stdout] [ 1421.609013] Stack:
16:50:05 DEBUG| [stdout] [ 1421.632941] 0000000000000000 ffff88019c080a28 0000000000000000 ffffc00000000fff
16:50:05 DEBUG| [stdout] [ 1421.721371] 00007fffffffefec ffff88020ff29000 ffff88016a5b3c38 ffffffff811d53e8
16:50:05 DEBUG| [stdout] [ 1421.809805] 0000000000000002 ffff88019c080a28 ffffc00000000fff 00007fffffffefec
16:50:05 DEBUG| [stdout] [ 1421.898237] Call Trace:
16:50:05 DEBUG| [stdout] [ 1421.927367] [<ffffffff811d53e8>] anon_vma_prepare+0x48/0x180
16:50:05 DEBUG| [stdout] [ 1421.996017] [<ffffffff811c8c3d>] handle_mm_fault+0x13ed/0x1b70
16:50:05 DEBUG| [stdout] [ 1422.066743] [<ffffffff812350cf>] ? atime_needs_update+0x6f/0xd0
16:50:05 DEBUG| [stdout] [ 1422.138510] [<ffffffff81235163>] ? touch_atime+0x33/0xd0
16:50:05 DEBUG| [stdout] [ 1422.202997] [<ffffffff8119781c>] ? generic_file_read_iter+0x5dc/0x6b0
16:50:05 DEBUG| [stdout] [ 1422.281009] [<ffffffff811cd5b8>] ? find_vma+0x68/0x70
16:50:05 DEBUG| [stdout] [ 1422.342374] [<ffffffff811c3176>] ? follow_page_mask+0x36/0x3a0
16:50:05 DEBUG| [stdout] [ 1422.413100] [<ffffffff811c35fe>] __get_user_pages+0x11e/0x600
16:50:05 DEBUG| [stdout] [ 1422.482788] [<ffffffff811c3ee2>] get_user_pages+0x52/0x60
16:50:05 DEBUG| [stdout] [ 1422.548314] [<ffffffff8121ea03>] copy_strings.isra.20+0x173/0x350
16:50:05 DEBUG| [stdout] [ 1422.622161] [<ffffffff8121ec14>] copy_strings_kernel+0x34/0x40
16:50:05 DEBUG| [stdout] [ 1422.692898] [<ffffffff8121f9cc>] do_execveat_common.isra.31+0x4cc/0x770
16:50:05 DEBUG| [stdout] [ 1422.772986] [<ffffffff8121feca>] SyS_execve+0x3a/0x50
16:50:05 DEBUG| [stdout] [ 1422.834353] [<ffffffff81857a25>] stub_execve+0x5/0x5
16:50:05 DEBUG| [stdout] [ 1422.894679] [<ffffffff818576ce>] ? entry_SYSCALL_64_fastpath+0x22/0xc1
16:50:05 DEBUG| [stdout] [ 1422.973724] Code: 08 65 4c 03 05 53 77 e1 7e 49 83 78 10 00 4d 8b 08 0f 84 2b 01 00 00 4d 85 c9 0f 84 22 01 00 00 49 63 47 20 48 8d 4a 01 49 8b 3f <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 84 c0 74 bb 49 63
16:50:05 DEBUG| [stdout] [ 1423.199831] RIP [<ffffffff811f5ac7>] kmem_cache_alloc+0x77/0x1f0
16:50:05 DEBUG| [stdout] [ 1423.272646] RSP <ffff88016a5b3bd0>
16:50:05 DEBUG| [stdout] [ 1423.314250] CR2: 0000000100000001
16:50:05 DEBUG| [stdout] [ 1423.353892] ---[ end trace 7c103db725e9179b ]---
16:50:05 DEBUG| [stdout]
16:50:05 DEBUG| [stdout] timerfd STARTING
16:50:10 DEBUG| [stdout] timerfd RETURNED 0
16:50:10 DEBUG| [stdout] timerfd FAILED (kernel oopsed)
16:50:10 DEBUG| [stdout] [ 1424.951012] BUG: unable to handle kernel paging request at 0000000100000001
16:50:10 DEBUG| [stdout] [ 1425.034231] IP: [<ffffffff811f584b>] kmem_cache_alloc_trace+0x7b/0x1f0
16:50:10 DEBUG| [stdout] [ 1425.112252] PGD 2133e0067 PUD 0
16:50:10 DEBUG| [stdout] [ 1425.150753] Oops: 0000 [#7] SMP
16:50:10 DEBUG| [stdout] [ 1425.189259] Modules linked in: unix_diag binfmt_misc vhost_net vhost macvtap cuse macvlan dccp_ipv4 dccp jitterentropy_rng algif_rng ghash_generic salsa20_generic salsa20_x86_64 camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 cast6_avx_x86_64 cast6_generic cast_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common xts algif_skcipher tgr192 wp512 rmd320 rmd256 rmd160 rmd128 md4 algif_hash af_alg aufs kvm_amd kvm ipmi_devintf ipmi_ssif irqbypass dcdbas ipmi_si fam15h_power acpi_power_meter joydev input_leds ipmi_msghandler serio_raw i2c_piix4 k10temp amd64_edac_mod 8250_fintek mac_hid shpchp edac_mce_amd edac_core ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
16:50:10 DEBUG| [stdout] [ 1426.037337] scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mptsas mptscsih pata_acpi hid_generic aesni_intel aes_x86_64 mptbase lrw gf128mul glue_helper psmouse ahci ablk_helper usbhid cryptd pata_atiixp libahci scsi_transport_sas bnx2 hid
16:50:10 DEBUG| [stdout] [ 1426.447365] CPU: 3 PID: 1404 Comm: irqbalance Tainted: G D 4.4.0-138-generic #164-Ubuntu
16:50:10 DEBUG| [stdout] [ 1426.557614] Hardware name: Dell Inc. PowerEdge R415/08WNM9, BIOS 1.9.3 04/26/2012
16:50:10 DEBUG| [stdout] [ 1426.647061] task: ffff88020f444600 ti: ffff880211c50000 task.ti: ffff880211c50000
16:50:10 DEBUG| [stdout] [ 1426.736514] RIP: 0010:[<ffffffff811f584b>] [<ffffffff811f584b>] kmem_cache_alloc_trace+0x7b/0x1f0
16:50:10 DEBUG| [stdout] [ 1426.843647] RSP: 0018:ffff880211c53c20 EFLAGS: 00010202
16:50:10 DEBUG| [stdout] [ 1426.907100] RAX: 0000000000000000 RBX: 00000000024080c0 RCX: 000000000129e216
16:50:10 DEBUG| [stdout] [ 1426.992392] RDX: 000000000129e215 RSI: 00000000024080c0 RDI: 000000000001a5c0
16:50:10 DEBUG| [stdout] [ 1427.077678] RBP: ffff880211c53c60 R08: ffff8802156da5c0 R09: ffff880215003b00
16:50:10 DEBUG| [stdout] [ 1427.162965] R10: 0000000100000001 R11: 0000000000000000 R12: 00000000024080c0
16:50:10 DEBUG| [stdout] [ 1427.248251] R13: ffffffff81285f92 R14: ffff88009b8fa700 R15: ffff880215003b00
16:50:10 DEBUG| [stdout] [ 1427.333539] FS: 00007efedee3e740(0000) GS:ffff8802156c0000(0000) knlGS:0000000000000000
16:50:10 DEBUG| [stdout] [ 1427.430270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
16:50:10 DEBUG| [stdout] [ 1427.498916] CR2: 0000000100000001 CR3: 000000021219f000 CR4: 00000000000406f0
16:50:10 DEBUG| [stdout] [ 1427.584201] Stack:
16:50:10 DEBUG| [stdout] [ 1427.608128] ffff880211c53c40 ffff880211c53c58 0000000000000028 ffff880213963a40
16:50:10 DEBUG| [stdout] [ 1427.696560] ffff880213741fe8 0000000000000000 ffff88009b8fa700 ffff88009b8fa710
16:50:10 DEBUG| [stdout] [ 1427.784992] ffff880211c53c98 ffffffff81285f92 ffff88009b8fa700 ffff88020f81f800
16:50:10 DEBUG| [stdout] [ 1427.873425] Call Trace:
16:50:10 DEBUG| [stdout] [ 1427.902556] [<ffffffff81285f92>] proc_reg_open+0x32/0x110
16:50:10 DEBUG| [stdout] [ 1427.968086] [<ffffffff812159c2>] do_dentry_open+0x202/0x310
16:50:10 DEBUG| [stdout] [ 1428.035699] [<ffffffff81285f60>] ? proc_reg_release+0x70/0x70
16:50:10 DEBUG| [stdout] [ 1428.105391] [<ffffffff81216b54>] vfs_open+0x54/0x80
16:50:10 DEBUG| [stdout] [ 1428.164686] [<ffffffff812228ab>] ? may_open+0x5b/0xf0
16:50:10 DEBUG| [stdout] [ 1428.226058] [<ffffffff81226b6c>] path_openat+0x59c/0x1340
16:50:10 DEBUG| [stdout] [ 1428.291592] [<ffffffff81852ffd>] ? __schedule+0x30d/0x7f0
16:50:10 DEBUG| [stdout] [ 1428.357122] [<ffffffff81852ff1>] ? __schedule+0x301/0x7f0
16:50:10 DEBUG| [stdout] [ 1428.422648] [<ffffffff81228b01>] do_filp_open+0x91/0x100
16:50:10 DEBUG| [stdout]
16:50:10 DEBUG| [stdout] tlb-shootdown STARTING

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-138-lowlatency 4.4.0-138.164
ProcVersionSignature: User Name 4.4.0-138.164-lowlatency 4.4.155
Uname: Linux 4.4.0-138-lowlatency x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Oct 5 07:58 seq
 crw-rw---- 1 root audio 116, 33 Oct 5 07:58 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CurrentDmesg:

Date: Fri Oct 5 07:59:12 2018
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R310
PciMultimedia:

ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-138-lowlatency root=UUID=7b91a2b8-2e02-407e-a51d-766f6d969020 ro console=ttyS0,1152008n1
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-138-lowlatency N/A
 linux-backports-modules-4.4.0-138-lowlatency N/A
 linux-firmware 1.157.20
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/17/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.8.2
dmi.board.name: 05XKKK
dmi.board.vendor: Dell Inc.
dmi.board.version: A05
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.8.2:bd08/17/2011:svnDellInc.:pnPowerEdgeR310:pvr:rvnDellInc.:rn05XKKK:rvrA05:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R310
dmi.sys.vendor: Dell Inc.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1796250

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Re: timer test in ubuntu_stress_smoke_test will cause kernel oops on X-lowlatency

Tested on a KVM with 4.4.0-138 amd64 lowlatency kernel, didn't spot this issue.

Need to test on "gonzo"

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

It looks like this issue is a bit random, with 4.4.0-137-lowlatency amd64 kernel on gonzo, the kernel oops happened with sysfs and tee test.

description: updated
description: updated
Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (34.2 KiB)

So I believe this is the sysfs test that causes the issues, the subsequent tee timer stressors just appear to be broken because the sysfs stressor got the kernel into a mangled broken state.

I was able to run the tee and timer stressors on a cleanly booted 4.4.0-137-lowlatency kernel without any issues. Running the sysfs stressor caused a load of oopses as follows:

  740.873265] iounmap: bad address ffffc90004da0000
[ 740.929503] CPU: 4 PID: 1788 Comm: stress-ng-sysfs Not tainted 4.4.0-138-lowlatency #164-Ubuntu
[ 740.929509] Hardware name: Dell Inc. PowerEdge R415/08WNM9, BIOS 1.9.3 04/26/2012
[ 740.929512] 0000000000000286 5dc48b8f33ce7222 ffff8800da813bc0 ffffffff8140dc61
[ 740.929536] ffff8800df1f9d80 ffffc90004da0000 ffff8800da813be0 ffffffff8106ec1f
[ 740.929541] ffff8800df1f9d80 ffffc90004da0000 ffff8800da813bf0 ffffffff8106ec5c
[ 740.929544] Call Trace:
[ 740.929553] [<ffffffff8140dc61>] dump_stack+0x63/0x82
[ 740.929558] [<ffffffff8106ec1f>] iounmap.part.1+0x7f/0x90
[ 740.929563] [<ffffffff8106ec5c>] iounmap+0x2c/0x30
[ 740.929572] [<ffffffff814a1bb1>] acpi_os_map_cleanup.part.9+0x31/0x40
[ 740.929587] [<ffffffff8185b71e>] acpi_os_unmap_iomem+0xbe/0xf0
[ 740.929607] [<ffffffff8155614d>] read_log+0xad/0x170
[ 740.929619] [<ffffffff81555e77>] tpm_binary_bios_measurements_open+0x37/0x90
[ 740.929626] [<ffffffff8121a7d2>] do_dentry_open+0x202/0x310
[ 740.929632] [<ffffffff81555e40>] ? tpm_ascii_bios_measurements_show+0x260/0x260
[ 740.929645] [<ffffffff8121b964>] vfs_open+0x54/0x80
[ 740.929656] [<ffffffff8122771b>] ? may_open+0x5b/0xf0
[ 740.929662] [<ffffffff8122ac36>] path_openat+0x1b6/0x13a0
[ 740.929670] [<ffffffff8106e34a>] ? __do_page_fault+0x23a/0x440
[ 740.929678] [<ffffffff8119fb91>] ? free_one_page+0x191/0x340
[ 740.929686] [<ffffffff8122d9f1>] do_filp_open+0x91/0x100
[ 740.929695] [<ffffffff8123e26e>] ? mntput_no_expire+0x2e/0x1b0
[ 740.929702] [<ffffffff8123b907>] ? __alloc_fd+0xc7/0x190
[ 740.929720] [<ffffffff8121bd38>] do_sys_open+0x138/0x2b0
[ 740.929735] [<ffffffff8121bece>] SyS_open+0x1e/0x20
[ 740.929747] [<ffffffff81864e8e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[ 749.735460] BUG: unable to handle kernel paging request at 0000000100000001
[ 749.818685] IP: [<ffffffff811fa246>] kmem_cache_alloc_trace+0x76/0x210
[ 749.896706] PGD d3e68067 PUD 0
[ 749.934172] Oops: 0000 [#1] PREEMPT SMP
[ 749.981003] Modules linked in: kvm_amd kvm irqbypass ipmi_devintf ipmi_ssif dcdbas input_leds joydev amd64_edac_mod 8250_fintek k10temp fam15h_power serio_raw edac_mce_amd ipmi_si i2c_piix4 acpi_power_meter edac_core ipmi_msghandler shpchp mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel hid_generic aes_x86_64 mptsas lrw mptscsih gf128mul usbhid glue_helper ablk_helper pata_acpi hid cryptd psmouse pata_atiixp mptbase ahci bnx2 libahci scsi_transport_sas
[ 750.743813] CPU: 4 PID: 1791 Comm: stress-ng-sysfs Not tai...

Revision history for this message
Colin Ian King (colin-king) wrote :

Exercising /sys/kernel/security/tpm0/ascii_bios_measurements causes:

[ 381.545913] iounmap: bad address ffffc9000e620000
[ 381.602107] CPU: 2 PID: 1833 Comm: stress-ng-sysfs Not tainted 4.4.0-138-lowlatency #164-Ubuntu
[ 381.602109] Hardware name: Dell Inc. PowerEdge R415/08WNM9, BIOS 1.9.3 04/26/2012
[ 381.602111] 0000000000000286 fc02aae09141f60c ffff8800daa03bc0 ffffffff8140dc61
[ 381.602117] ffff88021145cbc0 ffffc9000e620000 ffff8800daa03be0 ffffffff8106ec1f
[ 381.602124] ffff88021145cbc0 ffffc9000e620000 ffff8800daa03bf0 ffffffff8106ec5c
[ 381.602127] Call Trace:
[ 381.602134] [<ffffffff8140dc61>] dump_stack+0x63/0x82
[ 381.602138] [<ffffffff8106ec1f>] iounmap.part.1+0x7f/0x90
[ 381.602140] [<ffffffff8106ec5c>] iounmap+0x2c/0x30
[ 381.602144] [<ffffffff814a1bb1>] acpi_os_map_cleanup.part.9+0x31/0x40
[ 381.602147] [<ffffffff8185b71e>] acpi_os_unmap_iomem+0xbe/0xf0
[ 381.602151] [<ffffffff8155614d>] read_log+0xad/0x170
[ 381.602153] [<ffffffff81555f07>] tpm_ascii_bios_measurements_open+0x37/0x90
[ 381.602158] [<ffffffff8121a7d2>] do_dentry_open+0x202/0x310
[ 381.602160] [<ffffffff81555ed0>] ? tpm_binary_bios_measurements_open+0x90/0x90
[ 381.602164] [<ffffffff8121b964>] vfs_open+0x54/0x80
[ 381.602166] [<ffffffff8122771b>] ? may_open+0x5b/0xf0
[ 381.602169] [<ffffffff8122ac36>] path_openat+0x1b6/0x13a0
[ 381.602172] [<ffffffff8119fb91>] ? free_one_page+0x191/0x340
[ 381.602175] [<ffffffff8122d9f1>] do_filp_open+0x91/0x100
[ 381.602179] [<ffffffff813a7050>] ? common_file_perm+0x70/0x1b0
[ 381.602181] [<ffffffff8123b907>] ? __alloc_fd+0xc7/0x190
[ 381.602185] [<ffffffff8121bd38>] do_sys_open+0x138/0x2b0
[ 381.602187] [<ffffffff8121bece>] SyS_open+0x1e/0x20
[ 381.602191] [<ffffffff81864e8e>] entry_SYSCALL_64_fastpath+0x22/0xc1

Revision history for this message
Colin Ian King (colin-king) wrote :

It appears to hang when more than one thread is concurrently opening /sys/kernel/security/tpm0/ascii_bios_measurements

Revision history for this message
Colin Ian King (colin-king) wrote :

..after this hangs, I see:

[ 261.473664] Oops: 0000 [#1] PREEMPT SMP
[ 261.520488] Modules linked in: ipmi_ssif ipmi_devintf kvm_amd kvm dcdbas irqbypass input_leds serio_raw joydev amd64_edac_mod edac_mce_amd fam15h_power edac_core k10temp ipmi_si shpchp i2c_piix4 ipmi_msghandler 8250_fintek mac_hid acpi_power_meter ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul hid_generic ghash_clmulni_intel aesni_intel mptsas aes_x86_64 pata_acpi lrw mptscsih gf128mul usbhid ahci glue_helper ablk_helper hid cryptd psmouse pata_atiixp mptbase libahci bnx2 scsi_transport_sas
[ 262.283269] CPU: 3 PID: 1846 Comm: stress-ng Not tainted 4.4.0-138-lowlatency #164-Ubuntu
[ 262.381038] Hardware name: Dell Inc. PowerEdge R415/08WNM9, BIOS 1.9.3 04/26/2012
[ 262.470483] task: ffff8800d8323900 ti: ffff8800ded34000 task.ti: ffff8800ded34000
[ 262.559926] RIP: 0010:[<ffffffff811fa502>] [<ffffffff811fa502>] kmem_cache_alloc+0x72/0x200
[ 262.660817] RSP: 0018:ffff8800ded37d18 EFLAGS: 00010206
[ 262.724260] RAX: 0000000000000000 RBX: ffff88020f630e10 RCX: 0000000022a95403
[ 262.809545] RDX: 0000000022a95203 RSI: 0000000022a95203 RDI: 000000000001a740
[ 262.894829] RBP: ffff8800ded37d48 R08: ffffffff811d8ebc R09: 0000000000000001
[ 262.980114] R10: ffff88021eff9000 R11: 0000000000000000 R12: 0000000002000200
[ 263.065397] R13: 00007ffcfd834000 R14: ffff880215003b00 R15: ffff880215003b00
[ 263.150683] FS: 00007f0599fc9700(0000) GS:ffff8802156c0000(0000) knlGS:0000000000000000
[ 263.247404] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 263.316050] CR2: 00007ffcfd834000 CR3: 00000000d83a6000 CR4: 00000000000406f0
[ 263.401334] Stack:
[ 263.425258] 00007f059881dfff ffff88020f630e10 0000000000000000 ffff8800d3d84f40
[ 263.513690] 0000000000000001 ffff8800dae95a40 ffff8800ded37da0 ffffffff811d8ebc
[ 263.602120] 0000000000000000 ffff88020ef23518 ffff88020f630e88 ffff88020ef23590
[ 263.690550] Call Trace:
[ 263.719678] [<ffffffff811d8ebc>] anon_vma_clone+0x6c/0x200
[ 263.786248] [<ffffffff811d9082>] anon_vma_fork+0x32/0x140
[ 263.851779] [<ffffffff81084554>] copy_process+0x1474/0x1c70
[ 263.919385] [<ffffffff81084ee0>] _do_fork+0x80/0x390
[ 263.979714] [<ffffffff81085299>] SyS_clone+0x19/0x20
[ 264.040041] [<ffffffff81864e8e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[ 264.117009] Code: 08 48 39 f2 75 e7 48 83 78 10 00 4c 8b 28 0f 84 40 01 00 00 4d 85 ed 0f 84 37 01 00 00 49 63 46 20 49 8b 3e 48 8d 8a 00 02 00 00 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 ab 49
[ 264.343111] RIP [<ffffffff811fa502>] kmem_cache_alloc+0x72/0x200
[ 264.415924] RSP <ffff8800ded37d18>
[ 264.457529] CR2: 00007ffcfd834000
[ 264.497083] ---[ end trace 9af64fddce7496da ]---

and further opens seem to hang up.

Revision history for this message
Colin Ian King (colin-king) wrote :

..and after that last oops we get:

ubuntu@gonzo:~$ dmesg
Killed
ubuntu@gonzo:~$ top
Killed
ubuntu@gonzo:~$ ps
Killed

Revision history for this message
Colin Ian King (colin-king) wrote :

And is reproduced in previous kernel 4.4.0-137-lowlatency too, so not a regression on -138

Revision history for this message
Colin Ian King (colin-king) wrote :

And reproducible way back to 4.4.0-21-lowlatency too

Changed in linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → Medium
status: Incomplete → In Progress
summary: - timer test in ubuntu_stress_smoke_test will cause kernel oops on
+ sysfs test ubuntu_stress_smoke_test will cause kernel oops on
X-lowlatency
Revision history for this message
Colin Ian King (colin-king) wrote :

Course bisect, fixed between 4.8 and 4.9

Revision history for this message
Colin Ian King (colin-king) wrote :

repeated course bisect, now got something more reasonable:

4.8 fail
4.9 fail
4.10 - ok
4.11 - ok
4.12 - ok
4.14 - ok
4.18 - ok

Revision history for this message
Colin Ian King (colin-king) wrote :

Bisected, fixed with upstream commit:

From 748935eeb72c34368ab514a2bfdf75161768cec0 Mon Sep 17 00:00:00 2001
From: Nayna Jain <email address hidden>
Date: Mon, 14 Nov 2016 05:00:52 -0500
Subject: [PATCH] tpm: have event log use the tpm_chip

Revision history for this message
Colin Ian King (colin-king) wrote :

The number of prerequisite fixes before we can apply the fix 748935eeb72c343 makes this a rather overly involved fix. I doubt it will be SRU'able. Since this *only* occurs when accessing two TPM related interfaces in a fast multiple multi-threaded race on a few specific x86 devices as root makes this a very risky set of backport changes for such a corner case.

For now, I'm going to force stress-ng sysfs test to skip the tpm files for older kernels to workaround this issue. (Ugh).

Workaround committed in stress-ng:

http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=c7fcb4112b97188c8fcba6138b29b5c5a82938ea

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Verified on the same node with the same kernel.
The stress smoke test can pass without any issue.
Thanks!

Changed in ubuntu-kernel-tests:
status: New → Fix Released
Po-Hsu Lin (cypressyew)
Changed in ubuntu-kernel-tests:
assignee: nobody → Colin Ian King (colin-king)
Changed in linux (Ubuntu):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.