Comment 9 for bug 1733662

Revision history for this message
Rod Smith (rodsmith) wrote : Re: System hang with Linux kernel 4.13, not with 4.10

Joseph, I've just tested 4.15-rc4, and the script crashed and the system became responsive to only the simplest commands when bringing CPU 9 back up, accompanied by this out of dmesg:

[ 166.722460] Hardware name: Cisco Systems Inc UCSC-C240-M4L/UCSC-C240-M4L, BIOS C240M4.2.0.10c.0.032320160820 03/23/2016
[ 166.722540] RIP: 0010:__kmalloc_track_caller+0xc5/0x210
[ 166.722578] RSP: 0000:ffffb75e8c7cbb08 EFLAGS: 00010206
[ 166.722615] RAX: 0000000000000000 RBX: 43ea0882f873c0e8 RCX: 00000000000001bf
[ 166.722663] RDX: 00000000000001be RSI: 0000000000000000 RDI: 0000000000021040
[ 166.722711] RBP: ffffb75e8c7cbb40 R08: ffff9cc35d341eaa R09: ffff9ca3ff807c00
[ 166.722757] R10: ffffb75e8c7cbd08 R11: bc159441a547de42 R12: ffff9cc35d341eaa
[ 166.722805] R13: 00000000014000c0 R14: 0000000000000007 R15: ffff9ca3ff807c00
[ 166.722852] FS: 0000000000000000(0000) GS:ffff9cc3ff240000(0000) knlGS:0000000000000000
[ 166.722905] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 166.722945] CR2: 0000000000000000 CR3: 0000001be7e09001 CR4: 00000000001606e0
[ 166.722992] Call Trace:
[ 166.723020] ? idr_alloc_cmn+0x97/0xd0
[ 166.723051] ? kstrdup_const+0x23/0x30
[ 166.723081] kstrdup+0x31/0x60
[ 166.723107] kstrdup_const+0x23/0x30
[ 166.723137] __kernfs_new_node+0x2c/0x120
[ 166.723168] kernfs_new_node+0x28/0x50
[ 166.723197] kernfs_create_dir_ns+0x34/0x90
[ 166.723229] sysfs_create_dir_ns+0x40/0x90
[ 166.723261] kobject_add_internal+0xac/0x2b0
[ 166.723294] kobject_add+0x71/0xd0
[ 166.723323] ? device_private_init+0x23/0x70
[ 166.723356] device_add+0x12c/0x680
[ 166.723385] cpu_device_create+0xe1/0x100
[ 166.723418] ? __slab_alloc+0x20/0x40
[ 166.723449] ? _cond_resched+0x19/0x40
[ 166.723481] cacheinfo_cpu_online+0x29a/0x3f0
[ 166.723515] ? get_cpu_cacheinfo+0x50/0x50
[ 166.723549] cpuhp_invoke_callback+0x9b/0x550
[ 166.723587] ? padata_replace+0xf0/0xf0
[ 166.725151] cpuhp_thread_fun+0xc4/0x150
[ 166.726682] smpboot_thread_fn+0xec/0x160
[ 166.728221] kthread+0x11e/0x140
[ 166.729701] ? sort_range+0x30/0x30
[ 166.731145] ? kthread_create_worker_on_cpu+0x70/0x70
[ 166.732551] ret_from_fork+0x1f/0x30
[ 166.733906] Code: 4d 01 e0 4d 8b 18 4d 33 99 40 01 00 00 4c 89 c3 4c 31 db 65 48 0f c7 0f 0f 94 c0 84 c0 74 ac 4d 39 d8 74 14 49 63 41 20 48 01 c3 <48> 33 1b 49 33 99 40 01 00 00 0f 18 0b 41 f7 c5 00 80 00 00 0f
[ 166.736776] RIP: __kmalloc_track_caller+0xc5/0x210 RSP: ffffb75e8c7cbb08
[ 166.738188] ---[ end trace 39ce10746b0f4324 ]---

If you want direct access to the affected hardware, that can be arranged. (If you've already got access to the certification network in 1SS, the affected system on which I've been doing most of the testing is boldore.) I'm also happy to run tests using test kernels that you give me.