Joseph, I've just tested 4.15-rc4, and the script crashed and the system became responsive to only the simplest commands when bringing CPU 9 back up, accompanied by this out of dmesg:
If you want direct access to the affected hardware, that can be arranged. (If you've already got access to the certification network in 1SS, the affected system on which I've been doing most of the testing is boldore.) I'm also happy to run tests using test kernels that you give me.
Joseph, I've just tested 4.15-rc4, and the script crashed and the system became responsive to only the simplest commands when bringing CPU 9 back up, accompanied by this out of dmesg:
[ 166.722460] Hardware name: Cisco Systems Inc UCSC-C240- M4L/UCSC- C240-M4L, BIOS C240M4. 2.0.10c. 0.032320160820 03/23/2016 kmalloc_ track_caller+ 0xc5/0x210 7cbb08 EFLAGS: 00010206 0(0000) GS:ffff9cc3ff24 0000(0000) knlGS:000000000 0000000 cmn+0x97/ 0xd0 const+0x23/ 0x30 const+0x23/ 0x30 new_node+ 0x2c/0x120 new_node+ 0x28/0x50 create_ dir_ns+ 0x34/0x90 dir_ns+ 0x40/0x90 add_internal+ 0xac/0x2b0 add+0x71/ 0xd0 private_ init+0x23/ 0x70 add+0x12c/ 0x680 create+ 0xe1/0x100 alloc+0x20/ 0x40 0x19/0x40 cpu_online+ 0x29a/0x3f0 cacheinfo+ 0x50/0x50 callback+ 0x9b/0x550 replace+ 0xf0/0xf0 fun+0xc4/ 0x150 thread_ fn+0xec/ 0x160 0x30/0x30 create_ worker_ on_cpu+ 0x70/0x70 fork+0x1f/ 0x30 track_caller+ 0xc5/0x210 RSP: ffffb75e8c7cbb08
[ 166.722540] RIP: 0010:__
[ 166.722578] RSP: 0000:ffffb75e8c
[ 166.722615] RAX: 0000000000000000 RBX: 43ea0882f873c0e8 RCX: 00000000000001bf
[ 166.722663] RDX: 00000000000001be RSI: 0000000000000000 RDI: 0000000000021040
[ 166.722711] RBP: ffffb75e8c7cbb40 R08: ffff9cc35d341eaa R09: ffff9ca3ff807c00
[ 166.722757] R10: ffffb75e8c7cbd08 R11: bc159441a547de42 R12: ffff9cc35d341eaa
[ 166.722805] R13: 00000000014000c0 R14: 0000000000000007 R15: ffff9ca3ff807c00
[ 166.722852] FS: 000000000000000
[ 166.722905] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 166.722945] CR2: 0000000000000000 CR3: 0000001be7e09001 CR4: 00000000001606e0
[ 166.722992] Call Trace:
[ 166.723020] ? idr_alloc_
[ 166.723051] ? kstrdup_
[ 166.723081] kstrdup+0x31/0x60
[ 166.723107] kstrdup_
[ 166.723137] __kernfs_
[ 166.723168] kernfs_
[ 166.723197] kernfs_
[ 166.723229] sysfs_create_
[ 166.723261] kobject_
[ 166.723294] kobject_
[ 166.723323] ? device_
[ 166.723356] device_
[ 166.723385] cpu_device_
[ 166.723418] ? __slab_
[ 166.723449] ? _cond_resched+
[ 166.723481] cacheinfo_
[ 166.723515] ? get_cpu_
[ 166.723549] cpuhp_invoke_
[ 166.723587] ? padata_
[ 166.725151] cpuhp_thread_
[ 166.726682] smpboot_
[ 166.728221] kthread+0x11e/0x140
[ 166.729701] ? sort_range+
[ 166.731145] ? kthread_
[ 166.732551] ret_from_
[ 166.733906] Code: 4d 01 e0 4d 8b 18 4d 33 99 40 01 00 00 4c 89 c3 4c 31 db 65 48 0f c7 0f 0f 94 c0 84 c0 74 ac 4d 39 d8 74 14 49 63 41 20 48 01 c3 <48> 33 1b 49 33 99 40 01 00 00 0f 18 0b 41 f7 c5 00 80 00 00 0f
[ 166.736776] RIP: __kmalloc_
[ 166.738188] ---[ end trace 39ce10746b0f4324 ]---
If you want direct access to the affected hardware, that can be arranged. (If you've already got access to the certification network in 1SS, the affected system on which I've been doing most of the testing is boldore.) I'm also happy to run tests using test kernels that you give me.