Brief Description
-----------------
STX: Crashes in kernel with Trace Back
kernel/cpuset.c:955 update_cpumasks_hier
Severity
--------
Major
Steps to Reproduce
------------------
1. The testcase boots two instances on compute (compute-1 with local_lvm backing). One instance has 27 vcpus in the flavor
| flavor:extra_specs | {"hw:cpu_policy": "dedicated", "aggregate_instance_extra_specs:storage": "local_lvm", "hw:numa_node.0": "0"}
flavor:vcpus | 27
OS-EXT-SRV-ATTR:hostname | tenant2-vm-42
OS-EXT-SRV-ATTR:instance_name | instance-0000002f
wrs-res:topology | node:0, 1024MB, pgsize:2M, 1s,27c,1t, vcpus:0-26, pcpus:7,16,52,8,44,50,14,51,15,10,46,11,47,42,6,41,5,9,45,48,12,49,13,17,53,40,4, pol:ded, thr:pre
nova --os-username 'tenant2' --os-password 'Li69nux*' --os-project-name tenant2 --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-region-name RegionOne boot --image 30907c74-c29d-4b6c-9c3a-6c6900696e45 --flavor 55218cee-4e9e-4925-bee1-308de6ac8493 --key-name keypair-tenant2 --availability-zone nova:compute-1 --nic net-id=10107d29-d294-4caa-98e0-0d6d2d80e3a9,vif-model=virtio --nic net-id=a838c71b-0936-46b2-b15e-959d23001a08,vif-model=virtio tenant2-vm-42 --poll'
2. Both instances were deleted successfully here
[2018-10-14 01:09:09,844] 2323 INFO MainThread vm_helper.delete_vms:: Deleting vm(s): ['76308c5d-dacb-43a7-a91e-3c932b27adc5', 'ad10817f-350c-47d5-81c5-ddc1a345a855']
...
[2018-10-14 01:09:41,340] 2395 INFO MainThread vm_helper.delete_vms:: VM(s) deleted successfully: ['76308c5d-dacb-43a7-a91e-3c932b27adc5', 'ad10817f-350c-47d5-81c5-ddc1a345a855']
Expected Behavior
------------------
Did not expect kernel crash on compute-1 in step 2
Actual Behavior
----------------
Crashes in kernel - see Call Trace below (from compute-1)
see kern.log starting at 1:05:53 to 2018-10-14T01:07:33.366 after launch of instance with 27 vcpu in step 2 above.
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009394] ------------[ cut here ]------------
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009406] WARNING: CPU: 0 PID: 55779 at kernel/cpuset.c:955 update_cpumasks_hier+0x3d1/0x430
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009407] Modules linked in: dm_snapshot cuse fuse xt_REDIRECT nf_nat_redirect ip6table_raw ip6table_mangle xt_nat xt_conntrack xt_mark xt_connmark iptable_raw xt_comment iptable_nat xt_CHECKSUM iptable_mangle nbd ebtable_filter ebtables tun openvswitch nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack binfmt_misc virtio_net nfsd auth_rpcgss nfsv3 nfs_acl nfs lockd grace fscache cls_u32 sch_sfq sch_htb ip6table_filter ip6_tables iptable_filter sunrpc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c dm_mod iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd joydev lpc_ich mei_me mei i2c_i801 ioatdma ipmi_si ipmi_devintf ipmi_msghandler tpm_crb(O) acpi_power_meter ixgbevf(O) vfio_pci vfio_iommu_type1 vfio irqbypass ip_tables ext4 mbcache jbd2 xprtrdma(O) svcrdma(O) rpcrdma(O) nvmet_rdma(O) nvme_rdma(O) mlx4_en(O) ib_srp(O) ib_isert(O) ib_iser(O) rdma_rxe(O) mlx5_ib(O) sd_mod crc_t10dif crct10dif_generic mlx4_ib(O) mlx4_core(O) rdma_ucm(O) rdma_cm(O) iw_cm(O) ib_ucm(O) ib_uverbs(O) ib_cm(O) ib_core(O) crct10dif_pclmul crct10dif_common crc32c_intel ixgbe(O) igb i2c_algo_bit i2c_core ahci libahci mlx5_core(O) mlxfw(O) mlx_compat(O) devlink dca tpm_tis(O) tpm_tis_core(O) tpm(O) i40e(O) e1000e(O)
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009470] CPU: 0 PID: 55779 Comm: nova-compute Kdump: loaded Tainted: G O ------------ 3.10.0-862.11.6.el7.36.tis.x86_64 #1
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009472] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0022.062820171903 06/28/2017
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009473] Call Trace:
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009480] [<ffffffff9d7b4039>] dump_stack+0x19/0x1b
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009487] [<ffffffff9d079a38>] __warn+0xd8/0x100
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009488] [<ffffffff9d079b7d>] warn_slowpath_null+0x1d/0x20
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009490] [<ffffffff9d109af1>] update_cpumasks_hier+0x3d1/0x430
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009495] [<ffffffff9d1c8bfe>] ? __kmalloc+0x2e/0x2a0
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009497] [<ffffffff9d10534c>] ? cgroup_task_count+0x4c/0x60
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009501] [<ffffffff9d31d33c>] ? heap_init+0x1c/0x50
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009503] [<ffffffff9d109fea>] cpuset_write_resmask+0x49a/0x9c0
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009504] [<ffffffff9d109b50>] ? update_cpumasks_hier+0x430/0x430
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009506] [<ffffffff9d101a94>] cgroup_file_write+0x1d4/0x2d0
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009510] [<ffffffff9d1e4738>] ? __sb_start_write+0x58/0x110
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009516] [<ffffffff9d2a71a7>] ? security_file_permission+0x27/0xa0
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009520] [<ffffffff9d1e17c0>] vfs_write+0xc0/0x1f0
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009521] [<ffffffff9d1e25ef>] SyS_write+0x7f/0xf0
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009526] [<ffffffff9d7c1fdb>] system_call_fastpath+0x22/0x27
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009527] ---[ end trace 6293f46623053f31 ]---
2018-10-14T01:05:53.803 compute-1 kernel: warning [ 9284.009628] ------------[ cut here ]------------
Reproducibility
---------------
System Configuration
--------------------
2 controller, 2 compute
Branch/Pull Time/Commit
-----------------------
Master as of: 2018-10-12_20-18-00
Timestamp/Logs
--------------
see inline
Requested more info from the reporter:
Aside from the kernel traceback in the logs, what was the system impact? I assume the compute node did not reboot.