Brief Description
-----------------
Bootstrap failed at 'Saving config in sysinv database', because of '/sys/devices/system/node/node1' is not found.
Severity
--------
Bootstrap interrupted
Steps to Reproduce
------------------
# ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap.yml -vvvv
(after several minutes)
TASK [bootstrap/persist-config : debug] *****************************************************************************************************************************
task path: /usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/persist-config/tasks/update_sysinv_database.yml:71
ok: [localhost] => {
"populate_result": {
"changed": true,
"failed": false,
"failed_when_result": false,
"msg": "non-zero return code",
"rc": 1,
"stderr": "No handlers could be found for logger \"cgtsclient.common.http\"\nTraceback (most recent call last):\n File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1046, in <module>\n inventory_config_complete_wait(client, controller)\n File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1000, in inventory_config_complete_wait\n wait_initial_inventory_complete(client, controller)\n File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 991, in wait_initial_inventory_complete\n raise ConfigFail('Timeout waiting for controller inventory '\n__main__.ConfigFail: Timeout waiting for controller inventory completion\n",
"stderr_lines": [
"No handlers could be found for logger \"cgtsclient.common.http\"",
"Traceback (most recent call last):",
" File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1046, in <module>",
" inventory_config_complete_wait(client, controller)",
" File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1000, in inventory_config_complete_wait",
" wait_initial_inventory_complete(client, controller)",
" File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 991, in wait_initial_inventory_complete",
" raise ConfigFail('Timeout waiting for controller inventory '",
"__main__.ConfigFail: Timeout waiting for controller inventory completion"
],
"stdout": "Populating system config...\nSystem type is All-in-one\nSystem config completed.\nPopulating load config...\nLoad config completed.\nPopulating management network...\nPopulating pxeboot network...\nPopulating oam network...\nPopulating multicast network...\nPopulating cluster host network...\nPopulating cluster pod network...\nPopulating cluster service network...\nNetwork config completed.\nPopulating/Updating DNS config...\nDNS config completed.\nManagement mac = 00:00:00:00:00:00\nRoot fs device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0\nBoot device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0\nConsole = tty0\nTboot = false\nInstall output = text\nHost values = {'tboot': 'false', 'install_output': 'text', 'rootfs_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'boot_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'availability': 'offline', 'mgmt_mac': '00:00:00:00:00:00', 'console': 'tty0', 'mgmt_ip': '192.168.204.3', 'hostname': 'controller-0', 'operational': 'disabled', 'invprovision': 'provisioning', 'administrative': 'locked', 'personality': 'controller'}\nHost controller-0 created.\nFailed to update the initial system config.\n",
"stdout_lines": [
"Populating system config...",
"System type is All-in-one",
"System config completed.",
"Populating load config...",
"Load config completed.",
"Populating management network...",
"Populating pxeboot network...",
"Populating oam network...",
"Populating multicast network...",
"Populating cluster host network...",
"Populating cluster pod network...",
"Populating cluster service network...",
"Network config completed.",
"Populating/Updating DNS config...",
"DNS config completed.",
"Management mac = 00:00:00:00:00:00",
"Root fs device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0",
"Boot device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0",
"Console = tty0",
"Tboot = false",
"Install output = text",
"Host values = {'tboot': 'false', 'install_output': 'text', 'rootfs_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'boot_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'availability': 'offline', 'mgmt_mac': '00:00:00:00:00:00', 'console': 'tty0', 'mgmt_ip': '192.168.204.3', 'hostname': 'controller-0', 'operational': 'disabled', 'invprovision': 'provisioning', 'administrative': 'locked', 'personality': 'controller'}",
"Host controller-0 created.",
"Failed to update the initial system config."
]
}
}
Read vars_file 'vars/common/main.yml'
Read vars_file 'host_vars/bootstrap/default.yml'
TASK [bootstrap/persist-config : Fail if populate config script throws an exception] ********************************************************************************
task path: /usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/persist-config/tasks/update_sysinv_database.yml:73
fatal: [localhost]: FAILED! => {
"changed": false,
"msg": "Failed to provision initial system configuration."
}
# vim /var/log/sysinv.log
sysinv 2021-01-20 02:17:11.326 76126 INFO sysinv.conductor.manager [-] Cannot generate the configuration for controller-0, the host is not inventoried yet.
sysinv 2021-01-20 02:17:11.335 76126 INFO sysinv.agent.rpcapi [-] config_apply_runtime_manifest: fanout_cast: sending config 7ec0a704-6f60-43e1-9351-92b880c89385 {'classes': ['platform::compute::grub::runtime', 'platform::compute::config::runtime'], 'force': False, 'personalities': ['controller', 'worker'], 'host_uuids': [u'121a5e37-d4b1-4c67-aa4f-a43e739d014a']} to agent
sysinv 2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task [-] Error during AgentManager._agent_audit: [Errno 2] No such file or directory: '/sys/devices/system/node/node1/hugepages': OSError: [Errno 2] No such file or directory: '/sys/devices/system/node/node1/hugepages'
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task Traceback (most recent call last):
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/periodic_task.py", line 180, in run_periodic_tasks
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task task(self, context)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1124, in _agent_audit
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task force_updates=None)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task return f(*args, **kwargs)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1143, in agent_audit
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task self.ihost_inv_get_and_report(icontext)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 866, in ihost_inv_get_and_report
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task imemory = self._inode_operator.inodes_get_imemory()
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/node.py", line 562, in inodes_get_imemory
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task imemory = self._inode_get_memory_hugepages()
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/node.py", line 343, in _inode_get_memory_hugepages
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task subdirs = self._get_immediate_subdirs(hugepages)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/node.py", line 275, in _get_immediate_subdirs
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task return [name for name in listdir(dir)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task OSError: [Errno 2] No such file or directory: '/sys/devices/system/node/node1/hugepages'
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task
# ls /sys/devices/system/node/
has_cpu has_memory has_normal_memory node0 online possible power uevent
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: HygonGenuine
CPU family: 24
Model: 0
Model name: Hygon C86 7185 32-core Processor
Stepping: 1
CPU MHz: 2000.000
BogoMIPS: 4000.00
Hypervisor vendor: VMware
Virtualization type: full
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nopl tsc_reliable cpuid pni ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw pti ibpb fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xsaves clzero arat overflow_recov succor
Expected Behavior
------------------
only one NUMA node should be found
Actual Behavior
----------------
sysinv got 4 NUMA nodes
Reproducibility
---------------
100%
System Configuration
--------------------
All
Branch/Pull Time/Commit
-----------------------
http://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/centos/flock/outputs/iso/
Last Pass
---------
Likely never
Timestamp/Logs
--------------
See above
Test Activity
-------------
Developer Testing
Workaround
----------
None
I see this issue on our master image 20210510T040411Z from today too.