stx4.0:Ansible-playbook failed while Saving config in sysinv database

Bug #1912421 reported by HongyiChang
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Low
Unassigned

Bug Description

Brief Description
-----------------
Bootstrap failed at 'Saving config in sysinv database', because of '/sys/devices/system/node/node1' is not found.

Severity
--------
Bootstrap interrupted

Steps to Reproduce
------------------
# ansible-playbook /usr/share/ansible/stx-ansible/playbooks/bootstrap.yml -vvvv
(after several minutes)
TASK [bootstrap/persist-config : debug] *****************************************************************************************************************************
task path: /usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/persist-config/tasks/update_sysinv_database.yml:71
ok: [localhost] => {
    "populate_result": {
        "changed": true,
        "failed": false,
        "failed_when_result": false,
        "msg": "non-zero return code",
        "rc": 1,
        "stderr": "No handlers could be found for logger \"cgtsclient.common.http\"\nTraceback (most recent call last):\n File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1046, in <module>\n inventory_config_complete_wait(client, controller)\n File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1000, in inventory_config_complete_wait\n wait_initial_inventory_complete(client, controller)\n File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 991, in wait_initial_inventory_complete\n raise ConfigFail('Timeout waiting for controller inventory '\n__main__.ConfigFail: Timeout waiting for controller inventory completion\n",
        "stderr_lines": [
            "No handlers could be found for logger \"cgtsclient.common.http\"",
            "Traceback (most recent call last):",
            " File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1046, in <module>",
            " inventory_config_complete_wait(client, controller)",
            " File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 1000, in inventory_config_complete_wait",
            " wait_initial_inventory_complete(client, controller)",
            " File \"/tmp/.ansible-root/tmp/ansible-tmp-1611108390.09-174241063462522/populate_initial_config.py\", line 991, in wait_initial_inventory_complete",
            " raise ConfigFail('Timeout waiting for controller inventory '",
            "__main__.ConfigFail: Timeout waiting for controller inventory completion"
        ],
        "stdout": "Populating system config...\nSystem type is All-in-one\nSystem config completed.\nPopulating load config...\nLoad config completed.\nPopulating management network...\nPopulating pxeboot network...\nPopulating oam network...\nPopulating multicast network...\nPopulating cluster host network...\nPopulating cluster pod network...\nPopulating cluster service network...\nNetwork config completed.\nPopulating/Updating DNS config...\nDNS config completed.\nManagement mac = 00:00:00:00:00:00\nRoot fs device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0\nBoot device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0\nConsole = tty0\nTboot = false\nInstall output = text\nHost values = {'tboot': 'false', 'install_output': 'text', 'rootfs_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'boot_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'availability': 'offline', 'mgmt_mac': '00:00:00:00:00:00', 'console': 'tty0', 'mgmt_ip': '192.168.204.3', 'hostname': 'controller-0', 'operational': 'disabled', 'invprovision': 'provisioning', 'administrative': 'locked', 'personality': 'controller'}\nHost controller-0 created.\nFailed to update the initial system config.\n",
        "stdout_lines": [
            "Populating system config...",
            "System type is All-in-one",
            "System config completed.",
            "Populating load config...",
            "Load config completed.",
            "Populating management network...",
            "Populating pxeboot network...",
            "Populating oam network...",
            "Populating multicast network...",
            "Populating cluster host network...",
            "Populating cluster pod network...",
            "Populating cluster service network...",
            "Network config completed.",
            "Populating/Updating DNS config...",
            "DNS config completed.",
            "Management mac = 00:00:00:00:00:00",
            "Root fs device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0",
            "Boot device = /dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0",
            "Console = tty0",
            "Tboot = false",
            "Install output = text",
            "Host values = {'tboot': 'false', 'install_output': 'text', 'rootfs_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'boot_device': '/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:0:0', 'availability': 'offline', 'mgmt_mac': '00:00:00:00:00:00', 'console': 'tty0', 'mgmt_ip': '192.168.204.3', 'hostname': 'controller-0', 'operational': 'disabled', 'invprovision': 'provisioning', 'administrative': 'locked', 'personality': 'controller'}",
            "Host controller-0 created.",
            "Failed to update the initial system config."
        ]
    }
}
Read vars_file 'vars/common/main.yml'
Read vars_file 'host_vars/bootstrap/default.yml'

TASK [bootstrap/persist-config : Fail if populate config script throws an exception] ********************************************************************************
task path: /usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/persist-config/tasks/update_sysinv_database.yml:73
fatal: [localhost]: FAILED! => {
    "changed": false,
    "msg": "Failed to provision initial system configuration."
}
# vim /var/log/sysinv.log
sysinv 2021-01-20 02:17:11.326 76126 INFO sysinv.conductor.manager [-] Cannot generate the configuration for controller-0, the host is not inventoried yet.
sysinv 2021-01-20 02:17:11.335 76126 INFO sysinv.agent.rpcapi [-] config_apply_runtime_manifest: fanout_cast: sending config 7ec0a704-6f60-43e1-9351-92b880c89385 {'classes': ['platform::compute::grub::runtime', 'platform::compute::config::runtime'], 'force': False, 'personalities': ['controller', 'worker'], 'host_uuids': [u'121a5e37-d4b1-4c67-aa4f-a43e739d014a']} to agent
sysinv 2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task [-] Error during AgentManager._agent_audit: [Errno 2] No such file or directory: '/sys/devices/system/node/node1/hugepages': OSError: [Errno 2] No such file or directory: '/sys/devices/system/node/node1/hugepages'
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task Traceback (most recent call last):
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/periodic_task.py", line 180, in run_periodic_tasks
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task task(self, context)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1124, in _agent_audit
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task force_updates=None)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task return f(*args, **kwargs)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1143, in agent_audit
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task self.ihost_inv_get_and_report(icontext)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 866, in ihost_inv_get_and_report
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task imemory = self._inode_operator.inodes_get_imemory()
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/node.py", line 562, in inodes_get_imemory
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task imemory = self._inode_get_memory_hugepages()
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/node.py", line 343, in _inode_get_memory_hugepages
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task subdirs = self._get_immediate_subdirs(hugepages)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/agent/node.py", line 275, in _get_immediate_subdirs
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task return [name for name in listdir(dir)
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task OSError: [Errno 2] No such file or directory: '/sys/devices/system/node/node1/hugepages'
2021-01-20 02:17:11.346 70080 ERROR sysinv.openstack.common.periodic_task
# ls /sys/devices/system/node/
has_cpu has_memory has_normal_memory node0 online possible power uevent
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: HygonGenuine
CPU family: 24
Model: 0
Model name: Hygon C86 7185 32-core Processor
Stepping: 1
CPU MHz: 2000.000
BogoMIPS: 4000.00
Hypervisor vendor: VMware
Virtualization type: full
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nopl tsc_reliable cpuid pni ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw pti ibpb fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xsaves clzero arat overflow_recov succor

Expected Behavior
------------------
only one NUMA node should be found

Actual Behavior
----------------
sysinv got 4 NUMA nodes

Reproducibility
---------------
100%

System Configuration
--------------------
All

Branch/Pull Time/Commit
-----------------------
http://mirror.starlingx.cengn.ca/mirror/starlingx/release/latest_release/centos/flock/outputs/iso/

Last Pass
---------
Likely never

Timestamp/Logs
--------------
See above

Test Activity
-------------
Developer Testing

Workaround
----------
None

Tags: stx.config
summary: - Ansible-playbook failed while Saving config in sysinv database
+ stx4.0:Ansible-playbook failed while Saving config in sysinv database
Ghada Khalil (gkhalil)
tags: added: stx.config
Revision history for this message
Alexandru Dimofte (adimofte) wrote :

I see this issue on our master image 20210510T040411Z from today too.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: marking as low priority due to lack of activity

Changed in starlingx:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Peter Pikna (teraformer) wrote :

I have this issue too. I tried it with images releases : 5.0.1/ (21-Sep-2021 18:27) and 6.0.0/ (26-Jan-2022 19:34). Both have failed to this issue.

Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :

No progress on this bug for more than 2 years. Candidate for closure.

If there is no update, this issue is targeted to be closed as 'Won't Fix' in 2 weeks.

Revision history for this message
John Kung (john-kung) wrote :

As per prior comment, now Closed as Won't Fix

Changed in starlingx:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.