Activity log for bug #2059402

Date Who What changed Old value New value Message
2024-03-28 09:30:13 Andrew Vaillancourt bug added bug
2024-03-28 09:30:13 Andrew Vaillancourt attachment added sysinv.log.1 https://bugs.launchpad.net/bugs/2059402/+attachment/5760201/+files/sysinv.log.1
2024-03-28 17:01:23 Ghada Khalil description Brief Description ----------------- Modifying the HTTP port via 'system service-parameter-modify http config http_port="8887"' fails somewhat silently in that it appears to be successful (no error message upon setting the new HTTP port as '8887' and service-parameter-list showing port as '8887' after issuing port modify command) but config-out-of-date alarm does NOT clear until setting the port back to 8080. Appears to be related to a parsing issue of a list as a dict as indicated by ERROR tracebacks following the failure in sysinv: https://opendev.org/starlingx/config/src/commit/ecdb0d3b9fa33b369830d0845fcef6a8b75d0624/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L11537 See ERROR tracebacks from /var/log/sysinv.log in 'Timestamps/Logs' section. Severity -------- Major Steps to Reproduce ------------------ .... TC-name: testcases/wrcp/regression/networking/test_calico_network_policy.py::test_calico_network_policy 1. system service-parameter-modify http config http_port="8887" 2. config out of date alarms appear for both controllers and do not clear 3. test teardown includes setting port back to original '8080' Note: these are only 3 highlighted steps. Must refer to testcase for full procedure which may be relevant to recovery / and creating necessary conditions for failure. Expected Behavior ------------------ Able to change http_port, config out of date controllers clear Actual Behavior ---------------- System appears to sucessfully change http_port: system service-parameter-modify http config http_port="8887" +-------------+--------------------------------------+ | Property | Value | +-------------+--------------------------------------+ | uuid | 8ecf412c-5361-4155-bc71-d321675daafd | | service | http | | section | config | | name | http_port | | value | 8887 | | personality | None | | resource | None | +-------------+--------------------------------------+ New port '8887' reflected in system service-parameter-list Configuration does not succeed and alarms do not clear (given upwards of 30m for config to clear 3/3 times). Given the fact that sysinv reports the operation as a failure, the alarm not clearing without intervention is not surprising. Reproducibility --------------- Reproducible 8/8 Reviewing test case history, this has failed with the same sysinv error: "Change of system parameter HTTP failed" on all STX loads since 2024-03-06_19-00-09 on AIO-DX and AI0-SX. Could not find any other test runs on standard labs: Lab / # repros: - R750_003_004 5 - WRCP_SX_014 3 System Configuration -------------------- AIO-DX - IPv4 Lab-name: r750_003_004 Branch/Pull Time/Commit ----------------------- 2024-03-20_19-00-10 Last Pass --------- 2024-03-06_19-00-09 Timestamp/Logs -------------- sysinv 2024-03-26 18:00:58.663 450446 INFO sysinv.conductor.manager [-] Change of system parameter HTTP failed, error: {"class": "SysinvException", "module": "sysinv.common.exception", "message": "Failed to execute runtime manifest for host controller-0", "tb": ["Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 93, in puppet_apply_manifest\n subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable\n", " File \"/usr/lib/python3.9/subprocess.py\", line 373, in check_call\n raise CalledProcessError(retcode, cmd)\n", "subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/24.03/hieradata', 'controller-0', 'controller', 'runtime', '/tmp/tmptrxxtjto.yaml']' returned non-zero exit status 1.\n", "\nDuring handling of the above exception, another exception occurred:\n\n", "Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 1976, in config_apply_runtime_manifest\n self._apply_runtime_manifest(config_dict, hieradata_path=hieradata_path)\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 2047, in _apply_runtime_manifest\n puppet.puppet_apply_manifest(self._hostname,\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 98, in puppet_apply_manifest\n raise exception.SysinvException(_(msg))\n", "sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host controller-0\n"], "args": ["Failed to execute runtime manifest for host controller-0"], "kwargs": {"code": 500}} sysinv 2024-03-26 18:00:59.692 5461 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host controller-1: subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/24.03/hieradata', 'controller-1', 'controller', 'runtime', '/tmp/tmpwr_24mu0.yaml']' returned non-zero exit status 1. <snip> sysinv 2024-03-26 18:06:51.413 5673 INFO sysinv.agent.manager [-] Agent config applied 6349673d-d26c-48c9-a8fa-27bc7ccb48cd sysinv 2024-03-26 18:06:51.477 5673 INFO sysinv.agent.manager [-] Caught exception _retry_on_config_exception. Retrying... Exception: Remote error: AttributeError 'list' object has no attribute 'get' Traceback (most recent call last): File "/usr/lib/python3/dist-packages/zerorpc/core.py", line 167, in _async_task functor.pattern.process_call(self._context, bufchan, event, functor) File "/usr/lib/python3/dist-packages/zerorpc/patterns.py", line 30, in process_call result = functor(*req_event.args, **req_event.kwargs) File "/usr/lib/python3/dist-packages/zerorpc/decorators.py", line 44, in __call__ return self._functor(*args, **kargs) File "/usr/lib/python3/dist-packages/sysinv/zmq_rpc/zmq_rpc.py", line 42, in method retval = getattr(self.target, func)(context, **kwargs) File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 6519, in iconfig_update_by_ihost self.report_config_status(context, config_dict, status, error) File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10339, in report_config_status success = _process_config_report( File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10226, in _process_config_report callback_success(*callback_success_args) File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 11379, in report_sysparam_http_update_success for helmrepo in helmrepo_list.get("items"): AttributeError: 'list' object has no attribute 'get' Test Activity ------------- Regression Testing Brief Description ----------------- Modifying the HTTP port via 'system service-parameter-modify http config http_port="8887"' fails somewhat silently in that it appears to be successful (no error message upon setting the new HTTP port as '8887' and service-parameter-list showing port as '8887' after issuing port modify command) but config-out-of-date alarm does NOT clear until setting the port back to 8080. Appears to be related to a parsing issue of a list as a dict as indicated by ERROR tracebacks following the failure in sysinv: https://opendev.org/starlingx/config/src/commit/ecdb0d3b9fa33b369830d0845fcef6a8b75d0624/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L11537 See ERROR tracebacks from /var/log/sysinv.log in 'Timestamps/Logs' section. Severity -------- Minor Steps to Reproduce ------------------ .... TC-name: testcases/wrcp/regression/networking/test_calico_network_policy.py::test_calico_network_policy 1. system service-parameter-modify http config http_port="8887" 2. config out of date alarms appear for both controllers and do not clear 3. test teardown includes setting port back to original '8080' Note: these are only 3 highlighted steps. Must refer to testcase for full procedure which may be relevant to recovery / and creating necessary conditions for failure. Expected Behavior ------------------ Able to change http_port, config out of date controllers clear Actual Behavior ---------------- System appears to sucessfully change http_port: system service-parameter-modify http config http_port="8887" +-------------+--------------------------------------+ | Property | Value | +-------------+--------------------------------------+ | uuid | 8ecf412c-5361-4155-bc71-d321675daafd | | service | http | | section | config | | name | http_port | | value | 8887 | | personality | None | | resource | None | +-------------+--------------------------------------+ New port '8887' reflected in system service-parameter-list Configuration does not succeed and alarms do not clear (given upwards of 30m for config to clear 3/3 times). Given the fact that sysinv reports the operation as a failure, the alarm not clearing without intervention is not surprising. Reproducibility --------------- Reproducible 8/8 Reviewing test case history, this has failed with the same sysinv error: "Change of system parameter HTTP failed" on all STX loads since 2024-03-06_19-00-09 on AIO-DX and AI0-SX. Could not find any other test runs on standard labs: Lab / # repros:   - R750_003_004 5   - WRCP_SX_014 3 System Configuration -------------------- AIO-DX - IPv4 Lab-name: r750_003_004 Branch/Pull Time/Commit ----------------------- 2024-03-20_19-00-10 Last Pass --------- 2024-03-06_19-00-09 Timestamp/Logs -------------- sysinv 2024-03-26 18:00:58.663 450446 INFO sysinv.conductor.manager [-] Change of system parameter HTTP failed, error: {"class": "SysinvException", "module": "sysinv.common.exception", "message": "Failed to execute runtime manifest for host controller-0", "tb": ["Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 93, in puppet_apply_manifest\n subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable\n", " File \"/usr/lib/python3.9/subprocess.py\", line 373, in check_call\n raise CalledProcessError(retcode, cmd)\n", "subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/24.03/hieradata', 'controller-0', 'controller', 'runtime', '/tmp/tmptrxxtjto.yaml']' returned non-zero exit status 1.\n", "\nDuring handling of the above exception, another exception occurred:\n\n", "Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 1976, in config_apply_runtime_manifest\n self._apply_runtime_manifest(config_dict, hieradata_path=hieradata_path)\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 2047, in _apply_runtime_manifest\n puppet.puppet_apply_manifest(self._hostname,\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 98, in puppet_apply_manifest\n raise exception.SysinvException(_(msg))\n", "sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host controller-0\n"], "args": ["Failed to execute runtime manifest for host controller-0"], "kwargs": {"code": 500}} sysinv 2024-03-26 18:00:59.692 5461 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host controller-1: subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/24.03/hieradata', 'controller-1', 'controller', 'runtime', '/tmp/tmpwr_24mu0.yaml']' returned non-zero exit status 1. <snip> sysinv 2024-03-26 18:06:51.413 5673 INFO sysinv.agent.manager [-] Agent config applied 6349673d-d26c-48c9-a8fa-27bc7ccb48cd sysinv 2024-03-26 18:06:51.477 5673 INFO sysinv.agent.manager [-] Caught exception _retry_on_config_exception. Retrying... Exception: Remote error: AttributeError 'list' object has no attribute 'get' Traceback (most recent call last):   File "/usr/lib/python3/dist-packages/zerorpc/core.py", line 167, in _async_task     functor.pattern.process_call(self._context, bufchan, event, functor)   File "/usr/lib/python3/dist-packages/zerorpc/patterns.py", line 30, in process_call     result = functor(*req_event.args, **req_event.kwargs)   File "/usr/lib/python3/dist-packages/zerorpc/decorators.py", line 44, in __call__     return self._functor(*args, **kargs)   File "/usr/lib/python3/dist-packages/sysinv/zmq_rpc/zmq_rpc.py", line 42, in method     retval = getattr(self.target, func)(context, **kwargs)   File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 6519, in iconfig_update_by_ihost     self.report_config_status(context, config_dict, status, error)   File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10339, in report_config_status     success = _process_config_report(   File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10226, in _process_config_report     callback_success(*callback_success_args)   File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 11379, in report_sysparam_http_update_success     for helmrepo in helmrepo_list.get("items"): AttributeError: 'list' object has no attribute 'get' Test Activity ------------- Regression Testing