Comment 0 for bug 2059402

Revision history for this message
Andrew Vaillancourt (availlancourt) wrote :

Brief Description
-----------------

Modifying the HTTP port via 'system service-parameter-modify http config http_port="8887"' fails somewhat silently in that it appears to be successful (no error message upon setting the new HTTP port as '8887' and service-parameter-list showing port as '8887' after issuing port modify command) but config-out-of-date alarm does NOT clear until setting the port back to 8080.

Appears to be related to a parsing issue of a list as a dict as indicated by ERROR tracebacks following the failure in sysinv:

https://opendev.org/starlingx/config/src/commit/ecdb0d3b9fa33b369830d0845fcef6a8b75d0624/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L11537

See ERROR tracebacks from /var/log/sysinv.log in 'Timestamps/Logs' section.

Severity
--------
Major

Steps to Reproduce
------------------
....
TC-name: testcases/wrcp/regression/networking/test_calico_network_policy.py::test_calico_network_policy

1. system service-parameter-modify http config http_port="8887"
2. config out of date alarms appear for both controllers and do not clear
3. test teardown includes setting port back to original '8080'

Note: these are only 3 highlighted steps. Must refer to testcase for full procedure which may be relevant to recovery / and creating necessary conditions for failure.

Expected Behavior
------------------
Able to change http_port, config out of date controllers clear

Actual Behavior
----------------

System appears to sucessfully change http_port:

system service-parameter-modify http config http_port="8887"
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| uuid | 8ecf412c-5361-4155-bc71-d321675daafd |
| service | http |
| section | config |
| name | http_port |
| value | 8887 |
| personality | None |
| resource | None |
+-------------+--------------------------------------+

New port '8887' reflected in system service-parameter-list

Configuration does not succeed and alarms do not clear (given upwards of 30m for config to clear 3/3 times).

Given the fact that sysinv reports the operation as a failure, the alarm not clearing without intervention is not surprising.

Reproducibility
---------------
Reproducible 8/8

Reviewing test case history, this has failed with the same sysinv error: "Change of system parameter HTTP failed" on all STX loads since 2024-03-06_19-00-09 on AIO-DX and AI0-SX. Could not find any other test runs on standard labs:

Lab / # repros:
  - R750_003_004 5
  - WRCP_SX_014 3

System Configuration
--------------------
AIO-DX - IPv4
Lab-name: r750_003_004

Branch/Pull Time/Commit
-----------------------
2024-03-20_19-00-10

Last Pass
---------
2024-03-06_19-00-09

Timestamp/Logs
--------------

sysinv 2024-03-26 18:00:58.663 450446 INFO sysinv.conductor.manager [-] Change of system parameter HTTP failed, error: {"class": "SysinvException", "module": "sysinv.common.exception", "message": "Failed to execute runtime manifest for host controller-0", "tb": ["Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 93, in puppet_apply_manifest\n subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable\n", " File \"/usr/lib/python3.9/subprocess.py\", line 373, in check_call\n raise CalledProcessError(retcode, cmd)\n", "subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/24.03/hieradata', 'controller-0', 'controller', 'runtime', '/tmp/tmptrxxtjto.yaml']' returned non-zero exit status 1.\n", "\nDuring handling of the above exception, another exception occurred:\n\n", "Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 1976, in config_apply_runtime_manifest\n self._apply_runtime_manifest(config_dict, hieradata_path=hieradata_path)\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 2047, in _apply_runtime_manifest\n puppet.puppet_apply_manifest(self._hostname,\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 98, in puppet_apply_manifest\n raise exception.SysinvException(_(msg))\n", "sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host controller-0\n"], "args": ["Failed to execute runtime manifest for host controller-0"], "kwargs": {"code": 500}}
sysinv 2024-03-26 18:00:59.692 5461 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host controller-1: subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/24.03/hieradata', 'controller-1', 'controller', 'runtime', '/tmp/tmpwr_24mu0.yaml']' returned non-zero exit status 1.

<snip>

sysinv 2024-03-26 18:06:51.413 5673 INFO sysinv.agent.manager [-] Agent config applied 6349673d-d26c-48c9-a8fa-27bc7ccb48cd
sysinv 2024-03-26 18:06:51.477 5673 INFO sysinv.agent.manager [-] Caught exception _retry_on_config_exception. Retrying... Exception: Remote error: AttributeError 'list' object has no attribute 'get'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zerorpc/core.py", line 167, in _async_task
    functor.pattern.process_call(self._context, bufchan, event, functor)
  File "/usr/lib/python3/dist-packages/zerorpc/patterns.py", line 30, in process_call
    result = functor(*req_event.args, **req_event.kwargs)
  File "/usr/lib/python3/dist-packages/zerorpc/decorators.py", line 44, in __call__
    return self._functor(*args, **kargs)
  File "/usr/lib/python3/dist-packages/sysinv/zmq_rpc/zmq_rpc.py", line 42, in method
    retval = getattr(self.target, func)(context, **kwargs)
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 6519, in iconfig_update_by_ihost
    self.report_config_status(context, config_dict, status, error)
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10339, in report_config_status
    success = _process_config_report(
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10226, in _process_config_report
    callback_success(*callback_success_args)
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 11379, in report_sysparam_http_update_success
    for helmrepo in helmrepo_list.get("items"):
AttributeError: 'list' object has no attribute 'get'

Test Activity
-------------
Regression Testing