Modifying http_port fails silently causing config-out-of-date alarms to be raised and persist

Bug #2059402 reported by Andrew Vaillancourt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
New
Undecided
Unassigned

Bug Description

Brief Description
-----------------

Modifying the HTTP port via 'system service-parameter-modify http config http_port="8887"' fails somewhat silently in that it appears to be successful (no error message upon setting the new HTTP port as '8887' and service-parameter-list showing port as '8887' after issuing port modify command) but config-out-of-date alarm does NOT clear until setting the port back to 8080.

Appears to be related to a parsing issue of a list as a dict as indicated by ERROR tracebacks following the failure in sysinv:

https://opendev.org/starlingx/config/src/commit/ecdb0d3b9fa33b369830d0845fcef6a8b75d0624/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L11537

See ERROR tracebacks from /var/log/sysinv.log in 'Timestamps/Logs' section.

Severity
--------
Minor

Steps to Reproduce
------------------
....
TC-name: testcases/wrcp/regression/networking/test_calico_network_policy.py::test_calico_network_policy

1. system service-parameter-modify http config http_port="8887"
2. config out of date alarms appear for both controllers and do not clear
3. test teardown includes setting port back to original '8080'

Note: these are only 3 highlighted steps. Must refer to testcase for full procedure which may be relevant to recovery / and creating necessary conditions for failure.

Expected Behavior
------------------
Able to change http_port, config out of date controllers clear

Actual Behavior
----------------

System appears to sucessfully change http_port:

system service-parameter-modify http config http_port="8887"
+-------------+--------------------------------------+
| Property | Value |
+-------------+--------------------------------------+
| uuid | 8ecf412c-5361-4155-bc71-d321675daafd |
| service | http |
| section | config |
| name | http_port |
| value | 8887 |
| personality | None |
| resource | None |
+-------------+--------------------------------------+

New port '8887' reflected in system service-parameter-list

Configuration does not succeed and alarms do not clear (given upwards of 30m for config to clear 3/3 times).

Given the fact that sysinv reports the operation as a failure, the alarm not clearing without intervention is not surprising.

Reproducibility
---------------
Reproducible 8/8

Reviewing test case history, this has failed with the same sysinv error: "Change of system parameter HTTP failed" on all STX loads since 2024-03-06_19-00-09 on AIO-DX and AI0-SX. Could not find any other test runs on standard labs:

Lab / # repros:
  - R750_003_004 5
  - WRCP_SX_014 3

System Configuration
--------------------
AIO-DX - IPv4
Lab-name: r750_003_004

Branch/Pull Time/Commit
-----------------------
2024-03-20_19-00-10

Last Pass
---------
2024-03-06_19-00-09

Timestamp/Logs
--------------

sysinv 2024-03-26 18:00:58.663 450446 INFO sysinv.conductor.manager [-] Change of system parameter HTTP failed, error: {"class": "SysinvException", "module": "sysinv.common.exception", "message": "Failed to execute runtime manifest for host controller-0", "tb": ["Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 93, in puppet_apply_manifest\n subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable\n", " File \"/usr/lib/python3.9/subprocess.py\", line 373, in check_call\n raise CalledProcessError(retcode, cmd)\n", "subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/24.03/hieradata', 'controller-0', 'controller', 'runtime', '/tmp/tmptrxxtjto.yaml']' returned non-zero exit status 1.\n", "\nDuring handling of the above exception, another exception occurred:\n\n", "Traceback (most recent call last):\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 1976, in config_apply_runtime_manifest\n self._apply_runtime_manifest(config_dict, hieradata_path=hieradata_path)\n", " File \"/usr/lib/python3/dist-packages/sysinv/agent/manager.py\", line 2047, in _apply_runtime_manifest\n puppet.puppet_apply_manifest(self._hostname,\n", " File \"/usr/lib/python3/dist-packages/sysinv/puppet/common.py\", line 98, in puppet_apply_manifest\n raise exception.SysinvException(_(msg))\n", "sysinv.common.exception.SysinvException: Failed to execute runtime manifest for host controller-0\n"], "args": ["Failed to execute runtime manifest for host controller-0"], "kwargs": {"code": 500}}
sysinv 2024-03-26 18:00:59.692 5461 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host controller-1: subprocess.CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/opt/platform/puppet/24.03/hieradata', 'controller-1', 'controller', 'runtime', '/tmp/tmpwr_24mu0.yaml']' returned non-zero exit status 1.

<snip>

sysinv 2024-03-26 18:06:51.413 5673 INFO sysinv.agent.manager [-] Agent config applied 6349673d-d26c-48c9-a8fa-27bc7ccb48cd
sysinv 2024-03-26 18:06:51.477 5673 INFO sysinv.agent.manager [-] Caught exception _retry_on_config_exception. Retrying... Exception: Remote error: AttributeError 'list' object has no attribute 'get'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zerorpc/core.py", line 167, in _async_task
    functor.pattern.process_call(self._context, bufchan, event, functor)
  File "/usr/lib/python3/dist-packages/zerorpc/patterns.py", line 30, in process_call
    result = functor(*req_event.args, **req_event.kwargs)
  File "/usr/lib/python3/dist-packages/zerorpc/decorators.py", line 44, in __call__
    return self._functor(*args, **kargs)
  File "/usr/lib/python3/dist-packages/sysinv/zmq_rpc/zmq_rpc.py", line 42, in method
    retval = getattr(self.target, func)(context, **kwargs)
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 6519, in iconfig_update_by_ihost
    self.report_config_status(context, config_dict, status, error)
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10339, in report_config_status
    success = _process_config_report(
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 10226, in _process_config_report
    callback_success(*callback_success_args)
  File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 11379, in report_sysparam_http_update_success
    for helmrepo in helmrepo_list.get("items"):
AttributeError: 'list' object has no attribute 'get'

Test Activity
-------------
Regression Testing

Revision history for this message
Andrew Vaillancourt (availlancourt) wrote :
Ghada Khalil (gkhalil)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.