Various ansible system cli commands fail due to wrapped output

Bug #2018409 reported by Jim Gauld
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Dipankar Kar

Bug Description

Brief Description
-----------------
Various configuration steps using ansible-playbooks can fail due to wrapped system CLI outputs.
There are a number of usages in multiple playbooks that can have unpredictable results.
For example, this issue causes IP address to be mangled during AIO-SX Optimized Restore, but there are many lurking instances of this throughout the ansible-playbooks code. This is not limited to the Optimized Restore.

For example, to enable AIO-SX Optimized Backup and restore, I had to make the following change in two places:
source /etc/platform/openrc
system addrpool-list | grep cluster-host-subnet | cut -d'|' -f8

This command needs "--nowrap" option otherwise the parsing will yield mangled IP addresses like "192.168." instead of "192.168.206.1", etc.

The following change solves the issue:
system addrpool-list --nowrap | grep system-controller-oam-subnet | cut -d'|' -f8

There are A BUNCH of 'system' commands that are missing '--nowrap'. There is a high probability for unexplained failures since wrapping depends on the specific data. I did NOT do an exhaustive search, but I have EASILY identified many other instances of the same issue for: rehome-subcloud, update-subcloud, update-sc-admin-subnet, optimized-restore. I was only looking at the 'addrpool-list', but really ALL system CLI commands should be audited and corrected where '--nowrap' is available.

There is also a possibility that various things will PASS but have corrupted values, this is unpredictable.

A thorough code audit must be performed in the ansible-playbooks repo to correct 'system' CLI command useages to addd '--nowrap'. Note that not all commands support --nowrap.

I was able to EASILY identify a subset of cases doing the following:
cd $REPO_ROOT/cgcs-root/stx/ansible-playbooks
grep -rs 'system addrpool-list' |grep -v nowrap
grep -rs 'system ' |grep -v nowrap

Enjoy.

Severity
--------
Critical: Various features will completely fail, with sometimes unintelligible ansible-playbooks traceback.

Steps to Reproduce
------------------
VDM VBox AIO-SX Optimized Backup and Restore.
The ansible traceback parsed an IP address as "192.168." which is invalid form and just wrong, since the IP address was wrapped in a text box.

Expected Behavior
------------------
AIO-SX Optimized Backup and Restore should PASS.
All system cli commands called by ansible should not use wrapped outputs.

Write down what was expected after taking the steps written above

Actual Behavior
----------------
Running the 'restore' of the Optimized B&R:
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/restore_platform.yml \
-e "restore_mode=optimized" \
-e "initial_backup_dir=/home/sysadmin" -e "ansible_become_pass=Li69nux*" -e "admin_password=Li69nux*" \
-e "backup_filename=localhost_platform_backup_2023_04_21_11_18_03.tgz" \
-e "restore_registry_filesystem=true"

This will fail here due to not having '--nowrap':

TASK [optimized-restore/restore-data : Configure controller host addresses] ****************************************************************************************************
Wednesday 19 April 2023 14:23:01 +0000 (0:00:04.419) 0:03:45.708 *******
fatal: [localhost]: FAILED! => changed=true
  cmd:
  - ip
  - addr
  - add
  - 192.168.
  - dev
  - lo
  - scope
  - host
  delta: '0:00:00.006106'
  end: '2023-04-19 14:23:01.761379'
  msg: non-zero return code
  rc: 1
  start: '2023-04-19 14:23:01.755273'
  stderr: 'Error: any valid prefix is expected rather than "192.168.".'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Note the output of the command with --nowrap :
[sysadmin@localhost ~(keystone_admin)]$ system addrpool-list --nowrap | grep cluster-host-subnet | cut -d'|' -f8
 192.168.206.1

Reproducibility
---------------
100% with VDM AIO-SX for Optimized B&R.
Don't know about the other system commands, hidden issues lurking due to this.

System Configuration
--------------------
AIO-SX, IPv4
Underlying issue is generic for all configs.

Branch/Pull Time/Commit
-----------------------
Current loads: Apr, May 2023.

Last Pass
---------
Unknown.

Timestamp/Logs
--------------
None provided.
The issue is clear by the Problem statement.

Test Activity
-------------
Developer Testing, Patchback testing of subcloud upgrades.

Workaround
----------
Partial workaround. Not all underlying issues due to this bug have yet been identified.
Workaround for Optimized Restore:
ostree unlock --hotfix, remount stages mounts
Manually change the ansible code to add the '--nowrap' option to various 'system' commands.

Ghada Khalil (gkhalil)
tags: added: stx.config
John Kung (john-kung)
Changed in starlingx:
assignee: nobody → Dipankar Kar (dipkar)
Revision history for this message
John Kung (john-kung) wrote :
Changed in starlingx:
importance: Undecided → Medium
status: New → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.9.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.