The following is a Root Cause Analysis (RCA) on why we needed to revert patch 772755
Summary:
- 16.1 - getting error "IndexError: list index out of range" when playbook does not contain group property.
- 16.1 - to reproduce on 16.1, using node-health.yaml as playbook
- Modify node-health.yaml and comment out groups:
- /usr/share/ansible/validation-playbooks/node-health.yaml
- Run command to show problem
- FIX - file utils.py was modified to report more info about the problem
- 16.1: ORIGINAL version iterates through and returns with above python fault.
- 16.1: FIXED - see box below - we changed things to return with informative error if/when groups are NOT FOUND.
- Merge to master - turns out, above patch was NOT necessary - since code in master was changed to return
an empty list if the key was not found.
- master: code in master was changed to return an empty list if the key was not found.
- For example - ceph-pg.yaml - contains an empty [] group list - which is valid.
- master: the empty [] group list is VALID - thus we need to remove our initial patch.
Detail:
- 16.1 - getting error "IndexError: list index out of range" when playbook does not contain group property.
- 16.1 - to reproduce on 16.1, using node-health.yaml as playbook
---
- hosts: undercloud
vars:
metadata:
name: Node health check description: |
Check if all overcloud nodes can be connected to before starting a
scale-up or an upgrade.
# groups:
# - pre-upgrade
roles:
- node_health
- Run command to show problem
openstack tripleo validator run --group pre-update
(undercloud) [stack@undercloud-0 ansible]$ openstack tripleo validator run --group pre-update
Exception occured while running the command
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
return super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
return_code = self.take_action(parsed_args) or 0
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_validator.py", line 388, in take_action
self._run_validator_run(parsed_args)
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_validator.py", line 364, in _run_validator_run
t.field_names = results[0].keys()
IndexError: list index out of range
list index out of range
- FIX - file utils.py was modified to report more info about the problem
- 16.1: ORIGINAL version iterates through and returns with above python fault.
def parse_all_validations_on_disk(path, groups=None):
"""
Return a list of validations metadata
Can be sorted by Groups
"""
results = []
validations_abspath = glob.glob("{path}/*.yaml".format(path=path))
if isinstance(groups, six.string_types):
groups = [groups]
for pl in validations_abspath:
val = Validation(pl)
+------------------------------------------------------+
|if not groups or set(groups).intersection(val.groups):|
| results.append(val.get_metadata) | +------------------------------------------------------+
return results
- 16.1: FIXED - see box below - we changed things to return with informative error if/when groups are NOT FOUND.
---
- hosts: undercloud
vars:
metadata:
name: Validate requested Ceph Placement Groups description: |
In Ceph Lumionus and newer the Placement Group overdose protection check
(https://ceph.com/community/new-luminous-pg-overdose-protection) is
executed by Ceph before a pool is created. If the check does not pass,
then the pool is not created. When TripleO deploys Ceph it triggers ceph-ansible which creates the pools that OpenStack needs. This validation runs the same check that the overdose protection uses to
determine if the user should update their CephPools, PG count, or number
of OSD. Without this check a deployer may have to wait until after Ceph
is running but before the pools are created to realize the deployment
will fail.
+----------+
|groups: []|
+----------+
tasks:
- include_role:
name: ceph tasks_from: ceph-pg
- master: the empty [] group list is VALID - thus we need to remove our initial patch.
- Root Cause Analysis (RCA) on why we needed to revert patch 772755
- PROBLEM: On master - patch 772755 is causing an error when a playbook contains an empty group list [] /review. opendev. org/c/openstack /validations- libs/+/ 772755 /review. opendev. org/c/openstack /validations- libs/+/ 772755
- 772755: Add an error message when groups is not found in the playbook
- https:/
- TESTED: 16.1: patch 772755 WORKED, as validation.py did not have the change to return an empty group list [] when a group is not found.
- TESTED: Master: patch 772755 FAILED
- SOLUTION: Revert patch https:/
The following is a Root Cause Analysis (RCA) on why we needed to revert patch 772755
Summary:
- 16.1 - getting error "IndexError: list index out of range" when playbook does not contain group property. ansible/ validation- playbooks/ node-health. yaml
- 16.1 - to reproduce on 16.1, using node-health.yaml as playbook
- Modify node-health.yaml and comment out groups:
- /usr/share/
- Run command to show problem
- FIX - file utils.py was modified to report more info about the problem
- 16.1: ORIGINAL version iterates through and returns with above python fault.
- 16.1: FIXED - see box below - we changed things to return with informative error if/when groups are NOT FOUND.
- Merge to master - turns out, above patch was NOT necessary - since code in master was changed to return
an empty list if the key was not found.
- master: code in master was changed to return an empty list if the key was not found.
- For example - ceph-pg.yaml - contains an empty [] group list - which is valid.
- master: the empty [] group list is VALID - thus we need to remove our initial patch.
Detail:
- 16.1 - getting error "IndexError: list index out of range" when playbook does not contain group property.
- 16.1 - to reproduce on 16.1, using node-health.yaml as playbook
- Modify node-health.yaml and comment out groups:
- /usr/share/ ansible/ validation- playbooks/ node-health. yaml
---
description: |
- hosts: undercloud
vars:
metadata:
name: Node health check
Check if all overcloud nodes can be connected to before starting a
scale-up or an upgrade.
# groups:
# - pre-upgrade
roles:
- node_health
- Run command to show problem
openstack tripleo validator run --group pre-update
(undercloud) [stack@undercloud-0 ansible]$ openstack tripleo validator run --group pre-update python3. 6/site- packages/ tripleoclient/ command. py", line 32, in run Command, self).run( parsed_ args) python3. 6/site- packages/ osc_lib/ command/ command. py", line 41, in run parsed_ args) python3. 6/site- packages/ cliff/command. py", line 185, in run action( parsed_ args) or 0 python3. 6/site- packages/ tripleoclient/ v1/tripleo_ validator. py", line 388, in take_action _run_validator_ run(parsed_ args) python3. 6/site- packages/ tripleoclient/ v1/tripleo_ validator. py", line 364, in _run_validator_run
Exception occured while running the command
Traceback (most recent call last):
File "/usr/lib/
super(
File "/usr/lib/
return super(Command, self).run(
File "/usr/lib/
return_code = self.take_
File "/usr/lib/
self.
File "/usr/lib/
t.field_names = results[0].keys()
IndexError: list index out of range
list index out of range
- FIX - file utils.py was modified to report more info about the problem
- 16.1: ORIGINAL version iterates through and returns with above python fault.
// version: 16.1 python3. 6/site- packages/ validations_ libs/utils. py
// file: /usr/lib/
def parse_all_ validations_ on_disk( path, groups=None): _abspath = glob.glob( "{path} /*.yaml" .format( path=path) )
"""
Return a list of validations metadata
Can be sorted by Groups
"""
results = []
validations
if isinstance(groups, six.string_types):
groups = [groups]
for pl in validations_ abspath:
val = Validation(pl)
|if not groups or set(groups)
| results.
return results
- 16.1: FIXED - see box below - we changed things to return with informative error if/when groups are NOT FOUND.
// version: master libs/validation s_libs/ utils.py /review. opendev. org/c/openstack /validations- libs/+/ 772755
// file: validations-
// gerrit patch: https:/
def parse_all_ validations_ on_disk( path, groups=None):
results = [] data(groups)
if not groups:
groups = []
else:
groups = convert_
validations _abspath = glob.glob( "{path} /*.yaml" .format( path=path) )
for pl in validations_ abspath:
val = Validation(pl)
|if not val.groups: |
| msg = 'Group not found in playbook - please add appropriate group'|
| raise RuntimeError(msg) |
if not groups or set(groups) .intersection( val.groups) :
results. append( val.get_ metadata)
return results
- We tested above fix manually on 16.1
- Merge to master - turns out, above patch was NOT necessary - since code in master was changed to return an empty list if the key was not found.
- master: code in master was changed to return an empty list if the key was not found.
// version: master libs/validation s_libs/ validation. py /opendev. org/openstack/ validations- libs/commit/ eb62054a336853c 3ea641d781a8b57 7abf407fca
// file: validations-
// diff/patch: https:/
@property
def groups(self):
- return self.dict[ 'vars'] ['metadata' ].get(' groups' )
+ |if self.has_
+ | groups = self.dict[
+ | if groups: |
+ | return groups |
+ | else: |
+ | return [] |
+ |else: |
+ | raise NameError( |
+ | "No metadata found in validation {}".format(self.id) |
+ | ) |
| |
- For example - ceph-pg.yaml - contains an empty [] group list - which is valid.
// version: master validations/ playbooks/ ceph-pg. yaml /review. opendev. org/c/openstack /tripleo- validations/ +/776874
// file: tripleo-
// gerrit patch: https:/
---
description: | /ceph.com/ community/ new-luminous- pg-overdose- protection) is
ceph- ansible which creates the pools that OpenStack needs. This
validation runs the same check that the overdose protection uses to
tasks_ from: ceph-pg
- hosts: undercloud
vars:
metadata:
name: Validate requested Ceph Placement Groups
In Ceph Lumionus and newer the Placement Group overdose protection check
(https:/
executed by Ceph before a pool is created. If the check does not pass,
then the pool is not created. When TripleO deploys Ceph it triggers
determine if the user should update their CephPools, PG count, or number
of OSD. Without this check a deployer may have to wait until after Ceph
is running but before the pools are created to realize the deployment
will fail.
+----------+
|groups: []|
+----------+
tasks:
- include_role:
name: ceph
- master: the empty [] group list is VALID - thus we need to remove our initial patch.
// version: master libs/validation s_libs/ utils.py /review. opendev. org/c/openstack /validations- libs/+/ 777463
// file: validations-
// gerrit patch: https:/
|if not val.groups: | <<< REMOVE
| msg = 'Group not found in playbook - please add appropriate group'| <<< REMOVE
| raise RuntimeError(msg) | <<< REMOVE