Comment 0 for bug 1917785

Revision history for this message
Owen McGonagle (omcgonag) wrote :

- Root Cause Analysis (RCA) on why we needed to revert patch 772755

- PROBLEM: On master - patch 772755 is causing an error when a playbook contains an empty group list []
  - 772755: Add an error message when groups is not found in the playbook
  - https://review.opendev.org/c/openstack/validations-libs/+/772755
- TESTED: 16.1: patch 772755 WORKED, as validation.py did not have the change to return an empty group list [] when a group is not found.
- TESTED: Master: patch 772755 FAILED
- SOLUTION: Revert patch https://review.opendev.org/c/openstack/validations-libs/+/772755

The following is a Root Cause Analysis (RCA) on why we needed to revert patch 772755

Summary:

- 16.1 - getting error "IndexError: list index out of range" when playbook does not contain group property.
- 16.1 - to reproduce on 16.1, using node-health.yaml as playbook
- Modify node-health.yaml and comment out groups:
- /usr/share/ansible/validation-playbooks/node-health.yaml
- Run command to show problem
- FIX - file utils.py was modified to report more info about the problem
- 16.1: ORIGINAL version iterates through and returns with above python fault.
- 16.1: FIXED - see box below - we changed things to return with informative error if/when groups are NOT FOUND.
- Merge to master - turns out, above patch was NOT necessary - since code in master was changed to return
  an empty list if the key was not found.
- master: code in master was changed to return an empty list if the key was not found.
- For example - ceph-pg.yaml - contains an empty [] group list - which is valid.
- master: the empty [] group list is VALID - thus we need to remove our initial patch.

Detail:

- 16.1 - getting error "IndexError: list index out of range" when playbook does not contain group property.

- 16.1 - to reproduce on 16.1, using node-health.yaml as playbook

- Modify node-health.yaml and comment out groups:

- /usr/share/ansible/validation-playbooks/node-health.yaml

  ---
  - hosts: undercloud
    vars:
      metadata:
        name: Node health check
        description: |
          Check if all overcloud nodes can be connected to before starting a
          scale-up or an upgrade.
  # groups:
  # - pre-upgrade
    roles:
      - node_health

- Run command to show problem

  openstack tripleo validator run --group pre-update

  (undercloud) [stack@undercloud-0 ansible]$ openstack tripleo validator run --group pre-update
  Exception occured while running the command
  Traceback (most recent call last):
    File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
      super(Command, self).run(parsed_args)
    File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
      return super(Command, self).run(parsed_args)
    File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
      return_code = self.take_action(parsed_args) or 0
    File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_validator.py", line 388, in take_action
      self._run_validator_run(parsed_args)
    File "/usr/lib/python3.6/site-packages/tripleoclient/v1/tripleo_validator.py", line 364, in _run_validator_run
      t.field_names = results[0].keys()
  IndexError: list index out of range
  list index out of range

- FIX - file utils.py was modified to report more info about the problem

- 16.1: ORIGINAL version iterates through and returns with above python fault.

  // version: 16.1
  // file: /usr/lib/python3.6/site-packages/validations_libs/utils.py

  def parse_all_validations_on_disk(path, groups=None):
    """
        Return a list of validations metadata
        Can be sorted by Groups
    """
    results = []
    validations_abspath = glob.glob("{path}/*.yaml".format(path=path))
    if isinstance(groups, six.string_types):
        groups = [groups]

    for pl in validations_abspath:
        val = Validation(pl)

       +------------------------------------------------------+
       |if not groups or set(groups).intersection(val.groups):|
       | results.append(val.get_metadata) |
       +------------------------------------------------------+

    return results

- 16.1: FIXED - see box below - we changed things to return with informative error if/when groups are NOT FOUND.

  // version: master
  // file: validations-libs/validations_libs/utils.py
  // gerrit patch: https://review.opendev.org/c/openstack/validations-libs/+/772755

  def parse_all_validations_on_disk(path, groups=None):

    results = []
    if not groups:
        groups = []
    else:
        groups = convert_data(groups)

    validations_abspath = glob.glob("{path}/*.yaml".format(path=path))

    for pl in validations_abspath:
        val = Validation(pl)

       +----------------------------------------------------------------------+
       |if not val.groups: |
       | msg = 'Group not found in playbook - please add appropriate group'|
       | raise RuntimeError(msg) |
       +----------------------------------------------------------------------+

        if not groups or set(groups).intersection(val.groups):
            results.append(val.get_metadata)
    return results

- We tested above fix manually on 16.1

- Merge to master - turns out, above patch was NOT necessary - since code in master was changed to return an empty list if the key was not found.

- master: code in master was changed to return an empty list if the key was not found.

  // version: master
  // file: validations-libs/validations_libs/validation.py
  // diff/patch: https://opendev.org/openstack/validations-libs/commit/eb62054a336853c3ea641d781a8b577abf407fca

   @property
    def groups(self):

    - return self.dict['vars']['metadata'].get('groups')

       +-------------------------------------------------------------+
    + |if self.has_metadata_dict: |
    + | groups = self.dict['vars']['metadata'].get('groups') |
    + | if groups: |
    + | return groups |
    + | else: |
    + | return [] |
    + |else: |
    + | raise NameError( |
    + | "No metadata found in validation {}".format(self.id) |
    + | ) |
       | |
       +-------------------------------------------------------------+

- For example - ceph-pg.yaml - contains an empty [] group list - which is valid.

  // version: master
  // file: tripleo-validations/playbooks/ceph-pg.yaml
  // gerrit patch: https://review.opendev.org/c/openstack/tripleo-validations/+/776874

  ---
  - hosts: undercloud
    vars:
      metadata:
        name: Validate requested Ceph Placement Groups
        description: |
          In Ceph Lumionus and newer the Placement Group overdose protection check
          (https://ceph.com/community/new-luminous-pg-overdose-protection) is
          executed by Ceph before a pool is created. If the check does not pass,
          then the pool is not created. When TripleO deploys Ceph it triggers
          ceph-ansible which creates the pools that OpenStack needs. This
          validation runs the same check that the overdose protection uses to
          determine if the user should update their CephPools, PG count, or number
          of OSD. Without this check a deployer may have to wait until after Ceph
          is running but before the pools are created to realize the deployment
          will fail.
       +----------+
       |groups: []|
       +----------+
    tasks:
      - include_role:
          name: ceph
          tasks_from: ceph-pg

- master: the empty [] group list is VALID - thus we need to remove our initial patch.

  // version: master
  // file: validations-libs/validations_libs/utils.py
  // gerrit patch: https://review.opendev.org/c/openstack/validations-libs/+/777463

       +----------------------------------------------------------------------+ <<< REMOVE
       |if not val.groups: | <<< REMOVE
       | msg = 'Group not found in playbook - please add appropriate group'| <<< REMOVE
       | raise RuntimeError(msg) | <<< REMOVE
       +----------------------------------------------------------------------+ <<< REMOVE