When the push_images playbook failed, there is no indication from sysinv/fm (e.g. alarms)

Bug #1989373 reported by Lucas Borges
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Lucas Borges

Bug Description

Brief Description
When the push_images playbook failed, there is no indication from sysinv/fm (e.g. alarms)
------------------
Severity

Major

Steps to Reproduce
------------------
(This is kind of enhancement request)

Expected Behavior
------------------
When the push_images playbook failed, there should be certain indication from sysinv/fm (e.g. alarms)

Actual Behavior
----------------
E.g. when upgrade activation-failed because push_images playbook failed,
But there is no indication from sysinv/fm (e.g. alarms)

Reproducibility
---------------
Have seen it happen a few times, especially with upgrades.

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------

Last Pass
---------
N/A

Timestamp/Logs
--------------

log:
TASK [common/push-docker-images : Push imported images to local registry] ******
Friday 29 October 2021 18:34:31 +0000 (0:00:00.121) 0:00:23.290 ********
fatal: [localhost]: FAILED! =>
  msg: |-
    The conditional check 'images_archive_exists' failed. The error was: error while evaluating conditional (images_archive_exists): 'images_archive_exists' is undefined

    The error appears to have been in '/usr/share/ansible/stx-ansible/playbooks/roles/common/push-docker-images/tasks/main.yml': line 136, column 5, but may
    be elsewhere in the file depending on the exact syntax problem.

    The offending line appears to be:

    - block:
      - name: Push imported images to local registry
        ^ here

PLAY RECAP *********************************************************************
localhost : ok=50 changed=8 unreachable=0 failed=1

[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
+----------+------------------------------------------------------------------+----------------------+----------+---------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------+----------------------+----------+---------------+
| 280.002 | subcloud12 load sync_status is out-of-sync | subcloud=subcloud12. | major | 2021-10-29T18 |
| | | resource=load | | :34:47.326561 |
| | | | | |
| 280.002 | subcloud11 load sync_status is out-of-sync | subcloud=subcloud11. | major | 2021-10-29T18 |
| | | resource=load | | :34:46.124997 |
| | | | | |
| 100.104 | File System threshold exceeded ; threshold 80.00%, actual 86.05% | host=controller-1. | major | 2021-10-29T17 |
| | | filesystem=/ | | :59:27.549153 |
| | | | | |
| 900.005 | System Upgrade in progress. | host=controller | minor | 2021-10-29T17 |
| | | | | :57:45.663360 |
| | | | | |
+----------+------------------------------------------------------------------+----------------------+----------+---------------+

Test Activity
-------------
Feature Testing

Workaround
----------
Describe workaround if available

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.config
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Lucas Borges (lborges)
tags: added: stx.8.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/857227
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/2e96dde4f3516242a97c2abb62ef2cb6b302b811
Submitter: "Zuul (22348)"
Branch: master

commit 2e96dde4f3516242a97c2abb62ef2cb6b302b811
Author: Lucas Borges <email address hidden>
Date: Mon Sep 12 16:21:19 2022 -0300

    Raise alarm push_images playbook failed upgrade

    When an upgrade is performed and there is a network
    failure or something similar in the push_images playbook
    step, an alarm is created indicating that the problem is occurring
    For most tests, I simulate network issues to raise alarm.

    Test plan
      PASS: Run initial bootstrap for CentOS/Debian successfully
      PASS: Run unlock for CentOS/Debian successfully
      PASS: Run upgrade playbook
      PASS: Verified alarm set during bootstrap/upgrade
            (simulating network loss)
      PASS: Verified alarm cleared when network is recovered

    Closes-bug: 1989373

    Signed-off-by: Lucas Borges <email address hidden>
    Change-Id: I6f15e745f8cb061aca0cf04e240d9a17079b19ba

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.