[tripleo-ha-utils] 'Remove resources associated to remote nodes' task doesn't have a good filter to identify target resources.

Bug #1839910 reported by Keigo Noha
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Keigo Noha

Bug Description

While rollback of instance HA, the operation fails at 'Remove resources associated to remote nodes'.
The error is
~~~
TASK [instance-ha : Remove resources associated to remote nodes] *******************************************************************************************************************
fatal: [undercloud-0 -> controller-2]: FAILED! =>{"changed": true,"cmd":"for resourceid in $(pcs resource show | grep compute | grep -v -e Stopped: -e Started: -e disabled -e remote | awk '{print $3}')\n do\npcs resource cleanup $resourceid\n pcs --force resource delete $resourceid\n done","delta": "0:00:03.667994","end": "2019-08-13 11:02:15.318707","failed": true,"msg": "non-zero return code","rc": 1,"start": "2019-08-13 11:02:11.650713","stderr":"Error: Unable to forget failed operations of resource: FAILED\nResource 'FAILED' not found\nError performing operation: No such device or address\nError: Resource 'FAILED' does not exist.\nError: Unable to forget failed operations of resource: FAILED\nResource 'FAILED' not found\nError performing operation: No such device or address\nError: Resource 'FAILED' does not exist.","stderr_lines": ["Error: Unable to forget failed operations of resource: FAILED","Resource 'FAILED' not found","Error performing operation: No such device or address","Error: Resource 'FAILED' does not exist.","Error: Unable to forget failed operations of resource: FAILED","Resource 'FAILED' not found","Error performing operation: No such device or address","Error: Resource 'FAILED' does not exist."],"stdout":"Cleaned up nova-compute-checkevacuate:0 on compute-1\nCleaned up nova-compute-checkevacuate:1 on compute-0\nCleaned up nova-compute-checkevacuate:2 on compute-1\nCleaned up nova-compute-checkevacuate:2 on compute-0\nCleaned up nova-compute-checkevacuate:2 on controller-2\nCleaned up nova-compute-checkevacuate:2 on controller-1\nCleaned up nova-compute-checkevacuate:2 on controller-0\nCleaned up nova-compute-checkevacuate:3 on compute-1\nCleaned up nova-compute-checkevacuate:3 on compute-0\nCleaned up nova-compute-checkevacuate:3 on controller-2\nCleaned up nova-compute-checkevacuate:3 on controller-1\nCleaned up nova-compute-checkevacuate:3 on controller-0\nCleaned up nova-compute-checkevacuate:4 on compute-1\nCleaned up nova-compute-checkevacuate:4 on compute-0\nCleaned up nova-compute-checkevacuate:4 on controller-2\nCleaned up nova-compute-checkevacuate:4 on controller-1\nCleaned up nova-compute-checkevacuate:4 on controller-0\nRemoving Constraint - location-nova-compute-checkevacuate-clone\nRemoving Constraint - order-nova-compute-checkevacuate-clone-nova-compute-clone-mandatory\nDeleting Resource - nova-compute-checkevacuate\nCleaned up nova-compute:0 on compute-1\nCleaned up nova-compute:1 on compute-0\nCleaned up nova-compute:2 on compute-1\nCleaned up nova-compute:2 on compute-0\nCleaned up nova-compute:2 on controller-2\nCleaned up nova-compute:2 on controller-1\nCleaned up nova-compute:2 on controller-0\nCleaned up nova-compute:3 on compute-1\nCleaned up nova-compute:3 on compute-0\nCleaned up nova-compute:3 on controller-2\nCleaned up nova-compute:3 on controller-1\nCleaned up nova-compute:3 on controller-0\nCleaned up nova-compute:4 on compute-1\nCleaned up nova-compute:4 on compute-0\nCleaned up nova-compute:4 on controller-2\nCleaned up nova-compute:4 on controller-1\nCleaned up nova-compute:4 on controller-0\nWaiting for 2 replies from the CRMd.. OK\nRemoving Constraint - location-nova-compute-clone\nRemoving Constraint - order-nova-compute-clone-nova-evacuate-mandatory\nDeleting Resource - nova-compute","stdout_lines": ["Cleaned up nova-compute-checkevacuate:0 on compute-1", "Cleaned up nova-compute-checkevacuate:1 on compute-0", "Cleaned up nova-compute-checkevacuate:2 on compute-1", "Cleaned up nova-compute-checkevacuate:2 on compute-0", "Cleaned up nova-compute-checkevacuate:2 on controller-2", "Cleaned up nova-compute-checkevacuate:2 on controller-1", "Cleaned up nova-compute-checkevacuate:2 on controller-0", "Cleaned up nova-compute-checkevacuate:3 on compute-1", "Cleaned up nova-compute-checkevacuate:3 on compute-0", "Cleaned up nova-compute-checkevacuate:3 on controller-2", "Cleaned up nova-compute-checkevacuate:3 on controller-1", "Cleaned up nova-compute-checkevacuate:3 on controller-0", "Cleaned up nova-compute-checkevacuate:4 on compute-1", "Cleaned up nova-compute-checkevacuate:4 on compute-0", "Cleaned up nova-compute-checkevacuate:4 on controller-2", "Cleaned up nova-compute-checkevacuate:4 on controller-1", "Cleaned up nova-compute-checkevacuate:4 on controller-0", "Removing Constraint - location-nova-compute-checkevacuate-clone", "Removing Constraint - order-nova-compute-checkevacuate-clone-nova-compute-clone-mandatory", "Deleting Resource - nova-compute-checkevacuate", "Cleaned up nova-compute:0 on compute-1", "Cleaned up nova-compute:1 on compute-0", "Cleaned up nova-compute:2 on compute-1", "Cleaned up nova-compute:2 on compute-0", "Cleaned up nova-compute:2 on controller-2", "Cleaned up nova-compute:2 on controller-1", "Cleaned up nova-compute:2 on controller-0", "Cleaned up nova-compute:3 on compute-1", "Cleaned up nova-compute:3 on compute-0", "Cleaned up nova-compute:3 on controller-2", "Cleaned up nova-compute:3 on controller-1", "Cleaned up nova-compute:3 on controller-0", "Cleaned up nova-compute:4 on compute-1", "Cleaned up nova-compute:4 on compute-0", "Cleaned up nova-compute:4 on controller-2", "Cleaned up nova-compute:4 on controller-1", "Cleaned up nova-compute:4 on controller-0", "Waiting for 2 replies from the CRMd.. OK", "Removing Constraint - location-nova-compute-clone", "Removing Constraint - order-nova-compute-clone-nova-evacuate-mandatory", "Deleting Resource - nova-compute"]}
~~~

The current task is defined as
~~~
    - name: Remove resources associated to remote nodes
      shell: |
        for resourceid in $(pcs resource show | grep compute | grep -v -e Stopped: -e Started: -e disabled -e remote | awk '{print $3}')
         do
          pcs resource cleanup $resourceid
          pcs --force resource delete $resourceid
         done
~~~

However, the filter in grep command is insufficient. There is a case that the filter matches the starting resources. Then it will return 'Starting' word.

pcs resource show when the issue happens is below.
~~~
        " ip-10.0.0.109\t(ocf::heartbeat:IPaddr2):\tStarted controller-2",
        " ip-172.17.3.14\t(ocf::heartbeat:IPaddr2):\tStarted controller-0",
        " Clone Set: haproxy-clone [haproxy]",
        " Started: [ controller-0 controller-1 controller-2 ]",
        " Stopped: [ compute-0 compute-1 ]",
        " Master/Slave Set: galera-master [galera]",
        " Masters: [ controller-0 controller-1 controller-2 ]",
        " Stopped: [ compute-0 compute-1 ]",
        " ip-192.168.24.9\t(ocf::heartbeat:IPaddr2):\tStarted controller-1",
        " ip-172.17.4.12\t(ocf::heartbeat:IPaddr2):\tStarted controller-2",
        " Clone Set: rabbitmq-clone [rabbitmq]",
        " Started: [ controller-0 controller-1 controller-2 ]",
        " Stopped: [ compute-0 compute-1 ]",
        " Master/Slave Set: redis-master [redis]",
        " Masters: [ controller-1 ]",
        " Slaves: [ controller-0 controller-2 ]",
        " Stopped: [ compute-0 compute-1 ]",
        " ip-172.17.1.11\t(ocf::heartbeat:IPaddr2):\tStarted controller-0",
        " ip-172.17.1.18\t(ocf::heartbeat:IPaddr2):\tStarted controller-1",
        " openstack-cinder-volume\t(systemd:openstack-cinder-volume):\tStarted controller-2",
        " nova-evacuate\t(ocf::openstack:NovaEvacuate):\tStopped",
        " Clone Set: nova-compute-checkevacuate-clone [nova-compute-checkevacuate]",
        " Started: [ compute-0 compute-1 ]",
        " Stopped: [ controller-0 controller-1 controller-2 ]",
        " Clone Set: nova-compute-clone [nova-compute]",
        " nova-compute\t(systemd:openstack-nova-compute):\tStarting compute-1",
        " nova-compute\t(systemd:openstack-nova-compute):\tStarting compute-0",
        " Stopped: [ controller-0 controller-1 controller-2 ]",
        " compute-1\t(ocf::pacemaker:remote):\tStarted controller-0",
        " compute-0\t(ocf::pacemaker:remote):\tStarted controller-1"
~~~

In this task, the target resources are 'Clone Set'.
So, it looks that we should change the filter to use 'Clone Set'.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ha-utils (master)

Fix proposed to branch: master
Review: https://review.opendev.org/676092

Changed in tripleo:
assignee: nobody → Keigo Noha (knoha)
status: New → In Progress
Keigo Noha (knoha)
summary: - [triple-ha-utils] 'Remove resources associated to remote nodes' task
+ [tripleo-ha-utils] 'Remove resources associated to remote nodes' task
doesn't have a good filter to identify target resources.
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ha-utils (master)

Reviewed: https://review.opendev.org/676092
Committed: https://git.openstack.org/cgit/openstack/tripleo-ha-utils/commit/?id=928e74d86e9558c31aab881eb88fb85bbd497afa
Submitter: Zuul
Branch: master

commit 928e74d86e9558c31aab881eb88fb85bbd497afa
Author: Keigo Noha <email address hidden>
Date: Tue Aug 13 12:54:48 2019 +0900

    Modify grep rule to identify 'Clone Set' resources

    Current filter rule gets wrong results in transient condition.
    This commit changes the grep filter rule to increase the robustness of
    the result.

    Closes-Bug: #1839910

    Change-Id: I92aba5bc7eb9e8a90d6a2c321b3a2a20414f1779

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.