armada raises a 404 exception on a transient resource

Bug #1948850 reported by Iago Filipe
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Iago Filipe

Bug Description

Brief Description
-----------------
Application reapply failed due to a non-existing job while trying to delete it. During an application reapply armada receives a 404 response from the Kubernetes API for a resource (job in this case) that does not exist. This happens because the job has a lifecycle, every x minutes the job disappears. Armada must handle the response "gracefully" and continue to execute the application reapply if the resource was not found.

Severity
--------
Minor

Steps to reproduce
------------------
Execute an application reapply.

Actual Behavior
----------------
Breaking the application reapply if the job was not found.

Expected Behavior
----------------
Don't break the application reapply if the job was not found.

Reproducibility
---------------
Intermittent.

System Configuration
--------------------
Duplex

Branch/Pull Time/Commit
-----------------------
Unknown

Last Pass
---------
Unknown

Timestamp/Logs
--------------
2021-09-18 15:15:27.699 1075 INFO armada.handlers.tiller [-] [chart=openstack-cinder]: Deleting job cinder-volume-usage-audit-1631977200 in namespace: openstack
2021-09-18 15:15:27.699 1075 DEBUG armada.handlers.k8s [-] [chart=openstack-cinder]: Watching to delete job cinder-volume-usage-audit-1631977200, Wait timeout=1800 _delete_item_action /usr/local/lib/python3.6/dist-packages/armada/handlers/k8s.py:155
2021-09-18 15:15:27.716 1075 ERROR armada.handlers.k8s [-] [chart=openstack-cinder]: Exception when deleting job: name=cinder-volume-usage-audit-1631977200, namespace=openstack: kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found

Workaround
----------
Reapply the application.

Iago Filipe (ifest1)
Changed in starlingx:
assignee: nobody → Iago Filipe (ifest1)
Iago Filipe (ifest1)
description: updated
description: updated
Iago Filipe (ifest1)
description: updated
summary: - application reapply breaking: job was not found
+ armada raises a 404 exception on a transient resource
Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by "Iago Filipe <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/814906
Reason: https://opendev.org/starlingx/integ/src/commit/7e4d52bc3b305279fbbff56e7a8abf6f2e590f06/kubernetes/armada/centos/files/0001-Add-Helm-v2-client-initialization-using-tiller-postS.patch specifies the symbolic link creation inside the tiller container, thus no need to update the helmv2-cli anymore.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Iago Filipe, Given the review associated with this LP was abandoned, should this LP be marked as "Rejected"? It appears that a software change in starlingx is not required. Please update the status accordingly. Thanks.

Changed in starlingx:
importance: Undecided → Low
Revision history for this message
Iago Filipe (ifest1) wrote :

@Ghada Khalil, we didn't have to change anything on starlingx, so yes, we can close it unless we want to address the armada bug on this launchpad.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Iago Filipe, I am not sure what you mean by "address the armada bug on this launchpad". I would expect that you need to have the bug reported to the armada project, not use a starlingx LP.

Iago Filipe (ifest1)
Changed in starlingx:
status: In Progress → Invalid
Changed in starlingx:
status: Invalid → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/815008
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/f48fa1ac6ef50c77e8541f56b15c2cdeed250c87
Submitter: "Zuul (22348)"
Branch: master

commit f48fa1ac6ef50c77e8541f56b15c2cdeed250c87
Author: Iago Estrela <email address hidden>
Date: Thu Oct 21 11:01:54 2021 -0300

    Update Armada image version to ddbdd72

    This change updates StarlingX to use the Armada image based on Helm2
    branch and fixes bug[1].

    Test Plan:
    PASS: Verify that application apply doesn't break
    PASS: Verify that application upload doesn't break
    PASS: Verify that application remove doesn't break
    PASS: Verify that application delete doesn't break
    PASS: Lock standby controller and reapply all apps
    PASS: Unlock standby controller and reapply all apps
    PASS: Swact and repeat the previous two tests

    Regression:
    PASS: Verify StarlingX build and install
    PASS: Verify Armada logs for unexpected errors
    PASS: Verify that Armada pod has no errors

    [1] https://review.opendev.org/c/airship/armada/+/814747

    Closes-Bug: 1948850

    Signed-off-by: Iago Estrela <email address hidden>
    Change-Id: I5d5b537b77d6c8df383d28a7c4e809b69a3ac29d

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.