Application apply failure during BMC test automation run

Bug #1875653 reported by Anujeyan Manokeran
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Anujeyan Manokeran

Bug Description

Brief Description
-----------------
        Application apply failure during the Automation run test on test_bmc_swact.py::test_bmc_swact[ipmi]. This was observed beginning of the test before the host-swact . As the automation logs captured “Unexpected process termination while application-apply” . After this message application apply was in failure state. Prior to this test BMC-host reset test cases was executed . test_bmc_host_reset[dynamic].

 [2020-04-26 20:15:51,129] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+----------------------------------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+----------------------------------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
| 20548942-b9cc-4fea-8b90-9926995ae20b | 400.003 | Evaluation license key will expire on 30-sep-2020; there are 157 days remaining in this evaluation | host=controller-1 | minor | 2020-04-26T20:15:21.740131 |
| c40514ab-4e98-49fa-b1dc-791a8363f977 | 400.003 | Evaluation license key will expire on 30-sep-2020; there are 157 days remaining in this evaluation | host=controller-0 | minor | 2020-04-26T20:15:19.582710 |
| 861cac6d-4165-47a1-b226-b2235999e404 | 100.114 | NTP address 64:ff9b::cc11:cd18 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::cc11:cd18 | minor | 2020-04-26T20:13:16.118204 |
| 98d7ee3c-73da-4057-8ff7-7e03c1dd794b | 100.114 | NTP address 64:ff9b::a29f:c801 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::a29f:c801 | minor | 2020-04-26T20:13:16.111961 |
| 6ac6bab8-f36c-46bd-8ef6-71ce5cf7bf56 | 100.114 | NTP address 64:ff9b::d806:246 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::d806:246 | minor | 2020-04-26T20:13:16.110186 |
| 18f39a79-6ae9-4b62-b703-987f992dcff8 | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2020-04-26T19:53:36.819981 |
+--------------------------------------+----------+----------------------------------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
controller-0:~$
[2020-04-26 20:15:51,130] 314 DEBUG MainThread ssh.send :: Send 'echo $?'

[2020-04-26 20:08:19,427] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'

[2020-04-26 20:14:04,488] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-1'

| 1eef601e-51e0-4732-a391-c1916bdf27c2 | 750.002 | Application Apply Failure | k8s_application=platform-integ-apps | major | 2020-04-26T20:20:27.280027 |

Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-04-26 20:15:52,884] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------------------------------------------------+
| cert-manager | 1.0-0 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | uploaded | Unexpected process termination while application-apply was in progress. The |
| | | | | | application status has changed from 'applying' to 'uploaded'. |
| | | | | | |
+--------------------------+---------+-----------------------------------+----------------------------------------+----------+---------------------------------------------------------------------------------+
controller-0:~$
[2020-04-26 20:15:52,884] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-04-26 20:15:52,987] 436 DEBUG MainThread ssh.expect :: Output:

me Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-04-26 20:21:32,282] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+----------------------------------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+----------------------------------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+
| 1eef601e-51e0-4732-a391-c1916bdf27c2 | 750.002 | Application Apply Failure | k8s_application=platform-integ-apps | major | 2020-04-26T20:20:27.280027 |
| 5b5eb497-6ce1-4cff-944d-13db73de45b2 | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-1.ntp | major | 2020-04-26T20:20:16.604167 |
| a64f13fa-1173-4196-afce-f2b115915264 | 100.114 | NTP address 64:ff9b::cdce:4602 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::cdce:4602 | minor | 2020-04-26T20:20:16.596451 |
| 7b201006-727a-4fa6-a236-c7eb6dfbaaf5 | 100.114 | NTP address 64:ff9b::d051:1f4 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::d051:1f4 | minor | 2020-04-26T20:20:16.590079 |
| ec36cd2a-e5bd-46a1-8deb-15ec699b746f | 100.114 | NTP address 64:ff9b::d837:d016 is not a valid or a reachable NTP server. | host=controller-1.ntp=64:ff9b::d837:d016 | minor | 2020-04-26T20:20:16.587748 |
| b1e2efc9-5068-41f8-a5d9-f420f55606ac | 700.016 | Multi-Node Recovery Mode | subsystem=vim | major | 2020-04-26T20:16:04.308050 |
| 20548942-b9cc-4fea-8b90-9926995ae20b | 400.003 | Evaluation license key will expire on 30-sep-2020; there are 157 days remaining in this evaluation | host=controller-1 | minor | 2020-04-26T20:15:21.740131 |
| c40514ab-4e98-49fa-b1dc-791a8363f977 | 400.003 | Evaluation license key will expire on 30-sep-2020; there are 157 days remaining in this evaluation | host=controller-0 | minor | 2020-04-26T20:15:19.582710 |
| 861cac6d-4165-47a1-b226-b2235999e404 | 100.114 | NTP address 64:ff9b::cc11:cd18 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::cc11:cd18 | minor | 2020-04-26T20:13:16.118204 |
| 98d7ee3c-73da-4057-8ff7-7e03c1dd794b | 100.114 | NTP address 64:ff9b::a29f:c801 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::a29f:c801 | minor | 2020-04-26T20:13:16.111961 |
| 6ac6bab8-f36c-46bd-8ef6-71ce5cf7bf56 | 100.114 | NTP address 64:ff9b::d806:246 is not a valid or a reachable NTP server. | host=controller-0.ntp=64:ff9b::d806:246 | minor | 2020-04-26T20:13:16.110186 |
| 18f39a79-6ae9-4b62-b703-987f992dcff8 | 100.114 | NTP configuration does not contain any valid or reachable NTP servers. | host=controller-0.ntp | major | 2020-04-26T19:53:36.819981 |
+--------------------------------------+----------+----------------------------------------------------------------------------------------------------+------------------------------------------+----------+----------------------------+

Severity
--------
Major

Steps to Reproduce
------------------
!. Prior to this test cases test_bmc_host_reset[dynamic] executed
   this test have below steps.
    1.lock all the computes.
    2 reset host using BMC
    3. unlock host

2. As the description says after “Unexpected process termination while application-apply” application apply was in failure state.

System Configuration
--------------------
wf-8-12 AIO+DX +worker.

Expected Behavior
------------------
No failure during application apply.

Actual Behavior
----------------
As description says no output. It was stuck state

Reproducibility
---------------
Not seen all the times previous run on regular lab this was not seen

Load
----
2020-04-25_13-17-56

Last Pass
---------
Timestamp/Logs
--------------
2020-04-26T20:20:27.280027

Test Activity
-------------
Regression test

description: updated
description: updated
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
tags: added: stx.retestneeded
Revision history for this message
Brent Rowsell (brent-rowsell) wrote :

So you were applying an app and during the apply swacted ?

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Automation was not applying the app . The system was applying the app after the previous test BMC host reset.According to logs application apply was in progress and never being successful.
I will change the above test steps because this was triggered by multiple host reboot which is from previous test .

description: updated
Revision history for this message
Brent Rowsell (brent-rowsell) wrote :

ok, but a swact happened while the app was being applied ?

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

There was no swact. Test bmc_swact.py::test_bmc_swact[ipmi] was aborted before the swact.

Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
tags: added: stx.4.0 stx.apps
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.containers
Changed in starlingx:
assignee: nobody → Frank Miller (sensfan22)
Revision history for this message
Frank Miller (sensfan22) wrote :

Marking this LP as invalid. The TC code will need to be updated to not attempt a controller swact if an application apply is in progress.

Here is the timeline for this issue:

1. controller-1 was active. All other hosts were locked due to the previous TC.
2. controller-0 and the other hosts were unlocked at 20:08:19
3. The system is designed to re-apply applications if needed after hosts are unlocked. The platform-integ-apps app was re-applied after controller-0 went unlocked enabled - sysinv.log:
sysinv 2020-04-26 20:13:21.934 322002 INFO sysinv.conductor.manager [-] There has been an overrides change, setting up reapply of platform-integ-apps
sysinv 2020-04-26 20:13:22.930 322002 INFO sysinv.conductor.manager [-] Reapplying platform-integ-apps app

4. Before the application apply is completed, the TC does a controller swact:
[2020-04-26 20:14:04,488] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-1'

5. This terminates the application apply in progress.

Changed in starlingx:
status: Triaged → Invalid
assignee: Frank Miller (sensfan22) → Anujeyan Manokeran (anujeyan)
Ghada Khalil (gkhalil)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.