System host-swact has unexpected behavior

Bug #1879722 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Medium
Bin Qian

Bug Description

Brief Description
-----------------
In DX plus system, system tried to swatch host. After active controller lost connection and reconnected by floating IP, the active controller was not changed. After a short connection, the connection lost again. System reconnected to different host after 9 mins.

Severity
--------
Major

Steps to Reproduce
------------------
system host-swact

TC-name: z_containers/test_custom_containers.py::test_upload_charts_via_helm_upload

Expected Behavior
------------------
host-swact active controller

Actual Behavior
----------------
host-swact failed

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
Multi-node system

Lab-name: WP_8-12

Branch/Pull Time/Commit
-----------------------
2020-05-19_20-00-00

Last Pass
---------
2020-05-15_20-00-00

Timestamp/Logs
--------------
[2020-05-20 10:54:03,885] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-1'

2020-05-20 10:54:31,721] 436 DEBUG MainThread ssh.expect :: Output: packet_write_wait: Connection to 2620:10a:a001:a103::149 port 22: Broken pipe

[?1034hcontroller-1:~$
[2020-05-20 10:56:08,376] 1290 INFO MainThread ssh.connect :: Successfully connected to 2620:10a:a001:a103::149 from 128.224.151.254!
[2020-05-20 10:56:08,376] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-05-20 10:56:08,377] 314 DEBUG MainThread ssh.send :: Send 'export TMOUT=0'
[2020-05-20 10:56:08,491] 436 DEBUG MainThread ssh.expect :: Output:
controller-1:~$

[2020-05-20 10:58:21,055] 436 DEBUG MainThread ssh.expect :: Output: packet_write_wait: Connection to 2620:10a:a001:a103::149 port 22: Broken pipe

[2020-05-20 11:03:31,007] 1290 INFO MainThread ssh.connect :: Successfully connected to 2620:10a:a001:a103::149 from 128.224.151.254!
[2020-05-20 11:03:31,007] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-05-20 11:03:31,008] 314 DEBUG MainThread ssh.send :: Send 'export TMOUT=0'
[2020-05-20 11:03:31,122] 436 DEBUG MainThread ssh.expect :: Output:
controller-0:~$

Test Activity
-------------
Sanity

Tags: stx.5.0 stx.ha
Peng Peng (ppeng)
tags: added: stx.retestneeded
Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Brent Rowsell (brent-rowsell) wrote :

Can you clarify this statement

System reconnected to different host after 9 mins.

Revision history for this message
Yang Liu (yliu12) wrote :

I believe Peng meant after system host-swact cmd was sent, he was not able to ssh to the system via oam floating ip until 9 minutes later. Normally swact would take much less time than that.

There were 4 swacts done in this suite, and only one of them had this issue.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - intermittent issue w/ swact. Swact took 9mins. Should be reviewed in case there is a serious issue.

Changed in starlingx:
status: New → Invalid
status: Invalid → Triaged
importance: Undecided → Medium
assignee: nobody → Bin Qian (bqian20)
Ghada Khalil (gkhalil)
tags: added: stx.4.0 stx.ha
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Issue seen once and not reported since the original occurrence. Additionally the swact did complete, but took longer than expected. Based on these two reasons, this should not hold up stx.4.0. Moving to stx.5.0 to monitor and allow for further investigation of the available logs.

tags: added: stx.5.0
removed: stx.4.0
Revision history for this message
Bart Wensley (bartwensley) wrote :

I am closing this LP as "Won't fix". The issue has only been seen once and recovered automatically.

If the issue is seen again, please re-open the LP and provide new logs (the link to the existing logs is no longer working).

Changed in starlingx:
status: Triaged → Won't Fix
Ghada Khalil (gkhalil)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.