standby controller not going active after forced reboot of active controller

Bug #1821050 reported by Chris Friesen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Bin Qian

Bug Description

Brief Description
-----------------
In a two-node system, when forcing the active controller to go into an uncontrolled reboot (by crashing the kernel via sysrq) the standby controller sometimes takes a long time (multiple minutes) to go active.

On possible issue is that I think the system in question has direct-connected mgmt/infra links.

Severity
--------
Major

Steps to Reproduce
------------------
On active controller, as root, run:
echo 1 > /proc/sys/kernel/sysrq; echo c > /proc/sysrq-trigger

Expected Behavior
------------------
The standby controller should go active.

Actual Behavior
----------------
The standby controller stayed standby for multiple minutes.

Reproducibility
---------------
Intermittent but frequent

System Configuration
--------------------
Two node system, and I think the mgmt/infra links are direct connect

Branch/Pull Time/Commit
-----------------------
<email address hidden>"
BUILD_NUMBER="6"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-03-05 06:00:00 +0000"

Timestamp/Logs
--------------
Timestamp is somewhere around 19:46:56

Tags: stx.2.0 stx.ha
Revision history for this message
Chris Friesen (cbf123) wrote :

controller-0 sm logs

Revision history for this message
Chris Friesen (cbf123) wrote :

controller-1 sm logs

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; issue related to ha

Changed in starlingx:
importance: Undecided → High
assignee: nobody → Bin Qian (bqian20)
status: New → Triaged
tags: added: stx.2019.05 stx.ha
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Revision history for this message
Bin Qian (bqian20) wrote :

The issue was not reproducible with newest build (May 14).

Revision history for this message
Bin Qian (bqian20) wrote :

Test on AIO-DX and AIO-DX/DC.

Bin Qian (bqian20)
Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.