OVS agents were declared dead due to controller swact
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Bart Wensley |
Bug Description
Title
-----
OVS agents were declared dead due to controller swact
Brief Description
-----------------
On one lock operation for controller-1 (at 2019-02-
2019-02-26 23:48:27,659.659 22 WARNING neutron.
Open vSwitch agent 2019-02-26 23:47:12 compute-0
Open vSwitch agent 2019-02-26 23:47:12 compute-1
I think this happened because the agents failed to report their state due to a messaging timeout:
2019-02-26 23:48:43,650.650 121 ERROR neutron.
I expect this happened due to a temporary rabbitmq outage when the rabbitmq pod was deleted on controller-1 due to the lock. Someone from the neutron team should look at this - we may need to make this more tolerant of temporary messaging interruptions.
Severity
--------
Major - This results in VM getting migrated unnecessarily. They should not be migrated on controller operations.
Steps to Reproduce
------------------
Repeated controller lock/unlock operations (with swact in between).
Expected Behavior
------------------
The neutron server should be tolerant of a very short rabbitmq outage and not declare the OVS agents to be dead.
Actual Behavior
----------------
See above
Reproducibility
---------------
Intermittent - only saw on one out of eight lock/unlocks.
System Configuration
-------
2 + 2 system (kubernetes)
Branch/Pull Time/Commit
-------
OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="f/stein"
JOB="STX_
<email address hidden>"
BUILD_NUMBER="54"
BUILD_HOST=
BUILD_DATE=
Timestamp/Logs
--------------
See above.
Changed in starlingx: | |
assignee: | Joseph Richard (josephrichard) → Bart Wensley (bartwensley) |
tags: |
added: stx.2.0 removed: stx.2019.05 |
Marking as release gating; medium priority as the issue is intermittent.
If a neutron change is required, a neutron launchpad will be needed.