HAProxy marked swift as down for ~20 seconds after VIPs removal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
Medium
|
Michael Polenchuk |
Bug Description
Fuel version info (8.0 build #169): http://
Health check 'Check state of haproxy backends on controllers' failed after deleting of public and management VIPs:
2015-11-17 05:09:36 DEBUG (test_haproxy) Dead backends ['swift node-2 Status: DOWN 1/3/L7OSessions: 0 Rate: 0 ']
Steps to reproduce:
1. Delete 10 times public and management VIPs (ip netns exec haproxy ip addr del ${vip} dev b_management)
2. Wait while it is being restored
3. Verify it is restored
4. Run OSTF
Expected result: health checks passed
Actual result: check 'Check state of haproxy backends on controllers' failed
According to OSTF logs it used node-3 controller for checking HAProxy status. Here is the part of haproxy logs on node-3:
<129>Nov 17 05:09:24 node-3 haproxy[27338]: Server swift/node-2 is DOWN, reason: Layer7 timeout, check duration: 10001ms. 2 active and 0 backup servers left.
<133>Nov 17 05:09:43 node-3 haproxy[27338]: Server swift/node-2 is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 2030ms.
The same issue was also detected by haproxy on node-2:
<129>Nov 17 05:09:28 node-2 haproxy[28450]: Server swift/node-2 is DOWN, reason: Layer4 connection problem, info: "Invalid argument", check duration: 10001ms. 2 active and 0
active, 0 requeued, 0 remaining in queue.
<133>Nov 17 05:09:46 node-2 haproxy[28450]: Server swift/node-2 is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 2089ms.
I checked atop logs on node-2 and the node wasn't overloaded at that time. According to swift logs on node-2, object replication was running there at 05:09, so probably it could be a cause of the issue:
<46>Nov 17 05:09:19 node-2 swift-object-
<46>Nov 17 05:09:19 node-2 swift-object-
<46>Nov 17 05:09:19 node-2 swift-object-
<46>Nov 17 05:09:19 node-2 swift-object-
<46>Nov 17 05:09:19 node-2 swift-object-
<46>Nov 17 05:09:36 node-2 swift-container
<46>Nov 17 05:09:36 node-2 swift-container
<46>Nov 17 05:09:36 node-2 swift-container
<46>Nov 17 05:09:36 node-2 swift-container
<46>Nov 17 05:09:36 node-2 swift-container
<46>Nov 17 05:09:36 node-2 swift-container
<46>Nov 17 05:09:43 node-2 swift-account-
<46>Nov 17 05:09:43 node-2 swift-account-
<46>Nov 17 05:09:43 node-2 swift-account-
<46>Nov 17 05:09:43 node-2 swift-account-
<46>Nov 17 05:09:43 node-2 swift-account-
<46>Nov 17 05:09:43 node-2 swift-account-
<46>Nov 17 05:09:49 node-2 swift-object-
<46>Nov 17 05:09:49 node-2 swift-object-
<46>Nov 17 05:09:49 node-2 swift-object-
<46>Nov 17 05:09:49 node-2 swift-object-
<46>Nov 17 05:09:49 node-2 swift-object-
Diagnostic snapshot is attached.
tags: | added: area-library |
Changed in fuel: | |
assignee: | nobody → Fuel Library Team (fuel-library) |
importance: | Undecided → Medium |
Changed in fuel: | |
status: | New → Confirmed |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Michael Polenchuk (mpolenchuk) |
tags: |
added: area-library removed: area-qa |
tags: |
added: area-qa removed: area-library |
Please elaborate what is the exact subject of the reported issue? Is it 20 seconds downtime of several swift backends? Looks like no issue at all, there was 2 active anyway