Swift hangs after controller maintenance mode

Bug #1535733 reported by Artem Panchenko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Dmitry Bilunov
8.0.x
Invalid
High
Dmitry Bilunov

Bug Description

Swift backend is marked as down in HAProxy after controller maintenance mode, so OSTF test fails:

2016-01-19 04:51:40 DEBUG (test_haproxy) Dead backends ['swift node-5 Status: DOWN/L7TOUT Sessions: 0 Rate: 0 ']
2016-01-19 04:51:40 ERROR (nose_storage_plugin) fuel_health.tests.ha.test_haproxy.HAProxyCheck.test_001_check_state_of_backends
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 67, in testPartExecutor
    yield
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 601, in run
    testMethod()
  File "/usr/lib/python2.7/site-packages/fuel_health/tests/ha/test_haproxy.py", line 92, in test_001_check_state_of_backends
    "Step 2 failed: Some haproxy backend has down state.")
  File "/usr/lib/python2.7/site-packages/fuel_health/common/test_mixins.py", line 164, in verify_response_true
    self.fail(message.format(failed_step_msg, msg))
  File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail
    raise self.failureException(msg)
AssertionError: Step 2 failed: Some haproxy backend has down state.. Please refer to OpenStack logs for more details.

Steps to reproduce:

            1. Create environment with 3 controllers
            2. Switch on maintenance mode on 1 controller
            3. Wait until controller is rebooting
            4. Exit maintenance mode
            5. Check the controller become available
            6. Run OSTF HA tests

Expected result: all tests passed
Actual result: test for HAProxy backend fails, see the error below.

 - Check state of haproxy backends on controllers (failure) Some haproxy backend has down state.. Please refer to OpenStack logs for more details.

Some debug info:

root@node-5:~# haproxy-status.sh | grep swift
swift FRONTEND Status: OPEN Sessions: 0 Rate: 0
swift node-5 Status: DOWN/L7TOUT Sessions: 0 Rate: 0
swift node-1 Status: UP/L7OK Sessions: 0 Rate: 0
swift node-2 Status: UP/L7OK Sessions: 0 Rate: 0
swift BACKEND Status: UP Sessions: 0 Rate: 0

haproxy logs:

<129>Jan 19 04:13:10 node-5 haproxy[22681]: Server swift/node-5 is DOWN, reason: Layer7 timeout, check duration: 10001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

root@node-5:~# curl -v -m 3 -I 10.109.21.6:8080
* Rebuilt URL to: 10.109.21.6:8080/
* Hostname was NOT found in DNS cache
* Trying 10.109.21.6...
* Connected to 10.109.21.6 (10.109.21.6) port 8080 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 10.109.21.6:8080
> Accept: */*
>
* Operation timed out after 3001 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 3001 milliseconds with 0 bytes received

root@node-5:~# netstat -ltnp | grep 8080
tcp 0 0 10.109.21.6:8080 0.0.0.0:* LISTEN 5446/python

strace shows a lot of the next lines while attaching to swift-proxy-server process:

5446 1453180229.960847 select(0, NULL, NULL, NULL, {0, 9979}) = 0 (Timeout)
5446 1453180229.971009 wait4(0, 0x7ffc62630900, WNOHANG, NULL) = 0

looks like it hanged in a loop and doesn't check listening sockets, here are open file descriptors:

root@node-5:~# ls -la /proc/5446/fd
total 0
dr-x------ 2 root root 0 Jan 19 04:53 .
dr-xr-xr-x 9 swift swift 0 Jan 19 01:43 ..
lrwx------ 1 root root 64 Jan 19 04:53 0 -> /dev/null
lrwx------ 1 root root 64 Jan 19 04:53 1 -> /dev/null
lrwx------ 1 root root 64 Jan 19 04:53 10 -> socket:[871573]
lrwx------ 1 root root 64 Jan 19 04:53 11 -> socket:[871574]
lrwx------ 1 root root 64 Jan 19 04:53 12 -> socket:[871575]
lrwx------ 1 root root 64 Jan 19 04:53 13 -> socket:[871576]
lrwx------ 1 root root 64 Jan 19 04:53 14 -> socket:[871577]
lrwx------ 1 root root 64 Jan 19 04:53 15 -> socket:[871578]
lrwx------ 1 root root 64 Jan 19 04:53 16 -> socket:[871579]
lrwx------ 1 root root 64 Jan 19 04:53 2 -> /dev/null
lrwx------ 1 root root 64 Jan 19 04:53 3 -> socket:[870873]
lrwx------ 1 root root 64 Jan 19 04:53 4 -> socket:[870874]
lrwx------ 1 root root 64 Jan 19 04:53 5 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Jan 19 04:53 6 -> socket:[871571]
lrwx------ 1 root root 64 Jan 19 04:53 7 -> socket:[871483]
lrwx------ 1 root root 64 Jan 19 04:53 8 -> socket:[871572]
lr-x------ 1 root root 64 Jan 19 04:53 9 -> /dev/urandom

Diagnostic snapshot: https://drive.google.com/file/d/0BzaZINLQ8-xkaF9RV0gzN1NSQmc/view?usp=sharing

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Note: after I sent SIGHUP to swift-proxy-server process it reloaded and became to work fine:

root@node-5:~# kill -1 5446
root@node-5:~# curl -v -m 3 -I 10.109.21.6:8080
* Rebuilt URL to: 10.109.21.6:8080/
* Hostname was NOT found in DNS cache
* Trying 10.109.21.6...
* Connected to 10.109.21.6 (10.109.21.6) port 8080 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 10.109.21.6:8080
> Accept: */*
>
< HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found

root@node-5:~# netstat -ltnp | grep 8080
tcp 0 0 10.109.21.6:8080 0.0.0.0:* LISTEN 14684/python
root@node-5:~# haproxy-status.sh | grep swift
swift FRONTEND Status: OPEN Sessions: 0 Rate: 0
swift node-5 Status: UP/L7OK Sessions: 0 Rate: 0
swift node-1 Status: UP/L7OK Sessions: 0 Rate: 0
swift node-2 Status: UP/L7OK Sessions: 0 Rate: 0
swift BACKEND Status: UP Sessions: 0 Rate: 0

Maciej Relewicz (rlu)
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
status: New → Confirmed
tags: added: area-library
tags: added: team-bugfix
Changed in fuel:
status: Confirmed → Triaged
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Bilunov (dbilunov)
Changed in fuel:
status: Triaged → In Progress
Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

Cannot reproduce this problem on build 465.

Changed in fuel:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.