Swift hangs after controller maintenance mode
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Dmitry Bilunov | ||
8.0.x |
Invalid
|
High
|
Dmitry Bilunov |
Bug Description
Swift backend is marked as down in HAProxy after controller maintenance mode, so OSTF test fails:
2016-01-19 04:51:40 DEBUG (test_haproxy) Dead backends ['swift node-5 Status: DOWN/L7TOUT Sessions: 0 Rate: 0 ']
2016-01-19 04:51:40 ERROR (nose_storage_
Traceback (most recent call last):
File "/usr/lib/
yield
File "/usr/lib/
testMethod()
File "/usr/lib/
"Step 2 failed: Some haproxy backend has down state.")
File "/usr/lib/
self.
File "/usr/lib/
raise self.failureExc
AssertionError: Step 2 failed: Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
Steps to reproduce:
1. Create environment with 3 controllers
2. Switch on maintenance mode on 1 controller
3. Wait until controller is rebooting
4. Exit maintenance mode
5. Check the controller become available
6. Run OSTF HA tests
Expected result: all tests passed
Actual result: test for HAProxy backend fails, see the error below.
- Check state of haproxy backends on controllers (failure) Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
Some debug info:
root@node-5:~# haproxy-status.sh | grep swift
swift FRONTEND Status: OPEN Sessions: 0 Rate: 0
swift node-5 Status: DOWN/L7TOUT Sessions: 0 Rate: 0
swift node-1 Status: UP/L7OK Sessions: 0 Rate: 0
swift node-2 Status: UP/L7OK Sessions: 0 Rate: 0
swift BACKEND Status: UP Sessions: 0 Rate: 0
haproxy logs:
<129>Jan 19 04:13:10 node-5 haproxy[22681]: Server swift/node-5 is DOWN, reason: Layer7 timeout, check duration: 10001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
root@node-5:~# curl -v -m 3 -I 10.109.21.6:8080
* Rebuilt URL to: 10.109.21.6:8080/
* Hostname was NOT found in DNS cache
* Trying 10.109.21.6...
* Connected to 10.109.21.6 (10.109.21.6) port 8080 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 10.109.21.6:8080
> Accept: */*
>
* Operation timed out after 3001 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 3001 milliseconds with 0 bytes received
root@node-5:~# netstat -ltnp | grep 8080
tcp 0 0 10.109.21.6:8080 0.0.0.0:* LISTEN 5446/python
strace shows a lot of the next lines while attaching to swift-proxy-server process:
5446 1453180229.960847 select(0, NULL, NULL, NULL, {0, 9979}) = 0 (Timeout)
5446 1453180229.971009 wait4(0, 0x7ffc62630900, WNOHANG, NULL) = 0
looks like it hanged in a loop and doesn't check listening sockets, here are open file descriptors:
root@node-5:~# ls -la /proc/5446/fd
total 0
dr-x------ 2 root root 0 Jan 19 04:53 .
dr-xr-xr-x 9 swift swift 0 Jan 19 01:43 ..
lrwx------ 1 root root 64 Jan 19 04:53 0 -> /dev/null
lrwx------ 1 root root 64 Jan 19 04:53 1 -> /dev/null
lrwx------ 1 root root 64 Jan 19 04:53 10 -> socket:[871573]
lrwx------ 1 root root 64 Jan 19 04:53 11 -> socket:[871574]
lrwx------ 1 root root 64 Jan 19 04:53 12 -> socket:[871575]
lrwx------ 1 root root 64 Jan 19 04:53 13 -> socket:[871576]
lrwx------ 1 root root 64 Jan 19 04:53 14 -> socket:[871577]
lrwx------ 1 root root 64 Jan 19 04:53 15 -> socket:[871578]
lrwx------ 1 root root 64 Jan 19 04:53 16 -> socket:[871579]
lrwx------ 1 root root 64 Jan 19 04:53 2 -> /dev/null
lrwx------ 1 root root 64 Jan 19 04:53 3 -> socket:[870873]
lrwx------ 1 root root 64 Jan 19 04:53 4 -> socket:[870874]
lrwx------ 1 root root 64 Jan 19 04:53 5 -> anon_inode:
lrwx------ 1 root root 64 Jan 19 04:53 6 -> socket:[871571]
lrwx------ 1 root root 64 Jan 19 04:53 7 -> socket:[871483]
lrwx------ 1 root root 64 Jan 19 04:53 8 -> socket:[871572]
lr-x------ 1 root root 64 Jan 19 04:53 9 -> /dev/urandom
Diagnostic snapshot: https:/
Changed in fuel: | |
assignee: | nobody → Fuel Library Team (fuel-library) |
importance: | Undecided → High |
status: | New → Confirmed |
tags: | added: area-library |
tags: | added: team-bugfix |
Changed in fuel: | |
status: | Confirmed → Triaged |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Dmitry Bilunov (dbilunov) |
Changed in fuel: | |
status: | Triaged → In Progress |
Changed in fuel: | |
status: | In Progress → Invalid |
Note: after I sent SIGHUP to swift-proxy-server process it reloaded and became to work fine:
root@node-5:~# kill -1 5446
root@node-5:~# curl -v -m 3 -I 10.109.21.6:8080
* Rebuilt URL to: 10.109.21.6:8080/
* Hostname was NOT found in DNS cache
* Trying 10.109.21.6...
* Connected to 10.109.21.6 (10.109.21.6) port 8080 (#0)
> HEAD / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: 10.109.21.6:8080
> Accept: */*
>
< HTTP/1.1 404 Not Found
HTTP/1.1 404 Not Found
root@node-5:~# netstat -ltnp | grep 8080
tcp 0 0 10.109.21.6:8080 0.0.0.0:* LISTEN 14684/python
root@node-5:~# haproxy-status.sh | grep swift
swift FRONTEND Status: OPEN Sessions: 0 Rate: 0
swift node-5 Status: UP/L7OK Sessions: 0 Rate: 0
swift node-1 Status: UP/L7OK Sessions: 0 Rate: 0
swift node-2 Status: UP/L7OK Sessions: 0 Rate: 0
swift BACKEND Status: UP Sessions: 0 Rate: 0