Stale connections when VIP is moved

Bug #1376200 reported by Jan Provaznik
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Jan Provaznik

Bug Description

API services connect to mysql through VIP+haproxy. DB connection is kept open for next requests (IOW no new DB connection is done for each request coming to an API service). When API service and VIP is running on same machine and API service opens connection through haproxy to DB, then if VIP is moved to different machine (e.g. because of short network outage), then this "stale" connection from API service to DB (haproxy) remains open. If API service writes something to the socket of this connection it doesn't get expected "socket closed" error, but connection is presented as alive, but it doesn't get any response after writing to this socket so request times out.

Sockets of he connection API<->haproxy(DB) are visible in /proc/PID/fd of both API and haproxy processes, so this is probably why socket is still alive and writeable - I guess this is system feature, not bug.

If VIP is running on different machine than API service and VIP is moved, then API service gets expected "socket closed" error when trying to write again to the opened socket. IOW if connection API<->haproxy is not on same machine, then socket is not writeable. What looks like a bug is the fact that I get different behavior depending wheter connection is on local machine or to different machine.

Anyway the result is that because TripleO deploys API+haproxy on all controller nodes by default, then there is significant chance that after VIP failover APi service will stop responding.

How to reproduce:
1) set public_workers = 1 in keystone.conf (this is not necessary but makes debugging easier)
2) make sure VIP is on the same machine which will handle API request (we use "source" strategy in haproxy, so it's typically ctrl0 -> you can bump priority in keepalived.conf on ctrl0 and restart keepalived)
3) run "nova --debug list" - this request passes, important is that keystone creates DB connection and leaves it open
4) stop keepalived on ctrl0 - VIP is moved
5) run "nova --debug list" - this request times out on doing keystone auth

Here are details important bits from my reproducer + strace (file descriptor 7 in strace ouput is the opened connection to DB - socket 13484432):

[tripleo@ibm-x3550m3-15 tripleo]$ nova --debug list

[root@overcloud-controller0-5i7ountubwja ~]# ps auxf|grep keyst
root 4744 0.0 0.0 6188 2036 pts/0 S+ 11:06 0:00 | \_ grep --color=auto keyst
keystone 4507 2.4 1.2 56336 38192 ? Ss 11:05 0:01 /opt/stack/venvs/keystone/bin/python /opt/stack/venvs/keystone/bin/keystone-all --config-dir /etc/keystone

[root@overcloud-controller0-5i7ountubwja ~]# lsof -P -p 4507|grep 3306
keystone- 4507 keystone 7u IPv4 13484432 0t0 TCP overcloud-controller0-5i7ountubwja:55872->overcloud-controller0-5i7ountubwja:3306 (ESTABLISHED)

[root@overcloud-controller0-5i7ountubwja ~]# service keepalived stop

[tripleo@ibm-x3550m3-15 tripleo]$ nova --debug list

***** strace -p 4507 *****

{{EPOLLIN, {u32=5, u64=13803923081439215621}}}, 1023, 60000) = 1
epoll_ctl(4, EPOLL_CTL_DEL, 5, bf9162d0) = 0
accept(5, {sa_family=AF_INET, sin_port=htons(39460), sin_addr=inet_addr("192.0.2.32")}, [16]) = 8
fcntl64(8, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(8, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(8, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1412074933, 109700}, NULL) = 0
accept(5, 0xbf91658c, [16]) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=5, u64=13803923081439215621}}) = 0
gettimeofday({1412074933, 110286}, NULL) = 0
recv(8, "POST /v2.0/tokens HTTP/1.1\r\nHost"..., 8192, 0) = 352
getsockname(8, {sa_family=AF_INET, sin_port=htons(5000), sin_addr=inet_addr("192.0.2.30")}, [16]) = 0
gettimeofday({1412074933, 111258}, NULL) = 0
gettimeofday({1412074933, 117067}, NULL) = 0
poll([{fd=7, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(7, "$\1\0\0\3SELECT user.id AS user_id, "..., 296) = 296
read(7,

**** strace when keystone connects to VIP on different machine and this VIP is moved ****
[root@overcloud-controller0-5i7ountubwja ~]# strace -p 4507
Process 4507 attached
gettimeofday({1412075269, 212490}, NULL) = 0
epoll_wait(4, {{EPOLLIN, {u32=5, u64=13807746495685853189}}}, 1023, 60000) = 1
epoll_ctl(4, EPOLL_CTL_DEL, 5, bf9ef830) = 0
accept(5, {sa_family=AF_INET, sin_port=htons(57621), sin_addr=inet_addr("192.0.2.30")}, [16]) = 6
fcntl64(6, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl64(6, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1412075285, 975720}, NULL) = 0
accept(5, 0xbf9efaec, [16]) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=5, u64=13807746495685853189}}) = 0
gettimeofday({1412075285, 976194}, NULL) = 0
recv(6, "POST /v2.0/tokens HTTP/1.1\r\nHost"..., 8192, 0) = 352
getsockname(6, {sa_family=AF_INET, sin_port=htons(5000), sin_addr=inet_addr("192.0.2.30")}, [16]) = 0
gettimeofday({1412075285, 977137}, NULL) = 0
gettimeofday({1412075285, 983250}, NULL) = 0
poll([{fd=7, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(7, "$\1\0\0\3SELECT user.id AS user_id, "..., 296) = 296
read(7, 0xabe1b68, 16384) = -1 ECONNRESET (Connection reset by peer)
shutdown(7, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected)
close(7) = 0
gettimeofday({1412075285, 985898}, NULL) = 0
gettimeofday({1412075285, 986305}, NULL) = 0
gettimeofday({1412075285, 986557}, NULL) = 0
epoll_wait(4, {}, 1023, 0) = 0
gettimeofday({1412075285, 986960}, NULL) = 0
gettimeofday({1412075285, 987402}, NULL) = 0
stat64("/opt/stack/venvs/keystone/lib/python2.7/site-packages/keystone/common/wsgi.py", {st_mode=S_IFREG|0644, st_size=29175, ...}) = 0
open("/opt/stack/venvs/keystone/lib/python2.7/site-packages/keystone/common/wsgi.py", O_RDONLY|O_LARGEFILE) = 7
...

Changed in tripleo:
importance: Undecided → High
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I wonder if we can just get haproxy to shutdown all connections when the VIP moves away.

Changed in tripleo:
status: New → Triaged
Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

Currently we use infinite timeouts for mysql in haproxy, setting timeouts to e.g. 60s would close connections API<->haproxy connections after this timeout BUT if API uses this connection in first 60s then it hangs on reading from socket anyway. IOW 60s timeout just shrinks time window when you are able to get API service stuck (in production setup this timeout window would be hit most probably).

@Clint: yes, we can add a notify script to keepalived which restarts haproxy service in situation when VIP is moved from failed server.

Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

Update: restarting haproxy doesn't seem to solve the issue.

Changed in tripleo:
assignee: nobody → Jan Provaznik (jan-provaznik)
Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

So far the only way which seems to solve this issue is define explicit route for traffic going to VIP on local node:

# ip ro replace local virtual_ip dev lo src local_ip

This makes sure that when clients connect through VIP to other services, source address of the connection will be "local_ip" instead of "virtual_ip".

Revision history for this message
Jan Provaznik (jan-provaznik) wrote :

The problem is that if VIP runs on nodeX and any client on nodeX opens connection to VIP, then by default VIP is used also as source address of the created connection. For example if there is br-ex with following IPs:

5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether 00:28:6d:32:0e:18 brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.19/24 brd 192.0.2.255 scope global dynamic br-ex
       valid_lft 83888sec preferred_lft 83888sec
    inet 192.0.2.16/32 scope global br-ex
       valid_lft forever preferred_lft forever
    inet 192.0.2.17/32 scope global br-ex
       valid_lft forever preferred_lft forever

Where 192.0.2.16 and 192.0.2.17 are VIPs and 192.0.2.19 is used for talking to other nodes, then by default routing to VIP is:
# ip ro get 192.0.2.16
local 192.0.2.16 dev lo src 192.0.2.16
    cache <local>

And keystone's DB connections looks like:
keystone- 6155 keystone 8u IPv4 445414 0t0 TCP 192.0.2.16:48131->192.0.2.16:3306 (ESTABLISHED)

Then when VIP is moved from this node, system doesn't know how to route packets for moved VIP and connection stales.

A solution is to add explicit route (see prev comment) for VIP, then this is considered when systems chooses source address when creating new connections. Routing and connection then looks like:
# ip ro get 192.0.2.16
local 192.0.2.16 dev lo src 192.0.2.19
    cache <local>

a connection in lsof output:
cinder-sc 6225 cinder 8u IPv4 312695 0t0 TCP 192.0.2.19:41421->192.0.2.16:3306 (ESTABLISHED)

And then if VIP is moved, connection is closed as expected.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/135610

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/135610
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=815a3b3b77ed5fe47f9cad704235cbe12e8fe038
Submitter: Jenkins
Branch: master

commit 815a3b3b77ed5fe47f9cad704235cbe12e8fe038
Author: Jan Provaznik <email address hidden>
Date: Wed Nov 19 15:24:46 2014 +0100

    Set explicit route for VIPs

    When VIPs are moved around explicit route for each VIP is
    set to make sure that connections opened from local node to VIP
    don't use VIP as source address - if VIP is used as source address
    and is moved away then connections are stalled.

    This explicit routes are removed when VIPs are moved to another node.

    Closes-Bug: 1376200
    Change-Id: Ie5d55fbe33231519015b313b1f9210ac3614266e

Changed in tripleo:
status: In Progress → Fix Committed
Derek Higgins (derekh)
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.