Connections to DB are refusing to die after VIP is switched

Bug #1917068 reported by Michal Arbet
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Michal Arbet
Train
New
Medium
Unassigned
Ussuri
Fix Committed
Medium
Unassigned
Victoria
Fix Committed
Medium
Unassigned
Wallaby
Fix Committed
Medium
Michal Arbet

Bug Description

Hi,

On production kolla-ansible installed ENV we found strange bahaviour when switching VIP between controllers (under load).
When VIP is switched from master to backup keepalived, connections to DB are dead on host where VIP was before switch (keystone wsgi workers are all busy and waiting for DB reply).

Test env:
- 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
- 2 Computes

How to reproduce:

1. Generate as big traffic as you can to replicate issue (curl token issue to keystone VIP:5000)
2. Check logs for keystone (there will be big amount of 201 on both controllers)
2. Restart keepalived OR restart networking OR ifup/ifdown interface on current keepalived master
   (VIP will be switched to secondary host)
3. Check logs for keystone
4. You can see that access log for keystone is freezed (on host where VIP was before), after while there will be 503,504

Why this is happening ?

Normally when master keepalived is not reachable, secondary keepalived take VIP and send GARP to network, and all clients will refresh ARP table, so everything should work.

Problem is that wsgi processes has connectionPool to DB and these connections are dead after switch, they don't know that ARP changed (probably host refused GARP because there is very tiny window when VIP was assigned to him).

So, wsgi processes are trying to write to filedescriptor/socket for DB connection, but waiting for reply infinite. Simply said these connection are totally dead, and app layer can't fix it, because app layer (oslo.db/sqlalchemy) don't know it's is broken.

Above problem is solved itself after some time -> this time depends on user's kernel option net.ipv4.tcp_retries2 which is saying how many retries are sent for this TCP connection before kernel will kill it. In my case it was around 930-940 seconds every time I tried it (default value of net.ipv4.tcp_retries2=15). Of course retransmission will not work as VIP is gone and hosted by another host/mac.

Decrease tcp_retries2 to 1 fixed issue immediately.

Here is detailed article about tcp socket which are refusing to die -> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/

RedHat is also suggesting to tune this kernel option for HA solutions as it is noted here -> https://access.redhat.com/solutions/726753

"In a High Availability (HA) situation consider decreasing the setting to 3." << From RedHat

Here is also video of issue (left controller0, right contoller1, bottom logs, middle VIP monitor switch)

https://download.kevko.ultimum.cloud/video_debug.mp4

I will provide fix and push for review.

Revision history for this message
Michal Arbet (michalarbet) wrote :
Changed in kolla-ansible:
status: New → In Progress
assignee: nobody → Michal Arbet (michalarbet)
Revision history for this message
Michal Arbet (michalarbet) wrote :
Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → Medium
Revision history for this message
Mark Goddard (mgoddard) wrote :

Thinking about the GARP. If the NIC was bounced, then it might not see the GARP from the new master. There do seem to be options to tune GARP transmission: https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically

Revision history for this message
Michal Arbet (michalarbet) wrote : Re: [Bug 1917068] Re: Connections to DB are refusing to die after VIP is switched
Download full text (3.9 KiB)

Well,

I've tested lot of keepalived configurations and nothing fixed issue.

What I proposed is standard way how this should be fixed ... Check subjects
I've attached...

Dne po 1. 3. 2021 12:20 uživatel Mark Goddard <email address hidden>
napsal:

> Thinking about the GARP. If the NIC was bounced, then it might not see
> the GARP from the new master. There do seem to be options to tune GARP
> transmission: https://serverfault.com/questions/821809/keepalived-send-
> gratuitous-arp-periodically
> <https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1917068
>
> Title:
> Connections to DB are refusing to die after VIP is switched
>
> Status in kolla-ansible:
> In Progress
> Status in kolla-ansible train series:
> New
> Status in kolla-ansible ussuri series:
> New
> Status in kolla-ansible victoria series:
> New
> Status in kolla-ansible wallaby series:
> In Progress
>
> Bug description:
> Hi,
>
> On production kolla-ansible installed ENV we found strange bahaviour
> when switching VIP between controllers (under load).
> When VIP is switched from master to backup keepalived, connections to DB
> are dead on host where VIP was before switch (keystone wsgi workers are all
> busy and waiting for DB reply).
>
>
> Test env:
> - 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
> - 2 Computes
>
> How to reproduce:
>
> 1. Generate as big traffic as you can to replicate issue (curl token
> issue to keystone VIP:5000)
> 2. Check logs for keystone (there will be big amount of 201 on both
> controllers)
> 2. Restart keepalived OR restart networking OR ifup/ifdown interface on
> current keepalived master
> (VIP will be switched to secondary host)
> 3. Check logs for keystone
> 4. You can see that access log for keystone is freezed (on host where
> VIP was before), after while there will be 503,504
>
> Why this is happening ?
>
> Normally when master keepalived is not reachable, secondary keepalived
> take VIP and send GARP to network, and all clients will refresh ARP
> table, so everything should work.
>
> Problem is that wsgi processes has connectionPool to DB and these
> connections are dead after switch, they don't know that ARP changed
> (probably host refused GARP because there is very tiny window when VIP
> was assigned to him).
>
> So, wsgi processes are trying to write to filedescriptor/socket for DB
> connection, but waiting for reply infinite. Simply said these
> connection are totally dead, and app layer can't fix it, because app
> layer (oslo.db/sqlalchemy) don't know it's is broken.
>
> Above problem is solved itself after some time -> this time depends on
> user's kernel option net.ipv4.tcp_retries2 which is saying how many
> retries are sent for this TCP connection before kernel will kill it.
> In my case it was around 930-940 seconds every time I tried it
> (default value of net.ipv4.tcp_retries2=15). Of course retransmission
> will not work as VIP is gone and hosted by another host/mac.
>
> ...

Read more...

Revision history for this message
Michal Arbet (michalarbet) wrote :
Download full text (3.8 KiB)

What I've forgot to say ..this is problem for connections to DB which are
ESTABLISHED in DB pool

Dne po 1. 3. 2021 12:20 uživatel Mark Goddard <email address hidden>
napsal:

> Thinking about the GARP. If the NIC was bounced, then it might not see
> the GARP from the new master. There do seem to be options to tune GARP
> transmission: https://serverfault.com/questions/821809/keepalived-send-
> gratuitous-arp-periodically
> <https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1917068
>
> Title:
> Connections to DB are refusing to die after VIP is switched
>
> Status in kolla-ansible:
> In Progress
> Status in kolla-ansible train series:
> New
> Status in kolla-ansible ussuri series:
> New
> Status in kolla-ansible victoria series:
> New
> Status in kolla-ansible wallaby series:
> In Progress
>
> Bug description:
> Hi,
>
> On production kolla-ansible installed ENV we found strange bahaviour
> when switching VIP between controllers (under load).
> When VIP is switched from master to backup keepalived, connections to DB
> are dead on host where VIP was before switch (keystone wsgi workers are all
> busy and waiting for DB reply).
>
>
> Test env:
> - 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
> - 2 Computes
>
> How to reproduce:
>
> 1. Generate as big traffic as you can to replicate issue (curl token
> issue to keystone VIP:5000)
> 2. Check logs for keystone (there will be big amount of 201 on both
> controllers)
> 2. Restart keepalived OR restart networking OR ifup/ifdown interface on
> current keepalived master
> (VIP will be switched to secondary host)
> 3. Check logs for keystone
> 4. You can see that access log for keystone is freezed (on host where
> VIP was before), after while there will be 503,504
>
> Why this is happening ?
>
> Normally when master keepalived is not reachable, secondary keepalived
> take VIP and send GARP to network, and all clients will refresh ARP
> table, so everything should work.
>
> Problem is that wsgi processes has connectionPool to DB and these
> connections are dead after switch, they don't know that ARP changed
> (probably host refused GARP because there is very tiny window when VIP
> was assigned to him).
>
> So, wsgi processes are trying to write to filedescriptor/socket for DB
> connection, but waiting for reply infinite. Simply said these
> connection are totally dead, and app layer can't fix it, because app
> layer (oslo.db/sqlalchemy) don't know it's is broken.
>
> Above problem is solved itself after some time -> this time depends on
> user's kernel option net.ipv4.tcp_retries2 which is saying how many
> retries are sent for this TCP connection before kernel will kill it.
> In my case it was around 930-940 seconds every time I tried it
> (default value of net.ipv4.tcp_retries2=15). Of course retransmission
> will not work as VIP is gone and hosted by another host/mac.
>
> Decrease tcp_retries2 to 1 fixed issue immediately.
>
> Here is detailed...

Read more...

Revision history for this message
Michal Arbet (michalarbet) wrote :
Download full text (4.2 KiB)

 It is also visible below that it is used as I said :

https://codesearch.opendev.org/?q=net.ipv4.tcp_retries2&i=nope&files=&excludeFiles=&repos=

I think discussion has to be about the option value, not about fix itself.

po 1. 3. 2021 v 12:51 odesílatel Michal Arbet <email address hidden>
napsal:

> What I've forgot to say ..this is problem for connections to DB which are
> ESTABLISHED in DB pool
>
> Dne po 1. 3. 2021 12:20 uživatel Mark Goddard <email address hidden>
> napsal:
>
>> Thinking about the GARP. If the NIC was bounced, then it might not see
>> the GARP from the new master. There do seem to be options to tune GARP
>> transmission: https://serverfault.com/questions/821809/keepalived-send-
>> gratuitous-arp-periodically
>> <https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically>
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1917068
>>
>> Title:
>> Connections to DB are refusing to die after VIP is switched
>>
>> Status in kolla-ansible:
>> In Progress
>> Status in kolla-ansible train series:
>> New
>> Status in kolla-ansible ussuri series:
>> New
>> Status in kolla-ansible victoria series:
>> New
>> Status in kolla-ansible wallaby series:
>> In Progress
>>
>> Bug description:
>> Hi,
>>
>> On production kolla-ansible installed ENV we found strange bahaviour
>> when switching VIP between controllers (under load).
>> When VIP is switched from master to backup keepalived, connections to
>> DB are dead on host where VIP was before switch (keystone wsgi workers are
>> all busy and waiting for DB reply).
>>
>>
>> Test env:
>> - 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
>> - 2 Computes
>>
>> How to reproduce:
>>
>> 1. Generate as big traffic as you can to replicate issue (curl token
>> issue to keystone VIP:5000)
>> 2. Check logs for keystone (there will be big amount of 201 on both
>> controllers)
>> 2. Restart keepalived OR restart networking OR ifup/ifdown interface on
>> current keepalived master
>> (VIP will be switched to secondary host)
>> 3. Check logs for keystone
>> 4. You can see that access log for keystone is freezed (on host where
>> VIP was before), after while there will be 503,504
>>
>> Why this is happening ?
>>
>> Normally when master keepalived is not reachable, secondary keepalived
>> take VIP and send GARP to network, and all clients will refresh ARP
>> table, so everything should work.
>>
>> Problem is that wsgi processes has connectionPool to DB and these
>> connections are dead after switch, they don't know that ARP changed
>> (probably host refused GARP because there is very tiny window when VIP
>> was assigned to him).
>>
>> So, wsgi processes are trying to write to filedescriptor/socket for DB
>> connection, but waiting for reply infinite. Simply said these
>> connection are totally dead, and app layer can't fix it, because app
>> layer (oslo.db/sqlalchemy) don't know it's is broken.
>>
>> Above problem is solved itself after some time -> this time depends on
>> user's kernel option net.ipv4.tcp_retries2 ...

Read more...

Revision history for this message
chalansonnet (schalans) wrote :

Hello,

Just tried on my environment :
Centos7.8 RDO deploy with Kolla-Ansible Stein release
2 network node with Haproxy and keepalived
3 Ctrl nodes with Keystone
3 BDD node with Rabbit & Galera

net.ipv4.tcp_retries2 default value :15 retries

Steps i followed :
Generate some loops requests to keystone .
Log onto the master keepalived with Vip public and private
Shutoff the keepalived container
=> VIP was recreated almost instant on the other network controler

Logs on Keystone
Lost access to the Vip Database <= maybe it is different configuration from you, but all services are connected under the Mariadb VIP
Requests to keystone was stuck during 180sec

Second test
net.ipv4.tcp_retries2 set to 10
Request to keystone was during 120sec

Third test
net.ipv4.tcp_retries2 set to 5
Request to keystone was during 60sec

So, you are right , we can tune the failover of the VIP Haproxy with this tuning !!
I will do another tests , for me 180sec to failover was acceptable.

Greeting
Stephane Chalansonnet

Revision history for this message
Michal Arbet (michalarbet) wrote :
Download full text (4.8 KiB)

Hi,

Well yes, thank you that you replicated same. I was investigating this
issue and how to fix longer then week :)

I already proposed patch, but I am planning change my review to have this
option configurable .. today or tomorrow.

Just want to say, for example k8s is very sensitive when keystone is
stucked ...

Michal

Dne po 1. 3. 2021 23:25 uživatel chalansonnet <email address hidden>
napsal:

> Hello,
>
> Just tried on my environment :
> Centos7.8 RDO deploy with Kolla-Ansible Stein release
> 2 network node with Haproxy and keepalived
> 3 Ctrl nodes with Keystone
> 3 BDD node with Rabbit & Galera
>
> net.ipv4.tcp_retries2 default value :15 retries
>
> Steps i followed :
> Generate some loops requests to keystone .
> Log onto the master keepalived with Vip public and private
> Shutoff the keepalived container
> => VIP was recreated almost instant on the other network controler
>
>
> Logs on Keystone
> Lost access to the Vip Database <= maybe it is different configuration
> from you, but all services are connected under the Mariadb VIP
> Requests to keystone was stuck during 180sec
>
> Second test
> net.ipv4.tcp_retries2 set to 10
> Request to keystone was during 120sec
>
> Third test
> net.ipv4.tcp_retries2 set to 5
> Request to keystone was during 60sec
>
> So, you are right , we can tune the failover of the VIP Haproxy with this
> tuning !!
> I will do another tests , for me 180sec to failover was acceptable.
>
> Greeting
> Stephane Chalansonnet
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1917068
>
> Title:
> Connections to DB are refusing to die after VIP is switched
>
> Status in kolla-ansible:
> In Progress
> Status in kolla-ansible train series:
> New
> Status in kolla-ansible ussuri series:
> New
> Status in kolla-ansible victoria series:
> New
> Status in kolla-ansible wallaby series:
> In Progress
>
> Bug description:
> Hi,
>
> On production kolla-ansible installed ENV we found strange bahaviour
> when switching VIP between controllers (under load).
> When VIP is switched from master to backup keepalived, connections to DB
> are dead on host where VIP was before switch (keystone wsgi workers are all
> busy and waiting for DB reply).
>
>
> Test env:
> - 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
> - 2 Computes
>
> How to reproduce:
>
> 1. Generate as big traffic as you can to replicate issue (curl token
> issue to keystone VIP:5000)
> 2. Check logs for keystone (there will be big amount of 201 on both
> controllers)
> 2. Restart keepalived OR restart networking OR ifup/ifdown interface on
> current keepalived master
> (VIP will be switched to secondary host)
> 3. Check logs for keystone
> 4. You can see that access log for keystone is freezed (on host where
> VIP was before), after while there will be 503,504
>
> Why this is happening ?
>
> Normally when master keepalived is not reachable, secondary keepalived
> take VIP and send GARP to network, and all clients will refresh ARP
> table, so everything should work.
>
> Problem is that wsgi processes has connectionPool to DB ...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/777772
Committed: https://opendev.org/openstack/kolla-ansible/commit/09d0409ed4a69f514355925754e832e752f817ca
Submitter: "Zuul (22348)"
Branch: master

commit 09d0409ed4a69f514355925754e832e752f817ca
Author: Michal Arbet <email address hidden>
Date: Fri Feb 26 17:50:31 2021 +0100

    Allow user to set sysctl_net_ipv4_tcp_retries2

    This patch is adding configuration option to
    manipulate with kernel option sysctl_net_ipv4_tcp_retries2.

    More informations about kernel option in [1][2]
    and RedHat suggestion [3] to set for DBs and HA.

    [1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
    [2]: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
    [3]: https://access.redhat.com/solutions/726753

    Closes-Bug: #1917068
    Change-Id: Ia0decbbfa4e33b1889b635f8bb1c9094567a2ce6

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/798096

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/798096
Committed: https://opendev.org/openstack/kolla-ansible/commit/8521ddca289cac3a585f3a289be3111209bd1c02
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 8521ddca289cac3a585f3a289be3111209bd1c02
Author: Michal Arbet <email address hidden>
Date: Fri Feb 26 17:50:31 2021 +0100

    Allow user to set sysctl_net_ipv4_tcp_retries2

    This patch is adding configuration option to
    manipulate with kernel option sysctl_net_ipv4_tcp_retries2.

    More informations about kernel option in [1][2]
    and RedHat suggestion [3] to set for DBs and HA.

    [1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
    [2]: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
    [3]: https://access.redhat.com/solutions/726753

    Closes-Bug: #1917068
    Change-Id: Ia0decbbfa4e33b1889b635f8bb1c9094567a2ce6
    (cherry picked from commit 09d0409ed4a69f514355925754e832e752f817ca)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/799242

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/799243

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/799242
Committed: https://opendev.org/openstack/kolla-ansible/commit/d61340ba35a9bee019b9b22420f431f042b29cf2
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit d61340ba35a9bee019b9b22420f431f042b29cf2
Author: Michal Arbet <email address hidden>
Date: Fri Feb 26 17:50:31 2021 +0100

    Allow user to set sysctl_net_ipv4_tcp_retries2

    This patch is adding configuration option to
    manipulate with kernel option sysctl_net_ipv4_tcp_retries2.

    More informations about kernel option in [1][2]
    and RedHat suggestion [3] to set for DBs and HA.

    [1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
    [2]: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
    [3]: https://access.redhat.com/solutions/726753

    Closes-Bug: #1917068
    Change-Id: Ia0decbbfa4e33b1889b635f8bb1c9094567a2ce6
    (cherry picked from commit 09d0409ed4a69f514355925754e832e752f817ca)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/799243
Committed: https://opendev.org/openstack/kolla-ansible/commit/2acd4f711467bafa6f7e6deddd966910be61a993
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 2acd4f711467bafa6f7e6deddd966910be61a993
Author: Michal Arbet <email address hidden>
Date: Fri Feb 26 17:50:31 2021 +0100

    Allow user to set sysctl_net_ipv4_tcp_retries2

    This patch is adding configuration option to
    manipulate with kernel option sysctl_net_ipv4_tcp_retries2.

    More informations about kernel option in [1][2]
    and RedHat suggestion [3] to set for DBs and HA.

    [1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
    [2]: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
    [3]: https://access.redhat.com/solutions/726753

    Closes-Bug: #1917068
    Change-Id: Ia0decbbfa4e33b1889b635f8bb1c9094567a2ce6
    (cherry picked from commit 09d0409ed4a69f514355925754e832e752f817ca)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 11.1.0

This issue was fixed in the openstack/kolla-ansible 11.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 12.1.0

This issue was fixed in the openstack/kolla-ansible 12.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 10.3.0

This issue was fixed in the openstack/kolla-ansible 10.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 13.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 13.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.