kolla-ansible

Bug #1917068
Comment #6

Comment 6 for bug 1917068

Revision history for this message

Michal Arbet (michalarbet) wrote on 2021-03-01: Re: [Bug 1917068] Re: Connections to DB are refusing to die after VIP is switched

It is also visible below that it is used as I said :

https://codesearch.opendev.org/?q=net.ipv4.tcp_retries2&i=nope&files=&excludeFiles=&repos=

I think discussion has to be about the option value, not about fix itself.

po 1. 3. 2021 v 12:51 odesílatel Michal Arbet <email address hidden>
napsal:

> What I've forgot to say ..this is problem for connections to DB which are
> ESTABLISHED in DB pool
>
> Dne po 1. 3. 2021 12:20 uživatel Mark Goddard <email address hidden>
> napsal:
>
>> Thinking about the GARP. If the NIC was bounced, then it might not see
>> the GARP from the new master. There do seem to be options to tune GARP
>> transmission: https://serverfault.com/questions/821809/keepalived-send-
>> gratuitous-arp-periodically
>> <https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically>
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1917068
>>
>> Title:
>> Connections to DB are refusing to die after VIP is switched
>>
>> Status in kolla-ansible:
>> In Progress
>> Status in kolla-ansible train series:
>> New
>> Status in kolla-ansible ussuri series:
>> New
>> Status in kolla-ansible victoria series:
>> New
>> Status in kolla-ansible wallaby series:
>> In Progress
>>
>> Bug description:
>> Hi,
>>
>> On production kolla-ansible installed ENV we found strange bahaviour
>> when switching VIP between controllers (under load).
>> When VIP is switched from master to backup keepalived, connections to
>> DB are dead on host where VIP was before switch (keystone wsgi workers are
>> all busy and waiting for DB reply).
>>
>>
>> Test env:
>> - 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
>> - 2 Computes
>>
>> How to reproduce:
>>
>> 1. Generate as big traffic as you can to replicate issue (curl token
>> issue to keystone VIP:5000)
>> 2. Check logs for keystone (there will be big amount of 201 on both
>> controllers)
>> 2. Restart keepalived OR restart networking OR ifup/ifdown interface on
>> current keepalived master
>> (VIP will be switched to secondary host)
>> 3. Check logs for keystone
>> 4. You can see that access log for keystone is freezed (on host where
>> VIP was before), after while there will be 503,504
>>
>> Why this is happening ?
>>
>> Normally when master keepalived is not reachable, secondary keepalived
>> take VIP and send GARP to network, and all clients will refresh ARP
>> table, so everything should work.
>>
>> Problem is that wsgi processes has connectionPool to DB and these
>> connections are dead after switch, they don't know that ARP changed
>> (probably host refused GARP because there is very tiny window when VIP
>> was assigned to him).
>>
>> So, wsgi processes are trying to write to filedescriptor/socket for DB
>> connection, but waiting for reply infinite. Simply said these
>> connection are totally dead, and app layer can't fix it, because app
>> layer (oslo.db/sqlalchemy) don't know it's is broken.
>>
>> Above problem is solved itself after some time -> this time depends on
>> user's kernel option net.ipv4.tcp_retries2 which is saying how many
>> retries are sent for this TCP connection before kernel will kill it.
>> In my case it was around 930-940 seconds every time I tried it
>> (default value of net.ipv4.tcp_retries2=15). Of course retransmission
>> will not work as VIP is gone and hosted by another host/mac.
>>
>> Decrease tcp_retries2 to 1 fixed issue immediately.
>>
>> Here is detailed article about tcp socket which are refusing to die ->
>> https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>
>> RedHat is also suggesting to tune this kernel option for HA solutions
>> as it is noted here -> https://access.redhat.com/solutions/726753
>>
>> "In a High Availability (HA) situation consider decreasing the setting
>> to 3." << From RedHat
>>
>>
>> Here is also video of issue (left controller0, right contoller1, bottom
>> logs, middle VIP monitor switch)
>>
>> https://download.kevko.ultimum.cloud/video_debug.mp4
>>
>> I will provide fix and push for review.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/kolla-ansible/+bug/1917068/+subscriptions
>>
>

It is also visible below that it is used as I said :

https://codesearch.opendev.org/?q=net.ipv4.tcp_retries2&i=nope&files=&excludeFiles=&repos=

I think discussion has to be about the option value, not about fix itself.

po 1. 3. 2021 v 12:51 odesílatel Michal Arbet <michal.arbet@ultimum.io>
napsal:

> What I've forgot to say ..this is problem for connections to DB which are
> ESTABLISHED in DB pool
>
> Dne po 1. 3. 2021 12:20 uživatel Mark Goddard <1917068@bugs.launchpad.net>
> napsal:
>
>> Thinking about the GARP. If the NIC was bounced, then it might not see
>> the GARP from the new master. There do seem to be options to tune GARP
>> transmission: https://serverfault.com/questions/821809/keepalived-send-
>> gratuitous-arp-periodically
>> <https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically>
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1917068
>>
>> Title:
>>   Connections to DB are refusing to die after VIP is switched
>>
>> Status in kolla-ansible:
>>   In Progress
>> Status in kolla-ansible train series:
>>   New
>> Status in kolla-ansible ussuri series:
>>   New
>> Status in kolla-ansible victoria series:
>>   New
>> Status in kolla-ansible wallaby series:
>>   In Progress
>>
>> Bug description:
>>   Hi,
>>
>>   On production kolla-ansible installed ENV we found strange bahaviour
>> when switching VIP between controllers (under load).
>>   When VIP is switched from master to backup keepalived, connections to
>> DB are dead on host where VIP was before switch (keystone wsgi workers are
>> all busy and waiting for DB reply).
>>
>>
>>   Test env:
>>   - 2 Controllers - Haproxy, keepalived, OS services, DB ..etc
>>   - 2 Computes
>>
>>   How to reproduce:
>>
>>   1. Generate as big traffic as you can to replicate issue (curl  token
>> issue to keystone VIP:5000)
>>   2. Check logs for keystone (there will be big amount of 201 on both
>> controllers)
>>   2. Restart keepalived OR restart networking OR ifup/ifdown interface on
>> current keepalived master
>>      (VIP will be switched to secondary host)
>>   3. Check logs for keystone
>>   4. You can see that access log for keystone is freezed (on host where
>> VIP was before), after while there will be 503,504
>>
>>   Why this is happening ?
>>
>>   Normally when master keepalived is not reachable, secondary keepalived
>>   take VIP and send GARP to network, and all clients will refresh ARP
>>   table, so everything should work.
>>
>>   Problem is that wsgi processes has connectionPool to DB and these
>>   connections are dead after switch, they don't know that ARP changed
>>   (probably host refused GARP because there is very tiny window when VIP
>>   was assigned to him).
>>
>>   So, wsgi processes are trying to write to filedescriptor/socket for DB
>>   connection, but waiting for reply infinite. Simply said these
>>   connection are totally dead, and app layer can't fix it, because app
>>   layer (oslo.db/sqlalchemy) don't know it's is broken.
>>
>>   Above problem is solved itself after some time -> this time depends on
>>   user's kernel option net.ipv4.tcp_retries2 which is saying how many
>>   retries are sent for this TCP connection before kernel will kill it.
>>   In my case it was around 930-940 seconds every time I tried it
>>   (default value of net.ipv4.tcp_retries2=15). Of course retransmission
>>   will not work as VIP is gone and hosted by another host/mac.
>>
>>   Decrease tcp_retries2 to 1 fixed issue immediately.
>>
>>   Here is detailed article about tcp socket which are refusing to die ->
>>   https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
>>
>>   RedHat is also suggesting to tune this kernel option for HA solutions
>>   as it is noted here -> https://access.redhat.com/solutions/726753
>>
>>   "In a High Availability (HA) situation consider decreasing the setting
>>   to 3." << From RedHat
>>
>>
>>   Here is also video of issue (left controller0, right contoller1, bottom
>> logs, middle VIP monitor switch)
>>
>>   https://download.kevko.ultimum.cloud/video_debug.mp4
>>
>>   I will provide fix and push for review.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/kolla-ansible/+bug/1917068/+subscriptions
>>
>