OpenStack-Ansible

Bug #1981463
Comment #3

Comment 3 for bug 1981463

Revision history for this message

Damian Dąbrowski (damiandabrowski) wrote on 2022-10-21:

Hi again!

I'm so sorry for this huge delay, I was overwhelmed by other things.

I spent some time on this issue today and here are my findings:

1. These "Aborted connection .* (Got timeout reading communication packets)" occurrences are just warnings about connections aborted by mariadb due to the wait_timeout. I see no reason how the could be harmful.

2. As you mentioned, increasing wait_timeout is only a workaround which may prevent these warnings but may lead to other issues. For example if haproxy fails over between galera nodes(due to network issues etc.), galera nodes will leave a lot of "stale" connections for a long time due to the high wait_timeout. In that case you may easily exceed connection limit in haproxy. I believe that was the main reason why we decided to decrease wait_timeout.

3. I tried to reproduce your issue, so basically I set `openstack_db_connection_recycle_time=60` (it also affects wait_timeout). Then I prepared a script which concurrently executes "list servers" API call 30 times, waits 65 seconds(to let mariadb abort the connections) and do the same again 4 more times.
I didn't notice any performance difference between first attempt and the rest.
Script I used: https://paste.openstack.org/show/bMlSdIiAfVXGkiyyjvI1/
Output: https://paste.openstack.org/show/bOVA9iBfSscvLk1YzUz2/

So maybe your problem is somewhere else? memcached? Maybe you reached max_overflow or some other limit? Hard to say with the provided information :/