I'm so sorry for this huge delay, I was overwhelmed by other things.
I spent some time on this issue today and here are my findings:
1. These "Aborted connection .* (Got timeout reading communication packets)" occurrences are just warnings about connections aborted by mariadb due to the wait_timeout. I see no reason how the could be harmful.
2. As you mentioned, increasing wait_timeout is only a workaround which may prevent these warnings but may lead to other issues. For example if haproxy fails over between galera nodes(due to network issues etc.), galera nodes will leave a lot of "stale" connections for a long time due to the high wait_timeout. In that case you may easily exceed connection limit in haproxy. I believe that was the main reason why we decided to decrease wait_timeout.
3. I tried to reproduce your issue, so basically I set `openstack_db_connection_recycle_time=60` (it also affects wait_timeout). Then I prepared a script which concurrently executes "list servers" API call 30 times, waits 65 seconds(to let mariadb abort the connections) and do the same again 4 more times.
I didn't notice any performance difference between first attempt and the rest.
Script I used: https://paste.openstack.org/show/bMlSdIiAfVXGkiyyjvI1/
Output: https://paste.openstack.org/show/bOVA9iBfSscvLk1YzUz2/
So maybe your problem is somewhere else? memcached? Maybe you reached max_overflow or some other limit? Hard to say with the provided information :/
Hi again!
I'm so sorry for this huge delay, I was overwhelmed by other things.
I spent some time on this issue today and here are my findings:
1. These "Aborted connection .* (Got timeout reading communication packets)" occurrences are just warnings about connections aborted by mariadb due to the wait_timeout. I see no reason how the could be harmful.
2. As you mentioned, increasing wait_timeout is only a workaround which may prevent these warnings but may lead to other issues. For example if haproxy fails over between galera nodes(due to network issues etc.), galera nodes will leave a lot of "stale" connections for a long time due to the high wait_timeout. In that case you may easily exceed connection limit in haproxy. I believe that was the main reason why we decided to decrease wait_timeout.
3. I tried to reproduce your issue, so basically I set `openstack_ db_connection_ recycle_ time=60` (it also affects wait_timeout). Then I prepared a script which concurrently executes "list servers" API call 30 times, waits 65 seconds(to let mariadb abort the connections) and do the same again 4 more times. /paste. openstack. org/show/ bMlSdIiAfVXGkiy yjvI1/ /paste. openstack. org/show/ bOVA9iBfSscvLk1 YzUz2/
I didn't notice any performance difference between first attempt and the rest.
Script I used: https:/
Output: https:/
So maybe your problem is somewhere else? memcached? Maybe you reached max_overflow or some other limit? Hard to say with the provided information :/