Activity log for bug #1981463

Date Who What changed Old value New value Message
2022-07-12 15:51:48 Alexander Binzxxxxxx bug added bug
2022-07-12 15:52:47 Alexander Binzxxxxxx description setup: using xena and pretty much default settings. so openstack_db_connection_recycle_time is 600 and galera_wait_timeout as well while timeout in haproxy for galera frontend/backend is 5000s symptom: seeing galera connection aborts reported in haproxy in ERSP column. In the mariadb log I get lines like: "Aborted connection 594171 to db: 'placement' user: 'placement' host: 'hostA.mydomain.com' (Got timeout reading communication packets)" Also aborted connections counter is rising in mariadb. Such errors cause retries on openstack side causing things to go slow from time to time. expectation: not getting those kind of errors some analysis: maria db is actually dropping the connections at wait_timeout (=galera_wait_timeout=600) due to connection beeing idle for a long time. oslo.db config used in basically all openstack services is doing some connection pooling and is configured (e.g. in placement) with the following values (all default): max_overflow = 50 max_pool_size = 5 pool_timeout = 30 connection_recycle_time = 600 So it should actually close connections and re-establish them before the timeout. also haproxy using timeouts with 5000s in frontend and backend should not matter here. not a solution: increasing the wait_timeout in mariadb to 1200 or 3600. (workaround) solution but may not be a good one: increasing the wait_timeout in mariadb to 7200. I am not sure where the issue is actually comming from but here are my best guesses: * there is a bug in openstack end not setting the config values in lower layer library * there is some bug in the sql db facing lib code causing pooling and refresh not to work properly. * the timeout in mariadb must be higher then in oslo.db * haproxy may still cause some issue here and the 5000s may be part of that. impact: mostly annoying errors causing retries and slowing things down without any big impact. so i consider this a minor bug setup: using xena and pretty much default settings. so openstack_db_connection_recycle_time is 600 and galera_wait_timeout as well while timeout in haproxy for galera frontend/backend is 5000s symptom: seeing galera connection aborts reported in haproxy in ERSP column. In the mariadb log I get lines like: "Aborted connection 594171 to db: 'placement' user: 'placement' host: 'hostA.mydomain.com' (Got timeout reading communication packets)" Also aborted connections counter is rising in mariadb. Such errors cause retries on openstack side causing things to go slow from time to time. Of course after wait_timeout period in idle. expectation: not getting those kind of errors some analysis: maria db is actually dropping the connections at wait_timeout (=galera_wait_timeout=600) due to connection beeing idle for a long time. oslo.db config used in basically all openstack services is doing some connection pooling and is configured (e.g. in placement) with the following values (all default): max_overflow = 50 max_pool_size = 5 pool_timeout = 30 connection_recycle_time = 600 So it should actually close connections and re-establish them before the timeout. also haproxy using timeouts with 5000s in frontend and backend should not matter here. not a solution: increasing the wait_timeout in mariadb to 1200 or 3600. (workaround) solution but may not be a good one: increasing the wait_timeout in mariadb to 7200. I am not sure where the issue is actually comming from but here are my best guesses: * there is a bug in openstack end not setting the config values in lower layer library * there is some bug in the sql db facing lib code causing pooling and refresh not to work properly. * the timeout in mariadb must be higher then in oslo.db * haproxy may still cause some issue here and the 5000s may be part of that. impact: mostly annoying errors causing retries and slowing things down without any big impact. so i consider this a minor bug
2022-07-19 18:38:07 Dmitriy Rabotyagov openstack-ansible: assignee Damian DÄ…browski (damiandabrowski)
2022-10-21 18:27:01 Damian DÄ…browski openstack-ansible: status New Incomplete