OpenStack-Ansible

Bug #1981463
Activity log

Activity log for bug #1981463

Date	Who	What changed	Old value	New value	Message
2022-07-12 15:51:48	Alexander Binzxxxxxx	bug			added bug
2022-07-12 15:52:47	Alexander Binzxxxxxx	description	setup: using xena and pretty much default settings. so openstack_db_connection_recycle_time is 600 and galera_wait_timeout as well while timeout in haproxy for galera frontend/backend is 5000s symptom: seeing galera connection aborts reported in haproxy in ERSP column. In the mariadb log I get lines like: "Aborted connection 594171 to db: 'placement' user: 'placement' host: 'hostA.mydomain.com' (Got timeout reading communication packets)" Also aborted connections counter is rising in mariadb. Such errors cause retries on openstack side causing things to go slow from time to time. expectation: not getting those kind of errors some analysis: maria db is actually dropping the connections at wait_timeout (=galera_wait_timeout=600) due to connection beeing idle for a long time. oslo.db config used in basically all openstack services is doing some connection pooling and is configured (e.g. in placement) with the following values (all default): max_overflow = 50 max_pool_size = 5 pool_timeout = 30 connection_recycle_time = 600 So it should actually close connections and re-establish them before the timeout. also haproxy using timeouts with 5000s in frontend and backend should not matter here. not a solution: increasing the wait_timeout in mariadb to 1200 or 3600. (workaround) solution but may not be a good one: increasing the wait_timeout in mariadb to 7200. I am not sure where the issue is actually comming from but here are my best guesses: * there is a bug in openstack end not setting the config values in lower layer library * there is some bug in the sql db facing lib code causing pooling and refresh not to work properly. * the timeout in mariadb must be higher then in oslo.db * haproxy may still cause some issue here and the 5000s may be part of that. impact: mostly annoying errors causing retries and slowing things down without any big impact. so i consider this a minor bug	setup: using xena and pretty much default settings. so openstack_db_connection_recycle_time is 600 and galera_wait_timeout as well while timeout in haproxy for galera frontend/backend is 5000s symptom: seeing galera connection aborts reported in haproxy in ERSP column. In the mariadb log I get lines like: "Aborted connection 594171 to db: 'placement' user: 'placement' host: 'hostA.mydomain.com' (Got timeout reading communication packets)" Also aborted connections counter is rising in mariadb. Such errors cause retries on openstack side causing things to go slow from time to time. Of course after wait_timeout period in idle. expectation: not getting those kind of errors some analysis: maria db is actually dropping the connections at wait_timeout (=galera_wait_timeout=600) due to connection beeing idle for a long time. oslo.db config used in basically all openstack services is doing some connection pooling and is configured (e.g. in placement) with the following values (all default): max_overflow = 50 max_pool_size = 5 pool_timeout = 30 connection_recycle_time = 600 So it should actually close connections and re-establish them before the timeout. also haproxy using timeouts with 5000s in frontend and backend should not matter here. not a solution: increasing the wait_timeout in mariadb to 1200 or 3600. (workaround) solution but may not be a good one: increasing the wait_timeout in mariadb to 7200. I am not sure where the issue is actually comming from but here are my best guesses: * there is a bug in openstack end not setting the config values in lower layer library * there is some bug in the sql db facing lib code causing pooling and refresh not to work properly. * the timeout in mariadb must be higher then in oslo.db * haproxy may still cause some issue here and the 5000s may be part of that. impact: mostly annoying errors causing retries and slowing things down without any big impact. so i consider this a minor bug
2022-07-19 18:38:07	Dmitriy Rabotyagov	openstack-ansible: assignee		Damian Dąbrowski (damiandabrowski)
2022-10-21 18:27:01	Damian Dąbrowski	openstack-ansible: status	New	Incomplete