Neutron DBDeadlocks a ridiculous amount in successful CI runs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Critical
|
Kevin Benton |
Bug Description
This came up in the -qa channel when trying to figure out why a neutron test failed and there is a big fat DBDeadlock in the q-svc logs:
We find that this shows up a ton in a 7 day check/gate run:
498 hits in 7 days, check and gate.
The interesting thing is that 85% of those are successful runs.
Like this was a successful run where the DBDeadlock shows up:
This is a serviceability / QA issue for anyone trying to deploy neutron at scale - when things go back, how is an operator supposed to be able to cut through the noise in the logs to determine what's actually a real failure and what can be ignored?
If these DBDeadlocks are just getting retried with a retry decorator, there should be a way to only trace when we fail and raise up the DBDeadlock error, we shouldn't be logging each time. For example, if we DBDeadlock and retry and then it's OK, don't trace that first DB error. If we retry like 5 times and eventually punt, then trace the error.
Changed in neutron: | |
status: | Fix Committed → Fix Released |
Changed in neutron: | |
milestone: | liberty-rc1 → 7.0.0 |
Example trace:
2015-09-11 16:36:08.026 ERROR oslo_db.api [req-545d550d- a494-405b- bbc1-561d2eb918 ba admin admin] DB error. lib/python2. 7/dist- packages/ oslo_db/ api.py" , line 136, in wrapper new/neutron/ neutron/ api/v2/ base.py" , line 491, in create request. context, **kwargs) new/neutron/ neutron/ db/l3_hamode_ db.py", line 386, in create_router router( context, router) new/neutron/ neutron/ db/l3_db. py", line 186, in create_router router( context, router_db.id) lib/python2. 7/dist- packages/ oslo_utils/ excutils. py", line 195, in __exit__ self.type_ , self.value, self.tb) new/neutron/ neutron/ db/l3_db. py", line 181, in create_router new/neutron/ neutron/ db/l3_gwmode_ db.py", line 70, in _update_ router_ gw_info new/neutron/ neutron/ db/l3_db. py", line 402, in _update_ router_ gw_info new/neutron/ neutron/ db/l3_dvr_ db.py", line 180, in _create_gw_port new/neutron/ neutron/ db/l3_db. py", line 378, in _create_gw_port router_ gw_port( context, router, new_network, ext_ips) new/neutron/ neutron/ db/l3_db. py", line 296, in _create_ router_ gw_port new/neutron/ neutron/ plugins/ common/ utils.py" , line 140, in create_port create_ port(context, {'port': port_data}) lib/python2. 7/dist- packages/ oslo_db/ api.py" , line 146, in wrapper lib/python2. 7/dist- packages/ oslo_.. .
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api Traceback (most recent call last):
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/usr/local/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api return f(*args, **kwargs)
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api obj = obj_creator(
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api self).create_
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api self.delete_
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/usr/local/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api six.reraise(
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api gw_info, router=router_db)
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api context, router_id, info, router=router)
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api ext_ips)
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api ext_ips)
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api self._create_
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api context.elevated(), {'port': port_data})
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/opt/stack/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api return core_plugin.
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/usr/local/
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api ectxt.value = e.inner_exc
2015-09-11 16:36:08.026 26473 ERROR oslo_db.api File "/usr/local/