CONSTRAINT routerl3agentbindings failure during gate tests

Bug #1217998 reported by Clint Byrum
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
ZhiQiang Fan

Bug Description

http://logs.openstack.org/60/43760/3/check/gate-tempest-devstack-vm-neutron/dd1e380/logs/screen-q-svc.txt.gz

Shows a constraint failure, which is almost always evidence of some kind of race/poor error handling:

2013-08-27 18:45:06.141 29478 ERROR neutron.openstack.common.rpc.amqp [-] Exception during message handling
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp Traceback (most recent call last):
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/openstack/common/rpc/amqp.py", line 424, in _process_data
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp **args)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/common/rpc.py", line 44, in dispatch
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp neutron_ctxt, version, method, namespace, **kwargs)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/db/l3_rpc_base.py", line 47, in sync_routers
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp plugin.auto_schedule_routers(context, host, router_ids)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/db/agentschedulers_db.py", line 302, in auto_schedule_routers
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp self, context, host, router_ids)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/scheduler/l3_agent_scheduler.py", line 113, in auto_schedule_routers
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp context.session.add(binding)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 456, in __exit__
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp self.commit()
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 368, in commit
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp self._prepare_impl()
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 347, in _prepare_impl
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp self.session.flush()
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp File "/opt/stack/new/neutron/neutron/openstack/common/db/sqlalchemy/session.py", line 542, in _wrap
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp raise exception.DBError(e)
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp DBError: (IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`ovs_neutron`.`routerl3agentbindings`, CONSTRAINT `routerl3agentbindings_ibfk_2` FOREIGN KEY (`router_id`) REFERENCES `routers` (`id`) ON DELETE CASCADE)') 'INSERT INTO routerl3agentbindings (id, router_id, l3_agent_id) VALUES (%s, %s, %s)' ('c0771426-7e66-430e-beb9-4c3334e43039', '4dae479a-33a1-4d60-9a82-f0ee09cb9491', '039d8fe8-8993-4c9e-89a4-196dccb2878a')
2013-08-27 18:45:06.141 29478 TRACE neutron.openstack.common.rpc.amqp
2013-08-27 18:45:06.142 29478 ERROR neutron.openstack.common.rpc.common [-] Returning exception (IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`ovs_neutron`.`routerl3agentbindings`, CONSTRAINT `routerl3agentbindings_ibfk_2` FOREIGN KEY (`router_id`) REFERENCES `routers` (`id`) ON DELETE CASCADE)') 'INSERT INTO routerl3agentbindings (id, router_id, l3_agent_id) VALUES (%s, %s, %s)' ('c0771426-7e66-430e-beb9-4c3334e43039', '4dae479a-33a1-4d60-9a82-f0ee09cb9491', '039d8fe8-8993-4c9e-89a4-196dccb2878a') to caller
2013-08-27 18:45:06.142 29478 ERROR neutron.openstack.common.rpc.common [-] ['Traceback (most recent call last):\n', ' File "/opt/stack/new/neutron/neutron/openstack/common/rpc/amqp.py", line 424, in _process_data\n **args)\n', ' File "/opt/stack/new/neutron/neutron/common/rpc.py", line 44, in dispatch\n neutron_ctxt, version, method, namespace, **kwargs)\n', ' File "/opt/stack/new/neutron/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/opt/stack/new/neutron/neutron/db/l3_rpc_base.py", line 47, in sync_routers\n plugin.auto_schedule_routers(context, host, router_ids)\n', ' File "/opt/stack/new/neutron/neutron/db/agentschedulers_db.py", line 302, in auto_schedule_routers\n self, context, host, router_ids)\n', ' File "/opt/stack/new/neutron/neutron/scheduler/l3_agent_scheduler.py", line 113, in auto_schedule_routers\n context.session.add(binding)\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 456, in __exit__\n self.commit()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 368, in commit\n self._prepare_impl()\n', ' File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 347, in _prepare_impl\n self.session.flush()\n', ' File "/opt/stack/new/neutron/neutron/openstack/common/db/sqlalchemy/session.py", line 542, in _wrap\n raise exception.DBError(e)\n', "DBError: (IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (`ovs_neutron`.`routerl3agentbindings`, CONSTRAINT `routerl3agentbindings_ibfk_2` FOREIGN KEY (`router_id`) REFERENCES `routers` (`id`) ON DELETE CASCADE)') 'INSERT INTO routerl3agentbindings (id, router_id, l3_agent_id) VALUES (%s, %s, %s)' ('c0771426-7e66-430e-beb9-4c3334e43039', '4dae479a-33a1-4d60-9a82-f0ee09cb9491', '039d8fe8-8993-4c9e-89a4-196dccb2878a')\n"]

Full gate logs are here:

http://logs.openstack.org/60/43760/3/check/gate-tempest-devstack-vm-neutron/dd1e380/

Tags: l3-ipam-dhcp
ZhiQiang Fan (aji-zqfan)
Changed in neutron:
assignee: nobody → ZhiQiang Fan (aji-zqfan)
ZhiQiang Fan (aji-zqfan)
Changed in neutron:
status: New → Confirmed
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

introduced in I22e8d11b9676cbcfe9e72449031bb63071be8314

https://github.com/openstack/neutron/blob/master/neutron/scheduler/l3_agent_scheduler.py#L101
set(unscheduled_router_ids) should be set([r['id'] for r in routers]) because the unscheduled_router_ids may contain user/plugin inputted router_ids which may not exist, the local var routers are provided by l3_db, so it is safer (ignore race problem) than user's input.

fix patch will uploaded soon

Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :
Revision history for this message
yong sheng gong (gongysh) wrote :

So the 'in https://review.openstack.org/#/c/43558/ patch set 6' is due to the route_id is from 'user/plugin'?

Changed in neutron:
importance: Undecided → High
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote : Re: [Bug 1217998] Re: CONSTRAINT routerl3agentbindings failure during gate tests

actually i'm not quite sure, but:

http://logs.openstack.org/58/43558/6/check/gate-tempest-devstack-vm-neutron/ed83002/logs/screen-q-svc.txt.gz#_2013-08-28_19_56_46_810

the router 605e9368-1fc0-44c3-8ce0-39eb9c5ed7c3 is deleted and trigger
sync_routers(605e9368-1fc0-44c3-8ce0-39eb9c5ed7c3) ->
auto_schedule_routers(605e9368-1fc0-44c3-8ce0-39eb9c5ed7c3), bang

but i get confused: if delete a router will cause this exception, why we
not catch it yet? i can't figure out the root cause of neutron or tempest

meanwhile, when dive into the source code, i find that the
neutron.agent.l3_agent.L3NATAgent._sync_routers_task() will get router_ids
from self.conf.router_id if not self.conf.use_namespaces, is there any
guarantee for that router cannot be deleted? i can't find that

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/47218

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/47218
Committed: http://github.com/openstack/neutron/commit/3e1116eb0f1d94530707cd6ef4b37f17e9a13918
Submitter: Jenkins
Branch: master

commit 3e1116eb0f1d94530707cd6ef4b37f17e9a13918
Author: ZhiQiang Fan <email address hidden>
Date: Thu Sep 19 01:53:44 2013 +0800

    Ensure router exists when auto_schedule_routers

    Currently, the auto_schedule_routers() accepts parameter router_ids,
    which may contain invalid router ids, since we've already filtered
    them via plugin.get_routers(), we can directly use that safe object.

    Closes-Bug: #1217998
    Closes-Bug: #1210877

    Change-Id: I6196f16cca65fee4e848173d0a0a10fde967195d

Changed in neutron:
status: In Progress → Fix Committed
tags: added: l3-ipam-dhcp
Changed in neutron:
milestone: none → havana-rc1
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: havana-rc1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.