NSX: neutron sync backend race condition

Bug #1354226 reported by Aaron Rosen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Medium
Unassigned

Bug Description

If you run with multiple_api workers there is a race condition in the nsx sync backend as we start reading from the database before it's connected if running with 1 worker this doesn't occur as things don't occur concurrently.

-07 16:47:44.311 ERROR neutron.openstack.common.loopingcall [req-fa4f72be-3b10-49ef-a9a6-0204c1af1920 None None] in dynamic looping call
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall Traceback (most recent call last):
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/opt/stack/neutron/neutron/openstack/common/loopingcall.py", line 123, in _inner
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall idle = self.f(*self.args, **self.kw)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/opt/stack/neutron/neutron/plugins/vmware/common/sync.py", line 670, in _synchronize_state
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall scan_missing=scan_missing)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/opt/stack/neutron/neutron/plugins/vmware/common/sync.py", line 344, in _synchronize_lswitches
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall filters=filters)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/opt/stack/neutron/neutron/db/common_db_mixin.py", line 176, in _get_collection
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall items = [dict_func(c, fields) for c in query]
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2353, in __iter__
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall return self._execute_and_instances(context)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2368, in _execute_and_instances
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall result = conn.execute(querycontext.statement, self._params)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in execute
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall params)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall compiled_sql, distilled_params
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall context)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo/db/sqlalchemy/compat/handle_error.py", line 123, in _handle_dbapi_exception
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall six.reraise(type(newraise), newraise, sys.exc_info()[2])
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo/db/sqlalchemy/compat/handle_error.py", line 100, in _handle_dbapi_exception
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall per_fn = fn(ctx)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo/db/sqlalchemy/exc_filters.py", line 319, in handler
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall context.is_disconnect)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall File "/usr/local/lib/python2.7/dist-packages/oslo/db/sqlalchemy/exc_filters.py", line 260, in _is_db_connection_error
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall raise exception.DBConnectionError(operational_error)
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall DBConnectionError: (OperationalError) (2006, 'MySQL server has gone away') 'SELECT networks.tenant_id AS networks_tenant_id, networks.id AS networks_id, networks.name AS networks_name, networks.status AS networks_status, networks.admin_state_up AS networks_admin_state_up, networks.shared AS networks_shared, ipavailabilityranges_1.allocation_pool_id AS ipavailabilityranges_1_allocation_pool_id, ipavailabilityranges_1.first_ip AS ipavailabilityranges_1_first_ip, ipavailabilityranges_1.last_ip AS ipavailabilityranges_1_last_ip, ipallocationpools_1.id AS ipallocationpools_1_id, ipallocationpools_1.subnet_id AS ipallocationpools_1_subnet_id, ipallocationpools_1.first_ip AS ipallocationpools_1_first_ip, ipallocationpools_1.last_ip AS ipallocationpools_1_last_ip, subnets_1.tenant_id AS subnets_1_tenant_id, subnets_1.id AS subnets_1_id, subnets_1.name AS subnets_1_name, subnets_1.network_id AS subnets_1_network_id, subnets_1.ip_version AS subnets_1_ip_version, subnets_1.cidr AS subnets_1_cidr, subnets_1.gateway_ip AS subnets_1_gateway_ip, subnets_1.enable_dhcp AS subnets_1_enable_dhcp, subnets_1.shared AS subnets_1_shared, subnets_1.ipv6_ra_mode AS subnets_1_ipv6_ra_mode, subnets_1.ipv6_address_mode AS subnets_1_ipv6_address_mode, networksecuritybindings_1.network_id AS networksecuritybindings_1_network_id, networksecuritybindings_1.port_security_enabled AS networksecuritybindings_1_port_security_enabled, networkqueuemappings_1.network_id AS networkqueuemappings_1_network_id, networkqueuemappings_1.queue_id AS networkqueuemappings_1_queue_id, externalnetworks_1.network_id AS externalnetworks_1_network_id \nFROM networks LEFT OUTER JOIN externalnetworks ON networks.id = externalnetworks.network_id LEFT OUTER JOIN subnets AS subnets_1 ON networks.id = subnets_1.network_id LEFT OUTER JOIN ipallocationpools AS ipallocationpools_1 ON subnets_1.id = ipallocationpools_1.subnet_id LEFT OUTER JOIN ipavailabilityranges AS ipavailabilityranges_1 ON ipallocationpools_1.id = ipavailabilityranges_1.allocation_pool_id LEFT OUTER JOIN networksecuritybindings AS networksecuritybindings_1 ON networks.id = networksecuritybindings_1.network_id LEFT OUTER JOIN networkqueuemappings AS networkqueuemappings_1 ON networks.id = networkqueuemappings_1.network_id LEFT OUTER JOIN externalnetworks AS externalnetworks_1 ON networks.id = externalnetworks_1.network_id \nWHERE externalnetworks.network_id IS NULL' ()
2014-08-07 16:47:44.311 TRACE neutron.openstack.common.loopingcall

Tags: db
Aaron Rosen (arosen)
Changed in neutron:
assignee: nobody → Aaron Rosen (arosen)
Changed in neutron:
importance: Undecided → High
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

That looks like a one case of a whole class of problems that vendor drivers might experience.
I'd be interested to see how you fix this.

Changed in neutron:
status: New → Confirmed
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

After some thought: usually 'MySQL server has gone away' means mysql was there, but connection was lost (broken or closed)
Is it a case here?

Changed in neutron:
importance: High → Medium
status: Confirmed → Incomplete
tags: added: db
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 172 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
assignee: Aaron Rosen (arosen) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.