Race condition with multiple neutron-servers can allow a router to be scheduled twice

Bug #1230323 reported by Ed Bak
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Li Ma

Bug Description

In an environment with multiple neutron-servers, I have observed that a router can get scheduled to an l3-agent more than once. A "neutron l3-agent-list-hosting-router <router id>" will show the router scheduled twice to the same l3-agent or perhaps to two different agents. This can be reproduced using devstack. A second neutron-server on another host has to be configured. Executing a script against each of the neutron-servers which adds (neutron l3-agent-router-add) and removes (neutron l3-agent-router-remove) a router from an l3 agent is the quickest way to reproduce the race condition. There is no locking or other coordination across multiple neutron-server processes to prevent this.

Ed Bak (ed-bak2)
Changed in neutron:
assignee: nobody → Ed Bak (ed-bak2)
Ed Bak (ed-bak2)
Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/48357

Ed Bak (ed-bak2)
Changed in neutron:
assignee: Ed Bak (ed-bak2) → nobody
Tom Fifield (fifieldt)
Changed in neutron:
status: In Progress → Confirmed
Changed in neutron:
importance: Undecided → High
Revision history for this message
Li Ma (nick-ma-z) wrote :

I know it is a race condition problem. IMO, it seems weird to launch two or more neutron-servers in the one Openstack deployment and operate both of them. Load balancing on those processes does make sense.

Changed in neutron:
assignee: nobody → Li Ma (nick-ma-z)
Li Ma (nick-ma-z)
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/73234

Changed in neutron:
milestone: none → juno-1
Revision history for this message
Li Ma (nick-ma-z) wrote :

Submitted a new bug to describe the second case.
https://bugs.launchpad.net/neutron/+bug/1308302

Kyle Mestery (mestery)
Changed in neutron:
milestone: juno-1 → juno-2
tags: added: l3-ipam-dhcp
Kyle Mestery (mestery)
Changed in neutron:
milestone: juno-2 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/73234
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fbc6b991a79a147bf1411b643f5c304062e5956a
Submitter: Jenkins
Branch: master

commit fbc6b991a79a147bf1411b643f5c304062e5956a
Author: Li Ma <email address hidden>
Date: Fri Feb 21 00:57:25 2014 -0800

    Race condition of L3-agent to add/remove routers

    This race condition happens when repeatedly calling
    l3-agent-router-add and l3-agent-router-remove
    by different neutron-servers at the same time.

    The primary key constraint is added for the pair of
    (router_id and l3_agent_id).

    During migration, verification is done if the current
    records violate the PK constraint defined in this bug
    fix, and sanitize the data before schema modification.

    Due to different dialects of database engines, different
    sql statements are executed correspondingly to do
    the verification.

    Change-Id: Ia541e023b757b2e77c4eec9bb1670632c7a271fa
    Closes-Bug: #1230323

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → juno-3
status: Fix Committed → Fix Released
Akihiro Motoki (amotoki)
tags: added: icehouse-backport-potential
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-3 → 2014.2
Revision history for this message
Tom Verdaat (tom-verdaat) wrote :
Download full text (4.4 KiB)

Looks like this commit might be causing problems! I can't get the Neutron database schema installed on a fresh installation of Openstack Juno on Ubuntu 14.04 and it's failing on this migration. Using ML2 with OVS plugin, enabled L3 HA and DVR as well. Fresh installation so the database is empty when starting "neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head"

Traceback below. Added the full log as an attachment.

INFO [alembic.migration] Running upgrade 37f322991f59 -> 31d7f831a591, add constraint for routerid
Traceback (most recent call last):
  File "/usr/bin/neutron-db-manage", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/cli.py", line 173, in main
    CONF.command.func(config, CONF.command.name)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/cli.py", line 83, in do_upgrade_downgrade
    do_alembic_command(config, cmd, revision, sql=CONF.command.sql)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/cli.py", line 61, in do_alembic_command
    getattr(alembic_command, cmd)(config, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/alembic/command.py", line 125, in upgrade
    script.run_env()
  File "/usr/lib/python2.7/dist-packages/alembic/script.py", line 203, in run_env
    util.load_python_file(self.dir, 'env.py')
  File "/usr/lib/python2.7/dist-packages/alembic/util.py", line 212, in load_python_file
    module = load_module_py(module_id, path)
  File "/usr/lib/python2.7/dist-packages/alembic/compat.py", line 58, in load_module_py
    mod = imp.load_source(module_id, path, fp)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/alembic_migrations/env.py", line 108, in <module>
    run_migrations_online()
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/alembic_migrations/env.py", line 100, in run_migrations_online
    context.run_migrations()
  File "<string>", line 7, in run_migrations
  File "/usr/lib/python2.7/dist-packages/alembic/environment.py", line 688, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/usr/lib/python2.7/dist-packages/alembic/migration.py", line 258, in run_migrations
    change(**kw)
  File "/usr/lib/python2.7/dist-packages/neutron/db/migration/alembic_migrations/versions/31d7f831a591_add_constraint_for_routerid.py", line 85, in upgrade
    cols=['router_id', 'l3_agent_id']
  File "<string>", line 7, in create_primary_key
  File "/usr/lib/python2.7/dist-packages/alembic/operations.py", line 518, in create_primary_key
    schema)
  File "/usr/lib/python2.7/dist-packages/alembic/ddl/impl.py", line 135, in add_constraint
    self._exec(schema.AddConstraint(const))
  File "/usr/lib/python2.7/dist-packages/alembic/ddl/impl.py", line 76, in _exec
    conn.execute(construct, *multiparams, **params)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/sql/ddl.py", line 69, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/usr/lib/python2.7/dist-pac...

Read more...

Revision history for this message
Li Ma (nick-ma-z) wrote :

I cannot find any problems in my deployment. Could you provide more detailed informations about your database? It seems that something has conflicts with database behavior.

Revision history for this message
Tom Verdaat (tom-verdaat) wrote :

My issue is a equal to the report in bug #1384555 (https://bugs.launchpad.net/neutron/+bug/1384555)

The SQL doesn't work on MySQL 5.6 / MariaDB 10.0, apparently because as of these versions they forbid altering a column with a FK constraint.

Unfortunately that bug doesn't seem to get a lot of activity. Any chance you, as the original author, could fix it?

Revision history for this message
Li Ma (nick-ma-z) wrote :

It is fixed by https://review.openstack.org/#/c/132273/
You can check it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.