Bug #1550886 “L3 Agent's fullsync is raceful with creation of HA...” : Bugs : neutron

John Schwarz (jschwarz) on 2016-02-28

Changed in neutron:
assignee:	nobody → John Schwarz (jschwarz)

OpenStack Infra (hudson-openstack) on 2016-02-28

Changed in neutron:
status:	New → In Progress

John Schwarz (jschwarz) on 2016-02-29

description:

updated

Carl Baldwin (carl-baldwin) on 2016-02-29

Changed in neutron:
importance:	Undecided → Medium

Revision history for this message

LIU Yulong (dragon889) wrote on 2016-03-01:

#1

Maybe the following trace is related to this bug:
http://paste.openstack.org/show/488732/

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-03-01:

#2

Fix: https://review.openstack.org/#/c/284400/

Revision history for this message

Assaf Muller (amuller) wrote on 2016-03-18:

#3

New fix: https://review.openstack.org/#/c/285480/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-22: Change abandoned on neutron (master)

#4

Change abandoned by John Schwarz (<email address hidden>) on branch: master
Review: https://review.openstack.org/284400
Reason: This seems like a complicated patch and it looks like it's only going to get more complicated. I'm abandoning this in favour of https://review.openstack.org/#/c/285480/ which simplifies the code greatly and also solves the races at the same time.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-13: Fix merged to neutron (master)

#5

Reviewed: https://review.openstack.org/257059
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9c3c19f07ce52e139d431aec54341c38a183f0b7
Submitter: Jenkins
Branch: master

commit 9c3c19f07ce52e139d431aec54341c38a183f0b7
Author: Kevin Benton <email address hidden>
Date: Thu Feb 18 03:48:29 2016 -0800

Add ALLOCATING state to routers

    This patch adds a new ALLOCATING status to routers
    to indicate that the routers are still being built on the
    Neutron server. Any routers in this state are excluded in
    router retrievals by the L3 agent since they are not yet
    ready to be wired up.

    This is necessary when a router is made up of several
    distinct Neutron resources that cannot all be put
    into a single transaction. This patch applies this new
    state to HA routers while their internal HA ports and
    networks are being created/deleted so the L3 HA agent
    will never retrieve a partially formed HA router. It's
    important to note that the ALLOCATING status carries over
    until after the scheduling is done, which ensures that
    routers that weren't fully scheduled will not be sent to
    the agents.

    An HA router is placed in this state only when it is being
    created or converted to/from the HA state since this is
    disruptive to the dataplane.

    This patch also reverts the changes introduced in
    Iadb5a69d4cbc2515fb112867c525676cadea002b since they will
    be handled by the ALLOCATING logic instead.

Co-Authored-By: Ann Kamyshnikova <email address hidden>
Co-Authored-By: John Schwarz <email address hidden>

    APIImpact
    Closes-Bug: #1550886
    Related-bug: #1499647
    Change-Id: I22ff5a5a74527366da8f82982232d4e70e455570

Changed in neutron:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-14: Fix proposed to neutron (stable/mitaka)

#6

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/305622

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-14: Fix proposed to neutron (stable/liberty)

#7

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/305774

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-18: Change abandoned on neutron (master)

#8

Change abandoned by John Schwarz (<email address hidden>) on branch: master
Review: https://review.openstack.org/284400
Reason: Fixed by the ALLOCATING patch

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-19: Fix merged to neutron (stable/mitaka)

#9

Reviewed: https://review.openstack.org/305622
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=36305c0c4f4ebf498020f5956e103832da75f8a9
Submitter: Jenkins
Branch: stable/mitaka

commit 36305c0c4f4ebf498020f5956e103832da75f8a9
Author: Kevin Benton <email address hidden>
Date: Thu Feb 18 03:48:29 2016 -0800

Add ALLOCATING state to routers

    This patch adds a new ALLOCATING status to routers
    to indicate that the routers are still being built on the
    Neutron server. Any routers in this state are excluded in
    router retrievals by the L3 agent since they are not yet
    ready to be wired up.

    This is necessary when a router is made up of several
    distinct Neutron resources that cannot all be put
    into a single transaction. This patch applies this new
    state to HA routers while their internal HA ports and
    networks are being created/deleted so the L3 HA agent
    will never retrieve a partially formed HA router. It's
    important to note that the ALLOCATING status carries over
    until after the scheduling is done, which ensures that
    routers that weren't fully scheduled will not be sent to
    the agents.

    An HA router is placed in this state only when it is being
    created or converted to/from the HA state since this is
    disruptive to the dataplane.

    This patch also reverts the changes introduced in
    Iadb5a69d4cbc2515fb112867c525676cadea002b since they will
    be handled by the ALLOCATING logic instead.

Co-Authored-By: Ann Kamyshnikova <email address hidden>
Co-Authored-By: John Schwarz <email address hidden>

    APIImpact
    Closes-Bug: #1550886
    Related-bug: #1499647
    Change-Id: I22ff5a5a74527366da8f82982232d4e70e455570
    (cherry picked from commit 9c3c19f07ce52e139d431aec54341c38a183f0b7)

tags:

added: in-stable-mitaka

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-05-09: Fix included in openstack/neutron 8.1.0

#10

This issue was fixed in the openstack/neutron 8.1.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-09: Fix proposed to neutron (master)

#11

Fix proposed to branch: master
Review: https://review.openstack.org/314250

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-26: Fix merged to neutron (master)

#12

Download full text (36.9 KiB)

Reviewed: https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch: master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <email address hidden>
Date: Tue Oct 20 15:51:37 2015 +0000

Revert "Improve performance of ensure_namespace"

This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.

    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.

This also

Closes-bug: #1574881

Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Apr 21 07:05:16 2016 +0000

Imported Translations from Zanata

For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure

Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Apr 19 06:52:48 2016 +0000

Imported Translations from Zanata

For more information about this automatic import see:
https://wiki.openstack.org/wiki/Translations/Infrastructure

Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Apr 18 15:03:30 2016 +0000

Updated from global requirements

Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <email address hidden>
Date: Fri Apr 8 17:52:14 2016 -0700

Clear DVR MAC on last agent deletion from host

    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.

    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <email address hidden>
Date: Tue Feb 9 13:10:57 2016 -0500

Add an option for WSGI pool size

    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html

    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.

    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.

DocImpact: a previously unused configuration value
wsgi_default_pool_size is now used to a...

Reviewed:  https://review.openstack.org/314250
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3bf73801df169de40d365e6240e045266392ca63
Submitter: Jenkins
Branch:    master

commit a323769143001d67fd1b3b4ba294e59accd09e0e
Author: Ryan Moats <rmoats@us.ibm.com>
Date:   Tue Oct 20 15:51:37 2015 +0000

Revert "Improve performance of ensure_namespace"
    
    This reverts commit 81823e86328e62850a89aef9f0b609bfc0a6dacd.
    
    Unneeded optimization: this commit only improves execution
    time on the order of milliseconds, which is less than 1% of
    the total router update execution time at the network node.
    
    This also
    
    Closes-bug: #1574881
    
    Change-Id: Icbcdf4725ba7d2e743bb6761c9799ae436bd953b

commit 7fcf0253246832300f13b0aa4cea397215700572
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Thu Apr 21 07:05:16 2016 +0000

Imported Translations from Zanata
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: I9e930750dde85a9beb0b6f85eeea8a0962d3e020

commit 643b4431606421b09d05eb0ccde130adbf88df64
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Tue Apr 19 06:52:48 2016 +0000

Imported Translations from Zanata
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: I52d7460b3265b5460b9089e1cc58624640dc7230

commit 1ffea42ccdc14b7a6162c1895bd8f2aae48d5dae
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Mon Apr 18 15:03:30 2016 +0000

Updated from global requirements
    
    Change-Id: Icb27945b3f222af1d9ab2b62bf2169d82b6ae26c

commit b970ed5bdac60c0fa227f2fddaa9b842ba4f51a7
Author: Kevin Benton <kevin@benton.pub>
Date:   Fri Apr 8 17:52:14 2016 -0700

Clear DVR MAC on last agent deletion from host
    
    Once all agents are deleted from a host, the DVR MAC generated
    for that host should be deleted as well to prevent a buildup of
    pointless flows generated in the OVS agent for hosts that don't
    exist.
    
    Closes-Bug: #1568206
    Change-Id: I51e736aa0431980a595ecf810f148ca62d990d20
    (cherry picked from commit 92527c2de2afaf4862fddc101143e4d02858924d)

commit eee9e58ed258a48c69effef121f55fdaa5b68bd6
Author: Mike Bayer <mike_mp@zzzcomputing.com>
Date:   Tue Feb 9 13:10:57 2016 -0500

Add an option for WSGI pool size
    
    Neutron currently hardcodes the number of
    greenlets used to process requests in a process to 1000.
    As detailed in
    http://lists.openstack.org/pipermail/openstack-dev/2015-December/082717.html
    
    this can cause requests to wait within one process
    for available database connection while other processes
    remain available.
    
    By adding a wsgi_default_pool_size option functionally
    identical to that of Nova, we can lower the number of
    greenlets per process to be more in line with a typical
    max database connection pool size.
    
    DocImpact: a previously unused configuration value
               wsgi_default_pool_size is now used to affect
               the number of greenlets used by the server. The
               default number of greenlets also changes from 1000
               to 100.
    Change-Id: I94cd2f9262e0f330cf006b40bb3c0071086e5d71
    (cherry picked from commit 9d573387f1e33ce85269d3ed9be501717eed4807)

commit bf66cc6f74133cfe6c1ab75287d39814ac44b068
Author: Clayton O'Neill <clayton.oneill@twcable.com>
Date:   Thu Mar 24 15:28:21 2016 +0000

Don't disconnect br-int from phys br if connected
    
    When starting up, we don't want to delete the patch port between br-int
    and the physical bridges. In liberty the br-int bridge was changed to
    not tear down flows on startup, and  change
    I9801b76829021c9a0e6358982e1136637634a521 will change the physical
    bridges to not tear down flows also.
    
    Without this patch the patch port is torn down and not reinstalled until
    after the initial flows are set back up.
    
    Partial-Bug: #1514056
    Change-Id: I05bf5105a6f3acf6a313ce6799648a095cf8ec96
    (cherry picked from commit a549f30fad93508bf9dfdcfb20cd522f7add27b0)

commit 93795a4bda47605d5616476b2a456772308aa3c3
Author: Kevin Benton <kevin@benton.pub>
Date:   Mon Mar 28 14:14:15 2016 -0700

Fix deprecation warning for external_network_bridge
    
    We only want this to warn when a deployer has set anything other
    than a blank string. The olso cfg would warn whenever it was set
    so it was incorrectly warning on the value we want operators to
    set it to.
    
    This changes the cfg option to not use deprecated for removal and
    the L3 agent config validation to emit a warning if its not set
    to the value we are suggesting.
    
    Change-Id: If533cf7c4c379be78f5a15073accaff7f65973ab
    Closes-Bug: #1563070
    (cherry picked from 8382ac3717cf646145379456af94ce75000349a9)

commit 36305c0c4f4ebf498020f5956e103832da75f8a9
Author: Kevin Benton <kevin@benton.pub>
Date:   Thu Feb 18 03:48:29 2016 -0800

Add ALLOCATING state to routers
    
    This patch adds a new ALLOCATING status to routers
    to indicate that the routers are still being built on the
    Neutron server. Any routers in this state are excluded in
    router retrievals by the L3 agent since they are not yet
    ready to be wired up.
    
    This is necessary when a router is made up of several
    distinct Neutron resources that cannot all be put
    into a single transaction. This patch applies this new
    state to HA routers while their internal HA ports and
    networks are being created/deleted so the L3 HA agent
    will never retrieve a partially formed HA router. It's
    important to note that the ALLOCATING status carries over
    until after the scheduling is done, which ensures that
    routers that weren't fully scheduled will not be sent to
    the agents.
    
    An HA router is placed in this state only when it is being
    created or converted to/from the HA state since this is
    disruptive to the dataplane.
    
    This patch also reverts the changes introduced in
    Iadb5a69d4cbc2515fb112867c525676cadea002b since they will
    be handled by the ALLOCATING logic instead.
    
    Co-Authored-By: Ann Kamyshnikova <akamyshnikova@mirantis.com>
    Co-Authored-By: John Schwarz <jschwarz@redhat.com>
    
    APIImpact
    Closes-Bug: #1550886
    Related-bug: #1499647
    Change-Id: I22ff5a5a74527366da8f82982232d4e70e455570
    (cherry picked from commit 9c3c19f07ce52e139d431aec54341c38a183f0b7)

commit 07401352a964b92ef4bc09a09800554e4a84cc87
Author: Hynek Mlnarik <hmlnarik@redhat.com>
Date:   Thu Feb 25 11:34:15 2016 +0100

Cleanup stale OVS flows for physical bridges
    
    Perform deletion of the stale flows in physical bridges consistently with
    br-int and br-tun, respecting drop_flows_on_start configuration option.
    Added tests for auxiliary bridge and functional tests for the physical
    bridge using VLAN/flat external network. Fixes part of the bug 1514056;
    together with [1] and [2], the bug should be considered fixed.
    
    The commit also fixes inconsistency between netmask of allocated IP
    addresses assigned in _create_test_port_dict and ip_len in _plug_ports
    of base.py.
    
    [1] https://review.openstack.org/#/c/297211/
    [2] https://review.openstack.org/#/c/297818/
    
    Co-Authored-By: Jian Wen <wenjianhn@gmail.com>
    Partial-Bug: 1514056
    Change-Id: I9801b76829021c9a0e6358982e1136637634a521
    (cherry picked from commit cacde308eef6f1d7005e555b4521332da95d3cf4)

commit 05a4a34b7e46c2e13a9bd874674804a94f342d0c
Author: Oleg Bondarev <obondarev@mirantis.com>
Date:   Thu Apr 7 16:45:52 2016 +0300

Notify resource_versions from agents only when needed
    
    resource_versions were included into agent state reports recently to
    support rolling upgrades (commit 97a272a892fcf488949eeec4959156618caccae8)
    The downside is that it brought additional processing when handling state
    reports on server side: update of local resources versions cache and
    more seriously rpc casts to all other servers to do the same.
    All this led to a visible performance degradation at scale with hundreds
    of agents constantly sending reports. Under load (rally test) agents
    may start "blinking" which makes cluster very unstable.
    
    In fact there is no need to send and update resource_versions in each state
    report. I see two cases when it should be done:
     1) agent was restarted (after it was upgraded);
     2) agent revived - which means that server was not receiving or being able
        to process state reports for some time (agent_down_time). During that
        time agent might be upgraded and restarted.
    
    So this patch makes agents include resource_versions info only on startup.
    After agent revival server itself will update version_manager with
    resource_versions taken from agent DB record - this is to avoid
    version_manager being outdated.
    
    Closes-Bug: #1567497
    Change-Id: I47a9869801f4e8f8af2a656749166b6fb49bcd3b
    (cherry picked from commit e532ee3fccd0820f9ab0efc417ee787fb8c870e9)

commit 07fa3725c5a7fc68a41ed8af53ca2d3aad4c35b9
Author: Oleg Bondarev <obondarev@mirantis.com>
Date:   Tue Apr 5 16:18:03 2016 +0300

ADDRESS_SCOPE_MARK_IDS should not be global for L3 agent
    
    Otherwise agent becomes unable to handle more than 1024 routers
    (including deleted routers) and starts failing.
    
    It should be enough to distinguish address scopes inside router namespace,
    so this patch moves ADDRESS_SCOPE_MARK_IDS set to the RouterInfo class.
    
    Closes-Bug: #1566291
    Change-Id: I1e43bb3e68db4db93cc1dfc1383af0311bfb0f2d
    (cherry picked from commit 1cb43734808eded87210d2957d56b70c514d55c3)

commit 9c58ae6a70125497a39612f71c53c60a8b35968e
Author: Eugene Nikanorov <enikanorov@mirantis.com>
Date:   Wed Mar 30 08:36:58 2016 -0700

Wrap all update/delete l3_rpc handlers with retries
    
    This is needed for mysql galera multimaster backends.
    
    Closes-Bug: #1564144
    Change-Id: Ia5a14d5ee91c6672d61904f669e9e845a7f262c9
    (cherry picked from commit d8f0ee5ecd67ee6ec956c7fdadce3c2a8e8301bf)

commit fff909e899eb06ca2ba1b9219df8525db0980fdd
Author: Oleg Bondarev <obondarev@mirantis.com>
Date:   Thu Apr 7 19:27:38 2016 +0300

Values for [ml2]/physical_network_mtus should not be unique
    
    Obviously there could be physical networks with same MTU.
    The patch sets unique_values to False when parsing physical_network_mtus
    Unit test added.
    
    Closes-Bug: #1567502
    Change-Id: I46e5b5d3f7033a974fca40342af6dff7c71a9a4a
    (cherry picked from commit 5cdd7ae574312a7499a8768c1752d9bc27ff7d20)

commit ece192b056c36568238abaf763b0ea9f99877c4c
Author: Oleg Bondarev <obondarev@mirantis.com>
Date:   Fri Apr 1 19:40:20 2016 +0300

Use new DB context when checking if agent is online during rescheduling
    
    Commit 9ec466cd42de94c2fad091edfd7583a5f47eb87a was not enough
    since it checked l3 agents in the same transaction that was used
    to fetch down bindings - so agents always were down even if actually
    they went back online.
    This commit adds context creation on each iteration to make sure
    we use new transaction and fetch up-to-date info for the agent.
    
    Closes-Bug: #1522436
    Change-Id: I12a4e4f4e0c2042f0c0bf7eead42baca7b87a22b
    (cherry picked from commit 70068992e37c80e9aa8e70f017aa35132d7e5aee)

commit 2e2d75cbc2be356e962793db42905cecf28730d8
Author: Jakub Libosvar <libosvar@redhat.com>
Date:   Thu Apr 7 10:09:13 2016 +0000

ovsfw: Load vlan tag from other_config
    
    OVS agent stores vlan tag only to other_config before
    setup_port_filter() is called [1], leaving 'tag' column empty. This
    patch loads tag from correct place and modifies functional tests
    accordingly.
    
    Closes-Bug: 1566934
    [1] https://github.com/openstack/neutron/blob/1efed3a5321a259d27ec4a6e80352d35fc423c11/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L821
    
    Change-Id: Iaae46ce7362fedfc53af958600d6d712eb382e9f
    (cherry picked from commit dabd969090d35ec218d348f170edbef58163a79d)

commit 681441137b8590620c6a1f35ab5931edb22d7a31
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Mon Apr 11 06:27:57 2016 +0000

Imported Translations from Zanata
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: Ia032d91e89bd78e16412b9bc7f60f466c61a16e5

commit a2d1c46fe7952abc392d91ef3d00d203e0f38a4c
Author: Ihar Hrachyshka <ihrachys@redhat.com>
Date:   Thu Mar 24 16:16:17 2016 +0100

firewall: don't warn about a driver that does not accept bridge
    
    Drivers that do not need the bridge parameter are safe to ignore it.
    Actually, the base abstract interface for those drivers does not allow
    to pass the new parameter.
    
    Ideally, the base class __init__ signature would change to accept the
    argument, but we can't do it without breaking API.
    
    So instead, just handle both types of drivers - those that accept the
    additional argument, and those that don't. And don't assume the latter
    are somehow wrong.
    
    Change-Id: Iceee46f63669b28e3a34d207216d864f1bfa5cf8
    Closes-Bug: #1561243
    (cherry picked from commit 193aa35325b897b960de88b510181b377e4c345c)

commit fa5eb530bb71c1b1fe1020802a63b05907695011
Author: Kevin Benton <kevin@benton.pub>
Date:   Wed Mar 16 01:35:26 2016 -0700

Add uselist=True to subnet rbac_entries relationship
    
    Because the join conditions for Subnet rbac entries
    are manually specified, SQLAlchemy is not
    automatically detecting that this relationship is a list.
    This adds the uselist=True kwarg to the relationship to
    ensure that it's always handled as a list.
    
    Change-Id: Ia4ae57ddd932260691584ae74c0305a79b2e60a9
    Closes-Bug: #1557959
    (cherry picked from commit 691f8f5ea54c04bfdfb76e25bda14665b05ed859)

commit 5853af9cba6733725d6c9ac0db644f426713f0cf
Author: Dustin Lundquist <dustin@null-ptr.net>
Date:   Thu Mar 31 12:04:31 2016 -0700

Iptables firewall prevent IP spoofed DHCP requests
    
    The DHCP rules in the fixed iptables firewall rules were too permissive.
    They permitted any UDP traffic with a source port of 68 and destination
    port of 67. Care must be taken since these rules return before the IP
    spoofing prevention rules. This patch splits the fixed DHCP rules into
    two, one for the discovery and request messages which take place before
    the instance has bound an IP address and a second to permit DHCP
    renewals.
    
    Change-Id: Ibc2b0fa80baf2ea8b01fa568cd1fe7a7e092e7a5
    Partial-Bug: #1558658
    (cherry picked from commit 6a93ee8ac1a901c255e3475a24f1afc11d8bf80f)

commit c178bd9565ae699a79a8f13e89d1f8717a765cc1
Author: Armando Migliaccio <armamig@gmail.com>
Date:   Wed Mar 30 14:24:58 2016 -0700

Fix race conditions in IP availability API tests
    
    DHCP port creation is asynchronous with subnet creation.
    Therefore there is a time window where, depending on how fast
    the DHCP agent honors the request, the DHCP port IP allocation
    may or may not be accounted for in the total number of used IP
    for the network. To kill the race, do not run dhcp on the
    created subnets at all.
    
    Closes-bug: 1563883
    
    Change-Id: Idda25e65d04852d68a3c160cc9deefdb4ee82dcd
    (cherry picked from commit 27634bb2ba2637d694c7d0aa5758173d12ef579a)

commit ee32ea5e2bf2b01104c5bde6b6a5018dd0e15f57
Author: Ihar Hrachyshka <ihrachys@redhat.com>
Date:   Thu Apr 7 19:15:01 2016 +0200

Switched from fixtures to mock to mock out starting RPC consumers
    
    fixtures 2.0.0 broke us wildly, so instead of trying to make it work
    with new fixtures, I better just switch the mock to... mock.
    
    Change-Id: I58d7a750e263e4af54589ace07ac00bec34b553a
    Closes-Bug: #1567295
    (cherry picked from commit 2af86b8f6f749bf7b42a2c04b48c9a2dc28a46c9)

commit 77696d8529c7310e343722d809ab2e63bd8946f9
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Fri Apr 8 06:36:58 2016 +0000

Imported Translations from Zanata
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: I2f2ef73e04e007e9b450d51a72ee60e1ec788518

commit 3190494c0dfc9762d874399276327f096ce4bd92
Author: Armando Migliaccio <armamig@gmail.com>
Date:   Fri Apr 1 13:38:06 2016 -0700

Fix zuul_cloner errors during tox job setup
    
    Since [1], Tempest is a pip installable package, and that prevents zuul_cloner
    to work correctly. This change moves away from the existing logic in tox_install,
    and adds tempest as an explicit requirement for the api job. This is in fact
    the only tox target that needs Tempest to work.
    
    tox_install.sh has become less important now, but cleanup is left as follow
    up, to speed up gate salvation.
    
    [1] I25eac915c977ebaedced66ac896c5dd77259d193
    
    Change-Id: I00d882dde77a687ecb57ec200a34fd96256ad87a
    (cherry picked from commit 8a6913c534e39e93640ed99c67e66763a9d1da3c)

commit 9679285f547b301a511e02ebc763b5406fb03ffc
Author: Jamie Lennox <jamielennox@gmail.com>
Date:   Tue Mar 15 10:05:29 2016 +1100

Return oslo_config Opts to config generator
    
    We shouldn't be returning keystoneauth Opts to the oslo_config
    generator. Whilst it mostly works these objects are not interchangable
    and it can result in problems. You can see this by entries such as:
    
      # Warning: Failed to format sample for tenant_name
      # isinstance() arg 2 must be a class, type, or tuple of classes
        and types
    
    in the currently generated config files.
    
    Keystoneauth provides a function that returns oslo_config options so
    fetch, process and return those instead.
    
    Change-Id: Ie3fad2381467b19189cbb332c41cea8b6cf6e264
    Closes-Bug: #1548433
    (cherry picked from commit c3db0707eff70f381913643891ba4e148977407d)

commit 04fb1476de3b9d81ad579586c7010b5cdd2248a2
Author: Hynek Mlnarik <hmlnarik@redhat.com>
Date:   Wed Mar 30 10:44:09 2016 +0200

Refactor and fix dummy process fixture
    
    Extracting the test fixture that creates a new process and leaves it
    running for a given amount of time into helpers where other fixtures for
    functional tests live. This both keeps the fixtures at one place and
    increases visibility of the fixture so that it can be reused in other
    tests. At the same time, the fixture is fixed as the original code
    omitted starting the process.
    
    Change-Id: I97aeb8d1d5773ef3d59e8f908aea34ccceb38378
    Related-Bug: 1561046
    (cherry picked from commit 2690eed19a749fb1b50bb38f3d01fce0f1497f39)

commit 844cae4960cb2e3aedad750ceb91aa675f8e1142
Author: Dmitry Sutyagin <dsutyagin@mirantis.com>
Date:   Fri Feb 12 12:18:14 2016 +0300

Switches metering agent to stateless iptables
    
    If state_less parameter is not specified then
    neutron-postrouting-bottom rule goes up in POSTROUTING
    chain, which causes premature NATing of traffic,
    for ex. traffic between internal networks becomes NATed.
    
    Closes-Bug: 1544508
    Co-Authored-By: Sergey Belous <sbelous@mirantis.com>
    Change-Id: I2e0011237d50a59d417cfee01dcd5f9d0da2e7f5
    (cherry picked from commit 5d2d1120fcdcd5977d3c760ac1520a841048d456)

commit 19ea6ba92379168e1bfff7a7235119cfbbc0172c
Author: Hynek Mlnarik <hmlnarik@redhat.com>
Date:   Wed Mar 23 14:51:59 2016 +0100

Remove obsolete keepalived PID files before start
    
    keepalived refuses to start and claims "daemon already started"
    when there is already a process with the same PID as found in
    either the VRRP or the main process PID file. This happens even
    in case when the new process is not keepalived. The situation
    can happen when the neutron node is reset and the obsolete PID
    files are not cleaned before neutron is started.
    
    This commit adds PID file cleanup before keepalived start.
    
    Closes-Bug: 1561046
    Change-Id: Ib6b6f2fe76fe82253f195c9eab6b243d9eb76fa2
    (cherry picked from commit e98fabb5836b12bc40a2b64a2668893ea73c2320)

commit aafa702d2f389c0a4f3679df65be07940095ec29
Author: Kevin Benton <kevin@benton.pub>
Date:   Sun Mar 13 20:52:09 2016 -0700

Add IPAllocation object to session info to stop GC
    
    This adds the IPAllocation object created in the _store_ip_allocation
    method to the session info dictionary to prevent it from being
    immediately garbage collected. This is necessary because otherwise a
    new persistent object will be created when the fixed_ips relationship
    is referenced during the rest of the port create/update opertions.
    This persistent object will then interfere with a retry operation
    that uses the same session if it tries to create a conflicting record.
    
    By preventing the object from being garbage collected, the reference
    to fixed IPs will re-use the newly created sqlalchemy object instead
    which will properly be cleaned up on a rollback.
    
    This also removes the 'passive_delete' option from the fixed_ips
    relationship on ports because IPAllocation objects would now be
    left in the session after port deletes. At first glance, this might
    look like a performance penalty because fixed_ips would be looked
    up before port deletes; however, we already do that in the IPAM
    code as well as the ML2 code so this relationship is already being
    loaded on the delete_port operation.
    
    Closes-Bug: #1556178
    Change-Id: Ieee1343bb90cf111c55e00b9cabc27943b46c350
    (cherry picked from commit 7d9169967fca3d81076cf60eb772f4506735a218)

commit 005d49d2f092039660a44896217a5c245dcc4685
Author: Vincent Untz <vuntz@suse.com>
Date:   Tue Nov 17 17:47:56 2015 +0100

Ensure metadata agent doesn't use SSL for UNIX socket
    
    The communication between the ns metadata proxy and the metadata agent
    is pure HTTP, and should not switch to HTTPS when neutron is using SSL.
    
    We're therefore telling wsgi.Server to forcefully disable SSL in that
    case.
    
    Change-Id: I2cb9fa231193bcd5c721c4d5cf0eb9c16e842349
    Closes-Bug: #1514424
    (cherry picked from commit 7a306e2918775ebb94d9e1408aaa2b7c3ed26fc6)

commit 905fd05f966e0c7d3851f59665e58746c50bad1e
Author: Swaminathan Vasudevan <swaminathan.vasudevan@hpe.com>
Date:   Fri Mar 25 12:38:13 2016 -0700

DVR: Increase the link-local address pair range
    
    The current dvr_fip_ns.py file has FIP_LL_SUBNET configured
    with a subnet prefixlen of /23 which only allows 255 pairs of
    link-local addresses to be generated. If the number of routers
    per-node increases beyond the 255 limit it raises an assertion.
    
    This patch increases the link-local address cidr to be a /18
    to allow for 8K routers. The new range was chosen to not
    overlap with the original, allowing for in-place upgrades
    without affecting existing routers.
    
    Closes-Bug: #1562110
    Change-Id: I6e11622ea9cc74b1d2428757f16aa0de504ac31a
    (cherry picked from commit 7b1b8c2de57457c2ec1ed784165a3e10e24151cf)

commit 93d719a554d9b179636afccd25e1018b6e5d1cc3
Author: Sreekumar S <sreesiv@gmail.com>
Date:   Fri Jan 22 19:09:49 2016 +0530

SG protocol validation to allow numbers or names
    
    SG rule protocol provided is validated against the DB rules'
    protocols for both number and name. The filter provided to DB
    is modified so that it is queried for records with both the
    protocol name and number, instead of exactly the type provided
    with the input. The returned DB rule record's protocol field is
    validated against the supplied SG protocol field for both name
    or number.
    This way, user is still allowed to enter protocol name or number
    to create a rule, and API compatibility is maintained.
    
    Closes-Bug: #1215181
    (cherry picked from commit 913a64cc1175b3bd7efc7abe34895c32bf39a696)
    
    Also squashed the following regression fix:
    
    ===
    
    Don't drop 'protocol' from client supplied security_group_rule dict
    
    If protocol was present in the dict, but was None, then it was never
    re-instantiated after being popped out of the dict. This later resulted
    in KeyError when trying to access the key on the dict.
    
    Change-Id: I4985e7b54117bee3241d7365cb438197a09b9b86
    Closes-Bug: #1566327
    (cherry picked from commit 5a41caa47a080fdbc1801e2771163734b9790c57)
    
    ===
    
    Change-Id: If4ad684e961433b8d9d3ec8fe2810585d3f6a093

commit 33d3b8ce76d942950e2f999a6040787483adf729
Author: Kevin Benton <kevin@benton.pub>
Date:   Fri Apr 1 02:42:54 2016 -0700

L3 agent: match format used by iptables
    
    This fixes the iptables rules generated by the L3 agent
    (SNAT, DNAT, set-mark and metadata), and the DHCP agent
    (checksum-fill) to match the format that will be returned
    by iptables-save to prevent excessive extra replacement
    work done by the iptables manager.
    
    It also fixes the iptables test that was not passing the
    expected arguments (-p PROTO -m PROTO) for block rules.
    
    A simple test was added to the L3 agent to ensure that the
    rules have converged during the normal lifecycle tests.
    
    Closes-Bug: #1566007
    Change-Id: I5e8e27cdbf0d0448011881614671efe53bb1b6a1
    (cherry picked from commit b8d520ffe2afbffe26b554bff55165531e36e758)

commit 7b2fcaa1734bc1f26a1c4c2f0ace6fedbaa171e2
Author: Armando Migliaccio <armamig@gmail.com>
Date:   Fri Apr 1 18:02:47 2016 -0700

Use right class method in IP availability tests
    
    This should have been skip_checks. Spotted when dealing with
    fix for bug 1563883, argh.
    
    Change-Id: If609c285c2363967aba91a8ae1560d99391654d9
    Related-bug: 1563883
    (cherry picked from commit 2b0ce0ba8bf6abeb62654a00c12cfaad89f86e7c)

commit 93cdf8eb559cf78895200b3ec6775afe60b6c638
Author: Kevin Benton <kevin@benton.pub>
Date:   Sun Feb 21 21:31:59 2016 -0800

Make L3 HA interface creation concurrency safe
    
    This patch creates a function to handle the creation of the
    L3HA interfaces for a router in a manner that handles the
    HA network not existing or an existing one being  deleted
    by another worker before the interfaces could be created.
    
    Closes-Bug: #1548285
    Change-Id: Ibac0c366362aa76615e448fbe11d6d6b031732fe
    (cherry-picked from commit 7512d8aa26a945a695e889e0a97c6414cec6ac10)

commit d93466923f3b5627c55510f9bc7229a7b584b430
Author: Jakub Libosvar <libosvar@redhat.com>
Date:   Fri Apr 1 14:53:03 2016 +0000

ovsfw: Remove vlan tag before injecting packets to port
    
    Open vSwitch takes care of vlan tagging in case normal switching is
    used. When ingress traffic packets are accepted, the
    actions=output:<port_number> is used but we need to explicitly take care
    of stripping out the vlan tags.
    
    Closes-Bug: 1564947
    Change-Id: If3fc44c9fd1ac0f7bc9dfe9dc48e76352e981f8e
    (cherry picked from commit 0f9ec7b72a8ca173b760f20323f90bffefa91681)

commit 33c01f4d49a4561bac56831483ac22c4a4b3f47d
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Tue Apr 5 06:48:18 2016 +0000

Imported Translations from Zanata
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: Ief6281fde6ab1f72b26bfb11528fc64c71b407b9

commit 05ac0125fe861fbfb09d48eec97a29539b51b4e2
Author: YAMAMOTO Takashi <yamamoto@midokura.com>
Date:   Fri Mar 25 15:22:46 2016 +0900

test_network_ip_availability: Skip IPv6 tests when configured so
    
    Use a class attribute known by the base class so that IPv6 tests
    are skipped appropriately when CONF.network_feature_enabled.ipv6
    is False.
    
    Closes-Bug: #1561857
    Change-Id: I93f76b7f7cd94ff484d2e4507500af97578ac71a
    (cherry picked from commit 61a5bcb52e23fa39839c30cb4f13f8a26e115516)

commit 38894cc7603f90ed8a5938e37fcc773c926fbfe4
Author: Eugene Nikanorov <enikanorov@mirantis.com>
Date:   Mon Mar 21 20:05:47 2016 -0700

Retry updating agents table in case of deadlock
    
    Updating agents table is contantious operation which
    can fail often if mysql backend is in multimaster mode.
    This could lead to agents flapping and various issues
    such as sporadic reschedluing, port binding failures, etc.
    
    Change-Id: Ief392f9a09d86c185dc086055d2cbc1891ff1d7f
    Closes-Bug: #1560724
    (cherry picked from commit d5e4013556c7144d347ece267c4bd3c8dc87b24f)

commit aac460b0a7fec68fbb173ac8899274809e254a7a
Author: Vladimir Eremin <veremin@mirantis.com>
Date:   Thu Mar 17 19:32:29 2016 +0300

Allow to use several nics for physnet with SR-IOV
    
    Accordind specs and docs, SRIOV_NIC.physical_device_mappings is not
    limited to be a 1-1 mapping between physnets and NICs. However,
    implementation requires this. This bugfix unlocks 1-M mappings, so one
    physnet could be managed by many NICs.
    
    * introduced unique_keys in neutron.utils.parse_mappings
    * SRIOV_NIC.physical_device_mappings is parsed as dict with lists as
      values with parse_mappings(..., unique_keys=False)
    
    DocImpact
    Change-Id: I07b8682fdfe8389a35893cc662b87c94a00bd4a5
    Closes-Bug: #1558626
    (cherry picked from commit 46ddaf4288a1cac44d8afc0525b4ecb3ae2186a3)

commit 90b9cd334b1b33df933bf1b61b38c6e087c431af
Author: Ihar Hrachyshka <ihrachys@redhat.com>
Date:   Thu Mar 17 16:20:52 2016 +0100

port security: gracefully handle resources with no bindings
    
    Resources could be created before the extension was enabled in the
    setup. In that case, no bindings are created for them. In that case, we
    should gracefully return default (True) value when extracting the value
    using the mixin; and we should also create binding model on update
    request, if there is no existing binding model for the resource.
    
    While at it, introduced a constant to store the default value for port
    security (True) and changed several tests to use the constant instead of
    extracting it from extension resource map.
    
    Change-Id: I8607cdecdc16c5f94635c94e2f02700c732806eb
    Closes-Bug: #1509312
    (cherry picked from commit b0519cf0ada3b3d9b76f84948f9ad3c142fc50be)

commit 7174bc4c2a0a31e9036ec130cb65f91d1e5009a4
Author: venkata anil <anil.venkata@enovance.com>
Date:   Wed Mar 23 15:24:01 2016 +0000

Ignore exception when deleting linux bridge if doesn't exist
    
    Linux bridge is not handling RuntimeError exception when it is trying
    to delete network's bridge, which is deleted in parallel by nova.
    Fullstack test has similar scenario, it creates network's bridge for
    agent and deletes the bridge after the test, like nova.
    
    Linux bridge agent has to ignore RuntimeError exception if the bridge
    doesn't exist.
    
    Closes-bug: #1561040
    Change-Id: I428384fd42181ff6bc33f29369a7ff5ec163b532
    (cherry picked from commit 16b2ffdfd85eece8fb57a98d10bf35ad617d235a)

commit 93d29d131c0234075ac547906f77900ce47cceec
Author: Clayton O'Neill <clayton.oneill@twcable.com>
Date:   Thu Mar 24 14:59:41 2016 +0000

Don't delete br-int to br-tun patch on startup
    
    When starting up, we don't want to delete the patch port between br-int
    and br-tun unless we're also dropping the flows..  In liberty both of
    these bridges were switched to not dump flows on startup and to put the
    bridges in secure mode so that default flood flows are not installed
    when the bridge is created.
    
    Without this patch the patch port is torn down and not reinstalled until
    br-tun is setup again.
    
    Partial-Bug: #1514056
    Change-Id: Ia518a99a2de5d1bda467fde57892c43970f88bcd
    (cherry picked from commit 8dce6a5c873c2c18e5a9c6165bf3974aead02588)

commit 211e0a65a0cf637ab58a20c91791b7eb6b8c8519
Author: Jakub Libosvar <libosvar@redhat.com>
Date:   Mon Feb 29 16:20:05 2016 +0100

functional: Update ref used from ovs branch-2.5.
    
    OVS 2.5.0 has been released, but we need a later commit on branch-2.5 that
    fixes compilation with the latest kernel on ubuntu that backported some
    changes that broke compilation of 2.5.0.
    
    Change-Id: Id70db79a8450d4f0125dd500f7f6ab8d103d98c3
    (cherry picked from commit 59b36ecca8f60996a60cd52f816573ace9b39309)

commit e2676ae8d188286b76f802d605f363e21011841a
Author: Kevin Benton <kevin@benton.pub>
Date:   Fri Mar 25 12:37:12 2016 -0700

DVR: rebind port if ofport changes
    
    When binding is called in DVR, check to see if the port was
    previously wired under a different ofport. If it was, first
    unbind the old port and then bind the new one.
    
    Change-Id: I372158c4a6986295e396d849a2c9c5372b271e08
    Closes-Bug: #1562467
    (cherry picked from commit 4731dbbef1f615b9ce6d18315e8ca9810e8a772d)

commit c6ef57a6d5f0397b1abf865815bd71d40292f482
Author: Jakub Libosvar <libosvar@redhat.com>
Date:   Wed Feb 24 16:34:07 2016 +0000

ovs-fw: Mark conntrack entries invalid if no rule is matched
    
    This patch makes sure that existing connection breaks once security
    group rule that allowed such connection is removed. Due to correctly
    track connections on the same hypervisor, zones were changed from
    per-port to per-network (based on port's vlan tag). This information is
    now stored in register 6. Also there was added a test for RELATED
    connections to avoid marking such connection as invalid by REPLY rules.
    
    Closes-Bug: 1549370
    Change-Id: Ibb5942a980ddd8f2dd7ac328e9559a80c05789bb
    (cherry picked from commit 4f6aa3ffde2fd68b85bc5dfdaf6c2684931f3f61)

commit ef6ea62d5d1fc767a23e3caf2716e76f90d63f03
Author: Andreas Scheuring <andreas.scheuring@de.ibm.com>
Date:   Wed Mar 9 13:56:22 2016 +0100

l3: Send notify on router_create when ext gw is specified
    
    A router that got created with an external gateway specified now
    triggers a notfiy_router_updated event. This triggers scheduling
    of the router and creation of the network namespace on the target
    node.
    
    Change-Id: I7f6ff5edf6a9c5ffa6d8978c1f3de0e106b6a8bb
    Closes-Bug: #1535707
    (cherry picked from commit da00d1a186c55c91887c9546e893f6d075a2c2ad)

commit eb8ddb95bbb56bbdc658e15feebbf7f91d5ddf13
Author: Oleg Bondarev <obondarev@mirantis.com>
Date:   Tue Feb 16 18:03:52 2016 +0300

Move db query to fetch down bindings under try/except
    
    In case of intermittent DB failures router and network auto-rescheduling
    tasks may fail due to error on fetching down bindings from db.
    Need to put this queries under try/except to prevent unexpected exit.
    
    Closes-Bug: #1546110
    Change-Id: Id48e899a5b3d906c6d1da4d03923bdda2681cd92
    (cherry picked from commit b6ec40cbf754de9d189f843cbddfca67d4103ee3)

commit da1eee31057fb44ff758d99ac99eeb47b7caec6e
Author: Alex Oughton <alex.oughton@rackspace.com>
Date:   Fri Mar 18 11:12:10 2016 -0500

Close XenAPI sessions in neutron-rootwrap-xen-dom0
    
    Neutron with XenServer properly doesn't close XenAPI sessions.
    If it creates these sessions so rapidly, the XenServer host eventually
    exceeds its maximum allowed number of connections.
    This patch adds a close process for session.
    
    Closes-Bug: 1558721
    Change-Id: Ida90a970c649745c492c28c41c4a151e4d940aa6
    (cherry picked from commit 9d21b5ad7edbf9ac1fd9254e97f56966f25de8e6)

commit 1d51172babab64176716e723ae84f20e751a2ac3
Author: Kevin Benton <kevin@benton.pub>
Date:   Thu Mar 17 05:19:19 2016 -0700

Watch for 'new' events in ovsdb monitor for ofport
    
    This adjusts the event handling logic in the ovs interface
    monitor to watch for 'new' events as well because this is
    when ofports are assigned to the interface. Previously we
    were only watching for 'insert' events, which would never
    include the ofport so this led to every port being skipped
    initially by the OVS agent and then picked up on the next
    polling interval.
    
    Closes-Bug: #1560464
    Change-Id: I46ac508a839c6db779323d5afb083f3425f96e87
    (cherry picked from commit 62e88623d977d28c113a1e29458621f82d48f6e5)

commit bd3e9c3b1e759d4296cdb9a816a63ef773ea5f63
Author: Stephen Eilert <stephen.eilert@hpe.com>
Date:   Tue Mar 15 13:54:23 2016 -0700

Removes host file contents from DHCP agent logs
    
    The change I3ad7864eeb2f959549ed356a1e34fa18804395cc addressed a
    performance impact by removing logging calls from inside the loop that
    creates the hosts file in memory. However, it also introduced the
    dumping of said hosts file to the logs.
    
    On deployments with a very high number of ports(>5K), this increases the
    load of the node substantially. The problem is worse when a high number
    of port updates is received in a short amount of time.
    
    There is also the administrative burden caused by the sheer amount of
    data being logged.
    
    If the content of the host file is required for debugging purposes, it
    can be found at /opt/stack/data/neutron/dhcp/<net-id>/host
    
    This patch is trivial and only removes the logging of the file contents.
    The 'building'/'done building' logging calls may still provide useful
    information.
    
    Change-Id: Ie176ef123c194c22dca0967a6acfb9a48c959e6c
    Closes-Bug: 1557640
    (cherry picked from commit ed7411fa1c4d6a26cea3a8737b3c29ce6ff64363)

Ihar Hrachyshka (ihar-hrachyshka) on 2016-05-27

tags:

added: neutron-proactive-backport-potential

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-06-03: Fix included in openstack/neutron 9.0.0.0b1

#13

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

Revision history for this message

John Schwarz (jschwarz) wrote on 2016-06-08:

#15

In regards to discussions on whether or not to include this in stable/liberty or not: the race occurs when creating and deleting HA routers, and is especially apparent when the router is the first one a tenant has created. In this case, the resulting race can produce a wide array of effects, such as not creating all the resources and failure to schedule the router to the minimum required HA agents (even though there are more available).

This can happen very easily when running rally's create_and_delete_routers sample task and have been reported to happen on a few large deployments.

This rather-complex patch makes sure an agent is not made aware of routers which are currently ALLOCATING. As a safeguard, during specific sensitive areas of the code (and specifically when scheduling a router), its' status is modified to ALLOCATING.

Revision history for this message

John Schwarz (jschwarz) wrote on 2016-06-08:

#16

The fix for not scheduling to router to the minimum required HA agents is, btw, manually re-creating the agent or manually scheduling it to more agents.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-25: Change abandoned on neutron (stable/liberty)

#17

Change abandoned by John Schwarz (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/305774

Ihar Hrachyshka (ihar-hrachyshka) on 2016-10-07

tags:

removed: neutron-proactive-backport-potential

neutron

L3 Agent's fullsync is raceful with creation of HA router

Bug Description

Other bug subscribers

Remote bug watches