500 error on router-gateway-set for DVR on second external network

Bug #1374473 reported by Armando Migliaccio
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Swaminathan Vasudevan
Juno
Fix Released
Undecided
Unassigned

Bug Description

Under some circumstances this operation may fail.

Steps to reproduce:

1) Run Devstack with DVR *on* (devstack by default creates an external network and sets the gateway to the router)
2) Create an external network
3) Create a router
4) Set the gateway to the router
5) Observe the Internal Server Error

Expected outcome: the gateway is correctly set.

This occurs with the latest Juno code. The underlying error is an attempted double binding of the router to the L3 agent.

More details in:

http://paste.openstack.org/show/115614/

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Chuck Carlino (ccarlino)
Changed in neutron:
assignee: nobody → Chuck Carlino (ccarlino)
Chuck Carlino (ccarlino)
Changed in neutron:
assignee: Chuck Carlino (ccarlino) → nobody
Changed in neutron:
assignee: nobody → Andrey Epifanov (aepifanov)
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

This bug seems to be related to Bug #1377241.
It is some sort of timing issue that leads to the DBDuplicate error.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Sorry this is a different issue with the scheduler not being able to handle multiple external networks for DVR routers. This is not related to Bug#1377241

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Armando can we change the Error description to "500 error on gateway set for the Second external network"

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The description does mention that, I'd rather keep a bug title shorter if I can. Let me see if that works out.

summary: - 500 error on router-gateway-set for DVR
+ 500 error on router-gateway-set for DVR on second external network
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/127201

Revision history for this message
Mike Smith (michael-smith6) wrote :

I need more details are needed on steps/config. I can have 2 external networks added to a single dvr agent. I am using 2 routers, each one has 1 external network. It takes special config on the plugin/server and agents. In your setups, are you able to configure a centralized router the same way?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

It sounds like the issue does not manifest itself when you have more than one agent.

I did see the issue in a single host env that, granted, is not representative of any meaningful deployment scenario, but was enough to raise red flags all around.

If you confirm that this is a non-issue, perhaps we can simply mark this as invalid. After talking to Rajeev and Swami, I was under the impression that external networks support for DVR was something that needed some attention

Revision history for this message
Mike Smith (michael-smith6) wrote :

I will test on a single host env and follow up with Rajeev and Swami.

Revision history for this message
Mike Smith (michael-smith6) wrote :

I was able to get 2 external networks (one per router) working in a single node setup as well. I did not see the errors listed above. Both the l3_agent.ini and ml2_conf.ini files need specific config to get this working. An additional bridge needs to be created as well. Without seeing the specific configs that caused the errors, at this point I would have to vote this is a non-bug.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

As mentioned in the bug report, this can be reproduced with the 'default' config for DVR without any custom changes to config or environment.

What specific config/additional bridge are you referring to?

It might not be a code bug, but it might be worth documenting how external networks work with DVR in case something different from the legacy case needs to be done.

Revision history for this message
Mike Smith (michael-smith6) wrote :

I used this link as a guide: http://blog.oddbit.com/2014/05/28/multiple-external-networks-wit/

For one thing, I created a second external bridge for the second external network. The ml2_conf.ini file then has a config to map the physical interfaces to the bridges ('bridge_mappings').

Also, the l3_agent.ini needs to have the 'external_network_bridge' and 'gateway_external_network_id' must be set to blank. Otherwise the agent tries to retrieve the network id from the plugin.

Have you tried doing multiple external networks using the 'default' config for a centralized router? I got different errors when I tried it.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Hi Mike,

Yes, I did try the default (i.e. Devstack plain) configuration with central routers and no complains were raised, as in I was able to set the gateway for the router to the second external network. Now, this might just mask a failure down the line because the L3 agent is not configured correctly, but that is probably a separate issue.

I had a look at the blog you linked, as well as [1] (thanks Miguel!), while I was triaging this bug.

The system is clearly not behaving correctly, perhaps due to a misconfiguration error. But why would the central case failure mode be different from the dvr one? A misconfiguration should be detected, and the right error should be returned. An internal server error is not very nice :)

I think there's some value in addressing the 500 error. Perhaps a 409 might be more appropriate, if we manage to detect what's actually going on?

[1] http://www.ajo.es/post/86497974174/using-multiple-external-networks-in-openstack-neutron

Changed in neutron:
status: In Progress → Confirmed
assignee: Andrey Epifanov (aepifanov) → nobody
assignee: nobody → Eugene Nikanorov (enikanorov)
Revision history for this message
Mike Smith (michael-smith6) wrote :

Thanks to Erick Colnick the issue has been isolated to the sequence/timing of when the gateway-set occurs. If no VM ports are present when gateway-set occurs, then added later, an error occurs. But if the VM ports are added first (like how I was doing the test) there is no problem.

Early on in the DVR development we had similar issues because the notification path is different for when VM ports are added.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Eugene Mike is currently working on this bug. I do see you as a owner of this bug. Can you assign it back to Mike.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Please feel free to reassign the bug to whoever you think is appropriate

Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → Swaminathan Vasudevan (swaminathan-vasudevan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/142674

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/127201
Reason: Last patch update was October 9, abandoning due to the age of this change.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/142674
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3794b4a83e68041e24b715135f0ccf09a5631178
Submitter: Jenkins
Branch: master

commit 3794b4a83e68041e24b715135f0ccf09a5631178
Author: Swaminathan Vasudevan <email address hidden>
Date: Wed Dec 17 17:15:07 2014 -0800

    Fixes Multiple External Networks issue with DVR

    Current L3 agents can support more than one
    external network when configured properly.

    On DVR routers, router-gateway-set was
    returning a 500 error, when two external
    networks were configured in the system.

    The problem resides in the scheduler where the
    bind_router is called twice when the
    reschedule_router is called from update_router.

    The _schedule_router binds the snat
    and the qrouter with the respective agents.

    But after scheduling it does not return agent.

    And in the case of two external networks, the
    get_candidates always returns a valid candidate
    to be processed and hence the bind_router is
    called twice.

    This patch fixes the _schedule_router function
    and hence avoids the multiple calls to
    bind_router.

    This prevents the update_router from failing
    and causing the nested rollback for the
    transactions.

    Change-Id: I24d44c60a3ea5bbc9e3f44aa5191deff315723ca
    Closes-Bug: #1374473

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-3 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/195223

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/195223
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=884013c8cdfe97d297103b0feada849ecad7c113
Submitter: Jenkins
Branch: stable/juno

commit 884013c8cdfe97d297103b0feada849ecad7c113
Author: Swaminathan Vasudevan <email address hidden>
Date: Wed Dec 17 17:15:07 2014 -0800

    Fixes Multiple External Networks issue with DVR

    Current L3 agents can support more than one
    external network when configured properly.

    On DVR routers, router-gateway-set was
    returning a 500 error, when two external
    networks were configured in the system.

    The problem resides in the scheduler where the
    bind_router is called twice when the
    reschedule_router is called from update_router.

    The _schedule_router binds the snat
    and the qrouter with the respective agents.

    But after scheduling it does not return agent.

    And in the case of two external networks, the
    get_candidates always returns a valid candidate
    to be processed and hence the bind_router is
    called twice.

    This patch fixes the _schedule_router function
    and hence avoids the multiple calls to
    bind_router.

    This prevents the update_router from failing
    and causing the nested rollback for the
    transactions.

    Conflicts:
            neutron/tests/unit/test_l3_schedulers.py

    (cherry picked from commit 3794b4a83e68041e24b715135f0ccf09a5631178)
    Change-Id: I24d44c60a3ea5bbc9e3f44aa5191deff315723ca
    Closes-Bug: #1374473

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/199032
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=236e408272bcb9b8e957524864e571b5afdc4623
Submitter: Jenkins
Branch: master

commit 236e408272bcb9b8e957524864e571b5afdc4623
Author: Oleg Bondarev <email address hidden>
Date: Tue Jul 7 12:02:58 2015 +0300

    DVR: fix router scheduling

    Fix scheduling of DVR routers to not stop scheduling once
    csnat portion was scheduled. See bug report for failing
    scenario.

    This partially reverts
    commit 3794b4a83e68041e24b715135f0ccf09a5631178
    and fixes bug 1374473 by moving csnat scheduling
    after general dvr router scheduling, so double binding does
    not happen.

    Closes-Bug: #1472163
    Related-Bug: #1374473
    Change-Id: I57c06e2be732e47b6cce7c724f6b255ea2d8fa32

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (feature/pecan)

Related fix proposed to branch: feature/pecan
Review: https://review.openstack.org/211492

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (feature/pecan)
Download full text (37.3 KiB)

Reviewed: https://review.openstack.org/211492
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a7b91632fc65ab9d2687298c68b1d715866d0356
Submitter: Jenkins
Branch: feature/pecan

commit 966203f89dee8fe61fb2dce654e36e510e80380f
Author: Sukhdev Kapur <email address hidden>
Date: Wed Jul 1 16:30:44 2015 -0700

    Neutron-Ironic integration patch

    This patch is in preparation for the integration
    of Ironic and Neutron. A new vnic_type is being
    added so that ML2 drivers can filter for all
    Ironic ports based upon match for 'baremetal'.
    Nova/Ironic will set this vnic_type when issuing
    port-create request to neutron.
    (e.g. binding:vnic_type = 'baremetal' )

    Change-Id: I25dc9472b31db052719db503a10c1fb1a55572ef
    Partial-Implements: blueprint neutron-ironic-integration

commit 236e408272bcb9b8e957524864e571b5afdc4623
Author: Oleg Bondarev <email address hidden>
Date: Tue Jul 7 12:02:58 2015 +0300

    DVR: fix router scheduling

    Fix scheduling of DVR routers to not stop scheduling once
    csnat portion was scheduled. See bug report for failing
    scenario.

    This partially reverts
    commit 3794b4a83e68041e24b715135f0ccf09a5631178
    and fixes bug 1374473 by moving csnat scheduling
    after general dvr router scheduling, so double binding does
    not happen.

    Closes-Bug: #1472163
    Related-Bug: #1374473
    Change-Id: I57c06e2be732e47b6cce7c724f6b255ea2d8fa32

commit e152f93878b9bb6af7cfedc9e045892fcf7d0615
Author: Assaf Muller <email address hidden>
Date: Sat Aug 8 21:15:03 2015 +0300

    TESTING.rst love

    Change-Id: I64b569048f8f87ea2fe63d861302b4020d36493d

commit 633c52cca1b383af2c900e1663c8682114acd177
Author: sridhargaddam <email address hidden>
Date: Wed Aug 5 10:49:33 2015 +0000

    Avoid dhcp_release for ipv6 addresses

    dhcp_release is only supported for IPv4 addresses [1] and not for
    IPv6 addresses [2]. There will be no effect when it is called with
    IPv6 address. This patch adds a corresponding note and avoids calling
    dhcp_release for IPv6 addresses.

    [1] http://manpages.ubuntu.com/manpages/trusty/man1/dhcp_release.1.html
    [2] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2013q2/007084.html

    Change-Id: I8b8316c9d3d011c2a687a3a1e2a4da5cf1b5d604

commit 2de8fad17402f38bbc30204ee2f4f99cf21cb69d
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Aug 10 06:11:06 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I2b423e83a7d0ac8b23239f81fe33dd8382c6fff6

commit fef79dc7b9162e03c8891645494c115b52d4d014
Author: Henry Gessau <email address hidden>
Date: Mon Aug 3 23:30:34 2015 -0400

    Consistent layout and headings for devref

    The lack of convention for heading levels among the independently
    written devref documents was starting to make the Table of Contents
    look rather messy when rendered in HTML.

    This patch does not cover the "Neutron Internals" section since its
    layo...

tags: added: in-feature-pecan
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/241816

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/241816
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bc43214239a551443ebafe9e9b0d595150a61c51
Submitter: Jenkins
Branch: stable/kilo

commit bc43214239a551443ebafe9e9b0d595150a61c51
Author: Oleg Bondarev <email address hidden>
Date: Tue Jul 7 12:02:58 2015 +0300

    DVR: fix router scheduling

    Fix scheduling of DVR routers to not stop scheduling once
    csnat portion was scheduled. See bug report for failing
    scenario.

    This partially reverts
    commit 3794b4a83e68041e24b715135f0ccf09a5631178
    and fixes bug 1374473 by moving csnat scheduling
    after general dvr router scheduling, so double binding does
    not happen.

    Closes-Bug: #1472163
    Related-Bug: #1374473
    (cherry picked from commit 236e408272bcb9b8e957524864e571b5afdc4623)

    Conflicts:
            neutron/scheduler/l3_agent_scheduler.py
            neutron/tests/unit/extensions/test_agent.py
            neutron/tests/unit/plugins/openvswitch/test_agent_scheduler.py

    Change-Id: I57c06e2be732e47b6cce7c724f6b255ea2d8fa32

tags: added: in-stable-kilo
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.