tenant does not see network that is routable from tenant-visible network until neutron-server is restarted

Bug #1254555 reported by Clint Byrum
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Eugene Nikanorov
Havana
Fix Released
High
Akihiro Motoki
Icehouse
Fix Released
Undecided
Unassigned
tripleo
Fix Released
Critical
Unassigned

Bug Description

In TripleO We have a setup script[1] that does this as an admin:

neutron net-create default-net --shared
neutron subnet-create --ip_version 4 --allocation-pool start=10.0.0.2,end=10.255.255.254 --gateway 10.0.0.1 10.0.0.0/8 $ID_OF_default_net
neutron router-create default-router
neutron router-interface-add default-router $ID_OF_10.0.0.0/8_subnet
neutron net-create ext-net --router:external=True
neutron subnet-create ext-net $FLOATING_CIDR --disable-dhcp --alocation-pool start=$FLOATING_START,end=$FLOATING_END
neutron router-gateway-set default-router ext-net

I would then expect that all users will be able to see ext-net using 'neutron net-list' and that they will be able to create floating IPs on ext-net.

As of this commit:

commit c655156b98a0a25568a3745e114a0bae41bc49d1
Merge: 75ac6c1 c66212c
Author: Jenkins <email address hidden>
Date: Sun Nov 24 10:02:04 2013 +0000

    Merge "MidoNet: Added support for the admin_state_up flag"

I see that the ext-net network is not available after I do all of the above router/subnet creation. It does become available to tenants as soon as I restart neutron-server.

[1] https://git.openstack.org/cgit/openstack/tripleo-incubator/tree/scripts/setup-neutron

I can reproduce this at will using the TripleO devtest process on real hardware. I have not yet reproduced on VMs using the 'devtest' workflow.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

looks very similar to https://bugs.launchpad.net/neutron/+bug/1251982
I guess the reasons can be the same. I'm looking into it.

Changed in neutron:
assignee: nobody → Eugene Nikanorov (enikanorov)
Changed in neutron:
importance: Undecided → High
status: New → In Progress
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Eugene I'm pretty sure this is indeed the same as bug 1251982.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-incubator (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/58371

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-incubator (master)

Reviewed: https://review.openstack.org/58371
Committed: http://github.com/openstack/tripleo-incubator/commit/38ac22ce7c4ea4ee9a86e1b6f2c3e5e31c8ba652
Submitter: Jenkins
Branch: master

commit 38ac22ce7c4ea4ee9a86e1b6f2c3e5e31c8ba652
Author: Clint Byrum <email address hidden>
Date: Mon Nov 25 11:59:19 2013 -0800

    Work around neutron floatingip race condition

    Neutron has a problem where neutron-server will only return externally
    visible networks after the service has started sometimes. In order to
    deal with that we are ssh'ing into the controller node and restarting
    neutron-server before trying to create a floatingip. We try for two
    minutes before giving up.

    Change-Id: Ia4e476339a3d73a8c13ca898e0f0b582adc53828
    Related-Bug: #1254555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/58951

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/58951
Committed: http://github.com/openstack/neutron/commit/572ca29a0c8d8c832b2fa45895f4b58fd74d2fba
Submitter: Jenkins
Branch: master

commit 572ca29a0c8d8c832b2fa45895f4b58fd74d2fba
Author: Eugene Nikanorov <email address hidden>
Date: Thu Nov 28 12:46:41 2013 +0400

    Avoid loading policy when processing rpc requests

    When Neutron server is restarted in the environment where multiple agents
    are sending rpc requests to Neutron, it causes loading of policy.json
    before API extensions are loaded. That causes different policy check
    failures later on.
    This patch avoids loading policy when creating a Context in rpc layer.

    Change-Id: I66212baa937ec1457e0d284b5445de5243a8931f
    Partial-Bug: 1254555

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Changing priority to low. The particular symptoms are cured for now, but the issue remains.

Changed in neutron:
importance: High → Low
status: In Progress → Confirmed
Revision history for this message
yong sheng gong (gongysh) wrote :

I reproduced it today:
1. centos 6.5
2. devstack to install trunk openstack codes

. openrc demo demo
$ neutron net-list
+--------------------------------------+------------+--------------------------------------------------+
| id | name | subnets |
+--------------------------------------+------------+--------------------------------------------------+
| 22774f33-2402-4ecf-8231-7c7dd6c41a87 | ys_network | 292ea106-b15e-4c14-aa9b-d9e582e611b1 10.0.2.0/24 |
+--------------------------------------+------------+--------------------------------------------------+

it show no external networks.

Revision history for this message
yong sheng gong (gongysh) wrote :

<salv-orlando> I think enikanorov was working on this. The reason is that context.is_admin uses a policy to validate what it means to be an admin. To do so it loads the policy engine. The policy engine has also a rule for allowing everyone to see external networks. This rule needs to evaluate the router:external field and verify it matches to True (as a boolean value); to do so it uses the converter which is specified in the RESOURCE_ATTRIBUTE_MAP. The
<salv-orlando> are some cases where this happens before all the extensions are loaded, and this would lead to skipping the conversion, with the result that the policy evaluation fails.
<salv-orlando> This happens usually when a plugin does db operations on initialization
<salv-orlando> there are several ways to fix it, and one would be doing the conversion at every evaluation, which is a bit expensive but perhaps negligible.
<salv-orlando> I don't know if enikanorov is still actively working on this bug. You can ask him.
<salv-orlando> gongysh^

Changed in tripleo:
status: Triaged → Fix Released
Akihiro Motoki (amotoki)
tags: added: havana-backport-potential
Revision history for this message
Akihiro Motoki (amotoki) wrote :

I see this bug in stable/havana many times when neutron-server is restarted.

I tested today's latest stable/havana on all-in-one devstack. After stack.sh completes, I restarted neutron-server and then checked external network is visible to regular users with neutron command.

When neutron-server restarted when all other neutron agents are running, I saw this bug 9/10 times.
On the other hand, when neutron-server is restarted after stopping all other neutron agents, this bug did not happen (I restarted neutron-server 10 times).

After the commit 572ca29a0c8d8c832b2fa45895f4b58fd74d2fba is applied,
I didn't see this bug with 10 restarts even when all other neutron agents are running.

According to the bug investigation above, we have more neutron agents, more this bug occurs.
It occurs even on all-in-one devstack on a VM, so I guess this nearly always occurs in a usual deployment.

It affects all plugins with agents and there is a visible impact for users.
It is usual to restart neutron-server while neutron agents are running.
I will changed the priority to High and propose a backport.

Changed in neutron:
importance: Low → High
milestone: none → icehouse-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/71859

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/havana)

Reviewed: https://review.openstack.org/71859
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5f959d76a0969e5d873dbcb6e80c834f4d376a4b
Submitter: Jenkins
Branch: stable/havana

commit 5f959d76a0969e5d873dbcb6e80c834f4d376a4b
Author: Eugene Nikanorov <email address hidden>
Date: Thu Nov 28 12:46:41 2013 +0400

    Avoid loading policy when processing rpc requests

    When Neutron server is restarted in the environment where multiple agents
    are sending rpc requests to Neutron, it causes loading of policy.json
    before API extensions are loaded. That causes different policy check
    failures later on.
    This patch avoids loading policy when creating a Context in rpc layer.

    Change-Id: I66212baa937ec1457e0d284b5445de5243a8931f
    Partial-Bug: 1254555

tags: added: in-stable-havana
Revision history for this message
Alan Pevec (apevec) wrote :

All commits here were with Partial-Bug: or Related-Bug: - what's left to fix or can this be closed?

tags: removed: havana-backport-potential in-stable-havana
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Alan, the root cause of the issue is the ordering of policy loading and extension framework loading. It was not fixed in general, but so far no symptoms were seen after this fix was applied.

So I'm changing status to 'fix committed'

Changed in neutron:
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-3 → 2014.1
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

It seems that we're experiencing this problem yet https://bugzilla.redhat.com/show_bug.cgi?id=1147017 even with patch applied. Ongoing investigation.

Revision history for this message
Brent Eagles (beagles) wrote :

The issue appears to remain. sync_routers in api/rpc/handlers/l3_rpc.py (in Juno - its db/l3_rpc_base.py in Icehouse) creates an admin context. Apparently, this can be called before the extensions are loaded.

Revision history for this message
Brent Eagles (beagles) wrote :

Another possibly relevant detail - on the odd chance that getting RPCs before the extension manager is unexpected - the OVS plugin is in use where we are seeing this and it starts its RPC worker which appears to have the L3 stuff mixed in. Pretty messy. I wonder if this is largely a higher risk scenairo when the legacy Open vSwitch is being used.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/127633

Revision history for this message
Assaf Muller (amuller) wrote :

@Miguel, in comment 15 you linked to a private bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/127633
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=eeff5d06b2099ed9813091926dd8cef58680ad8f
Submitter: Jenkins
Branch: master

commit eeff5d06b2099ed9813091926dd8cef58680ad8f
Author: Brent Eagles <email address hidden>
Date: Fri Oct 10 13:27:51 2014 -0230

    Reset policies after RESOURCE_ATTRIBUTE_MAP is populated

    The REST API relies on neutron-specific policy checking logic that is
    only available after the extensions are loaded and the
    RESOURCE_ATTRIBUTE_MAP is populated. This patch resets the policies
    immediately after these steps are done. This ensures that in the event
    the policies are prematurely loaded for any reason, the on-demand
    loading of the policies will reload the policies and properly configure
    the neutron specific checks on the next policy check.

    Change-Id: Ic2ab3f0179b0c192e63af0bc4268d92aa26bdabe
    Closes-Bug: #1398566
    Related-Bug: #1254555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/juno)

Related fix proposed to branch: stable/juno
Review: https://review.openstack.org/146572

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/icehouse)

Related fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/146603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/146572
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=175e86977d6905e1e11766ca47e30b89c88d8827
Submitter: Jenkins
Branch: stable/juno

commit 175e86977d6905e1e11766ca47e30b89c88d8827
Author: Brent Eagles <email address hidden>
Date: Fri Oct 10 13:27:51 2014 -0230

    Reset policies after RESOURCE_ATTRIBUTE_MAP is populated

    The REST API relies on neutron-specific policy checking logic that is
    only available after the extensions are loaded and the
    RESOURCE_ATTRIBUTE_MAP is populated. This patch resets the policies
    immediately after these steps are done. This ensures that in the event
    the policies are prematurely loaded for any reason, the on-demand
    loading of the policies will reload the policies and properly configure
    the neutron specific checks on the next policy check.

    Change-Id: Ic2ab3f0179b0c192e63af0bc4268d92aa26bdabe
    Closes-Bug: #1398566
    Related-Bug: #1254555
    (cherry picked from commit eeff5d06b2099ed9813091926dd8cef58680ad8f)

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/146603
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c9cb001626e90239b1ffe5583ab9a4349280856c
Submitter: Jenkins
Branch: stable/icehouse

commit c9cb001626e90239b1ffe5583ab9a4349280856c
Author: Brent Eagles <email address hidden>
Date: Fri Oct 10 13:27:51 2014 -0230

    Reset policies after RESOURCE_ATTRIBUTE_MAP is populated

    The REST API relies on neutron-specific policy checking logic that is
    only available after the extensions are loaded and the
    RESOURCE_ATTRIBUTE_MAP is populated. This patch resets the policies
    immediately after these steps are done. This ensures that in the event
    the policies are prematurely loaded for any reason, the on-demand
    loading of the policies will reload the policies and properly configure
    the neutron specific checks on the next policy check.

    Change-Id: Ic2ab3f0179b0c192e63af0bc4268d92aa26bdabe
    Closes-Bug: #1398566
    Related-Bug: #1254555
    (cherry picked from commit eeff5d06b2099ed9813091926dd8cef58680ad8f)

tags: added: in-stable-icehouse
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.