DVR FloatingIP to unbound allowed_address_pairs does not work

Bug #1445255 reported by Davide Guerri on 2015-04-16
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
High
Swaminathan Vasudevan

Bug Description

I was trying to follow Aaron's guide here: http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/

VRRP is working fine, but with DVR enabled there is no way to get a floatingIP address working with a vIP.

There has been a discussion about this on #openstack-neutron on the 16th of April 2015:

[23:49:26] <kevinbenton> dguerri was trying to follow Aaron's guide here: http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/
[23:49:35] <kevinbenton> and it doesn't work with DVR
[23:50:49] <armax> kevinbenton: ok, but are we sure that’s because of an unbound port?
[23:51:37] <kevinbenton> armax: seems to be
[23:51:56] <kevinbenton> armax: no l3 agent will respond to an ARP request for the floating IP when i try it
[23:52:57] <armax> kevinbenton: ok, now I am with you
[23:53:53] <armax> kevinbenton: in aaron’s case the fip is associated to an unbound port
[23:54:05] <armax> kevinbenton: and yet routing works fine
[23:55:18] <armax> kevinbenton: I don’t think taht for such scenario DVR makes much sense
[23:55:48] <armax> kevinbenton: because if we allowed to have teh FIP namespace to land on the dvr_snat agent
[23:56:02] <armax> kevinbenton: you’re basically back to central routing
[23:56:07] <kevinbenton> armax: right
[23:56:11] <armax> kevinbenton: am I making any sense?
[23:56:29] <armax> kevinbenton: I am not saying that lack of VRRP support is nice
[23:56:37] <armax> kevinbenton: I am tryign to wrap my head around this
[23:56:49] <kevinbenton> armax: i was thinking maybe there was some fallback logic where the SNAT one would host a floating IP if there wasn't another l3 agent that could handle it
[23:57:16] <kevinbenton> armax: for example if one of the compute nodes wasn't running the l3 agent
[23:57:35] <kevinbenton> armax: it would be the same scenario
[23:57:37] <kevinbenton> armax: right?

Changed in neutron:
status: New → Confirmed
importance: Undecided → Low

I think this is borderline between low and wishlist, but I can see why we'd want to keep the feature parity with centralized routing.

summary: - FloatingIP and allowed_address_pairs won't work with DVR
+ DVR FloatingIP to unbound port does not work
Davide Guerri (davide-guerri) wrote :

I am not sure about the priority.
Having FloatingIP to unbound port not working with DVR, makes Allowed Address Pairs extension completely pointless with DVR.

Allowed address pair by itself is not tied to the DVR or floatingIP.

With DVR floatingIP traffic in/out is through the compute Node. So basically we create routers and assign floatingIP based on the port binding and as well as the "device_owner" of the port.

In order to support this scenario, either we need to manually set the "device_owner" for the port if intended to use with the FloatingIP on a DVR router. Once the "device_owner" is set, it has to be updated to the list of "DVR" serviceable ports.

Because today we do have a list of "DVR serviceable ports" such as "compute, lbaas-vip and dhcp". We might need to add this special port to that list in order for DVR to honor it.

Also this port should be binded to a host, so that we know where to deploy our FloatingIP namespace.

This is all happening because of the dynamic nature of the DVR in creating and deleting the FIP Namesapces.

Davide Guerri (davide-guerri) wrote :

Agreed about allowed address pair.
Nevertheless, I can't find a use case for it without binding a floating IP to the address used with allowed address pair

Fix proposed to branch: master
Review: https://review.openstack.org/175749

Changed in neutron:
assignee: nobody → yalei wang (yalei-wang)
status: Confirmed → In Progress

Change abandoned by yalei wang (<email address hidden>) on branch: master
Review: https://review.openstack.org/175749
Reason: bind to wrong bug id

yalei wang (yalei-wang) on 2015-04-21
Changed in neutron:
assignee: yalei wang (yalei-wang) → nobody
Changed in neutron:
assignee: nobody → Oleg Bondarev (obondarev)

I'd like to add that probably the bigger issue here is that floating ips won't work anymore on the compute node in the following case:

1) floating ip is associated with an unbound port
2) a VM is booted with that port
3) this VM is the first VM on a compute node

So I think we should consider raising the priority.

Copying detailed description from the duplicated bug:

Floating agent gw port is only created for compute host when floating ip is associated with a VM resided on this host [1].
If associate neutron port with floating ip before booting a VM with that port, floating agent gw port won't be created (in case this is the first VM scheduled to a compute host).
In that case l3 agent on compute host will receive router info with floating ip but no floating agent gw port: it will subscribe the router for fip namespace [2] but namespace itself won't be created [3]:
 [dvr_router.py]

    def create_dvr_fip_interfaces(self, ex_gw_port):
        floating_ips = self.get_floating_ips()
        fip_agent_port = self.get_floating_agent_gw_interface(
            ex_gw_port['network_id'])
        LOG.debug("FloatingIP agent gateway port received from the plugin: "
                  "%s", fip_agent_port)
        if floating_ips:
            is_first = self.fip_ns.subscribe(self.router_id)
            if is_first and fip_agent_port:
                if 'subnets' not in fip_agent_port:
                    LOG.error(_LE('Missing subnet/agent_gateway_port'))
                else:
                    self.fip_ns.create_gateway_port(fip_agent_port)
        ...

Since l3 agent already subscribed the router for fip_ns it won't ever create fip namespace for that router - this results in floating ips not working anymore for ANY subsequent VMs on that compute host, no matter if floating ip was associated with a VM or with a non-binded port (later associated with a VM).

I see two possible fixes:
 - add callback for PORT UDATE event to dvr server code to react on port with floating ip being associated with a VM.
This seems not optimal given lots of checks needed in the callback which will be called fairly often.

 - l3 agent on a compute host should request floating agent gw creation by rpc in case it receives router info with floating ips but no floating agent gateway. There is already such a method in agent to plugin rpc interface which now seems not used anywhere except tests. I'm not seeing any cons here so that's what I'm going to propose.

[1] https://github.com/openstack/neutron/blob/master/neutron/db/l3_dvr_db.py#L214-L225
[2] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_router.py#L502
[3] https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_router.py#L503-L507

Changed in neutron:
importance: Low → High

Oleg in this case we have two different problems.

1. The option to create a FIP namespace when portbinding occurs will solve one of the problem if the private port is intended for the VMs.
2. The other issue that we have is if the ports are not going to be bound for thier life time. ( Handling this case may be complex). This is the one where you create a port and assign a floatingIP and use that port for the "allow-address-pair" for the purpose of running VRRP on that port.

I was working on that fix to associate the FIP namespace to the "dvr_snat" node by default if the port binding is empty. But still it causes issues when I try to disassociate a FIP.
So I need to manually bind this port to the "dvr_snat" host and then configure the FIP namespace, that would be the right way to do.
But I am still looking into the options of "how" to forcefully bind the host.

Changed in neutron:
importance: High → Low
Oleg Bondarev (obondarev) wrote :

https://review.openstack.org/#/c/177507 is exactly what I was going to propose to fix bug 1447034

Changed in neutron:
assignee: Oleg Bondarev (obondarev) → nobody

Fix proposed to branch: master
Review: https://review.openstack.org/254439

Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
summary: - DVR FloatingIP to unbound port does not work
+ DVR FloatingIP to unbound allowed_address_pairs does not work
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan)
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan)

Folks,
We have been dissuading our customers from turning on DVR as it breaks floating-ip and inter-subnet traffic for allowed-address-pair IPs. We depend on allowed-address-pairs for high availability and scalability. We are hoping that the proposed bug fix (that has been +2ed) could make it to Mitaka.

Yes this has been in the pipeline for a while.

Reviewed: https://review.openstack.org/254439
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6185a09d130edb7a21e21a354b3fa12fcbebe8a6
Submitter: Jenkins
Branch: master

commit 6185a09d130edb7a21e21a354b3fa12fcbebe8a6
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Dec 4 16:44:44 2015 -0800

    DVR: Handle unbound allowed_address_pair port with FIP

    If an allowed_address_pair port associated with a FloatingIP
    is configured to a service_port, the allowed_address_pair port
    should inherit the service_ports host binding and device
    owner if device_owner is not configured.

    Hence the DVR will be able to deploy the FloatingIP for
    the provided allowed_address_pair.

    In this case if the associated port's admin state changes,
    the allowed_address_pairs device_owner and host binding will
    be reverted back to None.

    When associated service port is deleted the allowed_address_
    pairs device_owner and host binding will be reverted as well.

    Change-Id: I32b8d3e85a8e12fc146c419766596fcfb61f32f6
    Closes-Bug: #1445255

Changed in neutron:
status: In Progress → Fix Released
tags: added: mitaka-backport-potential

Should we raise priority to Normal here?

tags: added: liberty-backport-potential
Tom Verdaat (tom-verdaat) wrote :

@Swaminathan any progress on the Mitaka backport? You mentioned an issue with the patch but that was 30 days ago.

Also any chance you could look at the Liberty backport? Looks like Nate's patch needs some attention but he doesn't seem to be getting any assistance :-(

Hi Tom,
Yes I do have a couple of patches in the review queue and once merges I will try to backport all the three patches.

Thanks
Swami

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Tom Verdaat
Sent: Saturday, April 30, 2016 3:25 AM
To: <email address hidden>
Subject: [Bug 1445255] Re: DVR FloatingIP to unbound allowed_address_pairs does not work

@Swaminathan any progress on the Mitaka backport? You mentioned an issue with the patch but that was 30 days ago.

Also any chance you could look at the Liberty backport? Looks like Nate's patch needs some attention but he doesn't seem to be getting any assistance :-(

--
You received this bug notification because you are a bug assignee.
https://bugs.launchpad.net/bugs/1445255

Title:
  DVR FloatingIP to unbound allowed_address_pairs does not work

Status in neutron:
  Fix Released

Bug description:
  I was trying to follow Aaron's guide here: http://blog.aaronorosen.com
  /implementing-high-availability-instances-with-neutron-using-vrrp/

  VRRP is working fine, but with DVR enabled there is no way to get a
  floatingIP address working with a vIP.

  There has been a discussion about this on #openstack-neutron on the
  16th of April 2015:

  [23:49:26] <kevinbenton> dguerri was trying to follow Aaron's guide here: http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/
  [23:49:35] <kevinbenton> and it doesn't work with DVR
  [23:50:49] <armax> kevinbenton: ok, but are we sure that’s because of an unbound port?
  [23:51:37] <kevinbenton> armax: seems to be
  [23:51:56] <kevinbenton> armax: no l3 agent will respond to an ARP request for the floating IP when i try it
  [23:52:57] <armax> kevinbenton: ok, now I am with you
  [23:53:53] <armax> kevinbenton: in aaron’s case the fip is associated to an unbound port
  [23:54:05] <armax> kevinbenton: and yet routing works fine
  [23:55:18] <armax> kevinbenton: I don’t think taht for such scenario DVR makes much sense
  [23:55:48] <armax> kevinbenton: because if we allowed to have teh FIP namespace to land on the dvr_snat agent
  [23:56:02] <armax> kevinbenton: you’re basically back to central routing
  [23:56:07] <kevinbenton> armax: right
  [23:56:11] <armax> kevinbenton: am I making any sense?
  [23:56:29] <armax> kevinbenton: I am not saying that lack of VRRP support is nice
  [23:56:37] <armax> kevinbenton: I am tryign to wrap my head around this
  [23:56:49] <kevinbenton> armax: i was thinking maybe there was some fallback logic where the SNAT one would host a floating IP if there wasn't another l3 agent that could handle it
  [23:57:16] <kevinbenton> armax: for example if one of the compute nodes wasn't running the l3 agent
  [23:57:35] <kevinbenton> armax: it would be the same scenario
  [23:57:37] <kevinbenton> armax: right?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1445255/+subscriptions

Reviewed: https://review.openstack.org/295579
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=41e0fcd63348141e538bad76470c29d4dd7508b6
Submitter: Jenkins
Branch: stable/mitaka

commit 41e0fcd63348141e538bad76470c29d4dd7508b6
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Dec 4 16:44:44 2015 -0800

    DVR: Handle unbound allowed_address_pair port with FIP

    If an allowed_address_pair port associated with a FloatingIP
    is configured to a service_port, the allowed_address_pair port
    should inherit the service_ports host binding and device
    owner if device_owner is not configured.

    Hence the DVR will be able to deploy the FloatingIP for
    the provided allowed_address_pair.

    In this case if the associated port's admin state changes,
    the allowed_address_pairs device_owner and host binding will
    be reverted back to None.

    When associated service port is deleted the allowed_address_
    pairs device_owner and host binding will be reverted as well.

    Change-Id: I32b8d3e85a8e12fc146c419766596fcfb61f32f6
    Closes-Bug: #1445255
    (cherry picked from commit 6185a09d130edb7a21e21a354b3fa12fcbebe8a6)

tags: added: in-stable-mitaka

Change abandoned by Nate Johnston (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/299620
Reason: The illustrious Swaminathan Vasudevan has already solved this issue.

I would recommend that this fix backported to liberty since there are still customers impacted by this bug.

This issue was fixed in the openstack/neutron 8.1.1 release.

Tom Verdaat (tom-verdaat) wrote :

+1 for backporting this fix to Liberty. We are impacted by it.

Given these kinds of blocking bugs still present for Liberty after being released over 7 months ago, we're (like most production clouds probably) holding off on Mitaka until there is a less buggy maintenance release version available...

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

Raising the priority since the bug makes compute node FIP connectivity completely broken.

Changed in neutron:
importance: Low → High

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/318276
Reason: Wrong Change-Id. Superseded by https://review.openstack.org/327017

Reviewed: https://review.openstack.org/327017
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=88fbfa26dea57c91b38510d6129e4aff71c0d80e
Submitter: Jenkins
Branch: stable/liberty

commit 88fbfa26dea57c91b38510d6129e4aff71c0d80e
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Dec 4 16:44:44 2015 -0800

    DVR: Handle unbound allowed_address_pair port with FIP

    If an allowed_address_pair port associated with a FloatingIP
    is configured to a service_port, the allowed_address_pair port
    should inherit the service_ports host binding and device
    owner if device_owner is not configured.

    Hence the DVR will be able to deploy the FloatingIP for
    the provided allowed_address_pair.

    In this case if the associated port's admin state changes,
    the allowed_address_pairs device_owner and host binding will
    be reverted back to None.

    When associated service port is deleted the allowed_address_
    pairs device_owner and host binding will be reverted as well.

    Closes-Bug: #1445255
    (cherry picked from commit 6185a09d130edb7a21e21a354b3fa12fcbebe8a6)

    Conflicts:
     neutron/tests/functional/services/l3_router/test_l3_dvr_router_plugin.py
     neutron/tests/unit/scheduler/test_l3_agent_scheduler.py

    Change-Id: I32b8d3e85a8e12fc146c419766596fcfb61f32f6

tags: added: in-stable-liberty

This issue was fixed in the openstack/neutron 7.1.2 release.

tags: removed: liberty-backport-potential mitaka-backport-potential
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers