[SRU] Metadata service for instances is unavailable when the l3-agent on the compute host is dvr_snat mode

Bug #1606741 reported by Zhixin Li on 2016-07-27
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Status tracked in Stein
Queens
High
Unassigned
Rocky
High
Unassigned
Stein
Undecided
Unassigned
neutron
Medium
Slawek Kaplonski
neutron (Ubuntu)
Status tracked in Eoan
Bionic
High
Unassigned
Cosmic
High
Unassigned
Disco
High
Unassigned
Eoan
High
Unassigned

Bug Description

[Impact]
Currently if you deploy Openstack with dvr and l3ha enabled (and > 1 compute host) only instances that are booted on the compute host that is running the VR master will have access to metadata. This patch ensures that both master and slave VRs have an associated haproxy ns-metadata proccess running local to the compute host.

[Test Case]
* deploy Openstack with dvr and l3ha enabled with 2 compute hosts
* create an ubuntu instance on each compute hosts
* check that both are able to access the metadata api (i.e. cloud-init completes successfully)
* verify that there is an ns-metadata haproxy process running on each compute host

[Regression Potential]
None anticipated

=============================================================================

In my mitaka environment, there are five nodes here, including controller, network1, network2, computer1, computer2 node. I start l3-agents with dvr_snat mode in all network and compute nodes and set enable_metadata_proxy to true in l3-agent.ini. It works well for most neutron services unless the metadata proxy service. When I run command "curl http://169.254.169.254" in an instance booting from cirros, it returns "curl: couldn't connect to host" and the instance can't fetch metadata in its first booting.

* Pre-conditions: start l3-agent with dvr_snat mode in all computer and network nodes and set enable_metadata_proxy to true in l3-agent.ini.

* Step-by-step reproduction steps:
    1.create a network and a subnet under this network;
    2.create a router;
    3.add the subnet to the router
    4.create an instance with cirros (or other images) on this subnet
    5.open the console for this instance and run command 'curl http://169.254.169.254' in bash, waiting for result.

* Expected output: this command should return the true metadata info with the command 'curl http://169.254.169.254'

* Actual output: the command actually returns "curl: couldn't connect to host"

* Version:
  ** Mitaka
  ** All hosts are centos7

Zhixin Li (lizhixin) on 2016-07-27
tags: added: l3-dvr-backlog
tags: added: l3-bgp
description: updated
description: updated
Zhixin Li (lizhixin) on 2016-07-27
Changed in neutron:
assignee: nobody → Zhixin Li (lizhixin)
description: updated
Zhixin Li (lizhixin) on 2016-07-27
tags: added: l3-ha
removed: l3-bgp
Zhixin Li (lizhixin) wrote :

I found that if a dvr_snat-mode l3-agent isn't a master for a certain router, the l3-agent will destroy the metadata-proxy process in the router' namespace. Therefore, when a l3-agent on a compute node is started with dvr_snat mode, the instances on this compute node can't access the metadata proxy service and fails to run command 'curl http://169.254.169.254'. So If dvr_snat-mode l3-agents on all compute nodes don't destroy metadata-proxy processes in the standby router namespace, the instances on compute nodes can fetch metadata.

Changed in neutron:
importance: Undecided → High

I am not sure why this we are trying to run dvr_snat mode agent on all compute nodes. Is it just for dvr_snat HA or is there a specific use case here.

Zhixin Li (lizhixin) wrote :

Hi, Swaminathan:
   Honestly, I am not so agree running a dvr_snat-mode l3-agent on a compute node. But I still consider it as a bug.
   First, we permit a l3-agent runs in a compute node with dvr_snat and it works well for most services unless metadata proxy, so we should make metadata proxy service available as it may.
   Second, many developers like building openstack test environment with all-in-one mode for a convenience. If we run dvr_snat l3-agent on compute node (all-in-one), we can simplify our deployment in building our l3-ha and dvr environment, otherwise we should add 2 extra network nodes, which is wasteful in time and compute resource.
   Finally, should we separate network and computer nodes so strictly? We can also enable snat in a computer-node l3-agent, like dragonflow. In that case, we can have more ha routers and average snat traffics within each nodes, which may be useful in a small-scale environment.
   Maybe running dvr_snat mode in a compute-node l3-agent is a terrible thought in technological, but I think it is useful in some case and there may be someone use it with some reason. Therefore, I think it's necessary to fix this bug.

Cheers.

Changed in neutron:
assignee: Zhixin Li (lizhixin) → Dongcan Ye (hellochosen)
status: New → In Progress
Changed in neutron:
assignee: Dongcan Ye (hellochosen) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Zhixin Li (lizhixin)
Zhixin Li (lizhixin) on 2016-09-30
Changed in neutron:
assignee: Zhixin Li (lizhixin) → nobody

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/352686
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
importance: High → Medium

Perhaps one of the reasons the patch for this bug was abandoned - metadata proxy overhead - is addressed now that we use haproxy? So maybe you can update if you still feel it's necessary.

Changed in neutron:
status: In Progress → Confirmed
status: Confirmed → Won't Fix
Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
status: Won't Fix → In Progress

Reviewed: https://review.openstack.org/639979
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6ae228cc2e75504d9a8f35e3480a66707f9d7246
Submitter: Zuul
Branch: master

commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741

Changed in neutron:
status: In Progress → Fix Released

This issue was fixed in the openstack/neutron 14.0.0.0b3 development milestone.

Reviewed: https://review.openstack.org/642394
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3658c7155673077d712cd18fb99aa381bea9e843
Submitter: Zuul
Branch: stable/queens

commit 3658c7155673077d712cd18fb99aa381bea9e843
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741
    (cherry picked from commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246)

tags: added: in-stable-queens

Reviewed: https://review.openstack.org/642393
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bc828851abca346946be6a2cdf87c0dbe262ea5e
Submitter: Zuul
Branch: stable/rocky

commit bc828851abca346946be6a2cdf87c0dbe262ea5e
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741
    (cherry picked from commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246)

tags: added: in-stable-rocky

Reviewed: https://review.openstack.org/642396
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5aa1c315fcd904bd66cd07980124d7213a960d26
Submitter: Zuul
Branch: stable/pike

commit 5aa1c315fcd904bd66cd07980124d7213a960d26
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Conflicts:
        neutron/tests/unit/agent/l3/test_agent.py

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741
    (cherry picked from commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246)

tags: added: in-stable-pike

This issue was fixed in the openstack/neutron 11.0.7 release.

This issue was fixed in the openstack/neutron 13.0.3 release.

This issue was fixed in the openstack/neutron 12.0.6 release.

description: updated
tags: added: sts-sru-needed
summary: - Metadata service for instances is unavailable when the l3-agent on the
- compute host is dvr_snat mode
+ [SRU] Metadata service for instances is unavailable when the l3-agent on
+ the compute host is dvr_snat mode
Edward Hope-Morley (hopem) wrote :
Changed in neutron (Ubuntu Eoan):
importance: Undecided → High
status: New → Fix Released
Changed in neutron (Ubuntu Disco):
importance: Undecided → High
status: New → Fix Released
Changed in neutron (Ubuntu Cosmic):
importance: Undecided → High
status: New → Triaged
Changed in neutron (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Corey Bryant (corey.bryant) wrote :

A patched version of the ubuntu neutron package has been uploaded to the cosmic unapproved queue [1] where it is awaiting SRU team review.

[1] https://launchpad.net/ubuntu/cosmic/+queue?queue_state=1&queue_text=neutron

Hello Zhixin, or anyone else affected,

Accepted neutron into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:13.0.2-0ubuntu3.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Cosmic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Edward Hope-Morley (hopem) wrote :

cosmic-proposed verified using [Test Case].

tags: added: verification-done verification-done-cosmic
removed: verification-needed verification-needed-cosmic
Corey Bryant (corey.bryant) wrote :

Hello Zhixin, or anyone else affected,

Accepted neutron into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:rocky-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-rocky-needed
Corey Bryant (corey.bryant) wrote :

A patched version of the ubuntu neutron package has been uploaded to the bionic unapproved queue [1] where it is awaiting SRU team review.

[1] https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=neutron

Edward Hope-Morley (hopem) wrote :

rocky-proposed verified using [Test Case].

tags: added: verification-rocky-done
removed: verification-rocky-needed

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in neutron (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
removed: verification-done

Hello Zhixin, or anyone else affected,

Accepted neutron into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:12.0.5-0ubuntu5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:13.0.2-0ubuntu3.2

---------------
neutron (2:13.0.2-0ubuntu3.2) cosmic; urgency=medium

  * Backport fix for dvr+l3ha metadata service not available
    - d/p/Spawn-metadata-proxy-on-dvr-ha-standby-routers.patch (LP: #1606741)

 -- Edward Hope-Morley <email address hidden> Fri, 10 May 2019 17:24:31 +0100

Changed in neutron (Ubuntu Cosmic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:13.0.2-0ubuntu3.2~cloud0
---------------

 neutron (2:13.0.2-0ubuntu3.2~cloud0) bionic-rocky; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:13.0.2-0ubuntu3.2) cosmic; urgency=medium
 .
   * Backport fix for dvr+l3ha metadata service not available
     - d/p/Spawn-metadata-proxy-on-dvr-ha-standby-routers.patch (LP: #1606741)

Hello Zhixin, or anyone else affected,

Accepted neutron into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Edward Hope-Morley (hopem) wrote :

Bionic Queens verified using [Test Case]

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:12.0.5-0ubuntu5

---------------
neutron (2:12.0.5-0ubuntu5) bionic; urgency=medium

  * Backport fix for dvr+l3ha metadata service not available
    - d/p/Spawn-metadata-proxy-on-dvr-ha-standby-routers.patch (LP: #1606741)

 -- Corey Bryant <email address hidden> Mon, 13 May 2019 14:55:41 -0400

Changed in neutron (Ubuntu Bionic):
status: Fix Committed → Fix Released
Edward Hope-Morley (hopem) wrote :

Xenial Queens (UCA) verified using [Test Case]

tags: added: sts-sru-done verification-done verification-queens-done
removed: sts-sru-needed verification-needed verification-queens-needed

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:12.0.5-0ubuntu5~cloud0
---------------

 neutron (2:12.0.5-0ubuntu5~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:12.0.5-0ubuntu5) bionic; urgency=medium
 .
   * Backport fix for dvr+l3ha metadata service not available
     - d/p/Spawn-metadata-proxy-on-dvr-ha-standby-routers.patch (LP: #1606741)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers