Metadata not reachable when dvr_snat L3 agent is used on compute node

Bug #1817956 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Slawek Kaplonski

Bug Description

In case when L3 agents are deployed on compute nodes in dvr_snat agent mode (that is e.g. in CI jobs) and dvr ha is used it may happen that metadata will not be reachable from instances.

For example, as it is in neutron-tempest-dvr-ha-multinode-full job, we have:

- controller (all in one) with L3 agent in dvr mode,
- compute-1 with L3 agent in dvr_snat mode,
- compute-2 with L3 agent in dvr_snat mode.

Now, if VM will be scheduled e.g. on host compute-2 and it will be connected to dvr+ha router which is scheduled to be Active on compute-1 and standby on compute-2 node, than on compute-2 metadata haproxy will not be spawned and VM will not be able to reach metadata IP.

I found it when I tried to migrate existing legacy neutron-tempest-dvr-ha-multinode-full job to zuulv3. I found that legacy job is in fact "nonHA" job because "l3_ha" option is set there to False and because of that routers are created as nonHA dvr routers.
When I switched it to be dvr+ha in https://review.openstack.org/#/c/633979/ I spotted this error described above.

Example of failed tests http://logs.openstack.org/79/633979/16/check/neutron-tempest-dvr-ha-multinode-full/710fb3d/job-output.txt.gz - all VMs which SSH wasn't possible, can't reach metadata IP.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/639979

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/639979
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6ae228cc2e75504d9a8f35e3480a66707f9d7246
Submitter: Zuul
Branch: master

commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/642393

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/642394

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/642396

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b3

This issue was fixed in the openstack/neutron 14.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/642394
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3658c7155673077d712cd18fb99aa381bea9e843
Submitter: Zuul
Branch: stable/queens

commit 3658c7155673077d712cd18fb99aa381bea9e843
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741
    (cherry picked from commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/642393
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bc828851abca346946be6a2cdf87c0dbe262ea5e
Submitter: Zuul
Branch: stable/rocky

commit bc828851abca346946be6a2cdf87c0dbe262ea5e
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741
    (cherry picked from commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/642396
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5aa1c315fcd904bd66cd07980124d7213a960d26
Submitter: Zuul
Branch: stable/pike

commit 5aa1c315fcd904bd66cd07980124d7213a960d26
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 28 11:35:07 2019 +0100

    Spawn metadata proxy on dvr ha standby routers

    In case when L3 agent is running in dvr_snat mode on compute node,
    it is like that e.g. in some of the gate jobs, it may happen that
    same router is scheduled to be in standby mode on compute node and
    on same compute node there is instance connected to it.
    So in such case metadata proxy needs to be spawned in router namespace
    even if it is in standby mode.

    Conflicts:
        neutron/tests/unit/agent/l3/test_agent.py

    Change-Id: Id646ab2c184c7a1d5ac38286a0162dd37d72df6e
    Closes-Bug: #1817956
    Closes-Bug: #1606741
    (cherry picked from commit 6ae228cc2e75504d9a8f35e3480a66707f9d7246)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.3

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.