metadata server unreachable with provider networking only

Bug #1831935 reported by Jeff Hillman
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charm Helpers
Fix Released
High
David Ames
OpenStack Neutron Gateway Charm
Fix Released
High
David Ames
OpenStack Neutron Open vSwitch Charm
Fix Released
High
David Ames

Bug Description

In the scenario where their is no Nuetron Gateway, we're only using provider networking. Specifically VLAN provider networking.

The one network and subnet created look as follows:

 openstack network show mgmt-1
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | zone1, zone2, zone3 |
| availability_zones | zone1, zone2 |
| created_at | 2019-06-06T17:31:19Z |
| description | |
| dns_domain | |
| id | 724aef7a-54a2-4daf-9aa3-98f008215b55 |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | None |
| is_vlan_transparent | None |
| mtu | 9000 |
| name | mgmt-1 |
| port_security_enabled | True |
| project_id | 143294d60ce54454b451214026857bc9 |
| provider:network_type | vlan |
| provider:physical_network | physnet1 |
| provider:segmentation_id | 1030 |
| qos_policy_id | None |
| revision_number | 3 |
| router:external | Internal |
| segments | None |
| shared | False |
| status | ACTIVE |
| subnets | 5d0cf549-4bca-410d-8514-90b805276324 |
| tags | |
| updated_at | 2019-06-06T17:31:20Z |
+---------------------------+--------------------------------------+

$ openstack subnet show mgmt-1
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 10.243.160.10-10.243.160.100 |
| cidr | 10.243.160.0/24 |
| created_at | 2019-06-06T17:31:20Z |
| description | |
| dns_nameservers | |
| enable_dhcp | True |
| gateway_ip | 10.243.160.254 |
| host_routes | |
| id | 5d0cf549-4bca-410d-8514-90b805276324 |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | mgmt-1 |
| network_id | 724aef7a-54a2-4daf-9aa3-98f008215b55 |
| project_id | 143294d60ce54454b451214026857bc9 |
| revision_number | 0 |
| segment_id | None |
| service_types | |
| subnetpool_id | None |
| tags | |
| updated_at | 2019-06-06T17:31:20Z |
+-------------------+--------------------------------------+

neutron-openvswitch has enable-local-dhcp-and-metadata set to True. This is verified with:

$ openstack network agent list | egrep -i 'dhcp|meta'
| 08586252-fe88-48e7-afd8-f95a3500dee1 | Metadata agent | compute15 | None | :-) | UP | neutron-metadata-agent |
| 09f469c9-0f02-4e20-9725-3e98120ae704 | DHCP agent | compute1 | zone1 | :-) | UP | neutron-dhcp-agent |
| 0a9bc81d-ff79-43a8-b279-ef22290d36d7 | Metadata agent | compute3 | None | :-) | UP | neutron-metadata-agent |
| 0c0ecf6a-aa56-4095-9fff-4f22ab99d00c | Metadata agent | compute5 | None | :-) | UP | neutron-metadata-agent |
| 170fb80a-f782-4aa7-8ac6-750d00ec0125 | Metadata agent | compute14 | None | :-) | UP | neutron-metadata-agent |
| 1e0f1511-11a4-4e43-bdca-722bde6b6c11 | DHCP agent | compute3 | zone1 | :-) | UP | neutron-dhcp-agent |
| 22a110e4-f928-4388-87ca-e8ebc9f15554 | DHCP agent | compute2 | zone1 | :-) | UP | neutron-dhcp-agent |
| 236387f3-3f24-4075-9cb7-e12280d36438 | DHCP agent | compute5 | zone1 | :-) | UP | neutron-dhcp-agent |
| 2d643308-cd4d-4d07-a54a-7ab99167c1b0 | Metadata agent | compute6 | None | :-) | UP | neutron-metadata-agent |
| 2f35c4bd-cf44-4187-920a-db5beeab05cf | DHCP agent | compute14 | zone3 | :-) | UP | neutron-dhcp-agent |
| 3db2dcbf-bdea-444a-afc6-75b49f0132a8 | DHCP agent | compute12 | zone3 | :-) | UP | neutron-dhcp-agent |
| 44a09595-eded-46eb-9d39-036870731433 | Metadata agent | compute8 | None | :-) | UP | neutron-metadata-agent |
| 44e02602-f0e3-45f0-9887-9739aaaef3de | DHCP agent | compute8 | zone2 | :-) | UP | neutron-dhcp-agent |
| 468e7408-8af1-433c-a77b-e474bc16f9f9 | Metadata agent | compute13 | None | :-) | UP | neutron-metadata-agent |
| 51ba61b8-72b5-49cb-a33e-17afe72a4a0b | DHCP agent | compute6 | zone2 | :-) | UP | neutron-dhcp-agent |
| 56f99f61-3a6f-48d2-931d-6128d0877486 | DHCP agent | compute10 | zone2 | :-) | UP | neutron-dhcp-agent |
| 61df9ce0-7ad7-4955-8d59-88fe45b24ff7 | DHCP agent | compute15 | zone3 | :-) | UP | neutron-dhcp-agent |
| 6897a6d1-8e56-4958-b300-9642fd895ad0 | Metadata agent | compute10 | None | :-) | UP | neutron-metadata-agent |
| 6ab71669-379a-44ab-a804-0805da23630e | Metadata agent | compute7 | None | :-) | UP | neutron-metadata-agent |
| 6ba578ee-ed55-4930-af7e-3c3dade6db79 | Metadata agent | compute12 | None | :-) | UP | neutron-metadata-agent |
| 7feaf689-05b7-483a-a812-3ffb129717ee | DHCP agent | compute4 | zone1 | :-) | UP | neutron-dhcp-agent |
| 80e7c6d1-9748-4803-8e54-1b7df807cb69 | Metadata agent | compute11 | None | :-) | UP | neutron-metadata-agent |
| a8a70e20-910b-4f5c-8414-01d4dcac42cf | Metadata agent | compute9 | None | :-) | UP | neutron-metadata-agent |
| b6a751e7-c86a-4347-8f5b-079fd019fcaf | DHCP agent | compute7 | zone2 | :-) | UP | neutron-dhcp-agent |
| bd95e896-496f-4b37-84bc-9e916a1cd313 | Metadata agent | compute4 | None | :-) | UP | neutron-metadata-agent |
| d5ec415a-f8c9-43b0-b128-0f07da1b2625 | Metadata agent | compute1 | None | :-) | UP | neutron-metadata-agent |
| dba5e158-6222-4cba-b9d3-19165918261b | DHCP agent | compute11 | zone3 | :-) | UP | neutron-dhcp-agent |
| f4640a2e-fc7c-415c-8f3d-aa0c68f6110c | Metadata agent | compute2 | None | :-) | UP | neutron-metadata-agent |
| fdde9c2b-2352-4876-8598-ef877eda724d | DHCP agent | compute13 | zone3 | :-) | UP | neutron-dhcp-agent |
| ff340680-8eff-4cb3-8685-36a9b0459141 | DHCP agent | compute9 | zone2 | :-) | UP | neutron-dhcp-agent |

The instance, when booting is giving the message:

[WARNING]: No active metadata service found

If i create an instance using config-drive (ignoring metadata server), it starts up fine, and has a routing table with 169.254.169.254 via one of the qdhcp namespaces.

Inside of this instance i can ping both the IP of the namespace and the 169.254.169.254 address, but I cannot curl it.

Inside of the namespace if I curl http://169.254.169.254/ I get the following:

---

# curl http://169.254.169.254
<html>
 <head>
  <title>404 Not Found</title>
 </head>
 <body>
  <h1>404 Not Found</h1>
  The resource could not be found.<br /><br />

 </body>

---

Doing some googling, I found a suggestion of verify that the following rule existed in the iptables of the namespace:

-A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8775

I have no such rule, but this is likely because there is no NGW in their environment.

It is also worth noting, that in the console output of the instance, it never attempts to call http://169.254.169.254

Bundle can be found at:

Console log of last boot found at:

https://pastebin.canonical.com/p/Jyk22sfGT8/

Tags: cpe-onsite
Revision history for this message
Jeff Hillman (jhillman) wrote :

Doing some further testing, port 8775 is not open int he namespace. But it is open on the compute host.

a curl to http://<compute-with-namespace>:8775 provides:

1.0
2007-01-19
2007-03-01
2007-08-29
2007-10-10
2007-12-15
2008-02-01
2008-09-01
2009-04-04
latest

So there's a rule missing or a service not running to allow reaching the metadata running on the compute host that holds the namespace

Revision history for this message
Jeff Hillman (jhillman) wrote :

as a test i swtiched firewall-driver to iptables_hybrid from openvswitch....no effect.

the main concern, I think, is that port 8775 is not open in the namespace.

Revision history for this message
Jeff Hillman (jhillman) wrote :

Enabling dvr on neutron-api gives us a ns-metadata-proxy service running on the copute host that wasn't there also.

This process has opened port 80 in the namespace.

From within the namespace, if we curl against the namespace IP, we get metadata, if we curl against the 169.254.169.254, we get nothing.

Curling to either address from the instance fails (hngs)

Revision history for this message
Jeff Hillman (jhillman) wrote :

Tried enabling allow-automatic-l3agent-failover both with and with dvr, no change

Revision history for this message
Frode Nordahl (fnordahl) wrote :

Have you tried to enable this ``neutron-gateway`` configuration option [0]?

0: https://jaas.ai/neutron-gateway/262#charm-config-enable-isolated-metadata

information type: Public → Private
Frode Nordahl (fnordahl)
information type: Private → Public
information type: Public → Private
Frode Nordahl (fnordahl)
description: updated
information type: Private → Public
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I'm attaching the bundle that was initially linked in the description

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Frode,

Just a side comment: there is no neutron gateway in Jeff's deployment and neutron-openvswitch has it enabled for the dhcp agent unconditionally:

➜ charm-neutron-openvswitch git:(stable/19.04) grep -RiP enable_isolated_metadata
templates/icehouse/dhcp_agent.ini:enable_isolated_metadata = True
templates/mitaka/dhcp_agent.ini:enable_isolated_metadata = True

In the Neutron DHCP agent code itself:

neutron/agent/linux/dhcp.py

METADATA_DEFAULT_PREFIX = 16
METADATA_DEFAULT_IP = '169.254.169.254'
METADATA_DEFAULT_CIDR = '%s/%d' % (METADATA_DEFAULT_IP,
                                   METADATA_DEFAULT_PREFIX)

# ...
        if self.conf.force_metadata or self.conf.enable_isolated_metadata:
            ip_cidrs.append(METADATA_DEFAULT_CIDR)

In the doc for provider networks:

neutron/doc/source/admin/deploy-ovs-provider.rst

#. In the ``dhcp_agent.ini`` file, configure the DHCP agent:

   .. code-block:: ini

      [DEFAULT]
      interface_driver = openvswitch
      enable_isolated_metadata = True
      force_metadata = True

   .. note::

      The ``force_metadata`` option forces the DHCP agent to provide
      a host route to the metadata service on ``169.254.169.254``
      regardless of whether the subnet contains an interface on a
      router, thus maintaining similar and predictable metadata behavior
      among subnets.

However, looking at the option documentation descriptions I can see that there is no need to use both options because force_metadata is the stronger one (it enables metadata services unconditionally, not only for router-less networks):
https://github.com/openstack/neutron/blob/stable/queens/neutron/conf/agent/dhcp.py#L41-L57

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Looking at the code I could only find the invocations of functions that add metadata-related iptables rules into qrouter namespaces, not qdhcp namespaces:

➜ neutron git:(stable/queens) ✗ grep -RiP metadata_filter_rules

neutron/agent/metadata/driver.py: def metadata_filter_rules(cls, port, mark):
neutron/agent/metadata/driver.py: for c, r in proxy.metadata_filter_rules(proxy.metadata_port,

https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/metadata/driver.py#L172-L196
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/metadata/driver.py#L286-L294

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Based on the information Jeff provided haproxy listens on port 80 in the qdhcp namespace so there is no need for the iptables rule as in qrouter.

Revision history for this message
Jeff Hillman (jhillman) wrote :

Dmitrii had me run:

---

iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM --checksum-fill

---

within the namespace and now the instance can curl the metadata url of 169.254.169.254

iptables-save from within the NS

https://www.irccloud.com/pastebin/V49B0uvB/

ss -tlpna from within the NS

https://www.irccloud.com/pastebin/sEwHqlwG/

ip a from within the NS

https://www.irccloud.com/pastebin/yeyDtoGs/

RE: https://review.opendev.org/#/c/654645/ where some checksum'ing was reverted

Applying field-critical per Dmitrii

Revision history for this message
Jeff Hillman (jhillman) wrote :

It should also be noted, that from my observation, haproxy wasn't running in the namespace until we set enable-dvr=True on neutron-api

Revision history for this message
Jeff Hillman (jhillman) wrote :

So, while the iptables rule allowed an instance that was deployed with config-drive (so I could SSH in with my key), a newly created instance after that rule was created still has the message:

[WARNING]: No active metadata service found

And it never tries http://169.254.169.254

so no key applied and no metadata found, again.

Revision history for this message
Jeff Hillman (jhillman) wrote :

Console log of instance not getting metadata

https://pastebin.canonical.com/p/qbFwvD4Kgd/

Revision history for this message
Jeff Hillman (jhillman) wrote :

I went to each dpdk interface on each host and ran:

---

ovs-vsctl set Interface dpdk-<whatever> mtu_request=9000

---

and i was able to boot an instance and it received my ssh key. I still got the message No metadata service found, but I guess that's just irrelevant.

Testing without the mangle rule, but MTU in place resulted in the instance not receiving my key, (no metadata)

Re-applying the mangle rule one more time as a test resulted in my being able to login with my key (got metadata).

so the rule is still required

David Ames (thedac)
Changed in charm-neutron-openvswitch:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 19.07
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/664001

Revision history for this message
David Ames (thedac) wrote :

Summary:

We had three distinct issues:

1) MTU setting on the DPDK interfaces
This is resolved in master for neutron-openvswitch and the fix can be seen at [0].

2) An upstream neutron bug where the checksum for metadata traffic is not getting filled [1]
that bug will be tracked separately in LP Bug#1832021

3) The neutron-openvswitch charm was not setting force_metadata = True and was not installing haproxy which are requirements for ns-metdata-proxy which proxies metdata requests from the netns to the nova-api-metadata service.
This is being resolved in [2] and will be the focus of this bug.

[0] https://github.com/juju/charm-helpers/pull/333
[1] https://bugs.launchpad.net/neutron/+bug/1832021
[2] https://review.opendev.org/#/c/664001/

Once [2] lands, the neutron-openvswitch will be fully ready at master. It resolves the first and third problems and will NOT require enabling DVR.

Until [1] is resolved upstream, the workaround setting the checksum fill inside the qdhcp ip netns will remain necessary:

iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM --checksum-fill

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/664001
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=a1639fe51f48b9a6cdb5185c5bffc4480f4e264b
Submitter: Zuul
Branch: master

commit a1639fe51f48b9a6cdb5185c5bffc4480f4e264b
Author: David Ames <email address hidden>
Date: Fri Jun 7 09:58:11 2019 -0700

    Enable isolated provider network metadata access

    When an isolated provider network with no virtual routers metadata
    access occurs in the qdhcp netns.

    Without the force_metadata option in dhcp_agent.ini and the haproxy
    package installed ns-metadata-proxy is not enabled. ns-metdata-proxy
    sits in the ip netns and proxies requests from 169.254.169.254 to the
    nova-api-metadata service outside the netns.

    This change adds the force_metadata option and installs haproxy when
    enable-local-dhcp-and-metadata is True.

    Closes-Bug: #1831935

    Change-Id: Iaad1501e8d7d58888ef0917b6700d22a7cf05ecf

Changed in charm-neutron-openvswitch:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (stable/19.04)

Fix proposed to branch: stable/19.04
Review: https://review.opendev.org/664260

Revision history for this message
David Ames (thedac) wrote :

Update:

The charm fix [0] has landed in master and is currently being back ported to stable [2].

The checksum bug turned out to be a duplicate of LP Bug #1722584 [2]. The fix (a revert) is in upstream neutron but still needs SRU into Ubuntu packaging.

The final work for this bug will continue on LP Bug #1722584 [2]

[0] https://review.opendev.org/#/c/664001/
[1] https://review.opendev.org/664260
[2] https://bugs.launchpad.net/cloud-archive/+bug/1722584

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (stable/19.04)

Reviewed: https://review.opendev.org/664260
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=03eb908d822fcdc91f14c22434a0fd0b51761c90
Submitter: Zuul
Branch: stable/19.04

commit 03eb908d822fcdc91f14c22434a0fd0b51761c90
Author: David Ames <email address hidden>
Date: Fri Jun 7 09:58:11 2019 -0700

    Enable isolated provider network metadata access

    When an isolated provider network with no virtual routers metadata
    access occurs in the qdhcp netns.

    Without the force_metadata option in dhcp_agent.ini and the haproxy
    package installed ns-metadata-proxy is not enabled. ns-metdata-proxy
    sits in the ip netns and proxies requests from 169.254.169.254 to the
    nova-api-metadata service outside the netns.

    This change adds the force_metadata option and installs haproxy when
    enable-local-dhcp-and-metadata is True.

    Closes-Bug: #1831935

    Change-Id: Iaad1501e8d7d58888ef0917b6700d22a7cf05ecf
    (cherry picked from commit a1639fe51f48b9a6cdb5185c5bffc4480f4e264b)

James Page (james-page)
Changed in charm-neutron-openvswitch:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/701476

David Ames (thedac)
Changed in charm-neutron-openvswitch:
status: Fix Released → Triaged
milestone: 19.07 → 20.02
Changed in charm-helpers:
status: New → Triaged
Changed in charm-neutron-gateway:
status: New → Triaged
Changed in charm-helpers:
importance: Undecided → High
Changed in charm-neutron-gateway:
importance: Undecided → High
Changed in charm-neutron-openvswitch:
importance: Critical → High
Changed in charm-helpers:
assignee: nobody → David Ames (thedac)
Changed in charm-neutron-gateway:
assignee: nobody → David Ames (thedac)
milestone: none → 20.02
Changed in charm-neutron-openvswitch:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-gateway (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704462

Changed in charm-neutron-gateway:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-gateway (master)

Reviewed: https://review.opendev.org/704462
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-gateway/commit/?id=a03fe36fa65b710b6cd8059b870c44204f3e3856
Submitter: Zuul
Branch: master

commit a03fe36fa65b710b6cd8059b870c44204f3e3856
Author: David Ames <email address hidden>
Date: Mon Jan 27 14:54:42 2020 -0800

    Make ovs_use_veth a config option

    This change uses a common DHCPAgentContext and takes care to check for a
    pre-existing setting in the dhcp_agent.ini. Only allowing a config
    change if there is no pre-existing setting.

    Please review and merge charm-helpers PR:
    https://github.com/juju/charm-helpers/pull/422

    Partial-Bug: #1831935

    func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/157
    Change-Id: Ia01c637b0837a4e594d16f6565c605460ad3f922

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/701476
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=4075af6a1154246c9653549763e470bb74e7500f
Submitter: Zuul
Branch: master

commit 4075af6a1154246c9653549763e470bb74e7500f
Author: David Ames <email address hidden>
Date: Tue Jan 7 15:21:55 2020 -0800

    Make ovs_use_veth a config option

    This was originally fixed in commit 7578326 but this caused problems. It
    was subsequently reverted in commit 6d2e9ee.

    This change uses a common DHCPAgentContext and takes care to check for a
    pre-existing setting in the dhcp_agent.ini. Only allowing a config
    change if there is no pre-existing setting.

    Please review and merge charm-helpers PR:
    https://github.com/juju/charm-helpers/pull/422

    Partial-Bug: #1831935

    func-test-pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/157
    Change-Id: I4848a3246d3450540acb8d2f479dfa2e7767be60

Liam Young (gnuoy)
Changed in charm-neutron-openvswitch:
milestone: 20.02 → 20.05
Changed in charm-neutron-gateway:
milestone: 20.02 → 20.05
Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

We're also facing this bug in our deployment atm.

VLAN provider networks, metadata server is not accessible. tcpdump shows incorrect checksums on replies from metadata server

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

We're using charm neutron-openvswitch-269

Revision history for this message
David Ames (thedac) wrote :

This bug should actually be closed based on the above gerrit reviews. I'll hold off till I get confirmation from Nikolay.

Nikolay can you please test with the 20.02 release version:
cs:neutron-openvswitch-273

Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

turning tcp tx checksumming off on the qdhcp namespace for the metadata service bound interface also makes it work.
`ethtool -K ns-5546cbc-c9 tx off`

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Tested cs:neutron-openvswitch-274. The metadata server is reachable with vlan provider network.

Revision history for this message
Nikolay Vinogradov (nikolay.vinogradov) wrote :

Sorry about the typo,it has been tested on cs:neutron-openvswitch-273

Revision history for this message
David Ames (thedac) wrote :

Nikolay, thank you for confirmation. I will close this bug.

Changed in charm-helpers:
status: Triaged → Fix Released
Changed in charm-neutron-gateway:
status: In Progress → Fix Released
Changed in charm-neutron-openvswitch:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.