metadata server unreachable with provider networking only

Bug #1831935 reported by Jeff Hillman on 2019-06-06
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack neutron-openvswitch charm
Critical
David Ames

Bug Description

In the scenario where their is no Nuetron Gateway, we're only using provider networking. Specifically VLAN provider networking.

The one network and subnet created look as follows:

 openstack network show mgmt-1
+---------------------------+--------------------------------------+
| Field | Value |
+---------------------------+--------------------------------------+
| admin_state_up | UP |
| availability_zone_hints | zone1, zone2, zone3 |
| availability_zones | zone1, zone2 |
| created_at | 2019-06-06T17:31:19Z |
| description | |
| dns_domain | |
| id | 724aef7a-54a2-4daf-9aa3-98f008215b55 |
| ipv4_address_scope | None |
| ipv6_address_scope | None |
| is_default | None |
| is_vlan_transparent | None |
| mtu | 9000 |
| name | mgmt-1 |
| port_security_enabled | True |
| project_id | 143294d60ce54454b451214026857bc9 |
| provider:network_type | vlan |
| provider:physical_network | physnet1 |
| provider:segmentation_id | 1030 |
| qos_policy_id | None |
| revision_number | 3 |
| router:external | Internal |
| segments | None |
| shared | False |
| status | ACTIVE |
| subnets | 5d0cf549-4bca-410d-8514-90b805276324 |
| tags | |
| updated_at | 2019-06-06T17:31:20Z |
+---------------------------+--------------------------------------+

$ openstack subnet show mgmt-1
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| allocation_pools | 10.243.160.10-10.243.160.100 |
| cidr | 10.243.160.0/24 |
| created_at | 2019-06-06T17:31:20Z |
| description | |
| dns_nameservers | |
| enable_dhcp | True |
| gateway_ip | 10.243.160.254 |
| host_routes | |
| id | 5d0cf549-4bca-410d-8514-90b805276324 |
| ip_version | 4 |
| ipv6_address_mode | None |
| ipv6_ra_mode | None |
| name | mgmt-1 |
| network_id | 724aef7a-54a2-4daf-9aa3-98f008215b55 |
| project_id | 143294d60ce54454b451214026857bc9 |
| revision_number | 0 |
| segment_id | None |
| service_types | |
| subnetpool_id | None |
| tags | |
| updated_at | 2019-06-06T17:31:20Z |
+-------------------+--------------------------------------+

neutron-openvswitch has enable-local-dhcp-and-metadata set to True. This is verified with:

$ openstack network agent list | egrep -i 'dhcp|meta'
| 08586252-fe88-48e7-afd8-f95a3500dee1 | Metadata agent | compute15 | None | :-) | UP | neutron-metadata-agent |
| 09f469c9-0f02-4e20-9725-3e98120ae704 | DHCP agent | compute1 | zone1 | :-) | UP | neutron-dhcp-agent |
| 0a9bc81d-ff79-43a8-b279-ef22290d36d7 | Metadata agent | compute3 | None | :-) | UP | neutron-metadata-agent |
| 0c0ecf6a-aa56-4095-9fff-4f22ab99d00c | Metadata agent | compute5 | None | :-) | UP | neutron-metadata-agent |
| 170fb80a-f782-4aa7-8ac6-750d00ec0125 | Metadata agent | compute14 | None | :-) | UP | neutron-metadata-agent |
| 1e0f1511-11a4-4e43-bdca-722bde6b6c11 | DHCP agent | compute3 | zone1 | :-) | UP | neutron-dhcp-agent |
| 22a110e4-f928-4388-87ca-e8ebc9f15554 | DHCP agent | compute2 | zone1 | :-) | UP | neutron-dhcp-agent |
| 236387f3-3f24-4075-9cb7-e12280d36438 | DHCP agent | compute5 | zone1 | :-) | UP | neutron-dhcp-agent |
| 2d643308-cd4d-4d07-a54a-7ab99167c1b0 | Metadata agent | compute6 | None | :-) | UP | neutron-metadata-agent |
| 2f35c4bd-cf44-4187-920a-db5beeab05cf | DHCP agent | compute14 | zone3 | :-) | UP | neutron-dhcp-agent |
| 3db2dcbf-bdea-444a-afc6-75b49f0132a8 | DHCP agent | compute12 | zone3 | :-) | UP | neutron-dhcp-agent |
| 44a09595-eded-46eb-9d39-036870731433 | Metadata agent | compute8 | None | :-) | UP | neutron-metadata-agent |
| 44e02602-f0e3-45f0-9887-9739aaaef3de | DHCP agent | compute8 | zone2 | :-) | UP | neutron-dhcp-agent |
| 468e7408-8af1-433c-a77b-e474bc16f9f9 | Metadata agent | compute13 | None | :-) | UP | neutron-metadata-agent |
| 51ba61b8-72b5-49cb-a33e-17afe72a4a0b | DHCP agent | compute6 | zone2 | :-) | UP | neutron-dhcp-agent |
| 56f99f61-3a6f-48d2-931d-6128d0877486 | DHCP agent | compute10 | zone2 | :-) | UP | neutron-dhcp-agent |
| 61df9ce0-7ad7-4955-8d59-88fe45b24ff7 | DHCP agent | compute15 | zone3 | :-) | UP | neutron-dhcp-agent |
| 6897a6d1-8e56-4958-b300-9642fd895ad0 | Metadata agent | compute10 | None | :-) | UP | neutron-metadata-agent |
| 6ab71669-379a-44ab-a804-0805da23630e | Metadata agent | compute7 | None | :-) | UP | neutron-metadata-agent |
| 6ba578ee-ed55-4930-af7e-3c3dade6db79 | Metadata agent | compute12 | None | :-) | UP | neutron-metadata-agent |
| 7feaf689-05b7-483a-a812-3ffb129717ee | DHCP agent | compute4 | zone1 | :-) | UP | neutron-dhcp-agent |
| 80e7c6d1-9748-4803-8e54-1b7df807cb69 | Metadata agent | compute11 | None | :-) | UP | neutron-metadata-agent |
| a8a70e20-910b-4f5c-8414-01d4dcac42cf | Metadata agent | compute9 | None | :-) | UP | neutron-metadata-agent |
| b6a751e7-c86a-4347-8f5b-079fd019fcaf | DHCP agent | compute7 | zone2 | :-) | UP | neutron-dhcp-agent |
| bd95e896-496f-4b37-84bc-9e916a1cd313 | Metadata agent | compute4 | None | :-) | UP | neutron-metadata-agent |
| d5ec415a-f8c9-43b0-b128-0f07da1b2625 | Metadata agent | compute1 | None | :-) | UP | neutron-metadata-agent |
| dba5e158-6222-4cba-b9d3-19165918261b | DHCP agent | compute11 | zone3 | :-) | UP | neutron-dhcp-agent |
| f4640a2e-fc7c-415c-8f3d-aa0c68f6110c | Metadata agent | compute2 | None | :-) | UP | neutron-metadata-agent |
| fdde9c2b-2352-4876-8598-ef877eda724d | DHCP agent | compute13 | zone3 | :-) | UP | neutron-dhcp-agent |
| ff340680-8eff-4cb3-8685-36a9b0459141 | DHCP agent | compute9 | zone2 | :-) | UP | neutron-dhcp-agent |

The instance, when booting is giving the message:

[WARNING]: No active metadata service found

If i create an instance using config-drive (ignoring metadata server), it starts up fine, and has a routing table with 169.254.169.254 via one of the qdhcp namespaces.

Inside of this instance i can ping both the IP of the namespace and the 169.254.169.254 address, but I cannot curl it.

Inside of the namespace if I curl http://169.254.169.254/ I get the following:

---

# curl http://169.254.169.254
<html>
 <head>
  <title>404 Not Found</title>
 </head>
 <body>
  <h1>404 Not Found</h1>
  The resource could not be found.<br /><br />

 </body>

---

Doing some googling, I found a suggestion of verify that the following rule existed in the iptables of the namespace:

-A neutron-l3-agent-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8775

I have no such rule, but this is likely because there is no NGW in their environment.

It is also worth noting, that in the console output of the instance, it never attempts to call http://169.254.169.254

Bundle can be found at:

Console log of last boot found at:

https://pastebin.canonical.com/p/Jyk22sfGT8/

Jeff Hillman (jhillman) wrote :

Doing some further testing, port 8775 is not open int he namespace. But it is open on the compute host.

a curl to http://<compute-with-namespace>:8775 provides:

1.0
2007-01-19
2007-03-01
2007-08-29
2007-10-10
2007-12-15
2008-02-01
2008-09-01
2009-04-04
latest

So there's a rule missing or a service not running to allow reaching the metadata running on the compute host that holds the namespace

Jeff Hillman (jhillman) wrote :

as a test i swtiched firewall-driver to iptables_hybrid from openvswitch....no effect.

the main concern, I think, is that port 8775 is not open in the namespace.

Jeff Hillman (jhillman) wrote :

Enabling dvr on neutron-api gives us a ns-metadata-proxy service running on the copute host that wasn't there also.

This process has opened port 80 in the namespace.

From within the namespace, if we curl against the namespace IP, we get metadata, if we curl against the 169.254.169.254, we get nothing.

Curling to either address from the instance fails (hngs)

Jeff Hillman (jhillman) wrote :

Tried enabling allow-automatic-l3agent-failover both with and with dvr, no change

Frode Nordahl (fnordahl) wrote :

Have you tried to enable this ``neutron-gateway`` configuration option [0]?

0: https://jaas.ai/neutron-gateway/262#charm-config-enable-isolated-metadata

information type: Public → Private
Frode Nordahl (fnordahl) on 2019-06-07
information type: Private → Public
information type: Public → Private
Frode Nordahl (fnordahl) on 2019-06-07
description: updated
information type: Private → Public

I'm attaching the bundle that was initially linked in the description

Dmitrii Shcherbakov (dmitriis) wrote :

Frode,

Just a side comment: there is no neutron gateway in Jeff's deployment and neutron-openvswitch has it enabled for the dhcp agent unconditionally:

➜ charm-neutron-openvswitch git:(stable/19.04) grep -RiP enable_isolated_metadata
templates/icehouse/dhcp_agent.ini:enable_isolated_metadata = True
templates/mitaka/dhcp_agent.ini:enable_isolated_metadata = True

In the Neutron DHCP agent code itself:

neutron/agent/linux/dhcp.py

METADATA_DEFAULT_PREFIX = 16
METADATA_DEFAULT_IP = '169.254.169.254'
METADATA_DEFAULT_CIDR = '%s/%d' % (METADATA_DEFAULT_IP,
                                   METADATA_DEFAULT_PREFIX)

# ...
        if self.conf.force_metadata or self.conf.enable_isolated_metadata:
            ip_cidrs.append(METADATA_DEFAULT_CIDR)

In the doc for provider networks:

neutron/doc/source/admin/deploy-ovs-provider.rst

#. In the ``dhcp_agent.ini`` file, configure the DHCP agent:

   .. code-block:: ini

      [DEFAULT]
      interface_driver = openvswitch
      enable_isolated_metadata = True
      force_metadata = True

   .. note::

      The ``force_metadata`` option forces the DHCP agent to provide
      a host route to the metadata service on ``169.254.169.254``
      regardless of whether the subnet contains an interface on a
      router, thus maintaining similar and predictable metadata behavior
      among subnets.

However, looking at the option documentation descriptions I can see that there is no need to use both options because force_metadata is the stronger one (it enables metadata services unconditionally, not only for router-less networks):
https://github.com/openstack/neutron/blob/stable/queens/neutron/conf/agent/dhcp.py#L41-L57

Dmitrii Shcherbakov (dmitriis) wrote :

Looking at the code I could only find the invocations of functions that add metadata-related iptables rules into qrouter namespaces, not qdhcp namespaces:

➜ neutron git:(stable/queens) ✗ grep -RiP metadata_filter_rules

neutron/agent/metadata/driver.py: def metadata_filter_rules(cls, port, mark):
neutron/agent/metadata/driver.py: for c, r in proxy.metadata_filter_rules(proxy.metadata_port,

https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/metadata/driver.py#L172-L196
https://github.com/openstack/neutron/blob/stable/queens/neutron/agent/metadata/driver.py#L286-L294

Dmitrii Shcherbakov (dmitriis) wrote :

Based on the information Jeff provided haproxy listens on port 80 in the qdhcp namespace so there is no need for the iptables rule as in qrouter.

Jeff Hillman (jhillman) wrote :

Dmitrii had me run:

---

iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM --checksum-fill

---

within the namespace and now the instance can curl the metadata url of 169.254.169.254

iptables-save from within the NS

https://www.irccloud.com/pastebin/V49B0uvB/

ss -tlpna from within the NS

https://www.irccloud.com/pastebin/sEwHqlwG/

ip a from within the NS

https://www.irccloud.com/pastebin/yeyDtoGs/

RE: https://review.opendev.org/#/c/654645/ where some checksum'ing was reverted

Applying field-critical per Dmitrii

Jeff Hillman (jhillman) wrote :

It should also be noted, that from my observation, haproxy wasn't running in the namespace until we set enable-dvr=True on neutron-api

Jeff Hillman (jhillman) wrote :

So, while the iptables rule allowed an instance that was deployed with config-drive (so I could SSH in with my key), a newly created instance after that rule was created still has the message:

[WARNING]: No active metadata service found

And it never tries http://169.254.169.254

so no key applied and no metadata found, again.

Jeff Hillman (jhillman) wrote :

Console log of instance not getting metadata

https://pastebin.canonical.com/p/qbFwvD4Kgd/

Jeff Hillman (jhillman) wrote :

I went to each dpdk interface on each host and ran:

---

ovs-vsctl set Interface dpdk-<whatever> mtu_request=9000

---

and i was able to boot an instance and it received my ssh key. I still got the message No metadata service found, but I guess that's just irrelevant.

Testing without the mangle rule, but MTU in place resulted in the instance not receiving my key, (no metadata)

Re-applying the mangle rule one more time as a test resulted in my being able to login with my key (got metadata).

so the rule is still required

David Ames (thedac) on 2019-06-07
Changed in charm-neutron-openvswitch:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → David Ames (thedac)
milestone: none → 19.07
David Ames (thedac) wrote :

Summary:

We had three distinct issues:

1) MTU setting on the DPDK interfaces
This is resolved in master for neutron-openvswitch and the fix can be seen at [0].

2) An upstream neutron bug where the checksum for metadata traffic is not getting filled [1]
that bug will be tracked separately in LP Bug#1832021

3) The neutron-openvswitch charm was not setting force_metadata = True and was not installing haproxy which are requirements for ns-metdata-proxy which proxies metdata requests from the netns to the nova-api-metadata service.
This is being resolved in [2] and will be the focus of this bug.

[0] https://github.com/juju/charm-helpers/pull/333
[1] https://bugs.launchpad.net/neutron/+bug/1832021
[2] https://review.opendev.org/#/c/664001/

Once [2] lands, the neutron-openvswitch will be fully ready at master. It resolves the first and third problems and will NOT require enabling DVR.

Until [1] is resolved upstream, the workaround setting the checksum fill inside the qdhcp ip netns will remain necessary:

iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM --checksum-fill

Reviewed: https://review.opendev.org/664001
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=a1639fe51f48b9a6cdb5185c5bffc4480f4e264b
Submitter: Zuul
Branch: master

commit a1639fe51f48b9a6cdb5185c5bffc4480f4e264b
Author: David Ames <email address hidden>
Date: Fri Jun 7 09:58:11 2019 -0700

    Enable isolated provider network metadata access

    When an isolated provider network with no virtual routers metadata
    access occurs in the qdhcp netns.

    Without the force_metadata option in dhcp_agent.ini and the haproxy
    package installed ns-metadata-proxy is not enabled. ns-metdata-proxy
    sits in the ip netns and proxies requests from 169.254.169.254 to the
    nova-api-metadata service outside the netns.

    This change adds the force_metadata option and installs haproxy when
    enable-local-dhcp-and-metadata is True.

    Closes-Bug: #1831935

    Change-Id: Iaad1501e8d7d58888ef0917b6700d22a7cf05ecf

Changed in charm-neutron-openvswitch:
status: In Progress → Fix Committed
David Ames (thedac) wrote :

Update:

The charm fix [0] has landed in master and is currently being back ported to stable [2].

The checksum bug turned out to be a duplicate of LP Bug #1722584 [2]. The fix (a revert) is in upstream neutron but still needs SRU into Ubuntu packaging.

The final work for this bug will continue on LP Bug #1722584 [2]

[0] https://review.opendev.org/#/c/664001/
[1] https://review.opendev.org/664260
[2] https://bugs.launchpad.net/cloud-archive/+bug/1722584

Reviewed: https://review.opendev.org/664260
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=03eb908d822fcdc91f14c22434a0fd0b51761c90
Submitter: Zuul
Branch: stable/19.04

commit 03eb908d822fcdc91f14c22434a0fd0b51761c90
Author: David Ames <email address hidden>
Date: Fri Jun 7 09:58:11 2019 -0700

    Enable isolated provider network metadata access

    When an isolated provider network with no virtual routers metadata
    access occurs in the qdhcp netns.

    Without the force_metadata option in dhcp_agent.ini and the haproxy
    package installed ns-metadata-proxy is not enabled. ns-metdata-proxy
    sits in the ip netns and proxies requests from 169.254.169.254 to the
    nova-api-metadata service outside the netns.

    This change adds the force_metadata option and installs haproxy when
    enable-local-dhcp-and-metadata is True.

    Closes-Bug: #1831935

    Change-Id: Iaad1501e8d7d58888ef0917b6700d22a7cf05ecf
    (cherry picked from commit a1639fe51f48b9a6cdb5185c5bffc4480f4e264b)

James Page (james-page) on 2019-07-15
Changed in charm-neutron-openvswitch:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments