Toggling dhcp on and off in a subnet causes new instances to be unreachable

Bug #1918914 reported by Arnoud de Jonge
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
Medium
Unassigned

Bug Description

After DHCP was turned on and off again on our network, new instances where not reachable. We found that they still tried to get their network config via DHCP after that.

We run Openstack Ussuri installed with Openstack Kolla with OVN networking enabled. Also force_config_drive is set to true.

Steps to reproduce:

  openstack network create test
  openstack subnet create --no-dhcp --subnet-range 192.168.0.0/24 --network test test
  openstack router create test
  openstack router set test --external-gateway public
  openstack router add subnet test test

  openstack server create --network test --image e83d66e7-776a-4b59-a583-97dfcc5799f6 --flavor s3.small --key-name noudssh test-1

Network metadata:

{
   "links" : [
      {
         "ethernet_mac_address" : "fa:16:3e:b1:f6:ee",
         "id" : "tap7608d5b5-bd",
         "mtu" : 8942,
         "type" : "ovs",
         "vif_id" : "7608d5b5-bdc5-4215-a39c-acd8fa1318c2"
      }
   ],
   "networks" : [
      {
         "id" : "network0",
         "ip_address" : "192.168.0.237",
         "link" : "tap7608d5b5-bd",
         "netmask" : "255.255.255.0",
         "network_id" : "66a6378c-3e2d-4814-9412-4a784a81e516",
         "routes" : [
            {
               "gateway" : "192.168.0.1",
               "netmask" : "0.0.0.0",
               "network" : "0.0.0.0"
            }
         ],
         "services" : [],
         "type" : "ipv4"
      }
   ],
   "services" : []
}

Toggle DHCP and create new server:

  openstack subnet set --dhcp test
  openstack subnet set --no-dhcp test
  openstack server create --network test --image e83d66e7-776a-4b59-a583-97dfcc5799f6 --flavor s3.small --key-name noudssh test-2

Network metadata:

{
   "links" : [
      {
         "type" : "ovs",
         "id" : "tapee8f020a-1f",
         "vif_id" : "ee8f020a-1f2e-4db3-aab5-f6387fb45ba6",
         "ethernet_mac_address" : "fa:16:3e:94:05:35",
         "mtu" : 8942
      }
   ],
   "services" : [],
   "networks" : [
      {
         "network_id" : "66a6378c-3e2d-4814-9412-4a784a81e516",
         "link" : "tapee8f020a-1f",
         "type" : "ipv4_dhcp",
         "id" : "network0"
      }
   ]
}

As DHCP is now off, this instance stays unreachable.

I tried the same in a cluster with OVN disabled and that worked without any problem. So this seems to be OVN related.

Tags: ovn
description: updated
Revision history for this message
Brian Haley (brian-haley) wrote :

Since this report is against Ussuri or later I'm moving to the neutron component since that's where the OVN driver code is now.

affects: networking-ovn → neutron
tags: added: ovn
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Arnoud:

I tried using the steps you provided. When I create the first VM (test-1), there is no DHCP reply to the DHCP client. Once I enable the DHCP in the subnet, the VM receives an IP address.

Then I disable again the DHCP option in the subnet and create the second VM. Of course, the second VM (test-2) do not receive an IP address but the first one do not loose the one provided and still has connectivity.

Am I forgetting something when trying to reproduce this issue?

BTW, the metadata namespace is correctly created and the metadata agent is serving data from this namespace (at least to the first VM when the DHCP is enabled again and receives an IP).

Regards.

Revision history for this message
Arnoud de Jonge (arnoud-dejonge-4) wrote :

We're using the configdrive. So test-1 and test-2 should get their IP through the config drive. In our case test-1 gets an IP as expected. Then I toggle DHCP on and off and launch test-2. I would expect test-2 to get an IP the same way as test-1, but it doesn't and when I check the config drive for this one (see the json in my original post) it is now set to DHCP, which of cause will not work.

When I delete the DHCP port from the subnet, and launch a new server, it does get a static IP again, the same way I see for test-1 again.

Hongbin Lu (hongbin.lu)
Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Max Khon (fjoe) wrote (last edit ):

When you turn DHCP off the instance is expected to get the IP address via metadata service.

What are the Port_Binding properties for localport for your network in your scenario?

In my case I see that localport does not have external_ids:neutron:cidrs property (empty) and that's why neutron-ovn-metadata-agent ignores it:

---
root@eq-os1:~# ovn-sbctl find Port_Binding type=localport
_uuid : b6329cbe-e80f-48a3-921d-e1031afd85d8
chassis : []
datapath : 097732e0-85d1-4744-a9c6-bafa0d861700
encap : []
external_ids : {"neutron:cidrs"="", "neutron:device_id"=ovnmeta-81954d74-51e6-4598-b6b6-3da3832f20df, "neutron:device_owner"="network:dhcp", "neutron:network_name"=neutron-81954d74-51e6-4598-b6b6-3da3832f20df, "neutron:port_name"="", "neutron:project_id"=f11221fbfbb844209cd49c7ca3a12a00, "neutron:revision_number"="1", "neutron:security_group_ids"=""}
gateway_chassis : []
ha_chassis_group : []
logical_port : "a557f47a-dae7-4150-96c2-71abbf48b84b"
mac : ["fa:16:3e:06:ed:9b"]
nat_addresses : []
options : {requested-chassis=""}
parent_port : []
tag : []
tunnel_key : 2
type : localport
virtual_parent : []
root@eq-os1:~#
---

Corresponding code in neutron-ovn-metadata-agent is:

--- neutron/agent/ovn/metadata/agent.py ---
        # If there's no metadata port or it doesn't have a MAC or IP
        # addresses, then tear the namespace down if needed. This might happen
        # when there are no subnets yet created so metadata port doesn't have
        # an IP address.
        if not (port and port.mac and
                port.external_ids.get(ovn_const.OVN_CIDRS_EXT_ID_KEY, None)):
            LOG.debug("There is no metadata port for network %s or it has no "
                      "MAC or IP addresses configured, tearing the namespace "
                      "down if needed", net_name)
            self.teardown_datapath(datapath, net_name)
            return
---

When I enable DHCP on this subnet (of an external provider network) neutron:cidrs gets non-empty and metadata gets correctly provisioned.

Revision history for this message
yatin (yatinkarel) wrote :

Seems the original issue is duplicate of [1] already fixed with [2][3] in Ussuri, available with tag 16.4.2. I think this can be closed. And if still happens with >= 16.4.2 can be re opened.

And issue mentioned by @Max Khon seems to be the behavior(no metadata provision if dhcp disabled), atleast workaround is to use config-drive. Anyway there is already bug for it[4] so can be discussed seperately and see if this use can be supported or not.

[1] https://bugs.launchpad.net/networking-ovn/+bug/1950180
[2] https://review.opendev.org/c/openstack/neutron/+/813411
[3] https://review.opendev.org/c/openstack/neutron/+/812337
[4] https://bugs.launchpad.net/neutron/+bug/1976366

Changed in neutron:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.