o-hm0 port doesn't receive IP

Bug #1946325 reported by Radu Malica
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
Triaged
High
Unassigned

Bug Description

I have deployed Octavia with charm version 34 (latest stable) and OVN network on a functional Openstack Wallaby cloud, based on Ubuntu Focal.

Everything works correctly when deploying an Amphorae instance until Octavia tries to check the status connecting to port 9443.

I had these in my logs:

2021-10-21 12:06:10.183 109815 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='10.21.0.27', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f9e3a580610>, 'Connection to 10.21.0.27 timed out. (connect timeout=10.0)'))

Ports are UP in openstack port list

octavia-health-manager-octavia-13-listen-port:

fa:16:3e:cb:98:49 | ip_address='10.21.0.141', subnet_id='f7d379f9-b1f4-4449-a4f0-faaf349cc410' | ACTIVE |

Amphorae instance port:

fa:16:3e:10:87:64 | ip_address='10.21.0.27', subnet_id='f7d379f9-b1f4-4449-a4f0-faaf349cc410' | ACTIVE |

On octavia LXD container, port o-hm doesn't have any IP, nor there is a DHCP service enabled or config file in /etc/dhcp/octavia.

7: o-hm0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:cb:98:49 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::840b:6fff:fe8a:2be6/64 scope link
       valid_lft forever preferred_lft forever

Once I added the IP from the port: ip a a 10.21.0.141/24 dev o-hm0 , these changed in logs:

2021-10-21 12:06:55.231 109815 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='10.21.0.27', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f9e3a5519a0>, 'Connection to 10.21.0.27 timed out. (connect timeout=10.0)'))

2021-10-21 12:07:00.476 109815 INFO octavia.controller.worker.v1.tasks.database_tasks [-] Mark ALLOCATED in DB for amphora: 0bbd52b6-f276-4d8f-b82e-8d0956cb4c70 with compute id 267312e3-6cc0-4a85-ae60-e036bf124490 for load balancer: e1aae613-47b3-4b88-9bd4-0cc3c1a2b7c8

2021-10-21 12:07:12.735 109815 INFO octavia.controller.worker.v1.tasks.database_tasks [-] Mark ACTIVE in DB for load balancer id: e1aae613-47b3-4b88-9bd4-0cc3c1a2b7c8

2021-10-21 12:07:13.685 109815 INFO octavia.controller.queue.v1.endpoints [-] Creating listener '5055f253-97ec-4709-8115-624f0c455560'...

2021-10-21 12:07:21.256 109815 INFO octavia.controller.queue.v1.endpoints [-] Creating pool '6c7bbbd8-3b13-4959-b902-9e8c9795c7d5'...

2021-10-21 12:07:28.420 109815 INFO octavia.controller.queue.v1.endpoints [-] Creating member '8663ec9d-873a-4c7b-a962-b916cebe127b'...

2021-10-21 12:07:42.795 109815 INFO octavia.controller.queue.v1.endpoints [-] Creating health monitor '016e6586-b1eb-4241-bb71-ae3e24d24178'...

After this, I can ping the Amphorae image:

root@juju-b73276-4-lxd-15:~# ping 10.21.0.27
PING 10.21.0.27 (10.21.0.27) 56(84) bytes of data.
64 bytes from 10.21.0.27: icmp_seq=1 ttl=64 time=2.14 ms
64 bytes from 10.21.0.27: icmp_seq=2 ttl=64 time=1.25 ms
64 bytes from 10.21.0.27: icmp_seq=3 ttl=64 time=0.650 ms
64 bytes from 10.21.0.27: icmp_seq=4 ttl=64 time=0.567 ms

Tags: sts
Radu Malica (radumalica)
summary: - hm0 port binding_failed on Wallaby/Focal
+ o-hm0 port doesn't receive IP
Radu Malica (radumalica)
description: updated
Revision history for this message
Paul Goins (vultaire) wrote :

Hello Radu,

I just hit a similar issue and have spent some time looking at this.

It looks like there's been a bug fix for a different bug, https://bugs.launchpad.net/charm-octavia/+bug/1893446, which will cause the octavia units to go into a blocked state if the port isn't set up right. That's in the stable/21.10 release of the OpenStack charms, which would currently mean cs:octavia-37.

This caused me to review code and docs a bit, and it seems to me like we both may want to run the configure-resources action. Quoting from the charm's README:

> By executing the configure-resources action the charm will create the resources required for operation of the Octavia service. If you want to manage these resources yourself you must set the create-mgmt-network configuration option to False.
>
> You can at any time use the configure-resources action to prompt immediate resource discovery.

Hope this helps; I'm going to see if we can run this on my side to resolve the issue.

Best Regards,
Paul Goins

Revision history for this message
Paul Goins (vultaire) wrote :

Unfortunately, this didn't work for me.

As additional notes on my side, I see that I have the associated ports created in OpenStack, they're enabled, the MAC address matches that of the o-hm0 adapter on the sample node I was checking, but the IP address isn't being assigned. I am not sure why.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Health manager ports not getting an IP address can the result of more than one issue but the first thing to check is whether the DHCP server is running and reachable. For v6 networks in neutron-ovs environments this is provided by radvd running on the network router namespace so I would track that down and (a) ensure it is running and (b) ensure your octavia unit can see v6 traffic coming from that network.

Revision history for this message
Hemanth Nakkina (hemanth-n) wrote (last edit ):

I reproduced the issue on xena and applied the following workaround:

On ovn-nbdb leader unit, ran the following commands:

$ CIDR_UUID=$(ovn-nbctl --bare --columns=_uuid find dhcp_options cidr="10.100.0.0/24")
$ ovn-nbctl lsp-set-dhcpv4-options <o-hm0_port_id_from_neutron> $CIDR_UUID

10.100.0.0/24 ==> LB Mgmt network

And then ran dhclient o-hm0 on octavia unit.

tags: added: sts
Revision history for this message
Hemanth Nakkina (hemanth-n) wrote (last edit ):

Analysis so far:

dhcpv4-options are not set on the LSP if the device_owner has prefix "neutron:".

Please see the following code to confirm the above statement
https://opendev.org/openstack/neutron/src/branch/stable/xena/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L185-L189
https://opendev.org/openstack/neutron/src/branch/master/neutron/common/ovn/utils.py#L163-L169
https://opendev.org/openstack/neutron/src/branch/master/neutron/common/ovn/utils.py#L125-L127

Charm octavia sets device_owner as neutron:LOADBALANCERV2
https://opendev.org/openstack/charm-octavia/src/branch/master/src/lib/charm/openstack/api_crud.py#L362

I confirmed this theory by creating port manually and looking at ovs-sbctl lflow-list

I see the following entries if i create port manually with device_owner not started with neutron:

OpenStack command:
openstack port create --network lb-mgmt-net --device-owner Octavia:LOADBALANCERV2 --security-group fe575293-a434-48fd-9b06-74770cab1a6f testoctavia4

SB Flows:
table=17(ls_in_dhcp_options ), priority=100 , match=(inport == "74afdfcd-7291-476f-bfae-877264cc0b61" && eth.src == fa:16:3e:db:75:61 && ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68 && udp.dst == 67), action=(reg0[3] = put_dhcp_opts(offerip = 10.100.0.233, classless_static_route = {169.254.169.254/32,10.100.0.2, 0.0.0.0/0,10.100.0.1}, dns_server = {10.5.0.2}, domain_name = "octaviaovn.stsstack.qa.1ss.", lease_time = 43200, mtu = 1492, netmask = 255.255.255.0, router = 10.100.0.1, server_id = 10.100.0.1); next;)
table=17(ls_in_dhcp_options ), priority=100 , match=(inport == "74afdfcd-7291-476f-bfae-877264cc0b61" && eth.src == fa:16:3e:db:75:61 && ip4.src == 10.100.0.233 && ip4.dst == {10.100.0.1, 255.255.255.255} && udp.src == 68 && udp.dst == 67), action=(reg0[3] = put_dhcp_opts(offerip = 10.100.0.233, classless_static_route = {169.254.169.254/32,10.100.0.2, 0.0.0.0/0,10.100.0.1}, dns_server = {10.5.0.2}, domain_name = "octaviaovn.stsstack.qa.1ss.", lease_time = 43200, mtu = 1492, netmask = 255.255.255.0, router = 10.100.0.1, server_id = 10.100.0.1); next;)

Where as when i create a port using below command, I dont see above flows.
openstack port create --network lb-mgmt-net --device-owner neutron:LOADBALANCERV2 --security-group fe575293-a434-48fd-9b06-74770cab1a6f testoctavia5

Devstack uses Octavia:health-mgr as device_owner to create o-hm0 port.
https://github.com/openstack/octavia/blob/master/devstack/plugin.sh#L469

Please note this issue only occurs if LB Mgmt network is IPv4. I am going to go backwards and see in which release the problem started.

Changed in charm-octavia:
importance: Undecided → High
Changed in charm-octavia:
assignee: nobody → Hemanth Nakkina (hemanth-n)
tags: added: seg
removed: sts
tags: added: sts
removed: seg
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-octavia (master)
Changed in charm-octavia:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-octavia (master)

Change abandoned by "Hemanth N <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-octavia/+/841365
Reason: I am not working on this patch. Please feel free to continue on this patch if interested.

Changed in charm-octavia:
assignee: Hemanth Nakkina (hemanth-n) → nobody
status: In Progress → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.