OpenStack Octavia Charm

IPv6 mgmt network not working, octavia can't talk to Amphora instance

Bug #1911788 reported by Hybrid512 on 2021-01-14

This bug report is a duplicate of: Bug #1896630: Need for managing /etc/hosts for containers. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Octavia Charm	Incomplete	Undecided	Unassigned

Bug Description

Hi,

I'm trying to make Octavia Charm work but I encounter a few issues.
Apparently my deployment goes well (everything is green and I followed the Octavia charm guide to set it up automatically ie. auto create Amphora image with diskimage retrofit + configuring Octavia ressources automatically through the Charm).
Again, no errors and apparently, everything goes well.

Once connected to Horizon, I spawn a few VMs and then, I try to create a new LB for them.
First, until there is a LB create, I constantly get a very annoying notification popup saying there is no LB available for listing.

Then, I create a LB, fill everything to get a very simple round robin LB on 2 VMs in their private subnet.
Once completed, the LB is created but it stay stuck in "Offline/Pending Create" status until it ends in an "Error" state in which I can destroy it (it takes quite a few minutes before ending in that state).

While in "Pending Create" state, I checked the Octavia units logs and saw something like that :

2021-01-14 16:21:00.490 9938 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f61a01f9f10>, 'Connection to fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1 timed out. (connect timeout=10.0)'))

I then checked in the instances list and discovered the Amphora instance has been correctly created (I checked the console and it is fully booted, apparently with no error) and it has the fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1 IPv6 IP attached to it.

I then ssh to the Octavia units and tried to ping6 this IP and got no answer from every units so there seem to be an issue with the IPv6 management overlay network.

I tried to curl https://[fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1]:9443 and got no answer too.

Here is a detailed description of my setup :
* Charmed Openstack (20.10 stable) deployed through MaaS (2.9.0) with Juju (2.8.7) with the Focal series for every units (with Openstack Ussuri so "distro" openstack-origin)
* 7 machines (3 control planes, 4 compute nodes)
* HA mode for every control plane applications in conjunction to the HACluster charm with VIP assigned
* 3 spaces :
  - "pxe" : non routed untagged network for provisionning only
  - "ost-int" : routed /24 VLAN subnet for openstack internal network
  - "ost-pub" : routed /24 VLAN subnet for both openstack admin/public network

"ost-int" and "ost-pub" are routed, they can talk to each other, there is no firewall in between.

You can find my exported bundle attached.

I tried to deploy this bundle dozen of times with multiple spaces or only one space but that didn't change anything, I was never able to get Octavia work properly.
Amphora is created, LB is created but octavia API can't talk to the Amphora instance.

Revision history for this message

Hybrid512 (walid-moghrabi) wrote on 2021-01-14:

exported-bundle.yaml Edit (50.0 KiB, text/plain)

Revision history for this message

Frode Nordahl (fnordahl) wrote on 2021-01-15:

Thank you for the bug. To diagnose communication issues between the Octavia units and the cloud we must start by looking at the state of tunnels and port bindings.

Take a look at the Neutron ports created by the Octavia charm, for example `octavia-health-manager-octavia-0-listen-port`:
- Does the `binding_host_id` match the FQDN of the Octavia container?
- Does the `binding_vif_type` field say 'ovs' or does it say 'binding_failed'?

Since you are using OVN you will also have rich logging in /var/log/ovn/ovn-controller.log which would show evidence of any shortname vs. FQDN issues.

There is an ongoing issue with deploying OVS/OVN in LXD containers on MAAS as detailed in bug 1896630. The root of the issue is somewhere below the charms but the linked bug contain steps to work around the issue.

Changed in charm-octavia:
status:	New → Incomplete

Revision history for this message

Hybrid512 (walid-moghrabi) wrote on 2021-01-15: Re: [Bug 1911788] Re: IPv6 mgmt network not working, octavia can't talk to Amphora instance

Download full text (15.2 KiB)

Hi,

I took a look at the bug report you gave me ... I don't have the same
behavior but it might be related.

------------------------------------------------------------------
openstack port list | grep octavia
| 6fd3e411-0f55-4f57-a49b-8978ac7045be |
octavia-health-manager-octavia-0-listen-port | fa:16:3e:4b:c6:48 |
ip_address='fc00:bee5:427a:2b79:f816:3eff:fe4b:c648',
subnet_id='e7e22722-af55-4b8d-b126-d5cf2e037c0d' | DOWN |
| 8878130c-c8a6-44f1-a668-d669b00a8e0d |
octavia-health-manager-octavia-2-listen-port | fa:16:3e:bd:2c:83 |
ip_address='fc00:bee5:427a:2b79:f816:3eff:febd:2c83',
subnet_id='e7e22722-af55-4b8d-b126-d5cf2e037c0d' | DOWN |
| fdce70ed-b861-4473-a483-2024b2733c75 |
octavia-health-manager-octavia-1-listen-port | fa:16:3e:a0:1d:a2 |
ip_address='fc00:bee5:427a:2b79:f816:3eff:fea0:1da2',
subnet_id='e7e22722-af55-4b8d-b126-d5cf2e037c0d' | DOWN |
------------------------------------------------------------------

Hi,

I took a look at the bug report you gave me ... I don't have the same
behavior but it might be related.

------------------------------------------------------------------
openstack port list | grep octavia
| 6fd3e411-0f55-4f57-a49b-8978ac7045be |
octavia-health-manager-octavia-0-listen-port | fa:16:3e:4b:c6:48 |
ip_address='fc00:bee5:427a:2b79:f816:3eff:fe4b:c648',
subnet_id='e7e22722-af55-4b8d-b126-d5cf2e037c0d' | DOWN   |
| 8878130c-c8a6-44f1-a668-d669b00a8e0d |
octavia-health-manager-octavia-2-listen-port | fa:16:3e:bd:2c:83 |
ip_address='fc00:bee5:427a:2b79:f816:3eff:febd:2c83',
subnet_id='e7e22722-af55-4b8d-b126-d5cf2e037c0d' | DOWN   |
| fdce70ed-b861-4473-a483-2024b2733c75 |
octavia-health-manager-octavia-1-listen-port | fa:16:3e:a0:1d:a2 |
ip_address='fc00:bee5:427a:2b79:f816:3eff:fea0:1da2',
subnet_id='e7e22722-af55-4b8d-b126-d5cf2e037c0d' | DOWN   |
------------------------------------------------------------------

------------------------------------------------------------------
hostname -f
juju-37c2ba-2-lxd-16.maas
------------------------------------------------------------------

------------------------------------------------------------------
cat /etc/hosts
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
------------------------------------------------------------------

Take a look at the Neutron ports created by the Octavia charm, for example
> `octavia-health-manager-octavia-0-listen-port`:
> - Does the `binding_host_id` match the FQDN of the Octavia container?
>

As you can see, this is not the case, FQDN is "juju-37c2ba-2-lxd-16.maas"
while binding_host_id is  "juju-37c2ba-2-lxd-16" (shortname)

- Does the `binding_vif_type` field say 'ovs' or does it say
> 'binding_failed'?

As you can see, it says "binding_failed"

Here is a snipet of my /var/log/ovn/ovn-controller.log :

and it continues to the end of the file with the last 2 lines ("not
claiming ...", "Dropped xx log messages ..."), nothing else.

So, to summarize, you're saying that the issue is due to the fact
that binding_host_id is not using the FQDN ?
Would adding an entry in /etc/hosts would fix the issue ?

Thanks for your help.

Best regards,

Walid

Le ven. 15 janv. 2021 à 07:40, Frode Nordahl <1911788@bugs.launchpad.net> a
écrit :

> Thank you for the bug. To diagnose communication issues between the
> Octavia units and the cloud we must start by looking at the state of
> tunnels and port bindings.
>
> Take a look at the Neutron ports created by the Octavia charm, for example
> `octavia-health-manager-octavia-0-listen-port`:
> - Does the `binding_host_id` match the FQDN of the Octavia container?
> - Does the `binding_vif_type` field say 'ovs' or does it say
> 'binding_failed'?
>
> Since you are using OVN you will also have rich logging in /var/log/ovn
> /ovn-controller.log which would show evidence of any shortname vs. FQDN
> issues.
>
> There is an ongoing issue with deploying OVS/OVN in LXD containers on
> MAAS as detailed in bug 1896630. The root of the issue is somewhere
> below the charms but the linked bug contain steps to work around the
> issue.
>
> ** Changed in: charm-octavia
>        Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1911788
>
> Title:
>   IPv6 mgmt network not working, octavia can't talk to Amphora instance
>
> Status in OpenStack Octavia Charm:
>   Incomplete
>
> Bug description:
>   Hi,
>
>   I'm trying to make Octavia Charm work but I encounter a few issues.
>   Apparently my deployment goes well (everything is green and I followed
> the Octavia charm guide to set it up automatically ie. auto create Amphora
> image with diskimage retrofit + configuring Octavia ressources
> automatically through the Charm).
>   Again, no errors and apparently, everything goes well.
>
>   Once connected to Horizon, I spawn a few VMs and then, I try to create a
> new LB for them.
>   First, until there is a LB create, I constantly get a very annoying
> notification popup saying there is no LB available for listing.
>
>   Then, I create a LB, fill everything to get a very simple round robin LB
> on 2 VMs in their private subnet.
>   Once completed, the LB is created but it stay stuck in "Offline/Pending
> Create" status until it ends in an "Error" state in which I can destroy it
> (it takes quite a few minutes before ending in that state).
>
>   While in "Pending Create" state, I checked the Octavia units logs and
>   saw something like that :
>
>   2021-01-14 16:21:00.490 9938 WARNING
>   octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect
>   to instance. Retrying.: requests.exceptions.ConnectTimeout:
>   HTTPSConnectionPool(host='fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1',
>   port=9443): Max retries exceeded with url: // (Caused by
>   ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object
>   at 0x7f61a01f9f10>, 'Connection to
>   fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1 timed out. (connect
>   timeout=10.0)'))
>
>   I then checked in the instances list and discovered the Amphora
>   instance has been correctly created (I checked the console and it is
>   fully booted, apparently with no error) and it has the
>   fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1 IPv6 IP attached to it.
>
>   I then ssh to the Octavia units and tried to ping6 this IP and got no
>   answer from every units so there seem to be an issue with the IPv6
>   management overlay network.
>
>   I tried to curl https://[fc00:fa21:3d5c:9cfd:f816:3eff:fef1:7fa1]:9443
>   and got no answer too.
>
>
>   Here is a detailed description of my setup :
>   * Charmed Openstack (20.10 stable) deployed through MaaS (2.9.0) with
> Juju (2.8.7) with the Focal series for every units (with Openstack Ussuri
> so "distro" openstack-origin)
>   * 7 machines (3 control planes, 4 compute nodes)
>   * HA mode for every control plane applications in conjunction to the
> HACluster charm with VIP assigned
>   * 3 spaces :
>     - "pxe" : non routed untagged network for provisionning only
>     - "ost-int" : routed /24 VLAN subnet for openstack internal network
>     - "ost-pub" : routed /24 VLAN subnet for both openstack admin/public
> network
>
>   "ost-int" and "ost-pub" are routed, they can talk to each other, there
>   is no firewall in between.
>
>   You can find my exported bundle attached.
>
>   I tried to deploy this bundle dozen of times with multiple spaces or
> only one space but that didn't change anything, I was never able to get
> Octavia work properly.
>   Amphora is created, LB is created but octavia API can't talk to the
> Amphora instance.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/charm-octavia/+bug/1911788/+subscriptions
>

Revision history for this message

Frode Nordahl (fnordahl) wrote on 2021-01-15:

Thank you for providing more information, as you can see on the port details you are having exactly the same issue:
| binding_host_id | juju-37c2ba-2-lxd-16

The binding_host_id should have been set to the FQDN of the octavia unit, but as explained on the referenced bug that is not happening due to the container not being able to establish its FQDN at initial deploy or subsequent reboots.

I will mark this bug as a duplicate of the referenced bug.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1896630 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

exported-bundle.yaml Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.