octavia lb opearting status is offline

Bug #2046877 reported by zhengrui
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Confirmed
Medium
Unassigned

Bug Description

my lb can work successfully, but the lb's operating_status is offine, lb's provider is amphora, network is ovn, the health-manager log show some thing may be useful. it shows:

2023-12-19 14:59:34.252 1037110 DEBUG octavia.amphorae.drivers.health.heartbeat_udp [-] Received packet from ('172.16.2.193', 30377) dorecv /usr/lib/python3.6/site-packages/octavia/amphorae/drivers/health/heartbeat_udp.py:95
2023-12-19 14:59:34.253 1037110 WARNING octavia.amphorae.drivers.health.heartbeat_udp [-] Health Manager experienced an exception processing a heartbeat message from ('172.16.2.193', 30377). Ignoring this packet. Exception: 'NoneType' object has no attribute 'encode'

but when I add heartbeat_key in octavia.conf, it show hmac not equal, and now i do not have a idea about it.

here is my octavia.conf

[DEFAULT]
debug = True
transport_url = rabbit://openstack:RABBIT_PASS@controller
[api_settings]
bind_host = 0.0.0.0
bind_port = 9876
enabled_provider_drivers = ovn:'Octavia OVN driver',amphora:'The Octavia Amphora driver'
healthcheck_enabled = True
healthcheck_refresh_interval = 5
[database]
connection = mysql+pymysql://octavia:OCTAVIA_DBPASS@controller/octavia
[health_manager]
bind_port = 5555
bind_ip = 172.16.2.2
controller_ip_port_list = 172.16.2.2:5555
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = octavia
password = OCTAVIA_PASS
[certificates]
ca_private_key = /etc/octavia/certs/private/server_ca.key.pem
ca_certificate = /etc/octavia/certs/server_ca.cert.pem
server_certs_key_passphrase = insecure-key-do-not-use-this-key
ca_private_key_passphrase = not-secure-passphrase
[haproxy_amphora]
server_ca = /etc/octavia/certs/server_ca-chain.cert.pem
client_cert = /etc/octavia/certs/private/client.cert-and-key.pem
bind_host = 0.0.0.0
bind_port = 9443
lb_network_interface = o-hm0
[controller_worker]
client_ca = /etc/octavia/certs/client_ca.cert.pem
amp_image_tag = Amphora
amp_flavor_id = 100
amp_boot_network_list=8d310f13-4c3c-4688-a2ff-e00b6bba7ae7
amp_secgroup_list = 58b02d55-5374-490c-9bcd-40e76b79e427
network_driver = allowed_address_pairs_driver
compute_driver = compute_nova_driver
amphora_driver = amphora_haproxy_rest_driver
[oslo_messaging]
topic = octavia_prov
[service_auth]
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = octavia
password = OCTAVIA_PASS
[neutron]
auth_url = http://controller:5000
auth_type = password
project_domain_name = default
user_domain_name = default
region_name = RegionOne
project_name = service
username = neutron
password = NEUTRON_PASS
[driver_agent]
status_socket_path = /var/run/octavia/status.sock
stats_socket_path = /var/run/octavia/stats.sock
get_socket_path = /var/run/octavia/get.sock
enabled_provider_agents = ovn,amphora
[ovn]
ovn_nb_connection = tcp:192.168.122.94:6641
ovn_sb_connection = tcp:192.168.122.94:6642

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi @zhengrui, are you using Juju to deploy octavia. e.g. using something like "juju deploy octavia" or "juju deploy <bundle.yaml>". If not, then we'll need to re-allocate the bug to octavia. Thanks.

Changed in charm-octavia:
status: New → Incomplete
zhengrui (zhengrui)
affects: charm-octavia → octavia
zhengrui (zhengrui)
Changed in octavia:
status: Incomplete → New
Changed in octavia:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Thanks for reporting this issue.
what deployment tool do you use? Is it a manual install?

I see 2 potential issues in this story:

1. we don't mention heartbeat_key in the installation guide

There was a proposal for a new guide for centos that also specifies that heartbeat_key should be set in the ubuntu guide but it's still in review
https://review.opendev.org/c/openstack/octavia/+/784022

2. when heartbeat_key is not set, the amphora agent and the services have different behaviors.

when heartbeat_key is not set, the value is None, but:
- the amphora agent signs the data with a "None" key (note the double quote, it's a string)
- the health-manager verifies the data with a None key (not a string)

we could either ensure that a None key is handled in the same way on both sides, or we could add a big warning in the HM when the key is not set.

Revision history for this message
zhengrui (zhengrui) wrote : Re:[Bug 2046877] octavia lb opearting status is offline
Download full text (4.9 KiB)

@Gregory Thiemonge Thanks for your reply, yes I deploy octavia manually in centos,I do not add heartbeat_key at first, and I notice the "Exception: 'NoneType' object has no attribute 'encode'" warning in log, so I found the solution from network which told me to add heartbeat_key in octavia.conf.
So the actual question is my LB is offline, and I can not find where the problem is. Here are some information I collected in attachments, hope you can give me some suggestion.

| |
zhengrui20220
|
|
zhengrui20220@163.com
|

---- Replied Message ----
| From | Gregory Thiemonge<email address hidden> |
| Date | 12/20/2023 15:32 |
| To | <zhengrui20220@163.com> |
| Subject | [Bug 2046877] Re: octavia lb opearting status is offline |
Thanks for reporting this issue.
what deployment tool do you use? Is it a manual install?

I see 2 potential issues in this story:

1. we don't mention heartbeat_key in the installation guide

There was a proposal for a new guide for centos that also specifies that heartbeat_key should be set in the ubuntu guide but it's still in review
https://review.opendev.org/c/openstack/octavia/+/784022

2. when heartbeat_key is not set, the amphora agent and the services have different behaviors.

when heartbeat_key is not set, the value is None, but:
- the amphora agent signs the data with a "None" key (note the double quote, it's a string)
- the health-manager verifies the data with a None key (not a string)

we could either ensure that a None key is handled in the same way on
both sides, or we could add a big warning in the HM when the key is not
set.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/2046877

Title:
octavia lb opearting status is offline

Status in octavia:
Confirmed

Bug description:
my lb can work successfully, but the lb's operating_status is offine,
lb's provider is amphora, network is ovn, the health-manager log show
some thing may be useful. it shows:

2023-12-19 14:59:34.252 1037110 DEBUG octavia.amphorae.drivers.health.heartbeat_udp [-] Received packet from ('172.16.2.193', 30377) dorecv /usr/lib/python3.6/site-packages/octavia/amphorae/drivers/health/heartbeat_udp.py:95
2023-12-19 14:59:34.253 1037110 WARNING octavia.amphorae.drivers.health.heartbeat_udp [-] Health Manager experienced an exception processing a heartbeat message from ('172.16.2.193', 30377). Ignoring this packet. Exception: 'NoneType' object has no attribute 'encode'

but when I add heartbeat_key in octavia.conf, it show hmac not equal,
and now i do not have a idea about it.

here is my octavia.conf

[DEFAULT]
debug = True
transport_url = rabbit://openstack:RABBIT_PASS@controller
[api_settings]
bind_host = 0.0.0.0
bind_port = 9876
enabled_provider_drivers = ovn:'Octavia OVN driver',amphora:'The Octavia Amphora driver'
healthcheck_enabled = True
healthcheck_refresh_interval = 5
[database]
connection = mysql+pymysql://octavia:OCTAVIA_DBPASS@controller/octavia
[health_manager]
bind_port = 5555
bind_ip = 172.16.2.2
controller_ip_port_list = 172.16.2.2:5555
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_se...

Read more...

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

This WARNING:

2023-12-20 16:30:52.904 1194183 WARNING octavia.amphorae.backends.health_daemon.status_message [-] calculated hmac(hex=False): 1f3634c13fc0ef9b7f999a52e0bc80b7f587b8013a17782c58cdbae5cd5b62b6 not equal to msg hmac: 6139653333346536363264333261343363356335366233663734313265333536 dropping packet: octavia.common.exceptions.InvalidHMACException: calculated hmac: 35646161373335303962643234646363616532323137316637636130343934306430663831626536303162653661376131393132363530643334316362303866 not equal to msg hmac: 36356334376532653739363466623330663131376463316561623431626439376139653333346536363264333261343363356335366233663734313265333536 dropping packet

indicates that the heartbeat_key value is not the same on the octavia controllers and in the amphora instances.

it's a common problem when you update the value while some amphorae are already created.

to fix it, there are 2 methods:
- update the settings in the amphorae: `openstack loadbalancer amphora configure <amphora_id>` (you can get the list of amphorae with `openstack loadbalancer amphora list`)
- recreate new amphorae for the existing LBs: `openstack loadbalancer amphora failover <amphora_id>` (note: `openstack loadbalancer failover <lb_id>` works too)

Revision history for this message
zhengrui (zhengrui) wrote :

@Gregory Thiemonge thanks for your help, I have success to turn the status to online, but "openstack loadbalancer amphora configure" is not useful, "openstack loadbalancer amphora failover" command helped me.
anyway thanks for your help again.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Great news!
AFAIK "amphora configure" is supposed to update the heartbeat_key, we will also take a look at it.
Thanks!

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.