OpenStack-Ansible

Haproxy role failed when using fqdn

Bug #2006986 reported by Antoine Thys on 2023-02-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack-Ansible	New	Undecided	Unassigned

Bug Description

When I'm try to use fqdn in internal_lb_vip_address and external_lb_vip_address haproxy role use this value directly for frontend binding instead of ip address.
Haproxy canno't using fqdn for binding and wait for an IP.

But if I run playbok multi time the error is no longer present but haproxy doesn't work correctly.

Tags:

Revision history for this message

Antoine Thys (thystips) wrote on 2023-02-11:

And if I change internal_lb_vip_address by an IP adresse I have an error because certificate doesn't exists (I use letsencrypt with dns-01).

After create missing folder I have another error from ansible :

fatal: [node0]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: {{ haproxy_ssl_cert_path ~ '/haproxy_' ~ ansible_facts['hostname'] ~ '-' ~ item_name }}: {{ ('interface'in item and item['interface'] is truthy) | ternary(item['address'] ~ '-' ~ item_interface, item['address']) }}: {{ item['interface'] }}: 'dict object' has no attribute 'interface'\n\nThe error appears to be in '/etc/ansible/roles/haproxy_server/handlers/main.yml': line 16, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: regen pem\n ^ here\n"}

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-02-11 (last edit on 2023-02-11):

Hi Antoine,

Are you sure that haproxy can not be provided an FQDN to listen on? As literally all my deployments have haproxy binded on FQDN (well, except one which is a bit special). Keepalived, on it's turn, must indeed be provided with network for the VIP.

So when internal/external_lb_vip_address are set to FQDN, couple of extra variables are required to make keepalived happy, like in the example: https://opendev.org/openstack/openstack-ansible/src/commit/68cb6f950a4f9ddd20326727dff82fc384e5f6b6/etc/openstack_deploy/user_variables.yml#L173-L181

You can also tell haproxy to bind on specific IP instead of the internal/external_lb_vip_address using variables:
https://opendev.org/openstack/openstack-ansible-haproxy_server/src/commit/445b15f9c3776d8f88102934395076f92edfdb25/defaults/main.yml#L235-L239

I know that internal_lb_vip_address and external_lb_vip_address variables are awfully named which leads to big confusion, but we can't easily rename them without huge mess for existing deployments.

I will mark this report as invalid, sicne I believe we have all variables available to make deployment work.

Changed in openstack-ansible:
status:	New → Invalid

Revision history for this message

Antoine Thys (thystips) wrote on 2023-02-11 (last edit on 2023-02-11):

Hi Dmitriy,

Thanks for your response.

Variables for keepalived are already defined in my case with correct IP addresses.

So I tried with haproxy_bind_external/internal_lb_vip_address and haproxy_bind_external/internal_lb_vip_interface but another error occurred, certificate in configuration is `/etc/haproxy/ssl/haproxy_node0-10.2.200.205.pem` but certificate file in ssl directory is `haproxy_node0-10.2.200.200-br-mgmt.pem`.
Another problem is that the certificate is self signed but not certbot one.

This is my configuration :

###################################
haproxy_keepalived_external_vip_cidr: "10.2.200.200/24"
haproxy_keepalived_internal_vip_cidr: "10.2.200.205/24"
haproxy_keepalived_external_interface: br-mgmt
haproxy_keepalived_internal_interface: br-mgmt

haproxy_bind_external_lb_vip_address: 10.2.200.200
haproxy_bind_internal_lb_vip_address: 10.2.200.205

haproxy_bind_external_lb_vip_interface: br-mgmt
haproxy_bind_internal_lb_vip_interface: br-mgmt

# https://bugs.launchpad.net/openstack-ansible/+bug/2006938
# I delete --standalone argument in task file
haproxy_ssl_letsencrypt_enable: True
haproxy_ssl_letsencrypt_install_method: "distro"
haproxy_ssl_letsencrypt_email: *****
haproxy_interval: 2000
haproxy_ssl_letsencrypt_setup_extra_params: "--dns-google --dns-google-credentials *****"
haproxy_ssl_letsencrypt_certbot_challenge: "dns-01"

haproxy_stats_enabled: true
haproxy_stats_prometheus_enabled: true
###################################

I tried with and without define interfaces without effect and change `haproxy_tls_vip_binds` but I have issue want haproxy generate pem.

EDIT:

My bad, certbot cert is generate after flush handlers but role failed in this step.

Dmitriy Rabotyagov (noonedeadpunk) on 2023-02-11

Changed in openstack-ansible:
status:	Invalid → New

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-02-11:

Can you kindly provide OpenStack-Ansible version you're trying to deploy?

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-02-11:

Also providing output from failures is important, as I'm not really sure what is actually going wrong and thus it would be hard to reproduce.

As eventually, haproxy should bind normally on FQDN given that you don't use DNS RR for the VIP.

Revision history for this message

Antoine Thys (thystips) wrote on 2023-02-11 (last edit on 2023-02-11):

Sorry, I trying to deploy with stable/zed branch.

Ansible output is long but the principal error is `[ALERT] (645654) : parsing [/root/.ansible/tmp/ansible-tmp-1676126717.3075213-24626-225079311140876/tmpkteen3fh:844] : 'bind 10.2.200.205:15671' : unable to stat SSL certificate from file '/etc/haproxy/ssl/haproxy_node1-10.2.200.205.pem' : No such file or directory.`

when running : `haproxy_server : Regenerate haproxy configuration` tasks.

If I check files in /etc/haproxy/ssl :

```
root@node0:~# ls -la /etc/haproxy/ssl/
total 44
drwxr-xr-x 2 root root 4096 Feb 11 16:10 .
drwxr-xr-x 5 root root 64 Feb 11 16:10 ..
-rw-r--r-- 1 root root 2049 Feb 11 16:10 haproxy_node0-10.2.200.200-br-mgmt-ca.crt
-rw-r--r-- 1 root root 1911 Feb 11 16:10 haproxy_node0-10.2.200.200-br-mgmt.crt
-rw-r--r-- 1 root root 3243 Feb 11 16:10 haproxy_node0-10.2.200.200-br-mgmt.key
-rw-r--r-- 1 root root 7203 Feb 11 16:10 haproxy_node0-10.2.200.200-br-mgmt.pem
-rw-r--r-- 1 root root 2049 Feb 11 16:10 haproxy_node0-10.2.200.205-br-mgmt-ca.crt
-rw-r--r-- 1 root root 1911 Feb 11 16:10 haproxy_node0-10.2.200.205-br-mgmt.crt
-rw-r--r-- 1 root root 3243 Feb 11 16:10 haproxy_node0-10.2.200.205-br-mgmt.key
-rw-r--r-- 1 root root 7203 Feb 11 16:10 haproxy_node0-10.2.200.205-br-mgmt.pem
```

I a run playbook multi time I have haproxy_node0-10.2.200.200.pem but not haproxy_node0-10.2.200.205.pem and with another run it failed on next node for same reasons.

No issue with DNS but if I don't set haproxy_bind_external/internal_lb_vip_address haproxy use fqdn as bind directive instead of IP and failed.

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-02-11:

Sorry, I will be able to take a deeper look at the issue only on Monday.

Do you happen to have an output from haproxy when it fails to bind on fqdn? As I indeed use that thing quite widely as of today. Also do you use some default haproxy version provided by the distro or one of the latest ones from some third-party repo?

Revision history for this message

Antoine Thys (thystips) wrote on 2023-02-11:

complet unreadable trace Edit (8.8 KiB, text/plain)

No problem.

If haproxy_bind_external/internal_lb_vip_address and haproxy_bind_external/internal_lb_vip_interface are empty and with :

internal_lb_vip_address: "os-int.internal.antoinethys.net"
external_lb_vip_address: "os.internal.antoinethys.net"

```
[ALERT] (741277) : parsing [/root/.ansible/tmp/ansible-tmp-1676129968.0616238-27539-96043168798635/tmp9m5rbtjo:195] : 'bind' : invalid address: 'os-int.internal.antoinethys.net' in 'os-int.internal.antoinethys.net:9001'
```

I use default role for haproxy, I juste delete `--standalone` argument in certbot tasks so it's latest version of haproxy in ubuntu 22.04.

Revision history for this message

Antoine Thys (thystips) wrote on 2023-02-11:

Strange, this time it's seems to be working with following changes:

```
haproxy_keepalived_external_vip_cidr: "10.2.200.200/32"
haproxy_keepalived_internal_vip_cidr: "10.2.200.205/32"
```

instead of

```
haproxy_keepalived_external_vip_cidr: "10.2.200.200/24"
haproxy_keepalived_internal_vip_cidr: "10.2.200.205/24"
```

remove :

```
haproxy_keepalived_external_interface: br-mgmt
haproxy_keepalived_internal_interface: br-mgmt
haproxy_bind_external_lb_vip_address: 10.2.200.200
haproxy_bind_internal_lb_vip_address: 10.2.200.205
haproxy_bind_external_lb_vip_interface: br-mgmt
haproxy_bind_internal_lb_vip_interface: br-mgmt
```

replace internal vip :

```
internal_lb_vip_address: "10.2.200.205"
external_lb_vip_address: "os.internal.antoinethys.net"
```

II had already tried to make these changes separately but not together without success.

I don't know if there is an issue in this thread but it's weird.

Revision history for this message

Dmitriy Rabotyagov (noonedeadpunk) wrote on 2023-02-11:

#10

I'm not sure about let's encrypt at very least, but keepalived cidr worth to be /32 indeed, otherwise things might get weird in some other aspects as well (like src IP of haproxy_hosts becoming vip, while they're not whitelisted for some services).

Also there should not be any issue with having internal vip as FQDN.

I will try to reproduce that on Monday with some proper FQDN and let's encrypt.