[wallaby] With Ceph Dashboard enabled, Deploy hangs and fails at 'Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3'

Bug #1966453 reported by Ted Lum
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
New
Undecided
Unassigned

Bug Description

Description
===========
With wallaby, when deploying an HA cluster, with Ceph Dashboard enabled, deployment fails due to timeout waiting for containers to start at 'TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3'

haproxy container fails to start and exits with 'Starting proxy ceph_dashboard: cannot bind socket [10.100.4.40:8444]' when trying to bind to the ctlplane VIP.

ceph-mgr prevents haproxy from binding to port 8444 on the VIP because it has bound the dashboard port to all interfaces

Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp LISTEN 0 5 *:8444 *:* users:(("ceph-mgr",pid=49245,fd=53))

The IP address is set

[ceph: root@overcloud-controller-0 /]# ceph config get mgr mgr/dashboard/overcloud-controller-0-lozwge/server_addr
10.100.7.163

However, ceph-mgr binds to all interfaces

Steps to reproduce
==================

Deploy an openstack HA cluster with ceph dashboard enabled

Expected result
===============

Deployment does not hang and eventuality time out at 'TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3' while waiting for containers to start.

ceph-mgr binds port 8444 to the correct, specific, ip address.

haproxy container is able to start because it is able to bind to port 8444 in the ctlplane VIP.

Actual result
=============

Deployment hangs and eventuality times out at 'TASK | Create containers managed by Podman for /var/lib/tripleo-config/container-startup-config/step_3' while waiting for containers to start.

ceph-mgr binds port 8444 to all interfaces.

haproxy container exits immediately because it is unable to bind to port 8444 in the ctlplane VIP.

Environment
===========

wallaby

Using: https://trunk.rdoproject.org/centos8/component/tripleo/current/python3-tripleo-repos-0.1.1-0.20220214194848.cbbdde6.el8.noarch.rpm

python3-tripleoclient.noarch 16.4.1-0.20220321214520.2909e00.el8
ceph-ansible.noarch 6.0.7-1.el8s

ceph-mgr image: quay.io/ceph/daemon:v6.0.7-stable-6.0-pacific-centos-stream8

haproxy image: quay.io/tripleowallaby/openstack-haproxy:current-tripleo

Logs & Configs
==============

+---------------------------------+---------------------------+
| Name | Fixed IP Addresses |
+---------------------------------+---------------------------+
| control_virtual_ip | ip_address='10.100.4.40' |
| storage_virtual_ip | ip_address='10.100.7.40' |
| overcloud-controller-0-ctlplane | ip_address='10.100.4.83' |
| overcloud-controller-0_Storage | ip_address='10.100.7.163' |
| overcloud-controller-1-ctlplane | ip_address='10.100.4.79' |
| overcloud-controller-1_Storage | ip_address='10.100.7.157' |
| overcloud-controller-2-ctlplane | ip_address='10.100.4.81' |
| overcloud-controller-2_Storage | ip_address='10.100.7.156' |
+---------------------------------+---------------------------+

[ceph: root@overcloud-controller-0 /]# ceph config get mgr mgr/dashboard/overcloud-controller-0-lozwge/server_addr
10.100.7.163

[ceph: root@overcloud-controller-1 /]# ceph config get mgr mgr/dashboard/overcloud-controller-1-rmrwkp/server_addr
10.100.7.157

[ceph: root@overcloud-controller-2 /]# ceph config get mgr mgr/dashboard/overcloud-controller-2-anemij/server_addr
10.100.7.156

[ceph: root@overcloud-controller-0 /]# ceph mgr services
{
    "dashboard": "http://10.100.7.163:8444/",
    "prometheus": "http://10.100.7.163:9283/"
}

[ceph: root@overcloud-controller-1 /]# ceph mgr services
{
    "dashboard": "http://10.100.7.163:8444/",
    "prometheus": "http://10.100.7.163:9283/"
}

[ceph: root@overcloud-controller-2 /]# ceph mgr services
{
    "dashboard": "http://10.100.7.163:8444/",
    "prometheus": "http://10.100.7.163:9283/"
}

http://cdn.tedlum.com/ansible-20220324T100343.log
http://cdn.tedlum.com/cephadm_command-20220324T174346.log

Ted Lum (tlum)
description: updated
Revision history for this message
Ted Lum (tlum) wrote :

It turns out the IP address is being set in an invalid configuration node, so ceph uses it's bad default behavior instead.

It is setting mgr/dashboard/overcloud-controller-0-lozwge/server_addr, but it needs to be setting mgr/dashboard/overcloud-controller-0.lozwge/server_addr.

I would suggest using this patch, which simply gets the correct name directly from ceph, as opposed to trying to kludge the brittle, regex voodoo that doesn't work correctly.

index dc083e42..9567e42e 100644
--- a/tripleo_ansible/roles/tripleo_cephadm/tasks/dashboard/configure_dashboard_backends.yml
+++ b/tripleo_ansible/roles/tripleo_cephadm/tasks/dashboard/configure_dashboard_backends.yml
@@ -15,21 +15,30 @@
 # under the License.

 - name: Get the current mgr
- command: |
- {{ container_cli }} ps -a -f 'name=ceph-?(.*)-mgr.*' --format \{\{\.Names\}\}
+ shell: |
+ {{ tripleo_cephadm_bin }} ls --no-detail | jq -r '.[]|.name|select(startswith("mgr."))[4:]'
   register: ceph_mgr
   become: true
+ until: ceph_mgr.stdout|length > 0
+ retries: "24"
+ delay: "5"
+ ignore_errors: "false"
   delegate_to: "{{ dashboard_backend }}"

+- name: Fail if mgr daemon is not running
+ fail:
+ msg: "mgr daemon is not running"
+ when: ceph_mgr is undefined or ceph_mgr.stdout is undefined or ceph_mgr.stdout|length == 0
+
 - name: Check the resulting mgr container instance
   debug:
- msg: "{{ ceph_mgr.stdout | regex_replace('^ceph-?(.*)-mgr.', '') }}"
+ msg: "'the mgr daemon id is ' returned {{ ceph_mgr.stdout }}"
   when: tripleo_cephadm_verbose

 - name: config the current dashboard backend
   command: |
     {{ tripleo_cephadm_ceph_cli }} config set \
- mgr mgr/dashboard/{{ ceph_mgr.stdout | regex_replace('^ceph-?(.*)-mgr.', '') }}/server_addr \
+ mgr mgr/dashboard/{{ ceph_mgr.stdout }}/server_addr \
     {{ hostvars[dashboard_backend][tripleo_ceph_dashboard_net] }}
   become: true
   changed_when: false

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.