os-keystone-install playbook fails with multiple controllers

Bug #1990008 reported by James Denton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned

Bug Description

Distro: OpenStack-Ansible
Version: Zed
Point: Master

In a multi-node deployment with multiple controller nodes, a recent patch to Keystone role results in an attempt to hit the Keystone endpoint via VIP before the service has been re-enabled in haproxy.

-=-=-=-=-

TASK [include_tasks] ********************************************************************************************************************************
included: /opt/openstack-ansible/playbooks/common-tasks/haproxy-endpoint-manage.yml for infra1_keystone_container-4e93eb38

TASK [Set haproxy service state] ********************************************************************************************************************
changed: [infra1_keystone_container-4e93eb38 -> loadbalancer1(10.0.236.150)] => (item=loadbalancer1)

TASK [Configure container] **************************************************************************************************************************
included: /opt/openstack-ansible/playbooks/common-tasks/os-lxc-container-setup.yml for infra1_keystone_container-4e93eb38

TASK [Set default bind mounts (bind var/log)] *******************************************************************************************************
ok: [infra1_keystone_container-4e93eb38]

...

TASK [systemd_service : Place the systemd timer] ****************************************************************************************************
skipping: [infra1_keystone_container-4e93eb38] => (item={'service_name': 'keystone-wsgi-public', 'enabled': True, 'state': 'started', 'execstarts': '
/openstack/venvs/uwsgi-25.1.0.dev68-python3/bin/uwsgi --autoload --ini /etc/uwsgi/keystone-wsgi-public.ini', 'execreloads': '/openstack/venvs/uwsgi-2
5.1.0.dev68-python3/bin/uwsgi --reload /run/keystone-wsgi-public/uwsgi/keystone-wsgi-public.pid', 'config_overrides': {}})

TASK [systemd_service : Place the systemd socket] ***************************************************************************************************

TASK [systemd_service : Reload systemd on unit change] **********************************************************************************************

TASK [systemd_service : include_tasks] **************************************************************************************************************
included: /etc/ansible/roles/systemd_service/tasks/systemd_load.yml for infra1_keystone_container-4e93eb38 => (item={'service_name': 'keystone-wsgi-p
ublic', 'enabled': True, 'state': 'started', 'execstarts': '/openstack/venvs/uwsgi-25.1.0.dev68-python3/bin/uwsgi --autoload --ini /etc/uwsgi/keyston
e-wsgi-public.ini', 'execreloads': '/openstack/venvs/uwsgi-25.1.0.dev68-python3/bin/uwsgi --reload /run/keystone-wsgi-public/uwsgi/keystone-wsgi-publ
ic.pid', 'config_overrides': {}})

TASK [systemd_service : Load service keystone-wsgi-public] ******************************************************************************************
ok: [infra1_keystone_container-4e93eb38] => (item=)

TASK [systemd_service : Load timer keystone-wsgi-public] ********************************************************************************************
skipping: [infra1_keystone_container-4e93eb38] => (item=)

TASK [systemd_service : Load socket] ****************************************************************************************************************

TASK [os_keystone : Flush handlers] *****************************************************************************************************************

TASK [os_keystone : Wait for service to be up] ******************************************************************************************************
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (12 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (11 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (10 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (9 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (8 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (7 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (6 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (5 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (4 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (3 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (2 retries left).
FAILED - RETRYING: [infra1_keystone_container-4e93eb38]: Wait for service to be up (1 retries left).
fatal: [infra1_keystone_container-4e93eb38]: FAILED! => {"attempts": 12, "cache_control": "no-cache", "changed": false, "connection": "close", "conte
nt_length": "107", "content_type": "text/html", "elapsed": 0, "msg": "Status code was 503 and not [300]: HTTP Error 503: Service Unavailable", "redir
ected": false, "status": 503, "url": "http://10.0.236.150:5000"}

-=-=-=-=-

This seems to be related to the change made here:

https://github.com/openstack/openstack-ansible-os_keystone/commit/05c64f7651a93bfa987a939fce680c3d4b13df30

The re-enabling of the node in haproxy is a post_task[1] for os-keystone-install, which hasn't happened yet when the "Wait for service to be up" task is run. The service is reachable directly but not (yet) via the VIP. Reverting the patch seems to resolve this.

[1] https://github.com/openstack/openstack-ansible/blob/master/playbooks/os-keystone-install.yml#L98

Changed in openstack-ansible:
status: New → In Progress
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Bugfix proposed in https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/858385

Would be great if you could test it out.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_keystone (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/858385
Committed: https://opendev.org/openstack/openstack-ansible-os_keystone/commit/7868766202b80205574efa66deb217ab2491d5c2
Submitter: "Zuul (22348)"
Branch: master

commit 7868766202b80205574efa66deb217ab2491d5c2
Author: Dmitriy Rabotyagov <email address hidden>
Date: Mon Sep 19 16:08:46 2022 +0200

    Bootstrap when running against last backend

    When deploying keystone for the first time, aliveness check inside
    service_bootstrap can not succeed for multi-node setup, as playbook
    will disable current backend. So we need to bootstrap host only
    when running against last host in play. We also should make sure, that
    following tasks will not fail when running against first ones.

    Closes-Bug: #1990008
    Related-Bug: #1989326
    Change-Id: Ifa9a79c34265b225a5e24c30cae47d3f0fa0739f

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_keystone (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_keystone (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/859232
Committed: https://opendev.org/openstack/openstack-ansible-os_keystone/commit/36d06d4cc3c74fb749ac655f74e35092d6acea07
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 36d06d4cc3c74fb749ac655f74e35092d6acea07
Author: Dmitriy Rabotyagov <email address hidden>
Date: Mon Sep 19 16:08:46 2022 +0200

    Bootstrap when running against last backend

    When deploying keystone for the first time, aliveness check inside
    service_bootstrap can not succeed for multi-node setup, as playbook
    will disable current backend. So we need to bootstrap host only
    when running against last host in play. We also should make sure, that
    following tasks will not fail when running against first ones.

    With that we also now able to check the service status during
    bootstrap against the internal VIP. This is important in environments
    with limited connectivity.

    Closes-Bug: #1990008
    Closes-Bug: #1989326
    Squashes Change-ID: I1a4aec40618237aa23b4f40b335c141071a56f08

    Change-Id: Ifa9a79c34265b225a5e24c30cae47d3f0fa0739f
    (cherry picked from commit 7868766202b80205574efa66deb217ab2491d5c2)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible-os_keystone yoga-eom

This issue was fixed in the openstack/openstack-ansible-os_keystone yoga-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.