Designate HA may result in Resource not running res_designate_haproxy

Bug #1839021 reported by David Ames on 2019-08-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Designate Charm
High
David Ames
OpenStack hacluster charm
Undecided
Unassigned
OpenStack keystone charm
Undecided
Unassigned
charm-interface-hacluster
High
David Ames

Bug Description

Designate HA may result in Resource not running res_designate_haproxy. There seems to be a race, haproxy is in fact running but CRM is unaware of this.

Particularly running the full designate_ha openstack-mojo-spec with xenial <= queens.

Investigate timing, and compare for similarities with LP Bug##1837401.

Work around:
crm resource cleanup res_designate_haproxy

Alex Kavanagh (ajkavanagh) wrote :

Confirmed. I've seen this with keystone and swift as well. I'll dig into this further today.

Changed in charm-hacluster:
status: New → Confirmed
Alex Kavanagh (ajkavanagh) wrote :

Should add, that I've seen it during series upgrades. I thought it might be due to the reboot - but that fact that it's also seen on designate_ha mojo spec probably means it's not due to the reboot and a race.

The resource does seem to be 'broken' (in my series upgrade test) in crm and I got it working by manually the crm resource using crm commands.

Alex Kavanagh (ajkavanagh) wrote :

So adding to my previous comment re: keystone and haproxy with the error:

keystone config:

public_endpoint = http://10.5.100.2:5000
admin_endpoint = http://10.5.100.2:35357

haproxy.cfg:

frontend tcp-in_public-port
    bind *:5000
    bind :::5000
    acl net_10.5.0.92 dst 10.5.0.92/255.255.0.0
    use_backend public-port_10.5.0.92 if net_10.5.0.92
    default_backend public-port_10.5.0.92

i.e. they are both configured on the same port, and keystone doesn't realise that it should be using haproxy. I'll dig into the code that makes the decision about whether haproxy is present or not.

Alex Kavanagh (ajkavanagh) wrote :
Download full text (4.4 KiB)

So my previous was a red-herring (the public_endpoint and admin_endpoint don't actually affect the where keystone wsgi listens).

The config is correct:

apache2/sites-enabled/wsgi-openstack-api.conf:

# Configuration file maintained by Juju. Local changes may be overwritten.

Listen 35347
Listen 4990
<VirtualHost *:35347>
... etc

However, actual keystone units are listening on those ports:

# lsof | grep 5000
keystone- 2126 keystone 10u IPv4 23414 0t0 TCP *:5000 (LISTEN)
keystone- 3128 keystone 10u IPv4 23414 0t0 TCP *:5000 (LISTEN)
keystone- 3129 keystone 10u IPv4 23414 0t0 TCP *:5000 (LISTEN)

# lsof | grep 35357

keystone- 2126 keystone 9u IPv4 23413 0t0 TCP *:35357 (LISTEN)
keystone- 3126 keystone 9u IPv4 23413 0t0 TCP *:35357 (LISTEN)
keystone- 3126 keystone 15u IPv4 150493 0t0 TCP juju-6afeea-mojo-16.project.serverstack:35357->juju-6afeea-mojo-25.project.serverstack:51114 (ESTABLISHED)
keystone- 3127 keystone 9u IPv4 23413 0t0 TCP *:35357 (LISTEN)
keystone- 3127 keystone 10u IPv4 147044 0t0 TCP juju-6afeea-mojo-16.project.serverstack:35357->juju-6afeea-mojo-26.project.serverstack:33835 (ESTABLISHED)
keystone- 3127 keystone 13u IPv4 155157 0t0 TCP juju-6afeea-mojo-16.project.serverstack:35357->juju-6afeea-mojo-24.project.serverstack:56627 (ESTABLISHED)
keystone- 3127 keystone 16u IPv4 155159 0t0 TCP juju-6afeea-mojo-16.project.serverstack:35357->juju-6afeea-mojo-25.project.serverstack:52961 (ESTABLISHED)
keystone- 3128 keystone 9u IPv4 23413 0t0 TCP *:35357 (LISTEN)
keystone- 3129 keystone 9u IPv4 23413 0t0 TCP *:35357 (LISTEN)

However, it's also listening on the other ports as well (4990 and 35347).

Thus, one hypothesis is that the former are apache processes that weren't shutdown and are left over after a restart.

However, it seems that the keystone.service is starting some services on 5000 and 35357:

root@juju-6afeea-mojo-16:/etc# lsof | egrep keystone.*5000
keystone- 11047 keystone 10u IPv4 160995 0t0 TCP *:5000 (LISTEN)
keystone- 11062 keystone 10u IPv4 160995 0t0 TCP *:5000 (LISTEN)
keystone- 11063 keystone 10u IPv4 160995 0t0 TCP *:5000 (LISTEN)
root@juju-6afeea-mojo-16:/etc# lsof | egrep keystone.*4990
root@juju-6afeea-mojo-16:/etc# lsof | egrep keystone.*35357
keystone- 11047 keystone 9u IPv4 160994 0t0 TCP *:35357 (LISTEN)
keystone- 11060 keystone 9u IPv4 160994 0t0 TCP *:35357 (LISTEN)
keystone- 11061 keystone 9u IPv4 ...

Read more...

Alex Kavanagh (ajkavanagh) wrote :

So the mitka - keystone - hacluster pair bug is:

1. on the series-upgrade, the keystone service is either "still enabled" or enabled.
2. However, on mitaka, keystone is configured to be accessed via apache2
3. Thus the keystone.service shouldn't be enabled or running.

The outcome is:

1. the keystone.service runs and binds to 5000/35357
2. the "post-series-upgrade" hooks fails because it can't start haproxy.
3. it can't start haproxy because keystone.service has binded to 5000/35357
4. the hacluster charm reports that res_ks_haproxy isn't running and blocks (this part of the bug).

Solution:

Ensure keystone.service doesn't get enabled (or at least disable it) during the series upgrade.

Changed in charm-keystone:
status: New → Confirmed
assignee: nobody → Alex Kavanagh (ajkavanagh)
David Ames (thedac) wrote :
Changed in charm-designate:
status: New → Triaged
importance: Undecided → High
assignee: nobody → David Ames (thedac)
milestone: none → 19.10
David Ames (thedac) on 2019-08-06
Changed in charm-interface-hacluster:
status: New → Triaged
importance: Undecided → High
assignee: nobody → David Ames (thedac)
Changed in charm-designate:
status: Triaged → Invalid

Fix proposed to branch: master
Review: https://review.opendev.org/674872

Changed in charm-interface-hacluster:
status: Triaged → In Progress

Reviewed: https://review.opendev.org/675127
Committed: https://git.openstack.org/cgit/openstack/charm-keystone/commit/?id=21d212cb2739a3ff1b08d8d96916dfd638f06ffe
Submitter: Zuul
Branch: master

commit 21d212cb2739a3ff1b08d8d96916dfd638f06ffe
Author: Alex Kavanagh <email address hidden>
Date: Wed Aug 7 15:09:13 2019 +0100

    Ensure that keystone service is paused if needed on series upgrade

    During series upgrade, the keystone packages get re-installed as the
    underlying Linux has been upgraded and new package sets are updated and
    then pulled in. For trusty->xenial this means that keystone.service
    gets enabled which then breaks haproxy. On install, on xenial+, the
    keystone.service is disabled in the install hook. This just replicates
    this in the series-upgrade hook.

    Change-Id: Ic5ed6cf354d5545b9e554e205a048955a381e0f5
    Closed-Bug: #1839021

Alex Kavanagh (ajkavanagh) wrote :

On subsequent trusty->xenial series upgrade the keystone-hacluster on one node may show a blocked state with "Resource: res_ks_haproxy not running"

However, if the associated keystone unit is not errored or blocked, it's likely the crm retries have been exceeded. By running "sudo crm resource refresh" the status can be cleared.

Changed in charm-keystone:
status: Confirmed → Invalid
assignee: Alex Kavanagh (ajkavanagh) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers