Upgrade from Queens to Rocky fails with "Paused-single-unit with hacluster" scenario

Bug #2012647 reported by Márton Kiss
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Keystone Charm
New
Undecided
Unassigned

Bug Description

Upgrade from Queens to Rocky fails with "Paused-single-unit with hacluster" scenario

The Queens to Rocky upgrade of the keystone application failed, when I was following the "Paused-single-unit with hacluster" scenario described in the upgrade documentation:
- https://docs.openstack.org/charm-guide/latest/admin/upgrades/openstack.html

In my case the upgrade plan looks like that:
```
juju config keystone action-managed-upgrade=True
juju config keystone openstack-origin=cloud:bionic-rocky
== upgrade keystone/1 (leader) ==
juju run-action --wait hacluster-keystone/0 pause
juju run-action --wait keystone/1 pause
juju run-action --wait keystone/1 openstack-upgrade
juju run-action --wait keystone/1 resume
juju run-action --wait hacluster-keystone/0 resume
== upgrade keystone/0 ==
juju run-action --wait hacluster-keystone/2 pause
juju run-action --wait keystone/0 pause
juju run-action --wait keystone/0 openstack-upgrade
juju run-action --wait keystone/0 resume
juju run-action --wait hacluster-keystone/2 resume
== upgrade keystone/2 ==
juju run-action --wait hacluster-keystone/1 pause
juju run-action --wait keystone/2 pause
juju run-action --wait keystone/2 openstack-upgrade
juju run-action --wait keystone/2 resume
juju run-action --wait hacluster-keystone/1 resume
```

After the pause of hacluser-keystone/0 and the keystone/1 unit, the keystone/1 openstack-upgrade action is waiting forever in the config-changed-postupgrade hook:

```
2023-03-23 12:58:31 INFO unit.keystone/1.juju-log server.go:316 Keystone charm unit not ready - deferring identity-relation updates
2023-03-23 12:58:31 DEBUG unit.keystone/1.juju-log server.go:316 This unit (keystone/1) is in allowed unit list from mysql/0
2023-03-23 12:58:31 DEBUG unit.keystone/1.juju-log server.go:316 Database is initialised
2023-03-23 12:59:17 ERROR unit.keystone/1.juju-log server.go:316 The call within manager.py failed with the error: 'Unable to establish connection to http://localhost:35337/v3/auth/tokens: HTTPConnectionPool(host='localhost', port=35337): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f87d4409d68>: Failed to establish a new connection: [Errno 111] Connection refused',))'. The call was: path=['resolve_domain_id'], args=('ops-cni',), kwargs={}, api_version=None
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 Traceback (most recent call last):
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "./hooks/config-changed-postupgrade", line 937, in <module>
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 main()
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "./hooks/config-changed-postupgrade", line 930, in main
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 hooks.execute(sys.argv)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/charmhelpers/core/hookenv.py", line 956, in execute
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 self._hooks[hook_name]()
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/charmhelpers/contrib/openstack/utils.py", line 1893, in wrapped_f
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 return f(*args, **kwargs)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/charmhelpers/contrib/hardening/harden.py", line 93, in _harden_inner2
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 return f(*args, **kwargs)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "./hooks/config-changed-postupgrade", line 298, in config_changed_postupgrade
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 update_all_domain_backends()
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "./hooks/config-changed-postupgrade", line 354, in update_all_domain_backends
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 domain_backend_changed(relation_id=rid, unit=unit)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "./hooks/config-changed-postupgrade", line 661, in domain_backend_changed
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 create_or_show_domain(domain_name)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/hooks/keystone_utils.py", line 1122, in create_or_show_domain
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 domain_id = manager.resolve_domain_id(name)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/hooks/keystone_utils.py", line 1226, in __call__
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 return _proxy_manager_call(self._path, self.api_version, args, kwargs)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/charmhelpers/core/decorators.py", line 40, in _retry_on_exception_inner_2
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 return f(*args, **kwargs)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/hooks/keystone_utils.py", line 1270, in _proxy_manager_call
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 raise e
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 File "/var/lib/juju/agents/unit-keystone-1/charm/hooks/keystone_utils.py", line 1264, in _proxy_manager_call
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 raise RuntimeError(s)
2023-03-23 12:59:18 WARNING unit.keystone/1.openstack-upgrade logger.go:60 RuntimeError: The call within manager.py failed with the error: 'Unable to establish connection to http://localhost:35337/v3/auth/tokens: HTTPConnectionPool(host='localhost', port=35337): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f87d4409d68>: Failed to establish a new connection: [Errno 111] Connection refused',))'. The call was: path=['resolve_domain_id'], args=('ops-cni',), kwargs={}, api_version=None
```

The root cause is the following: the update_all_domain_backends() call in the config-changed-postupgrade hook is trying to retrieve domain information from the keystone service on the localhost, however the keystone services is not running on the localhost at all (unit paused), and nothing is listening on the localhost 35337 port, because the haproxy was paused as well. The 35337 should be served by haproxy (paused now), and redirect traffic to one of the two active non-leader keystone units.

The customer's environment was using the keystone charm version 323 (21.05 Charm Release), I assume the same error is present in the master branch of the keysone charm as well:
- https://opendev.org/openstack/charm-keystone/src/branch/master/hooks/keystone_utils.py#L961
- https://opendev.org/openstack/charm-keystone/src/branch/master/hooks/keystone_utils.py#L1258

As a conclusion the current keystone charms are not supporting the action-managed-upgrade=True upgrades in multi-unit environment where the hacluser's haproxy is in a paused state. Instead of the local address this call should use the endpoint on the VIP if the VIP address is present in the configuration.

```
def get_local_endpoint(api_suffix=None):
    """Returns the URL for the local end-point bypassing haproxy/ssl"""
    if not api_suffix:
        api_suffix = get_api_suffix()

    keystone_port = determine_api_port(api_port('keystone-admin'),
                                       singlenode_mode=True)

    if config('prefer-ipv6'):
        ipv6_addr = get_ipv6_addr(exc_list=[config('vip')])[0]
        local_endpoint = 'http://[{}]:{}/{}/'.format(
            ipv6_addr,
            keystone_port,
            api_suffix)
    else:
        local_endpoint = 'http://localhost:{}/{}/'.format(
            keystone_port,
            api_suffix)

    return local_endpoint
```

Revision history for this message
Márton Kiss (marton-kiss) wrote :

I did some additional investigation, and the 35337 port is served by the apache2, not haproxy service:

/etc/apache2/sites-enabled/wsgi-openstack-api.conf

```
Listen 35337
Listen 4980
<VirtualHost *:35337>
    WSGIDaemonProcess keystone-admin processes=3 threads=1 user=keystone group=keystone \
                      display-name=%{GROUP}
    WSGIProcessGroup keystone-admin
    WSGIScriptAlias /krb /usr/bin/keystone-wsgi-admin
    WSGIScriptAlias / /usr/bin/keystone-wsgi-admin
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
    <IfVersion >= 2.4>
      ErrorLogFormat "%{cu}t %M"
    </IfVersion>
    ErrorLog /var/log/apache2/keystone_error.log
    CustomLog /var/log/apache2/keystone_access.log combined

    <Directory /usr/bin>
        <IfVersion >= 2.4>
            Require all granted
        </IfVersion>
        <IfVersion < 2.4>
            Order allow,deny
            Allow from all
        </IfVersion>
    </Directory>
    IncludeOptional /etc/apache2/mellon*/sp-location*.conf
    IncludeOptional /etc/apache2/kerberos*/apache-kerberos.conf
</VirtualHost>
```

Resuming both hacluster and keystone service, then triggering the config-changed-postupgrade hook let the config-changed-postupgrade to finish without error:

```
$ juju run-action --wait hacluster-keystone/0 resume
$ juju run-action --wait keystone/1 resume
$ juju run --timeout 300s -u keystone/1 hooks/config-changed-postupgrade
```

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.