Creating the Keystone service and endpoint Failed: Internal Server Error (HTTP 500)

Bug #1846789 reported by Larry Lile
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Michal Nasiadka
Stein
Triaged
High
Unassigned
Train
Fix Released
High
Mark Goddard
Ussuri
Fix Released
High
Michal Nasiadka

Bug Description

OS: Red Hat Enterprise Linux Server release 7.6 (Maipo)
Kernel: 3.10.0-957.27.2.el7.x86_64 #1
Docker Version: Client: Docker Engine - Community 19.03.2
Kolla version: 8.0.2.dev20

When deploying stein with the control nodes on a separate network from network, monitoring and compute keystone fails to create the service and endpoint with the following error.

TASK [keystone : Creating the Keystone service and endpoint] **************************************************************************************
failed: [odt-rsd-ost-ctr-a1.example.com] (item={u'interface': u'admin', u'url': u'http://10.239.195.200:35357'}) => {"changed": true, "item": {"interface": "admin", "url": "http://10.239.195.200:35357"}, "msg": "'Traceback (most recent call last):\\n File \"/tmp/ansible_Ybr9ij/ansible_module_kolla_keystone_service.py\", line 58, in main\\n for _service in cloud.keystone_client.services.list():\\n File \"/opt/ansible/lib/python2.7/site-packages/shade/_legacy_clients.py\", line 95, in keystone_client\\n self._identity_client\\n File \"/opt/ansible/lib/python2.7/site-packages/shade/openstackcloud.py\", line 616, in _identity_client\\n \\'identity\\', min_version=2, max_version=\\'3.latest\\')\\n File \"/opt/ansible/lib/python2.7/site-packages/shade/openstackcloud.py\", line 506, in _get_versioned_client\\n if adapter.get_endpoint():\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/adapter.py\", line 282, in get_endpoint\\n return self.session.get_endpoint(auth or self.auth, **kwargs)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/session.py\", line 1200, in get_endpoint\\n return auth.get_endpoint(self, **kwargs)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/identity/base.py\", line 380, in get_endpoint\\n allow_version_hack=allow_version_hack, **kwargs)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/identity/base.py\", line 271, in get_endpoint_data\\n service_catalog = self.get_access(session).service_catalog\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/identity/base.py\", line 134, in get_access\\n self.auth_ref = self.get_auth_ref(session)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/identity/generic/base.py\", line 208, in get_auth_ref\\n return self._plugin.get_auth_ref(session, **kwargs)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/identity/v3/base.py\", line 184, in get_auth_ref\\n authenticated=False, log=False, **rkwargs)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/session.py\", line 1106, in post\\n return self.request(url, \\'POST\\', **kwargs)\\n File \"/opt/ansible/lib/python2.7/site-packages/keystoneauth1/session.py\", line 943, in request\\n raise exceptions.from_response(resp, method, url)\\nInternalServerError: Internal Server Error (HTTP 500)\\n'"}

This is the inventory configuration (truncated) used. [control] is in a different network than everything else, the kolla_internal_vip_address resides in the same network as [compute] and [network].

[control]
odt-rsd-ost-ctr-a1.example.com ansible_user=root ansible_password=password ansible_become=true
odt-rsd-ost-ctr-a2.example.com ansible_user=root ansible_password=password ansible_become=true
odt-rsd-ost-ctr-a3.example.com ansible_user=root ansible_password=password ansible_become=true

[network]
hv-10-017.example.com ansible_user=root ansible_password=password ansible_become=true
hv-10-022.example.com ansible_user=root ansible_password=password ansible_become=true
hv-10-027.example.com ansible_user=root ansible_password=password ansible_become=true

[compute]
hv-10-017.example.com ansible_user=root ansible_password=password ansible_become=true
hv-10-022.example.com ansible_user=root ansible_password=password ansible_become=true
hv-10-027.example.com ansible_user=root ansible_password=password ansible_become=true

[monitoring]
hv-10-017.example.com ansible_user=root ansible_password=password ansible_become=true
hv-10-022.example.com ansible_user=root ansible_password=password ansible_become=true
hv-10-027.example.com ansible_user=root ansible_password=password ansible_become=true

[storage]
hv-10-017.example.com ansible_user=root ansible_password=password ansible_become=true

[deployment]
localhost ansible_connection=local

[haproxy:children]
network

Keystone is reporting the following in both the admin and public error log:

2019-10-03 17:15:29.755102 mod_wsgi (pid=25): Target WSGI script '/usr/bin/keystone-wsgi-admin' cannot be loaded as Python module.
2019-10-03 17:15:29.755148 mod_wsgi (pid=25): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-admin'.
2019-10-03 17:15:29.755199 Traceback (most recent call last):
2019-10-03 17:15:29.755247 File "/usr/bin/keystone-wsgi-admin", line 52, in <module>
2019-10-03 17:15:29.755328 application = initialize_admin_application()
2019-10-03 17:15:29.755363 File "/usr/lib/python2.7/site-packages/keystone/server/wsgi.py", line 24, in initialize_public_application
2019-10-03 17:15:29.755420 name='public', config_files=flask_core._get_config_files())
2019-10-03 17:15:29.755452 File "/usr/lib/python2.7/site-packages/keystone/server/flask/core.py", line 151, in initialize_application
2019-10-03 17:15:29.755508 keystone.server.configure(config_files=config_files)
2019-10-03 17:15:29.755540 File "/usr/lib/python2.7/site-packages/keystone/server/__init__.py", line 28, in configure
2019-10-03 17:15:29.755595 keystone.conf.configure()
2019-10-03 17:15:29.755628 File "/usr/lib/python2.7/site-packages/keystone/conf/__init__.py", line 139, in configure
2019-10-03 17:15:29.755689 deprecated_since=versionutils.deprecated.STEIN))
2019-10-03 17:15:29.755722 File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2045, in __inner
2019-10-03 17:15:29.755778 result = f(self, *args, **kwargs)
2019-10-03 17:15:29.755809 File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2323, in register_cli_opt
2019-10-03 17:15:29.755913 raise ArgsAlreadyParsedError("cannot register CLI option")
2019-10-03 17:15:29.755983 ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Please attach (from the controller nodes):
* full keystone logs (those under /var/log/kolla/keystone)
* docker logs for the keystone container
* /var/log/kolla/ansible.log
(you can archive that together for a cleaner upload)

And also:
* globals.yml

ArgsAlreadyParsedError is a weird error to get from that.

Changed in kolla:
status: New → Incomplete
Revision history for this message
Larry Lile (llile) wrote :

The requested logs have been provided, with slight redactions to domain names.

md5sum cabcf4456e6077ed20035e288abcb226 1846789.tar.gz

Revision history for this message
Larry Lile (llile) wrote :

Just to add some data points here for reference:

1. Train using stable/train exhibits this problem

2. Stein using stable/stein exhibits this problem

3. Rocky using stable/rocky works as expected
   [control] on network 1
   [monitoring] on network 1
   [compute,network] on network 2
   [haproxy:children]: control (on network 1)

The scenario described in 3 is our target, where control and haproxy are co-located on hosts, on network 1, monitoring is on network 1 while compute, network, storage are on network 2.

The reasoning behind this split is control, haproxy and monitoring are in a protected data center, compute, network, storage are in a utility data center without power protection.

Changed in kolla:
status: Incomplete → New
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Were there no ansible.log in logs dir?

Train is on master. stable/train is not there yet.

Are you using kolla-ansible from branch stable/stein when deploying those stein images?

Also, your def of haproxy:children differs between report (network) and comment (control) - that changes behavior.

Changed in kolla:
status: New → Incomplete
Revision history for this message
Larry Lile (llile) wrote :

Hi,

Yes, to clarify, on train we are using master and on rocky we are using stable/rocky. Train and Stein exhibit the same behavior, Rocky works successfully.

With regard to the logs and current bug report:

Yes, we are using kolla-ansible stable/stein deploying the official stein images.

Yes, currently we are deploying haproxy:children to (network) not (control).

Once we get past the keystone issue we intend to change haproxy:children over to (control) but that seems to cause a different failure. However, I assumed it would be easier to figure out the keystone issue if I simplified the configuration.

Sorry for any confusion.

Revision history for this message
Larry Lile (llile) wrote :

The ansible.log files are attached as requested.

Changed in kolla:
status: Incomplete → New
affects: kolla → kolla-ansible
Changed in kolla-ansible:
importance: Undecided → High
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I'm puzzled. Could you also include the generated configs on the deployed nodes? The /etc/kolla stuff.

Changed in kolla-ansible:
status: New → Incomplete
Revision history for this message
Larry Lile (llile) wrote :

The cluster has been redeployed since my last update, however the same issue persists.

I have gathered all deployment and configuration details along with logs for all subsystems.

Please let me know what else I can do to assist with debugging this issue.

Revision history for this message
Larry Lile (llile) wrote :

With regard to the data uploaded in 1846789_2.tar.gz, this is with haproxy running on the control nodes.

Revision history for this message
Larry Lile (llile) wrote :

The logs attached in 1846789_3.tar.gz show the same failure, but in this iteration I have haproxy running on network. So it appears that the location of haproxy does not affect the issue.

Revision history for this message
Larry Lile (llile) wrote :

I don't know what has changed but I just had a successful deployment with kolla-ansible 8.0.2.dev24 using the stein docker images. Mon Oct 14 17:03:22 EDT 2019

Revision history for this message
Mark Goddard (mgoddard) wrote :

Is it working reliably for you now?

Revision history for this message
Larry Lile (llile) wrote :

I tried moving haproxy to control and it failed, I am retrying with haproxy on network to see if I can reproduce the original result.

Revision history for this message
Larry Lile (llile) wrote :
Download full text (9.3 KiB)

I was able to re-deploy with haproxy on the network nodes. I did hit an issue with the heat versioning API failure during bootstrap, this seems to be an intermittent problem. Performing an additional deploy got past the issue.

TASK [heat : Running Heat bootstrap container] ***********************************************************************************************************************************
fatal: [odt-rsd-ost-ctr-a1.example.com -> odt-rsd-ost-ctr-a1.example.com]: FAILED! => {"changed": true, "msg": "Container exited with non-zero return code 1", "rc": 1, "stderr": "+ sudo -E kolla_set_configs\nINFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json\nINFO:__main__:Validating config file\nINFO:__main__:Kolla config strategy set to: COPY_ALWAYS\nINFO:__main__:Copying service configuration files\nINFO:__main__:Deleting /etc/heat/heat.conf\nINFO:__main__:Copying /var/lib/kolla/config_files/heat.conf to /etc/heat/heat.conf\nINFO:__main__:Setting permission for /etc/heat/heat.conf\nINFO:__main__:Writing out command to execute\n++ cat /run_command\n+ CMD=heat-api\n+ ARGS=\n+ [[ ! -n '' ]]\n+ . kolla_extend_start\n++ [[ ! -d /var/log/kolla/heat ]]\n++ mkdir -p /var/log/kolla/heat\n+++ stat -c %a /var/log/kolla/heat\n++ [[ 2755 != \\7\\5\\5 ]]\n++ chmod 755 /var/log/kolla/heat\n+++ whoami\n++ [[ heat == \\r\\o\\o\\t ]]\n++ . /usr/local/bin/kolla_heat_extend_start\n+++ [[ -n 0 ]]\n+++ heat-manage db_sync\n++++ openstack domain list\n++++ awk '{print $4}'\n++++ grep heat\nFailed to contact the endpoint at http://10.239.195.200:5000 for discovery. Fallback to using that endpoint as the base url.\nNot Found (HTTP 404) (Request-ID: req-1e9bea57-85a9-4a84-ac4c-b1d17dde6c2e)\n+++ CURRENT_HEAT_DOMAIN_NAME=\n+++ [[ heat_user_domain != '' ]]\n+++ openstack domain create heat_user_domain\n+++ openstack user create --domain heat_user_domain heat_domain_admin --password Op3nSt@ck\nget() takes exactly 1 argument (2 given)\n", "stderr_lines": ["+ sudo -E kolla_set_configs", "INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json", "INFO:__main__:Validating config file", "INFO:__main__:Kolla config strategy set to: COPY_ALWAYS", "INFO:__main__:Copying service configuration files", "INFO:__main__:Deleting /etc/heat/heat.conf", "INFO:__main__:Copying /var/lib/kolla/config_files/heat.conf to /etc/heat/heat.conf", "INFO:__main__:Setting permission for /etc/heat/heat.conf", "INFO:__main__:Writing out command to execute", "++ cat /run_command", "+ CMD=heat-api", "+ ARGS=", "+ [[ ! -n '' ]]", "+ . kolla_extend_start", "++ [[ ! -d /var/log/kolla/heat ]]", "++ mkdir -p /var/log/kolla/heat", "+++ stat -c %a /var/log/kolla/heat", "++ [[ 2755 != \\7\\5\\5 ]]", "++ chmod 755 /var/log/kolla/heat", "+++ whoami", "++ [[ heat == \\r\\o\\o\\t ]]", "++ . /usr/local/bin/kolla_heat_extend_start", "+++ [[ -n 0 ]]", "+++ heat-manage db_sync", "++++ openstack domain list", "++++ awk '{print $4}'", "++++ grep heat", "Failed to contact the endpoint at http://10.239.195.200:5000 for discovery. Fallback to using that endpoint as the base url.", "Not Found (HTTP 404) (Request-ID: req-1e9bea57-85a9-4a84-ac4c-b1d17dde6c2e)", "+++ CURRENT_H...

Read more...

Revision history for this message
Larry Lile (llile) wrote :

This is currently working on Stein.

However, I have tried this on master (Train) with kolla-ansible 8.1.0.dev589 and it is still broken but less broken. The kolla-ansible deploy completes successfully, but horizon or keystone seem to be broken. Attempting to login through the horizon UI as admin results in "Unable to retrieve authorized projects."

Does that give us any insight into what fixed Stein?

Errors on master from keystone seem the same

2019-10-14 11:13:24.616538 mod_wsgi (pid=24): Target WSGI script '/usr/bin/keystone-wsgi-public' cannot be loaded as Python module.
2019-10-14 11:13:24.616648 mod_wsgi (pid=24): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-public'.
2019-10-14 11:13:24.616710 Traceback (most recent call last):
2019-10-14 11:13:24.616766 File "/usr/bin/keystone-wsgi-public", line 52, in <module>
2019-10-14 11:13:24.617004 application = initialize_public_application()
2019-10-14 11:13:24.617040 File "/usr/lib/python2.7/site-packages/keystone/server/wsgi.py", line 24, in initialize_public_application
2019-10-14 11:13:24.617102 name='public', config_files=flask_core._get_config_files())
2019-10-14 11:13:24.617133 File "/usr/lib/python2.7/site-packages/keystone/server/flask/core.py", line 151, in initialize_application
2019-10-14 11:13:24.617239 keystone.server.configure(config_files=config_files)
2019-10-14 11:13:24.617274 File "/usr/lib/python2.7/site-packages/keystone/server/__init__.py", line 28, in configure
2019-10-14 11:13:24.617368 keystone.conf.configure()
2019-10-14 11:13:24.617406 File "/usr/lib/python2.7/site-packages/keystone/conf/__init__.py", line 139, in configure
2019-10-14 11:13:24.617470 deprecated_since=versionutils.deprecated.STEIN))
2019-10-14 11:13:24.617500 File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2045, in __inner
2019-10-14 11:13:24.617557 result = f(self, *args, **kwargs)
2019-10-14 11:13:24.617587 File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2323, in register_cli_opt
2019-10-14 11:13:24.617667 raise ArgsAlreadyParsedError("cannot register CLI option")
2019-10-14 11:13:24.617735 ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option

Revision history for this message
Larry Lile (llile) wrote :

My string of successful deployments seems to have ended abruptly. This is failing again.

Revision history for this message
Larry Lile (llile) wrote :

I'm still having consistent failures, with the occasional successful deployment. However, even when the deploy appears to be successful keystone is still producing the error. Any ideas?

2019-10-22 10:51:13.797434 mod_wsgi (pid=18): Target WSGI script '/usr/bin/keystone-wsgi-public' cannot be loaded as Python module.
2019-10-22 10:51:13.797531 mod_wsgi (pid=18): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-public'.
2019-10-22 10:51:13.797596 Traceback (most recent call last):
2019-10-22 10:51:13.797652 File "/usr/bin/keystone-wsgi-public", line 52, in <module>
2019-10-22 10:51:13.797741 application = initialize_public_application()
2019-10-22 10:51:13.797776 File "/usr/lib/python2.7/site-packages/keystone/server/wsgi.py", line 24, in initialize_public_application
2019-10-22 10:51:13.797840 name='public', config_files=flask_core._get_config_files())
2019-10-22 10:51:13.797873 File "/usr/lib/python2.7/site-packages/keystone/server/flask/core.py", line 151, in initialize_application
2019-10-22 10:51:13.797933 keystone.server.configure(config_files=config_files)
2019-10-22 10:51:13.797966 File "/usr/lib/python2.7/site-packages/keystone/server/__init__.py", line 28, in configure
2019-10-22 10:51:13.798024 keystone.conf.configure()
2019-10-22 10:51:13.798057 File "/usr/lib/python2.7/site-packages/keystone/conf/__init__.py", line 139, in configure
2019-10-22 10:51:13.798174 deprecated_since=versionutils.deprecated.STEIN))
2019-10-22 10:51:13.798273 File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2045, in __inner
2019-10-22 10:51:13.798344 result = f(self, *args, **kwargs)
2019-10-22 10:51:13.798377 File "/usr/lib/python2.7/site-packages/oslo_config/cfg.py", line 2323, in register_cli_opt
2019-10-22 10:51:13.798434 raise ArgsAlreadyParsedError("cannot register CLI option")
2019-10-22 10:51:13.798490 ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option

Revision history for this message
Mark Goddard (mgoddard) wrote :

That suggests that this error might not be the problematic one.

Revision history for this message
Klemen Pogacnik (kemopq) wrote :

I'm working on rocky release and have the same problem. It happens on each deploy on three-node cluster. I haven't notice this problem on single-node deployment.
After deploying keystone, its endpoint (port 5000) is quite unstable. It works five, six times and then fails with "500 Internal Server Error". Then it works again some time and again fails. When Internal server error is returned, the keystone error logs are the same as Larry had:

mod_wsgi (pid=20): Target WSGI script '/var/lib/kolla/venv/bin/keystone-wsgi-admin' cannot be loaded as Python module.
mod_wsgi (pid=20): Exception occurred processing WSGI script '/var/lib/kolla/venv/bin/keystone-wsgi-admin'.

I checked on keystone container on each node and there is no rule, sometimes only one has this problem, sometimes two.
Cloud deploymend then fails when some other module needs keystone endpoint.

I found out, that restarting keystone containers and redeploying cloud resolve this situation.

One more note, I did't have those problems with the rocky version from Nov 7th and I and my collegues deployed single and multicloud many times.

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

Probably found the cause - it seems we have some race-ish condition, and sometimes public keystone wsgi comes up without fernet keys in /etc/keystone/fernet-keys/ - visible in this logfile:
https://88763c12f13d1aeca43c-63681721353a54dab1064b012b97b3cb.ssl.cf1.rackcdn.com/698452/2/check/kolla-ansible-ubuntu-source-ceph/09da3d0/primary/logs/kolla/keystone/keystone-apache-public-error.txt.gz

And then persists on not coming up properly.

Working on patches in Kolla (fix fernet bootstrap so it properly sets changed flag) and Kolla-Ansible (restart keystone containers after fernet bootstrap or a similar solution).

Changed in kolla-ansible:
status: Incomplete → Triaged
assignee: nobody → Michal Nasiadka (mnasiadka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/698710

Changed in kolla-ansible:
status: Triaged → In Progress
Revision history for this message
Larry Lile (llile) wrote :

Hi Michal,

I have been tracking the updates to change 698710 on OpenDev. We are successfully using one of the intermediate patches you supplied earlier with great success.

I wanted to thank you, and the reviewers, for all your hard work on this issue so far!

I guessing your final patch will not only address this presentation of the problem but many others that have yet to be reported or identified.

Thank you!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/698710
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=0799782ce83d1057f262b44c979a15f9a1b05c72
Submitter: Zuul
Branch: master

commit 0799782ce83d1057f262b44c979a15f9a1b05c72
Author: Michal Nasiadka <email address hidden>
Date: Thu Dec 12 13:19:48 2019 +0100

    Fix keystone fernet bootstrap

    There are cases when a multinode deployment ends up in unusable
    keystone public wsgi on some nodes.

    The root cause is that keystone public wsgi doesn't find fernet
    keys on startup - and then persists on sending 500 errors to any
    requests - due to a race condition between
    fernet_setup/fernet-push.sh and keystone startup.

    Depends-On: https://review.opendev.org/703742/
    Change-Id: I63709c2e3f6a893db82a05640da78f492bf8440f
    Closes-Bug: #1846789

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/705949

Revision history for this message
Doug Szumski (dszumski) wrote :

I believe I have just hit this on Train also.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Very likely, this is Train-affecting.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Neither Train nor Stein had any related fixes. The fix by Michał is awaiting followup and backporting.

Revision history for this message
Doug Szumski (dszumski) wrote :

I'm seeing this on master in CI too now:

https://zuul.opendev.org/t/openstack/build/78feeaaa93014e1ab7ef8b7fafa70eb7/log/primary/logs/kolla/keystone/keystone-apache-public-error.txt#14944

```
2020-06-24 15:21:10.668579 mod_wsgi (pid=34, process='keystone-public', application=''): Loading Python script file '/var/lib/kolla/venv/bin/keystone-wsgi-public'.
2020-06-24 15:21:10.669624 mod_wsgi (pid=34): Failed to exec Python script file '/var/lib/kolla/venv/bin/keystone-wsgi-public'.
2020-06-24 15:21:10.669662 mod_wsgi (pid=34): Exception occurred processing WSGI script '/var/lib/kolla/venv/bin/keystone-wsgi-public'.
2020-06-24 15:21:10.669837 Traceback (most recent call last):
2020-06-24 15:21:10.669879 File "/var/lib/kolla/venv/bin/keystone-wsgi-public", line 52, in <module>
2020-06-24 15:21:10.669882 application = initialize_public_application()
2020-06-24 15:21:10.669889 File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystone/server/wsgi.py", line 24, in initialize_public_application
2020-06-24 15:21:10.669892 name='public', config_files=flask_core._get_config_files())
2020-06-24 15:21:10.669898 File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystone/server/flask/core.py", line 157, in initialize_application
2020-06-24 15:21:10.669901 keystone.server.configure(config_files=config_files)
2020-06-24 15:21:10.670073 File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystone/server/__init__.py", line 28, in configure
2020-06-24 15:21:10.670076 keystone.conf.configure()
2020-06-24 15:21:10.670081 File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystone/conf/__init__.py", line 137, in configure
2020-06-24 15:21:10.670084 deprecated_since=versionutils.deprecated.STEIN))
2020-06-24 15:21:10.670089 File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py", line 2055, in __inner
2020-06-24 15:21:10.670092 result = f(self, *args, **kwargs)
2020-06-24 15:21:10.670098 File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_config/cfg.py", line 2333, in register_cli_opt
2020-06-24 15:21:10.670100 raise ArgsAlreadyParsedError("cannot register CLI option")
2020-06-24 15:21:10.670115 oslo_config.cfg.ArgsAlreadyParsedError: arguments already parsed: cannot register CLI option
```

Revision history for this message
Doug Szumski (dszumski) wrote :
Download full text (10.8 KiB)

I think this goes beyond the Fernet issues - on a single node Train (Centos7) deploy I see the same issue. It is sporadic - sometimes for example a keystone endpoint list works and others not. As soon as I restart Keystone the issue is gone.

It appears to be related to two keystone calls in close proximity. A keystone token issue never fails. An endpoint list always gets a token, but then the GET/v3/endpoints fails. An example:

```==== # openstack endpoint list (FAILS) ==================

 ==> keystone-apache-admin-access.log <==
 192.168.33.2 - - [29/Jun/2020:16:29:15 +0000] "GET /v3 HTTP/1.1" 200 253 3251 "-" "openstacksdk/0.36.3 keystoneauth1/3.17.2 python-requests/2.22.0 CPython/2.7
 .5"
 192.168.33.2 - - [29/Jun/2020:16:29:15 +0000] "POST /v3/auth/tokens HTTP/1.1" 201 2347 331455 "-" "openstacksdk/0.36.3 keystoneauth1/3.17.2 python-requests/2.
 22.0 CPython/2.7.5"
 192.168.33.2 - - [29/Jun/2020:16:29:16 +0000] "POST /v3/auth/tokens HTTP/1.1" 201 2347 341851 "-" "openstacksdk/0.36.3 keystoneauth1/3.17.2 python-requests/2.
 22.0 CPython/2.7.5"

 ==> keystone-apache-public-access.log <==
 192.168.33.2 - - [29/Jun/2020:16:29:16 +0000] "GET / HTTP/1.1" 300 267 2879 "-" "openstacksdk/0.36.3 keystoneauth1/3.17.2 python-requests/2.22.0 CPython/2.7.5
 "
 192.168.33.2 - - [29/Jun/2020:16:29:16 +0000] "GET /v3/endpoints HTTP/1.1" 200 2926 37411 "-" "python-keystoneclient"

 ==> keystone-apache-public-error.log <==
 2020-06-29 16:29:16.734104 mod_wsgi (pid=23): Target WSGI script '/usr/bin/keystone-wsgi-public' cannot be loaded as Python module.
 2020-06-29 16:29:16.734150 mod_wsgi (pid=23): Exception occurred processing WSGI script '/usr/bin/keystone-wsgi-public'.
 2020-06-29 16:29:16.734195 Traceback (most recent call last): ...

Revision history for this message
Mark Goddard (mgoddard) wrote :

I do see a fernet error in the link you posted though Doug.

SystemExit: /etc/keystone/fernet-keys/ does not contain keys, use keystone-manage fernet_setup to create Fernet keys.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Could you try again with a depends-on to this patch? https://review.opendev.org/#/c/707080/

Revision history for this message
Doug Szumski (dszumski) wrote :

Thanks, I missed that. Trying with the above patch.

On my single node Train deploy I also see:

```
keystone/keystone-apache-public-error.log:2020-06-29 17:11:21.562830 File "/usr/lib/python2.7/site-packages/keystone/receipt/providers/fernet/core.py", line
 45, in __init__
keystone/keystone-apache-public-error.log:2020-06-29 17:11:21.562879 'Fernet keys.') % subs)
keystone/keystone-apache-public-error.log:2020-06-29 17:11:21.562902 SystemExit: /etc/keystone/fernet-keys/ does not contain keys, use keystone-manage fernet_
setup to create Fernet keys.
keystone/keystone-apache-public-error.log:2020-06-29 17:11:25.810353 File "/usr/lib/python2.7/site-packages/keystone/receipt/providers/fernet/core.py", line
 45, in __init__
keystone/keystone-apache-public-error.log:2020-06-29 17:11:25.810403 'Fernet keys.') % subs)
keystone/keystone-apache-public-error.log:2020-06-29 17:11:25.810427 SystemExit: /etc/keystone/fernet-keys/ does not contain keys, use keystone-manage fernet_
setup to create Fernet keys.
```

With the following fernet-keys folder in the container (but perhaps not when Keystone started):

(keystone)[root@controller1 fernet-keys]# ls -la
total 8
drwxrwx--- 2 keystone keystone 24 Jun 29 17:11 .
drwxr-x--- 1 root keystone 46 Jun 30 08:30 ..
-rw------- 1 keystone keystone 44 Jun 29 17:11 0
-rw------- 1 keystone keystone 44 Jun 29 17:11 1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/705949
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=ac38a48754de8484c2ee83534ba2df530caa2db3
Submitter: Zuul
Branch: stable/train

commit ac38a48754de8484c2ee83534ba2df530caa2db3
Author: Michal Nasiadka <email address hidden>
Date: Thu Dec 12 13:19:48 2019 +0100

    Fix keystone fernet bootstrap

    There are cases when a multinode deployment ends up in unusable
    keystone public wsgi on some nodes.

    The root cause is that keystone public wsgi doesn't find fernet
    keys on startup - and then persists on sending 500 errors to any
    requests - due to a race condition between
    fernet_setup/fernet-push.sh and keystone startup.

    Depends-On: https://review.opendev.org/705958
    Change-Id: I63709c2e3f6a893db82a05640da78f492bf8440f
    Closes-Bug: #1846789
    (cherry picked from commit 0799782ce83d1057f262b44c979a15f9a1b05c72)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.3.0

This issue was fixed in the openstack/kolla-ansible 9.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.