Keystone register.yml task errors during deploy on existing environment

Bug #1797814 reported by Eric Miller
32
This bug affects 8 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Mark Goddard
Queens
Fix Released
Medium
Mark Goddard
Rocky
Fix Released
Medium
Mark Goddard
Stein
Fix Released
Medium
Mark Goddard
Train
Fix Released
Medium
Mark Goddard

Bug Description

While running a deploy on an existing 7.0.0.0rc3devXX environment (which includes a "reconfigure" since that just calls deploy.yml), this task errors:

TASK [keystone : Creating admin project, user, role, service, and endpoint] *******************************************************************************
task path: /usr/share/kolla-ansible/ansible/roles/keystone/tasks/register.yml:2
fatal: [controller001]: FAILED! => {"msg": "The conditional check '(keystone_bootstrap.stdout | from_json).changed' failed. The error was: Invalid control character at: line 1 column 332 (char 331)"}

It appears that the include statement here:
https://github.com/openstack/kolla-ansible/blob/3e7fc62c51f61b6afeb922edc3796e6f9c16e847/ansible/roles/keystone/tasks/bootstrap.yml#L35

does not get run, so the keystone_bootstrap stdout is never generated.

At least that is my first guess. :)

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

I missed that the "failed_when" line was referring to the "register" variable in this task - so it is NOT related to the bootstrap.yml file I mentioned.

So, something in the docker exec statement failed. I'll re-run with -vvv to see if I can get the output from the command (I only had -vv this last run).

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

It looks like the failure is caused by this SQL statement error (please forgive the lack of spaces - due to a JSON formatter):

inraise_mysql_exception2018-10-1419: 04: 07.664192ERRORkeystoneraiseerrorclass(errno,
 errval)2018-10-1419: 04: 07.664192ERRORkeystoneDBError: (pymysql.err.InternalError)(1054,
 u\\"Unknown column \'description\' in \'field list\'\\")[SQL: u\'INSERTINTO`role`(id,
 name,
 description,
 extra)VALUES(%(id)s,
 %(name)s,
 %(description)s,
 %(extra)s)\'][parameters: {
  \'extra\': \'{

  }\',
  \'description\': None,
  \'name\': \'reader\',
  \'id\': \'<redacted>\'
 }](Backgroundonthiserrorat: http: //sqlalche.me/e/2j85)2018-10-1419: 04: 07.664192ERRORkeystone\\u001b[00m\\", \\"changed\\": true}",

Revision history for this message
Eric Miller (erickmiller) wrote :

After reviewing a bit more - the SQL error above is "expected", indicating that the role already exists. And thus the "failed" return value, in the JSON response, should be "true".

This is the first chunk of the JSON response:

<controller001> (1, '\r\n{"changed": true, "end": "2018-10-14 19:04:07.801951", "stdout": "{\\"failed\\": true, \\"msg\\": \\"2018-10-14 19:04:07.013 192 DEBUG keystone.notifications [-] Callback: `keystone.application_credential.core.Manager._delete_app_creds_on_user_delete_callback` subscribed to event `identity.user.deleted`. register_event_callback /var/lib/kolla/venv/lib/python2.7/site-packages/keystone/notifications.py:286\\u001b[00m 2018-10-14 19:04:07.014 192 DEBUG keystone.notifications [-] Callback: `keystone.application_credential.core.Manager._delete_app_creds_on_user_delete_callback` subscribed to event `identity.user.disabled`. register_event_callback /var/lib/kolla/venv/lib/python2.7/site-packages/keystone/notifications.py:286\\u001b[00m

If I only take the "stdout" value:

{\\"failed\\": true, \\"msg\\": \\"2018-10-14 19:04:07.013 192 DEBUG keystone.notifications [-] Callback: `keystone.application_credential.core.Manager._delete_app_creds_on_user_delete_callback` subscribed to event `identity.user.deleted`. register_event_callback /var/lib/kolla/venv/lib/python2.7/site-packages/keystone/notifications.py:286\\u001b[00m 2018-10-14 19:04:07.014 192 DEBUG keystone.notifications [-] Callback: `keystone.application_credential.core.Manager._delete_app_creds_on_user_delete_callback` subscribed to event `identity.user.disabled`. register_event_callback /var/lib/kolla/venv/lib/python2.7/site-packages/keystone/notifications.py:286\\u001b[00m

I have no idea what it is complaining about at Character 331 (specified in the original error as the issue). But obviously the JSON parser in Ansible is unhappy.

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

I ran the docker command directly on the host so I could get clean output and it looks like this (the first few lines out of hundreds):

{"failed": true, "msg": "2018-10-14 19:55:01.731 210 DEBUG keystone.notifications [-] Callback: `keystone.application_credential.core.Manager._delete_app_creds_on_user_delete_callback` subscribed to event `identity.user.deleted`. register_event_callback /var/lib/kolla/venv/lib/python2.7/site-packages/keystone/notifications.py:286 2018-10-14 19:55:01.732 210 DEBUG keystone.notifications [-] Callback: `keystone.application_credential.core.Manager._delete_app_creds_on_user_delete_callback` subscribed to event `identity.user.disabled`. register_event_callback /var/lib/kolla/venv/lib/python2.7/site-packages/keystone/notifications.py:286 2018-10-14 19:55:01.733 210 DEBUG keystone.notifications [-] Callback:

It seems to be pointing to the space between 286 and 2018 here? notifications.py:286 2018-10-14

Any ideas?

Eric

Revision history for this message
Ingo Ebel (savar) wrote :

Any news on that?
I'm seeing the same when tying to deploy rocky.

fatal: [toscomp02]: FAILED! => {"msg": "The conditional check '(keystone_bootstrap.stdout | from_json).changed' failed. The error was: Invalid control character at: line 1 column 101 (char 100)"}

Doug Szumski (dszumski)
Changed in kolla-ansible:
status: New → Confirmed
Revision history for this message
Doug Szumski (dszumski) wrote :
Download full text (13.0 KiB)

I see something very similar to this on a three node deploy, in stein and in master.

The following task fails:

```
TASK [keystone : Creating admin project, user, role, service, and endpoint] ******************************************************************************************************************************************************************
fatal: [control01]: FAILED! => {"msg": "The conditional check '(keystone_bootstrap.stdout | from_json).changed' failed. The error was: Invalid control character at: line 1 column 284 (char 283)"}
```

If I run the command manually I get the following output:

```
$ docker exec keystone kolla_keystone_bootstrap admin top_secret admin admin http://172.28.128.254:35357 http://172.28.128.254:5000 http://172.28.128.254:5000 RegionOne
{"failed": true, "msg": "2019-07-01 12:47:09.993 92 WARNING keystone.access_rules_config.backends.json [-] No config file found for access rules, application credential access rules will be unavailable.: IOError: [Errno 2] No such file or directory: '/etc/keystone/access_rules.json' 2019-07-01 12:47:10.256 92 CRITICAL keystone [-] Unhandled error: ProgrammingError: (pymysql.err.ProgrammingError) (1146, u"Table 'keystone.project' doesn't exist") [SQL: u'INSERT INTO project (id, name, domain_id, description, enabled, extra, parent_id, is_domain) VALUES (%(id)s, %(name)s, %(domain_id)s, %(description)s, %(enabled)s, %(extra)s, %(parent_id)s, %(is_domain)s)'] [parameters: {'is_domain': 1, 'description': 'The default domain', 'extra': '{}', 'enabled': 1, 'domain_id': '<<keystone.domain.root>>', 'parent_id': None, 'id': 'default', 'name': 'Default'}] (Background on this error at: http://sqlalche.me/e/f405) 2019-07-01 12:47:10.256 92 ERROR keystone Traceback (most recent call last): 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/bin/keystone-manage", line 10, in <module> 2019-07-01 12:47:10.256 92 ERROR keystone sys.exit(main()) 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 40, in main 2019-07-01 12:47:10.256 92 ERROR keystone cli.main(argv=sys.argv, developer_config_file=developer_config) 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 1358, in main 2019-07-01 12:47:10.256 92 ERROR keystone CONF.command.cmd_class.main() 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 179, in main 2019-07-01 12:47:10.256 92 ERROR keystone klass.do_bootstrap() 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 170, in do_bootstrap 2019-07-01 12:47:10.256 92 ERROR keystone self.bootstrapper.bootstrap() 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/bootstrap.py", line 61, in bootstrap 2019-07-01 12:47:10.256 92 ERROR keystone self._bootstrap_default_domain() 2019-07-01 12:47:10.256 92 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/bootstrap.py", line 82, in _bootstrap_default_domain 2019-07-01 12:47:10.256 92 ERROR keystone domain=default_domain) 2019-07-01 12:47:10.256 92 ERROR keysto...

Revision history for this message
Mark Goddard (mgoddard) wrote :

The problem is this:

Table 'keystone.project' doesn't exist

For some reason the DB bootstrap has not happened. We do see this kind of error fairly often, due to only running the bootstrap if the DB is created in ansible/roles/keystone/tasks/bootstrap.yml.

Changed in kolla-ansible:
assignee: nobody → Mark Goddard (mgoddard)
status: Confirmed → In Progress
Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → Medium
Revision history for this message
Doug Szumski (dszumski) wrote :

A manual workaround (truncated in post above) is to run the db_sync manually for the failed service, eg:

$ docker exec keystone keystone-manage db_sync

Changed in kolla-ansible:
assignee: Mark Goddard (mgoddard) → Radosław Piliszek (yoctozepto)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/650962
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d5e5e885d11e338806425839a361d868c1f4ff10
Submitter: Zuul
Branch: master

commit d5e5e885d11e338806425839a361d868c1f4ff10
Author: Mark Goddard <email address hidden>
Date: Mon Apr 8 17:51:07 2019 +0100

    During deploy, always sync DB

    A common class of problems goes like this:

    * kolla-ansible deploy
    * Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
    * Re-run kolla-ansible deploy
    * Service fails to start

    This happens because the DB is created during the first run, but for some
    reason we fail before performing the DB sync. This means that on the second run
    we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
    already exists, and therefore still don't perform the DB sync. However this
    time, the command may complete without apparent error.

    We should be less careful about when we perform the DB sync, and do it whenever
    it is necessary. There is an argument for not doing the sync during a
    'reconfigure' command, although we will not change that here.

    This change only always performs the DB sync during 'deploy' and
    'reconfigure' commands.

    Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
    Closes-Bug: #1823766
    Closes-Bug: #1797814

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/670546

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/670548

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/670549

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/stein)

Reviewed: https://review.opendev.org/670546
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d62e927dacfa42822296e7a0720cb85019c08a9b
Submitter: Zuul
Branch: stable/stein

commit d62e927dacfa42822296e7a0720cb85019c08a9b
Author: Mark Goddard <email address hidden>
Date: Mon Apr 8 17:51:07 2019 +0100

    During deploy, always sync DB

    A common class of problems goes like this:

    * kolla-ansible deploy
    * Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
    * Re-run kolla-ansible deploy
    * Service fails to start

    This happens because the DB is created during the first run, but for some
    reason we fail before performing the DB sync. This means that on the second run
    we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
    already exists, and therefore still don't perform the DB sync. However this
    time, the command may complete without apparent error.

    We should be less careful about when we perform the DB sync, and do it whenever
    it is necessary. There is an argument for not doing the sync during a
    'reconfigure' command, although we will not change that here.

    This change only always performs the DB sync during 'deploy' and
    'reconfigure' commands.

    Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
    Closes-Bug: #1823766
    Closes-Bug: #1797814
    (cherry picked from commit d5e5e885d11e338806425839a361d868c1f4ff10)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 8.0.0.0rc2

This issue was fixed in the openstack/kolla-ansible 8.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/queens)

Reviewed: https://review.opendev.org/670549
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8e219a91738048ff16f2caed4c24f85832211aad
Submitter: Zuul
Branch: stable/queens

commit 8e219a91738048ff16f2caed4c24f85832211aad
Author: Mark Goddard <email address hidden>
Date: Mon Apr 8 17:51:07 2019 +0100

    During deploy, always sync DB

    A common class of problems goes like this:

    * kolla-ansible deploy
    * Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
    * Re-run kolla-ansible deploy
    * Service fails to start

    This happens because the DB is created during the first run, but for some
    reason we fail before performing the DB sync. This means that on the second run
    we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
    already exists, and therefore still don't perform the DB sync. However this
    time, the command may complete without apparent error.

    We should be less careful about when we perform the DB sync, and do it whenever
    it is necessary. There is an argument for not doing the sync during a
    'reconfigure' command, although we will not change that here.

    This change only always performs the DB sync during 'deploy' and
    'reconfigure' commands.

    Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
    Closes-Bug: #1823766
    Closes-Bug: #1797814
    (cherry picked from commit d5e5e885d11e338806425839a361d868c1f4ff10)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/670548
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=441c5a03f7e788336f3a7c1cb15dcec207073b07
Submitter: Zuul
Branch: stable/rocky

commit 441c5a03f7e788336f3a7c1cb15dcec207073b07
Author: Mark Goddard <email address hidden>
Date: Mon Apr 8 17:51:07 2019 +0100

    During deploy, always sync DB

    A common class of problems goes like this:

    * kolla-ansible deploy
    * Hit a problem, often in ansible/roles/*/tasks/bootstrap.yml
    * Re-run kolla-ansible deploy
    * Service fails to start

    This happens because the DB is created during the first run, but for some
    reason we fail before performing the DB sync. This means that on the second run
    we don't include ansible/roles/*/tasks/bootstrap_service.yml because the DB
    already exists, and therefore still don't perform the DB sync. However this
    time, the command may complete without apparent error.

    We should be less careful about when we perform the DB sync, and do it whenever
    it is necessary. There is an argument for not doing the sync during a
    'reconfigure' command, although we will not change that here.

    This change only always performs the DB sync during 'deploy' and
    'reconfigure' commands.

    Change-Id: I82d30f3fcf325a3fdff3c59f19a1f88055b566cc
    Closes-Bug: #1823766
    Closes-Bug: #1797814
    (cherry picked from commit d5e5e885d11e338806425839a361d868c1f4ff10)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 6.2.2

This issue was fixed in the openstack/kolla-ansible 6.2.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.1.2

This issue was fixed in the openstack/kolla-ansible 7.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 9.0.0.0rc1 release candidate.

Revision history for this message
Rowan Potgieter (rowan-potgieter) wrote :

Hi, we have hit this issue using kolla-ansible 8.0.1 (installed from pip) as well as the `stable/stein` branch when deploying using the git repository.

From what we are seeing the error does not appear to be database related at all. There is actually a "NULL" value in the output of this command:

docker exec keystone kolla_keystone_bootstrap {{ openstack_auth.username }} {{ openstack_auth.password }} {{ openstack_auth.project_name }} admin {{ keystone_admin_url }} {{ keystone_internal_url }} {{ keystone_public_url }} {{ item }}

I ran the command manually and saved the result in file.txt. I then cat the file showing non-printable characters:
   $ cat -vE file.txt
   {"failed": true, "msg": "2019-11-25 12:57:43.952 541 WARNING keystone.access_rules_config.backends.json [-] No config file found for access rules, application credential access rules will be unavailable.: FileNotFoundError: [Errno 2] No such file or directory: '/etc/keystone/access_rules.json'^[[00m /etc/keystone/fernet-keys/ does not exist", "changed": true}$

I believe the ^[[00m in the "msg" field is causing the issue. If I pipe the response to jq I receive similar errors:
   parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 1, column 342

I think it is worth investigating why "docker exec keystone kolla_keystone_bootstrap" outputs a NULL character.

Revision history for this message
Rowan Potgieter (rowan-potgieter) wrote :

Sorry - minor correction to my previous statement, it's an escape character (^[) not a NULL character.
I tested piping the same file.txt through a tr command to remove non-printable characters:

# tr -cd "[:print:]\n" < file.txt
2019-11-25 13:53:52.264 619 WARNING keystone.access_rules_config.backends.json [-] No config file found for access rules, application credential access rules will be unavailable.: FileNotFoundError: [Errno 2] No such file or directory: '/etc/keystone/access_rules.json'[00m
/etc/keystone/fernet-keys/ does not exist

Revision history for this message
Mark Goddard (mgoddard) wrote :

That's good information Rowan, we do see some odd failures during this task. I did a little googling and ^]]00m is an escape character for terminal colour control that should reset the attributes to normal. Perhaps we need to remove the control codes from the output?

There is also the underlying issue, which is that /etc/keystone/fernet-keys/ does not exist

Revision history for this message
Rowan Potgieter (rowan-potgieter) wrote :

Hi Mark, I would recommend removing the control codes, it will break anytime you pipe stdout or stderr to from_json to get the changed status.

I never managed to figure out the issue with the missing fernet-keys, in the end we decided to perform a fresh kolla-ansible deploy since I suspect we broke something.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Raised a separate bug for the control code issue: https://bugs.launchpad.net/kolla-ansible/+bug/1855701

Mark Goddard (mgoddard)
Changed in kolla-ansible:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.