Ocata -> Pike: overcloud upgrade returns 500 error at the end (race condition?)

Bug #1712974 reported by Cédric Jeanneret deactivated on 2017-08-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

Dear Stackers,

While wanting to upgrade our lab ocata to pike, we got a weird situation.

Used command:
openstack overcloud deploy \
  --templates $TEMPLATES \
  -e ./upgrade-pike.yaml \
  -e ./openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml \
  $ENVIRONMENTS

The "upgrade-pike.yaml" contains the commands to be executed in order to update the source lists on the nodes, as per http://tripleo.org/post_deployment/upgrade.html
The other variables are set and points to the correct directories/files.

All seemed to work, but in the end we got that error:
2017-08-24 09:50:51Z [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully

 Stack overcloud UPDATE_COMPLETE

Internal Server Error (HTTP 500)

Yep, "COMPLETE" but right after that we got a 500 error, meaning, in the end, "failed".

The current state: horizon doesn't allow any login (returns a 500), but we apparently can do calls to the API directly using the `openstack` command.

After digging a bit more and deeper, I found this weird error in /var/log/keystone/keystone.log:
2017-08-25 05:36:15.988 482029 DEBUG keystone.common.fernet_utils [req-58adc1c1-f203-4904-8152-10d0637c05bb - - - - -] Loaded 2 Fernet keys from /etc/keystone/fernet-keys, but `[fernet_tokens] max_active_keys = 5`; perhaps there have not been enough key rotations to reach `max_active_keys` yet? load_keys /usr/lib/python2.7/site-packages/keystone/common/fernet_utils.py:306

Searching for the error message, I stumbled on many occurrences, for older version, and they apparently were solved in the meanwhile, like:
- https://bugs.launchpad.net/openstack-ansible/+bug/1510244
- https://bugs.launchpad.net/keystone/+bug/1473567
- https://bugzilla.redhat.com/show_bug.cgi?id=1393435 (state: WONTFIX..... ?!)

I'm not 100% sure this is the realy issue, as it's a DEBUG message and not an ERROR, and not 100% sure keystone is broken as we actually can call the API (note: I didn't test all the capabilities - only `openstack server list`).

Has anyone already hit that situation?

Thanks in advance.

Best regards,

C.

Update: might be unrelated to keystone, more likely some race condition between something restarted on the overcloud services and some query done by the deploy process (it wants to show the keystone endpoint at the end, right?)

summary: - Ocata -> Pike: overcloud upgrade fails due to keystone/fernet
+ Ocata -> Pike: overcloud upgrade returns 500 error at the end (race
+ condition?)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → pike-rc2
tags: added: upgrade
Changed in tripleo:
milestone: pike-rc2 → queens-1
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1

Cédric, is this still an issue?

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers