TASK [keystone : Initialise fernet key authentication] Failed

Bug #1748065 reported by Liyingjun
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Triaged
Medium
Unassigned
Stein
Triaged
Medium
Unassigned
Train
Triaged
Medium
Unassigned
Ussuri
Triaged
Medium
Unassigned

Bug Description

Using pike, deploy may failed with:
TASK [keystone : Initialise fernet key authentication] **************************************************************
fatal: [pike-controller1]: FAILED! => {"failed": true, "msg": "The conditional check 'fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1' failed. The error was: error while evaluating conditional (fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1): list object has no element 2"}

Liyingjun (liyingjun)
Changed in kolla-ansible:
assignee: nobody → Liyingjun (liyingjun)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/548182

Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
Maciej Kucia (maciejkucia) wrote :

I have similar issue on Ocata

2018-06-04 18:27:44,037 p=31385 u=root | TASK [keystone : Initialise fernet key authentication] **********************************************************************************************************************************
2018-06-04 18:27:44,037 p=31385 u=root | task path: /usr/share/kolla-ansible/ansible/roles/keystone/tasks/init_fernet.yml:2
2018-06-04 18:27:44,253 p=31385 u=root | Using module file /usr/lib/python2.7/site-packages/ansible/modules/commands/command.py
2018-06-04 18:27:44,562 p=31385 u=root | [WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{ fernet_create.stdout.find('localhost | SUCCESS => ') != -1 and
(fernet_create.stdout.split('localhost | SUCCESS => ')[1]|from_json).changed }}

2018-06-04 18:27:44,577 p=31385 u=root | fatal: [ci-vcompute2-b]: FAILED! => {
    "msg": "The conditional check '(fernet_create.stdout.split()[2] == 'SUCCESS') or (fernet_create.stdout.find('Key repository is already initialized') != -1)' failed. The error was: error while evaluating conditional ((fernet_create.stdout.split()[2] == 'SUCCESS') or (fernet_create.stdout.find('Key repository is already initialized') != -1)): list object has no element 2"
}

Revision history for this message
Eric Miller (erickmiller) wrote :

We needed to re-install our controller001 and ran into this issue when running "deploy".

Kolla Ansible should really consider all nodes equal and search for appropriate "existing" nodes to pull data from for "new" nodes, when necessary, and/or delegate tasks to.

This file:
/usr/share/kolla-ansible/ansible/roles/keystone/tasks/init_fernet.yml

delegates tasks to the first controller listed in the keystone group in the Ansible inventory.

I would have tried to change the order of the controllers listed, but I don't know what impact that has globally, since it appears that various Kolla Ansible scripts rely on the first controller being "special".

Eric

Revision history for this message
Eric Miller (erickmiller) wrote :

We ended up having to change the order of the controllers specified in the multinode inventory file, which solved our issues. The inventory file was returned to its original state after controller001 was re-deployed successfully. So it appears that changing the order of the controllers in the multinode inventory file isn't too much of an issue, with the requirement that the first controller listed is a "working" node.

Eric

Revision history for this message
Benjamin (tumbl3w33d) wrote :

This is still broken in stein and happened to me first when I added a control+compute node to an existing cluster. The mentioned workaround to make the new node first in the list of controllers worked. After the deployment I put the new node last again in the controller group and ran the reconfigure task to make sure all nodes know the desired configuration. Not sure that last step was necessary though.

Revision history for this message
Mark Goddard (mgoddard) wrote :

It looks like at least one cause of this issue is when adding a new controller, it does not have a copy of the fernet key repository. If this node becomes the first node in the keystone group, then the task "Initialise fernet key authentication" will create a new key, but presumably fail in some unexpected way.

I think what is required is:

* check if any of the keystone nodes has a fernet key repository
* if yes, sync to other keystone hosts
* if no, initialise on any keystone host and sync to other keystone hosts

The check could be something like:

docker exec -t keystone_fernet ls /etc/keystone/fernet-keys/

Changed in kolla-ansible:
importance: Undecided → Medium
assignee: Liyingjun (liyingjun) → nobody
status: In Progress → Triaged
Revision history for this message
Mark Goddard (mgoddard) wrote :

Is there someone able to fix this?

Revision history for this message
Yang Youseok (ileixe) wrote :

@Mark

Hi Mark, we also encounter this issue doing steps below.

1. One host(A) in keystone group, deploy it.
2. Add one more host(B) in keystone group
3. Try to deploy B
4. B could not be initialized since A does not have IP address in fernet-push.sh.

I think kolla-ansible seems to have fernet init step contrary to intuition. imho, this is came from kolla-ansible depends on generated static file like fernet-*.sh.j2.

What I suggest is to use required keystone hosts dynamically rather than using sh file. To be specific, to-be step looks like this

1. One host(A) in keystone group deploy it
   1.1 A does not have fernet-*.sh files
2. Add one more host (B) in keystone group
3. Try to deploy B
4. B make A to push fernet key using given IP addresses when deployed.

What do you think about it?

Revision history for this message
Mark Goddard (mgoddard) wrote :

Hi @Yang, just to be clear, when you are deploying the new host B, are you using kolla-ansible --limit <host B> ?

Revision history for this message
Yang Youseok (ileixe) wrote :

@Mark

Sorry, I did not notice you comment which does not alert me. Yes I was trying to deploy using limit

Revision history for this message
Scott Beck (scottbeck) wrote :

I'm seeing this exact error on a fresh install (after calling destroy and cleaning up ceph) using train:
TASK [keystone : Initialise fernet key authentication] ************************************************************************************************************************************************************************************************************************
FAILED - RETRYING: Initialise fernet key authentication (10 retries left).
FAILED - RETRYING: Initialise fernet key authentication (9 retries left).
fatal: [n5.smcstaff.org]: FAILED! => {"msg": "The conditional check 'fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1' failed. The error was: error while evaluating conditional (fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1): list object has no element 2"}

NO MORE HOSTS LEFT ************************************************************************************************************************************************************************************************************************************************************

PLAY RECAP ********************************************************************************************************************************************************************************************************************************************************************
localhost : ok=5 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
n1.smcstaff.org : ok=5 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
n2.smcstaff.org : ok=5 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
n4.smcstaff.org : ok=152 changed=86 unreachable=0 failed=0 skipped=72 rescued=0 ignored=1
n5.smcstaff.org : ok=168 changed=99 unreachable=0 failed=1 skipped=70 rescued=0 ignored=1

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Task: keystone : Initialise fernet key authentication
Tags: keystone
Host: primary
Time: 2020-04-28 18:33:36

Ansible version: 2.8.11

Msg
The conditional check 'fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1' failed. The error was: error while evaluating conditional (fernet_create.stdout.split()[2] == 'SUCCESS' or fernet_create.stdout.find('Key repository is already initialized') != -1): list object has no element 2

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

But there was also an earlier Mariadb failure (docker daemon communication error) so secondary1 was removed from run.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

So secondary1 never got keystone fernet nor ssh - primary keystone fernet docker logs:

2020-04-28T18:33:34.365632590Z ++ cat /run_command
2020-04-28T18:33:34.368280291Z + CMD='crond -s -n'
2020-04-28T18:33:34.368308155Z + ARGS=
2020-04-28T18:33:34.368465594Z + sudo kolla_copy_cacerts
2020-04-28T18:33:34.387422919Z + [[ ! -n '' ]]
2020-04-28T18:33:34.387438741Z + . kolla_extend_start
2020-04-28T18:33:34.387661953Z ++ FERNET_SYNC=/usr/bin/fernet-node-sync.sh
2020-04-28T18:33:34.387672509Z ++ FERNET_TOKEN_DIR=/etc/keystone/fernet-keys
2020-04-28T18:33:34.387687241Z ++ [[ -f /usr/bin/fernet-node-sync.sh ]]
2020-04-28T18:33:34.387720257Z ++ /usr/bin/fernet-node-sync.sh
2020-04-28T18:33:34.493473328Z Warning: Permanently added '[192.0.2.3]:8023' (ECDSA) to the list of known hosts.
2020-04-28T18:33:34.656796401Z ssh: connect to host 192.0.2.1 port 8023: Connection refused
2020-04-28T18:33:34.657205371Z rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
2020-04-28T18:33:34.657251111Z rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.3]

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

So one issue is the real problem is hidden. Second issue is the real problem of which may be multiple. Let's treat this bug report as fixing 'hiding problem' and then create some separate issues if applicable (as for mariadb we could only add retries there).

Revision history for this message
Mark Goddard (mgoddard) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by "Michal Nasiadka <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/548182
Reason: No updates since 2018

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.