AIO with ceph deploy fails ansible error on setup-infrastructure.yml

Bug #1989367 reported by Peter Garlic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Confirmed
Medium
Dmitriy Rabotyagov

Bug Description

25.1.0 - AIO deployment on a single Rocky 8.6 VM with default values inside one infrastructure with proxy. fails on playbook "openstack-ansible setup-infrastructure.yml" for ansible var error.

Error:

TASK [ceph-facts : generate cluster fsid] *************************************************************************************************************************************************************
fatal: [aio1_ceph-mon_container-4f3cbdc0 -> {{ groups[mon_group_name][0] }}]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'ansible.vars.hostvars.HostVarsVars object' has no attribute 'discovered_interpreter_python'\n\nThe error appears to be in '/etc/ansible/roles/ceph-ansible/roles/ceph-facts/tasks/facts.yml': line 169, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n block:\n - name: generate cluster fsid\n ^ here\n"}

How to reproduce:
- follow the quickstart guide
- set scenario
export SCENARIO='aio_lxc_ceph'

# clone repo
git clone https://opendev.org/openstack/openstack-ansible /opt/openstack-ansible
cd /opt/openstack-ansible
git checkout 25.1.0

# run scripts
scripts/bootstrap-ansible.sh
scripts/bootstrap-aio.sh

# Run playbooks
cd /opt/openstack-ansible/playbooks
openstack-ansible setup-hosts.yml
openstack-ansible setup-infrastructure.yml

the deployment stop with the above error

playbook run summary

PLAY RECAP ********************************************************************************************************************************************************************************************
aio1 : ok=49 changed=29 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
aio1_ceph-mon_container-4f3cbdc0 : ok=30 changed=8 unreachable=0 failed=1 skipped=6 rescued=0 ignored=0
aio1_galera_container-5c2f6c3c : ok=79 changed=41 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
aio1_memcached_container-37e22d5f : ok=20 changed=11 unreachable=0 failed=0 skipped=1 rescued=0 ignored=0
aio1_rabbit_mq_container-4684b9a1 : ok=68 changed=32 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0
aio1_repo_container-8bad80a4 : ok=88 changed=42 unreachable=0 failed=0 skipped=23 rescued=0 ignored=0
aio1_utility_container-ec6c2183 : ok=70 changed=39 unreachable=0 failed=0 skipped=11 rescued=0 ignored=0

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Well, that sounds more like ceph-ansible issue that we do use for deploying ceph. So according to the output task fails here:
https://github.com/ceph/ceph-ansible/blob/d7bf53a576465f0f25409b116ec8ca1d797ad0b7/roles/ceph-facts/tasks/facts.yml#L169-L173

Can you kindly provide output of these 2 tasks, that are supposed to define discovered_interpreter_python:
https://github.com/ceph/ceph-ansible/blob/d7bf53a576465f0f25409b116ec8ca1d797ad0b7/roles/ceph-facts/tasks/facts.yml#L18-L34 ?

It would help understanding where this issue does come from.

Revision history for this message
Peter Garlic (petergarlic) wrote :

Hi Dmitriy,

excuse me for the late answer bbut i was in vacation.

the test VM has been removed and I got just a few log lines from screen output that you can find on attachment, but this is not a problem: I can rebuild it in the next days and provide more info.

Let me know if there is any configuration flag, code or anything else that can help you with code debug.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I was able to reproduce the issue you're mentioning. Will try to take care of it.

Changed in openstack-ansible:
status: New → Confirmed
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

So, after digging into the issue. It occurs because ansible does not discover and define `discovered_interpreter_python` fact for Rocky Linux.

For ceph deployment we leverage ceph-ansible project. There's assumption made, that `discovered_interpreter_python` is always defined, which is not the case for Rocky. That's why you receive mentioned failure.

I believe this is quite valid ansible bug rather then openstack-ansible, as we can't really fix neither ceph-ansible logic nor ansibles. So I will mark this bug report as Invalid.

However as a workaround you can define in /etc/openstack_deploy/group_vars/ceph_all:

  ansible_python_interpreter: /usr/bin/python3

Changed in openstack-ansible:
status: Confirmed → Invalid
status: Invalid → Opinion
status: Opinion → Invalid
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I didn't stop searching for the reason, and it appeared to be result of defining OSA_ANSIBLE_PYTHON_INTERPRETER during bootstrap, which does simply disable interpreter discovery:

https://opendev.org/openstack/openstack-ansible/src/commit/cb81bd1081b79a71a7bdd3254f9d3f9ab495ce49/scripts/bootstrap-ansible.sh#L74

But at the same time ansible_python_interpreter does not get defined for some reason. I'm quite interested to sort this out and find proper fix.

Changed in openstack-ansible:
status: Invalid → Confirmed
importance: Undecided → Medium
assignee: nobody → Dmitriy Rabotyagov (noonedeadpunk)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.