ceph will not install with xena from scratch

Bug #1960175 reported by Dwane Pottratz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Dmitriy Rabotyagov

Bug Description

When installing OSA from scratch ceph will not install. I am using origin/stable/xena branch. This used to work with wallaby branch, but is it also broken.

The play book that is not working is ceph-install.yml

fatal: [infra1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'all_ipv4_addresses'\n\nThe error appears to be in '/etc/ansible/roles/ceph-ansible/roles/ceph-facts/tasks/set_monitor_address.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: set_fact _monitor_addresses to monitor_address_block ipv4\n ^ here\n"}

I am using three machines infra[1:3] for all of the infrastructure and control. I have a fourth machine as haproxy. I have tried with and without no_containers in openstack_user_config.yml.

I have tried with 'export ANSIBLE_CACHE_PLUGIN=memory' in /usr/local/bin/openstack-ansible.rc

If I run ansible gather_facts it shows the needed 'ansible_all_ipv4_addresses' dictionary item. These facts are not getting gathered in the play book. I have verified they are not there in /etc/openstack_deploy/ansible_facts.

Workaround it to install ceph before using OSA to install ceph. I am using https://github.com/ceph/ceph-ansible.git which is the same version that OSA uses. However, even without the roles in OSA it messes with the configuration. I will break ceph-rgw on the second two hosts.

Tags: ceph
Revision history for this message
Dwane Pottratz (dpcsar) wrote :
Revision history for this message
Dwane Pottratz (dpcsar) wrote :

here is my user_variables.yml also.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):

Hi!

This block of code is responsible for gathering facts: https://opendev.org/openstack/openstack-ansible/src/branch/stable/xena/playbooks/ceph-install.yml#L30-L37

As you might see, it's executed only when `monitor_address_block` variable is defined. Looking at your user_variables, it's set to: monitor_address_block: "{{ cidr_networks.container }}"

From what I can say, you tried to rely on cidr_networks from openstack_user_config, however it's not a valid variable. openstack_user_config used for inventory generation and can't be used that way or referenced directly for variables.

So this result in `monitor_address_block` being undefined as a result.

I think we have an issue in documentation when saying this: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/user_variables.yml.prod-ceph.example#L18-L20

Basically I believe it should be like this:

```
monitor_address_block: "{{ (container_networks['container_address']['address'] ~ '/' ~ container_networks['container_address']['netmask']) | ansible.netcommon.ipaddr('network/prefix') }}"
public_network: "{{ monitor_address_block }}"
cluster_network: "{{ (container_networks['storage_address']['address'] ~ '/' ~ container_networks['storage_address']['netmask']) | ansible.netcommon.ipaddr('network/prefix') }}"
```

Can you kindly check if this solution works for you?

Changed in openstack-ansible:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Dmitriy Rabotyagov (noonedeadpunk)
Revision history for this message
Dwane Pottratz (dpcsar) wrote :

Hi Dmitriy,

That didn't work. I also tried to "192.168.2.0/24' for monitor_address_block.

I have found what the bug is. The latest OSA is using ansible 2.11.6. According to documentation for ansible.builtin.setup:

Filter: "As of Ansible 2.11, the type has changed from string to list and the default has became an empty list. A simple string is still accepted and works as a single pattern. The behavior prior to Ansible 2.11 remains."

Since you are using a list as a string. This is broken.

I am working on a fix and testing it. I am sure you will need to make something more complex to support different versions of ansible.

Dwane

Revision history for this message
Dwane Pottratz (dpcsar) wrote :

Changing deployment_extra_facts_filter in ceph-install.yml

diff --git a/playbooks/ceph-install.yml b/playbooks/ceph-install.yml
index cf2a126c7..df0cb3f0f 100644
--- a/playbooks/ceph-install.yml
+++ b/playbooks/ceph-install.yml
@@ -30,7 +30,8 @@
     - name: Gather additional facts for monitor_address_block
       include_tasks: "common-tasks/gather-hardware-facts.yml"
       vars:
- deployment_extra_facts_filter: "ansible_all_ipv[4,6]_addresses"
+ deployment_extra_facts_filter:
+ - "ansible_all_ipv[4,6]_addresses"
         deployment_extra_facts_subset: "!all,network"
       when: monitor_address_block is defined
       tags:
@@ -39,7 +40,8 @@
     - name: Gather additional facts for monitor_interface
       include_tasks: "common-tasks/gather-hardware-facts.yml"
       vars:
- deployment_extra_facts_filter: "{{ 'ansible_' ~ monitor_interface | replace('-','_') }}"
+ deployment_extra_facts_filter:
+ - "{{ 'ansible_' ~ monitor_interface | replace('-','_') }}"
         deployment_extra_facts_subset: "!all,network"
       when: monitor_interface is defined
       tags:
@@ -48,7 +50,8 @@
     - name: Gather memory facts
       include_tasks: "common-tasks/gather-hardware-facts.yml"
       vars:
- deployment_extra_facts_filter: "ansible_memtotal*"
+ deployment_extra_facts_filter:
+ - "ansible_memtotal*"
         deployment_extra_facts_subset: "!all,hardware"
       tags:
         - always
@@ -131,7 +134,8 @@
     - name: Gather memory facts
       include_tasks: "common-tasks/gather-hardware-facts.yml"
       vars:
- deployment_extra_facts_filter: "ansible_memtotal*"
+ deployment_extra_facts_filter:
+ - "ansible_memtotal*"
         deployment_extra_facts_subset: "!all,hardware"
       tags:
         - always
@@ -189,7 +193,8 @@
     - name: Gather memory facts
       include_tasks: "common-tasks/gather-hardware-facts.yml"
       vars:
- deployment_extra_facts_filter: "ansible_memtotal*"
+ deployment_extra_facts_filter:
+ - "ansible_memtotal*"
         deployment_extra_facts_subset: "!all,hardware"
       tags:
         - always

So even though ansible states that a string will work, it doesn't. :=(

Revision history for this message
Dwane Pottratz (dpcsar) wrote (last edit ):

Here is a patch that I came up with.

ansible version is 2.11.6

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):
Download full text (6.3 KiB)

So while it's indeed a list by default, as you posted "A simple string is still accepted and works as a single pattern."

Based on that I highly doubt that issue is indeed related to the list vs string for setup. I wrote a simple playbook to verify that current way of setup is not broken. I also tested with ansible 2.12.2 and result was exactly the same.

Output:

root@server-0207-1334:~/openstack-ansible# openstack-ansible test.yml
Variable files: "-e @/etc/openstack_deploy/user_ceph_aio.yml -e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml -e @/etc/openstack_deploy/user_variables_barbican.yml -e @/etc/openstack_deploy/user_variables_ceph.yml "
[WARNING]: Unable to parse /etc/openstack_deploy/inventory.ini as an inventory source
Operations to perform:
  Apply all migrations: admin, api, auth, contenttypes, db, sessions
Running migrations:
  No migrations to apply.

PLAY [aio1] *******************************************************************************************************************************************************************************************************************************************************************
2022-02-07 19:43:47,625 INFO ansible: PLAY [aio1] *******************************************************************************************************************************************************************************************************************************************************************

TASK [Clean facts] ************************************************************************************************************************************************************************************************************************************************************
2022-02-07 19:43:47,898 INFO ansible: TASK [Clean facts] ************************************************************************************************************************************************************************************************************************************************************
ok: [aio1] => {
    "ansible_all_ipv4_addresses": "VARIABLE IS NOT DEFINED!"
}
2022-02-07 19:43:47,954 INFO ansible: ok: [aio1] => {
    "ansible_all_ipv4_addresses": "VARIABLE IS NOT DEFINED!"
}

TASK [Gather additional facts] ************************************************************************************************************************************************************************************************************************************************
2022-02-07 19:43:48,081 INFO ansible: TASK [Gather additional facts] ************************************************************************************************************************************************************************************************************************************************
ok: [aio1]
2022-02-07 19:43:50,153 INFO ansible: ok: [aio1]

TASK [Gathered facts after setup] *********************************************************************************************************************************************************************************************************************************************
2022-0...

Read more...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Eventually super easy work-around for the issue would be to define `monitor_address: "{{ (container_networks['storage_address']['address'] }}" and just avoid that block of code from ceph-ansible that finds out address based on facts and interface: https://github.com/ceph/ceph-ansible/blob/82eee4303bce3e41b5043bcb03fa3143dcdfd30d/roles/ceph-facts/tasks/set_monitor_address.yml#L2-L20

Revision history for this message
Dwane Pottratz (dpcsar) wrote (last edit ):

For what it's worth, I didn't see that same issue when using AIO. It worked without any issues. Only on bare metal or a VM without the setup of AIO am I seeing issues.

I presume that 'monitor_address' would be used in place of 'monitor_address_block' in user_variables.yml

I also think that I want the 'container_address' instead of the 'storage_address'. Because the public address and monitor address need to be on the same network (from my trials).
'

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Ah, yes, indeed it should be container_address, not storage, sorry wrongly copy/pasted from previous reply :(

Revision history for this message
Dwane Pottratz (dpcsar) wrote :

That worked. However you had an extra '(' in there on monitor_address.

user_variables.yml:
monitor_address: "{{ container_networks['container_address']['address'] }}"
public_network: "{{ cidr_networks.container }}"
cluster_network: "{{ cidr_networks.storage }}"

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

As I said, any usage of cidr_networks is not valid inside user_variables at the moment. So "{{ cidr_networks.container }}" will also lead to issues on later steps.

Revision history for this message
Dwane Pottratz (dpcsar) wrote :

Interesting.... We are using 'container_networks' for monitor_address. Where it getting that value?

I thought it was out of openstack_inventory.json. If that is the case, then I also see cidr_networks in openstack_inventory.json.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote (last edit ):

Well, it's not really intended to be in openstack_inventory.json. I don't see it there neither in sandbox nor in production environemnts.

Also during team meeting another patch was given that likely solves issue on master - we haven't backported it to Xena yet and it's related to the change you was refferencing to: https://review.opendev.org/c/openstack/openstack-ansible/+/823796

Would be great if you could check if it's smth that solves issue.

Revision history for this message
Dwane Pottratz (dpcsar) wrote :

Hi Dmitriy,

I tried out master with the change. I am not seeing the same error as I was before. It will not start the monitors and the /var/log/ceph/ceph-mon.infra3.log repeats this:

2022-02-08T17:55:54.643-0800 7f9e850e3700 1 mon.infra3@0(leader).osd e1 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 373293056 full_alloc: 373293056 kv_alloc: 272629760
2022-02-08T17:55:56.603-0800 7f9e828de700 1 mon.infra3@0(leader) e1 adding peer [v2:192.168.2.23:3300/0,v1:192.168.2.23:6789/0] to list of hints
2022-02-08T17:55:56.603-0800 7f9e828de700 1 mon.infra3@0(leader) e1 adding peer [v2:192.168.2.23:3300/0,v1:192.168.2.23:6789/0] to list of hints
2022-02-08T17:55:56.603-0800 7f9e828de700 1 mon.infra3@0(leader).elector(4) discarding election message: v2:192.168.2.23:3300/0 not in my monmap e1: 3 mons at {infra1=v1:0.0.0.0:0/1,infra2=[v2:192.168.2.22:3300/0,v1:192.168.2.22:6789/0],infra3=[v2:192.168.2.21:3300/0,v1:192.168.2.21:6789/0]}

Strangely, this in on my infra1 machine. Should it be ceph-mon.infra1.log?
Other machines seem to show the same message, but are name correctly.

Revision history for this message
Dwane Pottratz (dpcsar) wrote :

Hi Dmitriy,

I can confirm that the patch works on stable/xena.
git cherry-pick 61c550d0835449f08f00172fa2bfd6bab24ccce2

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Awesome, thanks! We will backport it and include with next Xena release.
Regarding your last question - yes, it should be ceph-mon.infra1.log but likely main issue is that mon1 is set to 0.0.0.0:0 and also not having v2 proto. So was worth checking how it's setup in ceph.conf.
I'd still to blame cidr_networks though...

Changed in openstack-ansible:
status: Triaged → Fix Committed
Changed in openstack-ansible:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.