tripleo-ansible roles/tripleo_ceph_client putting ctlplane IPs in mon_host instead of storage network IPs when using composable networks

Bug #1912218 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Committed
High
John Fulton

Bug Description

Deployed overcloud (3 control, 3 ceph-storage, 2 compute) with network isolation built 24 hours ago using fixes from the following bugs:

 https://launchpad.net/bugs/1912109
 https://launchpad.net/bugs/1912103

Glance public endpoint returned 503. Glance logs show:

ERROR glance_store._drivers.rbd rados.TimedOut: [errno 110] RADOS timed out (error connecting to the cluster)

On inspection of controllers it looks like ceph-ansible correctly configured /etc/ceph/ceph.conf with storage network IPs but that new ceph client role assigned IPs from provisioning network (192.168.24.1) and not the storage network.

[root@oc0-controller-0 ceph]# grep 'mon host' /etc/ceph/ceph.conf
mon host = [v2:172.16.11.183:3300,v1:172.16.11.183:6789],[v2:172.16.11.197:3300,v1:172.16.11.197:6789],[v2:172.16.11.187:3300,v1:172.16.11.187:6789]
[root@oc0-controller-0 ceph]# grep 'mon host' /var/lib/tripleo-config/ceph/ceph.conf
mon host = 192.168.24.14,192.168.24.8,192.168.24.7
[root@oc0-controller-0 ceph]#

Changed in tripleo:
status: New → Triaged
importance: Undecided → High
Revision history for this message
John Fulton (jfulton-org) wrote :

root cause

This is the task that sets the IP:

    - name: Get ceph_mon_ip addresses
      set_fact:
        tripleo_ceph_client_mon_ips: "{{ (tripleo_ceph_client_mon_ips | default([]))
                                         | union([(hostvars[item]['storage_ip']
                                         | default(hostvars[item]['ctlplane_ip']))]) }}"
      loop: "{{ groups['ceph_mon'] | list }}"

It would work with network isolation PROVIDED that you used the default name of the storage network. In my case I had customized it.

- name: Controller
  description: |
    Controller role that has all the controler services loaded and handles
    Database, Messaging and Network functions.
  CountDefault: 1
  tags:
    - primary
    - controller
    # Create external Neutron bridge for SNAT (and floating IPs when using
    # ML2/OVS without DVR)
    - external_bridge
  networks:
    External:
      subnet: external_cloud_0_subnet
    InternalApi:
      subnet: internal_api_cloud_0_subnet
    Storage:
      subnet: storage_cloud_0_subnet
    StorageMgmt:
      subnet: storage_mgmt_cloud_0_subnet
    Tenant:
      subnet: tenant_cloud_0_subnet
...

So the task does the right thing when I modify it like this:

    - name: Get ceph_mon_ip addresses
      set_fact:
        tripleo_ceph_client_mon_ips: "{{ (tripleo_ceph_client_mon_ips | default([]))
                                         | union([(hostvars[item]['storage_cloud_0_ip']
                                         | default(hostvars[item]['ctlplane_ip']))]) }}"
      loop: "{{ groups['ceph_mon'] | list }}"

Focussing on the workaround:

                                         | union([(hostvars[item]['storage_ip']
vs
                                         | union([(hostvars[item]['storage_cloud_0_ip']

We should probably not hard code either of the above.

We could back track futher to determine what they set based on the composable networks feature, for example:

 https://github.com/openstack/tripleo-heat-templates/blob/master/roles/Controller.yaml

has:

  networks:
    Storage:
      subnet: storage_subnet

While my customization used:

  networks:
    Storage:
      subnet: storage_cloud_0_subnet

So we COULD backtrack all the way up to networks and then work our way down to Storage and then grab whatever subnet is there to reason

storage_subnet --> storage_ip
storage_cloud_0_subnet --> storage_cloud_0_ip

summary: tripleo-ansible roles/tripleo_ceph_client putting ctlplane IPs in
- mon_host instead of storage network IPs
+ mon_host instead of storage network IPs when using composable networks
Revision history for this message
John Fulton (jfulton-org) wrote :
Changed in tripleo:
status: Triaged → In Progress
assignee: nobody → John Fulton (jfulton-org)
Revision history for this message
John Fulton (jfulton-org) wrote :
Changed in tripleo:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 3.0.0

This issue was fixed in the openstack/tripleo-ansible 3.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.