VM build fails after Train-Ussuri upgrade

Bug #1928690 reported by Albert Braden
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Undecided
Unassigned

Bug Description

What happened:

I upgraded my Train test cluster to Ussuri following these instructions:

https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html#upgrade-procedure

The upgrade completed successfully with no failures, and the existing VMs are fine, but new VM build fails with rados.Rados.connect\nrados.PermissionDeniedError:

http://paste.openstack.org/show/805424/

I'm running external ceph so I looked at this document:

OpenStack Docs: External Ceph<https://docs.openstack.org/kolla-ansible/latest/reference/storage/external-ceph-guide.html>

It says that I need the following in /etc/kolla/config/glance/ceph.conf:

auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

I didn't have that, so I added it and then redeployed, but still can't build VMs. I tried adding the same to all copies of ceph.conf and redeployed again, but that didn't help. I see some cryptic talk about ceph in the release notes but it's not obvious what I'm being asked to change:

https://docs.openstack.org/releasenotes/kolla-ansible/ussuri.html#relnotes-10-0-0-stable-ussuri-upgrade-notes

I read the bug that it refers to:

https://bugs.launchpad.net/kolla-ansible/+bug/1904062

But I already have "backend_host=rbd:volumes" so I don't think I'm hitting that.

Also I read these sections but I don't see anything obvious here that needs to be changed. My config files are in the standard locations.

* For cinder (cinder-volume and cinder-backup), glance-api and manila keyrings behavior has changed and Kolla Ansible deployment will not copy those keys using wildcards (ceph.*), instead will use newly introduced variables. Your environment may render unusable after an upgrade if your keys in /etc/kolla/config do not match default values for introduced variables.

* The default behavior for generating the cinder.conf template has changed. An rbd-1 section will be generated when external Ceph functionality is used, i.e. cinder_backend_ceph is set to true. Previously it was only included when Kolla Ansible internal Ceph deployment mechanism was used.

* The rbd section of nova.conf for nova-compute is now generated when nova_backend is set to "rbd". Previously it was only generated when both enable_ceph was "yes" and nova_backend was set to "rbd".

My ceph keys have the default name and are in the default locations. I have cinder_backend_ceph: "yes". I don't have a nova_backend setting but I have nova_backend_ceph: "yes"

I added nova_backend: "rbd" and redeployed and now I get a different error: rados.Rados.connect\nrados.ObjectNotFound

http://paste.openstack.org/show/805425/

What I expected to happen: VMs build without errors after upgrade

How to reproduce it:

Install kolla-ansible Train on Centos 7. Upgrade Centos 7 to 8, upgrade Train to Ussuri, then build a VM.

Environment:

[root@chrnc-void-testupgrade-control-1-replace ~]# cat /etc/*rele*
CentOS Linux release 8.2.2004 (Core)
Derived from Red Hat Enterprise Linux 8.2 (Source)
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

CentOS Linux release 8.2.2004 (Core)
CentOS Linux release 8.2.2004 (Core)
cpe:/o:centos:centos:8

[root@chrnc-void-testupgrade-control-1-replace ~]# uname -a
Linux chrnc-void-testupgrade-control-1-replace.dev.chtrse.com 4.18.0-193.6.3.el8_2.x86_64 #1 SMP Wed Jun 10 11:09:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@chrnc-void-testupgrade-control-1-replace ~]# docker version
Client: Docker Engine - Community
 Version: 20.10.6
 API version: 1.41
 Go version: go1.13.15
 Git commit: 370c289
 Built: Fri Apr 9 22:45:33 2021
 OS/Arch: linux/amd64
 Context: default
 Experimental: true

Server: Docker Engine - Community
 Engine:
  Version: 20.10.6
  API version: 1.41 (minimum version 1.12)
  Go version: go1.13.15
  Git commit: 8728dd2
  Built: Fri Apr 9 22:43:57 2021
  OS/Arch: linux/amd64
  Experimental: false
 containerd:
  Version: 1.4.4
  GitCommit: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version: 1.0.0-rc93
  GitCommit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version: 0.19.0
  GitCommit: de40ad0

Kolla-Ansible version: Ussuri

kolla_install_type: "source"

Official images

(openstack) [root@chrnc-void-testupgrade-build ~]# grep -v ^# /etc/kolla/globals.yml|grep -v ^$
---
config_strategy: "COPY_ALWAYS"
virtualenv: /opt/kolla/venv
virtualenv_site_packages: yes
kolla_base_distro: "centos"
kolla_install_type: "source"
openstack_release: "train"
node_custom_config: "/etc/kolla/config"
kolla_internal_vip_address: "172.16.0.100"
network_interface: "eth0"
kolla_external_vip_interface: "eth0"
neutron_external_interface: "eth1"
neutron_plugin_agent: "openvswitch"
keepalived_virtual_router_id: "51"
kolla_enable_tls_internal: "no"
openstack_region_name: "chrnc-void-testupgrade"
multiple_regions_names: ["{{ openstack_region_name }}"]
enable_openstack_core: "yes"
rabbitmq_use_3_7_24_on_centos7: true
elasticsearch_use_v6: true
kibana_use_v6: true
enable_central_logging: "yes"
enable_ceph: "no"
enable_chrony: "no"
enable_cinder: "yes"
enable_cinder_backup: "yes"
enable_fluentd: "yes"
enable_grafana: "no"
enable_mariabackup: "yes"
enable_masakari: "yes"
enable_neutron_agent_ha: "no"
enable_neutron_bgp_dragent: "yes"
enable_neutron_provider_networks: "yes"
enable_prometheus: "yes"
rabbitmq_server_additional_erl_args: "+S 1:1"
external_ceph_cephx_enabled: "yes"
glance_backend_ceph: "yes"
glance_backend_file: "no"
glance_backend_swift: "no"
glance_backend_vmware: "no"
cinder_backend_ceph: "yes"
cinder_backup_driver: "ceph"
nova_backend_ceph: "yes"
nova_compute_virt_type: "qemu"
nova_safety_upgrade: "yes"

(openstack) [root@chrnc-void-testupgrade-build glance]# cat /etc/kolla/config/glance/ceph.conf
# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
cluster network = 172.16.0.0/22
fsid = 30c52736-0f41-4bfc-a5a6-90657bb3315d
mon host = [v2:172.16.3.111:3300,v1:172.16.3.111:6789],[v2:172.16.1.89:3300,v1:172.16.1.89:6789],[v2:172.16.0.213:3300,v1:172.16.0.213:6789]
mon initial members = chrnc-void-testupgrade-ceph-1,chrnc-void-testupgrade-ceph-2,chrnc-void-testupgrade-ceph-0
osd pool default crush rule = -1
public network = 172.16.0.0/22

[osd]
osd memory target = 11037206118

(openstack) [root@chrnc-void-testupgrade-build glance]# ssh ceph0 "ceph status"
  cluster:
    id: 30c52736-0f41-4bfc-a5a6-90657bb3315d
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum chrnc-void-testupgrade-ceph-0,chrnc-void-testupgrade-ceph-2,chrnc-void-testupgrade-ceph-1 (age 4w)
    mgr: chrnc-void-testupgrade-ceph-1(active, since 4w), standbys: chrnc-void-testupgrade-ceph-0, chrnc-void-testupgrade-ceph-2
    osd: 3 osds: 3 up (since 4w), 3 in (since 4w)

  data:
    pools: 3 pools, 48 pgs
    objects: 2.05k objects, 2.9 GiB
    usage: 6.1 GiB used, 24 GiB / 30 GiB avail
    pgs: 48 active+clean

Revision history for this message
Mark Goddard (mgoddard) wrote :

Hi Albert. To verify that the keys in the containers are valid you could docker exec into nova_compute & cinder_volume, and try running some ceph commands using each of the keys.

Revision history for this message
Albert Braden (ozzzo) wrote :

I tried this in nova_compute:

(nova-compute)[root@chrnc-void-testupgrade-compute-1-replace /]# ceph -n client.admin --keyring /var/lib/kolla/config_files/ceph.client.nova.keyring health
[errno 13] error connecting to the cluster

But I get the same result on a working cluster:

(nova-compute)[root@anaca-os-compute-01 /]# ceph -n client.admin --keyring /var/lib/kolla/config_files/ceph.client.nova.keyring health
[errno 13] error connecting to the cluster

I checked perms:

client.nova
        key: AQACHe1eAAAAABAAEytR6/2A/5y7O7/C/mjldw==
        caps: [mon] profile rbd
        caps: [osd] profile rbd pool=vms

It looks like the nova client can only run rbd commands, but rbd commands also fail on the working cluster:

(nova-compute)[root@anaca-os-compute-01 /]# rbd --keyring /var/lib/kolla/config_files/ceph.client.nova.keyring ls
2021-05-19 16:52:44.364 7f9f929fa700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2021-05-19 16:52:44.364 7f9f899f8700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2021-05-19 16:52:44.364 7f9f921f9700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
rbd: couldn't connect to the cluster!
rbd: listing images failed: (13) Permission denied

What am I missing?

Revision history for this message
Mark Goddard (mgoddard) wrote :

You're using a name of client.admin with the nova keyring.

I think I normally use this to use ceph.client.nova.keyring in /etc/ceph:

ceph --id nova health

Revision history for this message
Albert Braden (ozzzo) wrote :

That works on my working cluster:

(nova-compute)[root@chrnc-dev-os-compute-01 /]# ceph --id nova health
HEALTH_OK

Fails on the upgraded one:

(nova-compute)[nova@chrnc-void-testupgrade-compute-1-replace /]$ ceph --id nova health
[errno 2] error connecting to the cluster

Revision history for this message
Albert Braden (ozzzo) wrote :

Did I need to have done something with the keys during the upgrade? They were working before.

Revision history for this message
Mark Goddard (mgoddard) wrote :

I don't believe any change to the keys is required. I would suggest taking the ceph.conf and keyring from your nova container, and comparing them with a working config file and keyring, then converging the two until you find the issue.

In the past I know people have had issues with Windows line endings on the keyring file. Perhaps you could check perms of those files too?

Revision history for this message
Albert Braden (ozzzo) wrote :

In a working cluster I have /etc/ceph/ceph.client.nova.keyring in the nova_compute container:

(nova-compute)[nova@chrnc-dev-os-compute-01 /etc/ceph]$ ll
total 16
-rwx------. 1 nova nova 176 Feb 18 05:44 ceph.client.cinder.keyring
-rwx------. 1 nova nova 123 Feb 18 05:44 ceph.client.nova.keyring
-rwx------. 1 nova nova 483 Feb 18 05:44 ceph.conf
-rw-r--r--. 1 root root 92 Jun 8 2020 rbdmap

In the upgraded cluster that file doesn't exist:

(nova-compute)[nova@chrnc-area51-os-compute-01 /etc/ceph]$ ls -lA /etc/ceph/
total 12
-rw-------. 1 nova nova 176 May 28 14:31 ceph.client.cinder.keyring
-rw-------. 1 nova nova 383 May 28 14:31 ceph.conf
-rw-r--r--. 1 root root 92 May 14 13:35 rbdmap

I checked the build server and the nova keyring is there in the correct place:

[root@chnrc-area51-os-build nova]# ls -lA /etc/kolla/config/nova/
total 16
-rw------- 1 root root 176 Mar 9 16:38 ceph.client.cinder.keyring
-rw------- 1 root root 123 Mar 9 16:38 ceph.client.nova.keyring
-rw-r--r-- 1 root root 383 Feb 24 17:44 ceph.conf
-rw-r--r-- 1 root root 101 Nov 11 2020 nova-compute.conf

I'll dig through my upgrade output and see if I can figure out why it isn't copying into the container.

Revision history for this message
Albert Braden (ozzzo) wrote :
Download full text (3.5 KiB)

Ansible appears to be copying the nova key to all 3 computes:

TASK [nova-cell : Check nova keyring file] **************************************************************************************************************************************************************************************************
ok: [192.168.0.51]

TASK [nova-cell : Check cinder keyring file] ************************************************************************************************************************************************************************************************
ok: [192.168.0.51]

TASK [nova-cell : Copy over ceph nova keyring file] *****************************************************************************************************************************************************************************************
ok: [192.168.0.51] => (item=nova-compute)
ok: [192.168.0.52] => (item=nova-compute)
ok: [192.168.0.53] => (item=nova-compute)

TASK [nova-cell : Copy over ceph cinder keyring file] ***************************************************************************************************************************************************************************************
ok: [192.168.0.51] => (item=nova-compute)
ok: [192.168.0.52] => (item=nova-compute)
ok: [192.168.0.53] => (item=nova-compute)

This is the ansible code in /opt/openstack/share/kolla-ansible/ansible/roles/nova-cell/tasks/external_ceph.yml

- name: Check nova keyring file
  stat:
    path: "{{ node_custom_config }}/nova/{{ ceph_nova_keyring }}"
  delegate_to: localhost
  run_once: True
  register: nova_cephx_keyring_file
  failed_when: not nova_cephx_keyring_file.stat.exists
  when:
    - nova_backend == "rbd"
    - external_ceph_cephx_enabled | bool

- name: Check cinder keyring file
  stat:
    path: "{{ node_custom_config }}/nova/{{ ceph_cinder_keyring }}"
  delegate_to: localhost
  run_once: True
  register: cinder_cephx_keyring_file
  failed_when: not cinder_cephx_keyring_file.stat.exists
  when:
    - cinder_backend_ceph | bool
    - external_ceph_cephx_enabled | bool

- name: Copy over ceph nova keyring file
  copy:
    src: "{{ nova_cephx_keyring_file.stat.path }}"
    dest: "{{ node_config_directory }}/{{ item }}/"
    mode: "0660"
  become: true
  with_items:
    - nova-compute
  when:
    - inventory_hostname in groups[nova_cell_compute_group]
    - nova_backend == "rbd"
    - external_ceph_cephx_enabled | bool
  notify:
    - Restart {{ item }} container

Variables:
share/kolla-ansible/ansible/group_vars/all.yml:node_custom_config: "/etc/kolla/config"
share/kolla-ansible/ansible/group_vars/all.yml:ceph_nova_keyring: "{{ ceph_cinder_keyring }}"
share/kolla-ansible/ansible/group_vars/all.yml:ceph_cinder_keyring: "ceph.client.cinder.keyring"
share/kolla-ansible/ansible/group_vars/all.yml:node_config_directory: "/etc/kolla"

So it looks like it is copying from /etc/kolla/config/nova/ceph.client.cinder.keyring to /etc/kolla/nova-compute/

If I look on the hypervisor I see both keyrings in the correct place:
[root@chrnc-area51-os-compute-01 nova-compute]# ll /etc/kolla/nova-compute/
total 20
-rw-rw----. 1 root root 176 May 24 18:47 ceph.client.cinder.keyring
-rw-rw----. 1 root r...

Read more...

Revision history for this message
Albert Braden (ozzzo) wrote :

I tried manually copying /etc/kolla/nova-compute/ceph.client.nova.keyring into /etc/ceph/ in the container, and that fixes the problem, but I would like to figure out why it doesn't get copied during the upgrade.

Revision history for this message
Albert Braden (ozzzo) wrote :
Download full text (5.1 KiB)

Manually copying the keyring into the container fixes the problem in the sense that ceph commands now work in the container:

(nova-compute)[root@chrnc-area51-os-compute-03 /]# ceph --id nova health
HEALTH_OK

But I get a new error when I try to build a VM now:

2021-05-28 17:00:36.141 20 ERROR nova.scheduler.utils [req-1763e349-bf76-4112-a914-fe790d57d924 f1cc3cd2fe734a93ab1ed6ad8143decc b7c712174e1c41c9bd5bde4721a7458d - default default] [instance: 475f2fd9-947b-4300-abdc-aab7bdf80fb3] Error from last host: chrnc-area51-os-compute-02.chtrse.com (node chrnc-area51-os-compute-02.chtrse.com): ['Traceback (most recent call last):\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 2385, in _build_and_run_instance\n accel_info=accel_info)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3690, in spawn\n cleanup_instance_disks=created_disks)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6627, in _create_domain_and_network\n cleanup_instance_disks=cleanup_instance_disks)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise\n raise value\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6596, in _create_domain_and_network\n post_xml_callback=post_xml_callback)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6532, in _create_domain\n guest.launch(pause=pause)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 142, in launch\n self._encoded_xml, errors=\'ignore\')\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n self.force_reraise()\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n six.reraise(self.type_, self.value, self.tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise\n raise value\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 137, in launch\n return self._domain.createWithFlags(flags)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit\n result = proxy_call(self._autowrap, f, *args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call\n rv = execute(f, *args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute\n six.reraise(c, e, tb)\n', ' File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise\n raise value\n', ' File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker\n rv = meth(*args, **kwargs)...

Read more...

Revision history for this message
Mark Goddard (mgoddard) wrote :

From https://docs.openstack.org/kolla-ansible/ussuri/reference/storage/external-ceph-guide.html:

  ceph_nova_keyring (by default it’s the same as ceph_cinder_keyring)

In Ussuri we switched nova to use the same named keyring as Cinder by default, which is ceph.client.cinder.keyring. Perhaps that keyring does not have the necessary permissions to access the nova pool, or perhaps you have configuration overrides for nova.conf to use the old keyring name?

I would suggest removing any unnecessary configuration overrides, and either setting ceph_nova_keyring to the old value of ceph.client.nova.keyring, or updating the permissions of the cinder keyring to allow access to the nova pool.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Can you think of any way we could prevent this issue from occurring again?

Revision history for this message
Albert Braden (ozzzo) wrote (last edit ):

Yes, this is the problem. I rebuilt my heat stack and before I start the Train->Ussuri upgrade, the keys are different:

(nova-compute)[root@chrnc-void-testupgrade-compute-1-replace ceph]# cat ceph.client.cinder.keyring
[client.cinder]
        key = AQCnarZgAAAAABAA/fprRD3z8dRzTgi7jtDeYA==
        caps mon = "profile rbd"
        caps osd = "profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images"
(nova-compute)[root@chrnc-void-testupgrade-compute-1-replace ceph]# cat ceph.client.nova.keyring
[client.nova]
        key = AQCqarZgAAAAABAAt0lhY7TXttXIk2Y6HYQxEw==
        caps mon = "profile rbd"
        caps osd = "profile rbd pool=vms"

I don't see anything key-related in nova.conf:

(openstack) [root@chrnc-void-testupgrade-build ~]# cat /etc/kolla/config/nova.conf
[libvirt]
cpu_mode = host-model
disk_cachemodes = network=writeback
hw_disk_discard = unmap

[DEFAULT]
instance_name_template = instance-%(uuid)s
dhcp_domain = dmz.chtrse.com

cpu_allocation_ratio = 1
initial_cpu_allocation_ratio = 1
ram_allocation_ratio = 1
initial_ram_allocation_ratio = 1

reserved_host_cpus = 2
reserved_host_memory_mb = 1024

Here are the uncommented "ceph" settings from /etc/kolla/globals.yml:

(openstack) [root@chrnc-void-testupgrade-build ~]# grep ceph /etc/kolla/globals.yml|grep -v ^#
enable_ceph: "no"
external_ceph_cephx_enabled: "yes"
glance_backend_ceph: "yes"
cinder_backend_ceph: "yes"
cinder_backup_driver: "ceph"
nova_backend_ceph: "yes"

I searched for ceph_cinder and ceph_nova in my config but don't find anything:
(openstack) [root@chrnc-void-testupgrade-build kolla]# grep -r ceph_cinder /etc/kolla/
(openstack) [root@chrnc-void-testupgrade-build kolla]# grep -r ceph_nova /etc/kolla/
(openstack) [root@chrnc-void-testupgrade-build kolla]#

I'll experiment with changing the perms of the cinder keyring. Is there a document on changing keyring perms?

Revision history for this message
Albert Braden (ozzzo) wrote :

I've added client.nova to the cinder keyring in my heat cluster. I'll perform the Train->Ussuri upgrade now and report back.

Revision history for this message
Albert Braden (ozzzo) wrote :

I'm still missing something. On the build server, I added client.nova to /etc/kolla/config/nova/ceph.client.cinder.keyring before the upgrade, and I see the client.nova lines in the container after the upgrade, but nova still can't connect:

(nova-compute)[root@chrnc-void-testupgrade-compute-0-replace /]# cat /etc/ceph/ceph.client.cinder.keyring
[client.cinder]
        key = AQCnarZgAAAAABAA/fprRD3z8dRzTgi7jtDeYA==
        caps mon = "profile rbd"
        caps osd = "profile rbd pool=volumes, profile rbd pool=vms, profile rbd pool=images"
[client.nova]
        key = AQCqarZgAAAAABAAt0lhY7TXttXIk2Y6HYQxEw==
        caps mon = "profile rbd"
        caps osd = "profile rbd pool=vms"
(nova-compute)[root@chrnc-void-testupgrade-compute-0-replace /]# ceph --id nova health
[errno 13] error connecting to the cluster
(nova-compute)[root@chrnc-void-testupgrade-compute-0-replace /]# ceph --id cinder health
HEALTH_OK

If I manually copy the key, then the problem is fixed:
(nova-compute)[root@chrnc-void-testupgrade-compute-0-replace /]# cat /etc/ceph/ceph.client.nova.keyring
[client.nova]
        key = AQCqarZgAAAAABAAt0lhY7TXttXIk2Y6HYQxEw==
        caps mon = "profile rbd"
        caps osd = "profile rbd pool=vms"
(nova-compute)[root@chrnc-void-testupgrade-compute-0-replace /]# ceph --id nova health
HEALTH_OK

Revision history for this message
Albert Braden (ozzzo) wrote :

After manually copying the keys into the nova_compute containers, I'm back to the " qemu unexpectedly closed the monitor" errors, apparently indicating that my libvirt auth is broken.

Revision history for this message
Albert Braden (ozzzo) wrote :

In the nova_libvirt container I see 4 files in /etc/libvirt/secrets/:

(nova-libvirt)[root@chrnc-void-testupgrade-compute-0-replace /]# ls -lA /etc/libvirt/secrets
total 16
-rw-------. 1 root root 40 Jun 3 18:09 a51d6c3c-748e-475c-a51e-113b4917e0d1.base64
-rw-------. 1 root root 168 Jun 3 18:09 a51d6c3c-748e-475c-a51e-113b4917e0d1.xml
-rw-------. 1 root root 40 Jun 3 18:09 cf960909-3df5-4856-8ba5-486711b134f5.base64
-rw-------. 1 root root 170 Jun 3 18:09 cf960909-3df5-4856-8ba5-486711b134f5.xml

The first 2 are for nova; the second 2 are for cinder. The cinder key matches ceph.client.cinder.keyring:

(nova-libvirt)[root@chrnc-void-testupgrade-compute-0-replace secrets]# cat a51d6c3c-748e-475c-a51e-113b4917e0d1.base64;echo
AQCnarZgAAAAABAA/fprRD3z8dRzTgi7jtDeYA==

The nova key does not match ceph.client.nova.keyring; it is the same as the cinder key:

(nova-libvirt)[root@chrnc-void-testupgrade-compute-0-replace secrets]# cat a51d6c3c-748e-475c-a51e-113b4917e0d1.base64;echo
AQCnarZgAAAAABAA/fprRD3z8dRzTgi7jtDeYA==

So, it looks like my initial configuration of separate keys for cinder and nova is breaking at least 2 things during the upgrade. Is this an incorrect initial configuration? Do I need to change the keys before the upgrade?

Revision history for this message
Richard Barrett (rbarrett) wrote :

So question here, we know the keyring is the issue here.
How can we go about rebuilding the keyring?

Several Questions:
* Do the controller containers nova_libvirt and nova_compute use the host files in /etc/kolla/nova-api/nova.conf
* Do the compute containers used the host files in /etc/kolla/nova-compute and /etc/kolla/nova-libvirt

Example Compute /etc/kolla/nova-compute/:
[compute-01 nova-compute]# ls
ceph.client.cinder.keyring ceph.client.nova.keyring ceph.conf config.json nova.conf

Example Compute /etc/kolla/nova-libvirt/:
[compute-01 nova-libvirt]# ls
ceph.client.nova.keyring ceph.conf config.json libvirtd.conf qemu.conf secrets

Example Controller Node /etc/kolla/nova-api/:
[control-01 nova-api]# ls
config.json nova.conf

Can we just copy the ceph.client.cinder.keyring over into nova-libvirt?
Or do we need to rebuild the entire authflow from our openstack cluster to the in-network ceph cluster that is currently running? If so, how would we go about this?

Revision history for this message
Mark Goddard (mgoddard) wrote :

Richard, each container has access to a host directory for its config under /etc/kolla/<container>.

Albert:

"I've added client.nova to the cinder keyring in my heat cluster. I'll perform the Train->Ussuri upgrade now and report back."

I don't really understand what you mean there. I think you need to pick a path:

1. go with the upstream default for ussuri, and use ceph.client.cinder.keyring for both nova and cinder. Your cinder keyring does have 'profile rbd pool=vms', so I'm surprised it doesn't work. You might want to try using the rbd client with the cinder keyring to check which pools it can access. (https://linux.die.net/man/8/rbd, use --id cinder as before).

2. stay with separate keys for nova and cinder. Set ceph_nova_keyring in globals.yml to ceph.client.nova.keyring.

Revision history for this message
Albert Braden (ozzzo) wrote :

Before the upgrade, the cinder client can access vms, images and volumes; nova can only access vms. I did not have ceph_nova_keyring set; I'll set that and then try the upgrade again.

Revision history for this message
Albert Braden (ozzzo) wrote :

It looks like setting ceph_nova_keyring before the upgrade fixes the problem. On the cluster where I had already run the upgrade, I set ceph_nova_keyring in globals.yml to ceph.client.nova.keyring and then redeployed, and that fixed it.

Revision history for this message
Albert Braden (ozzzo) wrote :

I recommend adding a "Note" to https://docs.openstack.org/kolla-ansible/latest/user/operating-kolla.html#upgrade-procedure to warn about this issue:

Note: If you have separate keys for nova and cinder, please be sure to set "ceph_nova_keyring: ceph.client.nova.keyring" in /etc/kolla/globals.yml

Revision history for this message
Mark Goddard (mgoddard) wrote :

That's a good suggestion, although probably needs to go in the Ussuri version of the docs (always use the relevant version).

Could you propose such a change?

Revision history for this message
Albert Braden (ozzzo) wrote :

Yes, I'm working on that today.

Revision history for this message
Albert Braden (ozzzo) wrote :
Revision history for this message
Albert Braden (ozzzo) wrote :

Oops, that's the master branch. I'll do a new one on Ussuri and abandon this one.

Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
Albert Braden (ozzzo) wrote :
Revision history for this message
Albert Braden (ozzzo) wrote :

I changed the branch to Ussuri before making the change, but it still says "Repo | Branch openstack/kolla-ansible | master"

What am I doing wrong?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/798851

Revision history for this message
Mark Goddard (mgoddard) wrote :

Albert, I found a very relevant bug: https://bugs.launchpad.net/kolla-ansible/+bug/1934145. Fix is here: https://review.opendev.org/c/openstack/kolla-ansible/+/798851. I'm not sure if it's exactly the same issue, but does affect your workaround, in that you will have to set ceph_nova_user back to nova once the fix merges.

Revision history for this message
Albert Braden (ozzzo) wrote :

Thank you! My contract ended today so I will pass on this information to the co-worker who is taking over the upgrade project.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/798851
Committed: https://opendev.org/openstack/kolla-ansible/commit/c3f9ba835b1da740c9f3056dbda20a7072467757
Submitter: "Zuul (22348)"
Branch: master

commit c3f9ba835b1da740c9f3056dbda20a7072467757
Author: Mark Goddard <email address hidden>
Date: Wed Jun 30 09:30:54 2021 +0100

    nova: Use cinder user for Ceph

    In Ussuri, nova stopped using separate Ceph keys for the volumes and vms
    pools by default. Instead, we set ceph_nova_keyring to the value of
    ceph_cinder_keyring by default, which is ceph.client.cinder.keyring.
    This is in line with the Ceph OpenStack integration guide [1]. However,
    the user used by nova to access the vms pool (ceph_nova_user) defaults
    to nova, meaning that nova will still try to use a
    ceph.client.nova.keyring, which probably does not exist. We did not see
    this issue in CI, because we set ceph_nova_user to cinder.

    This change fixes the issue by setting ceph_nova_user to the value of
    ceph_cinder_user by default, which is cinder.

    Closes-Bug: #1934145
    Related-Bug: #1928690

    [1] https://docs.ceph.com/en/latest/rbd/rbd-openstack/

    Change-Id: I6aa8db2214e07906f1f3e035411fc80ba911a274

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/803838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/803839

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/803840

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/803838
Committed: https://opendev.org/openstack/kolla-ansible/commit/d22e3e995ca2018dfc504da45a869032ed5cd7d3
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d22e3e995ca2018dfc504da45a869032ed5cd7d3
Author: Mark Goddard <email address hidden>
Date: Wed Jun 30 09:30:54 2021 +0100

    nova: Use cinder user for Ceph

    In Ussuri, nova stopped using separate Ceph keys for the volumes and vms
    pools by default. Instead, we set ceph_nova_keyring to the value of
    ceph_cinder_keyring by default, which is ceph.client.cinder.keyring.
    This is in line with the Ceph OpenStack integration guide [1]. However,
    the user used by nova to access the vms pool (ceph_nova_user) defaults
    to nova, meaning that nova will still try to use a
    ceph.client.nova.keyring, which probably does not exist. We did not see
    this issue in CI, because we set ceph_nova_user to cinder.

    This change fixes the issue by setting ceph_nova_user to the value of
    ceph_cinder_user by default, which is cinder.

    Closes-Bug: #1934145
    Related-Bug: #1928690

    [1] https://docs.ceph.com/en/latest/rbd/rbd-openstack/

    Change-Id: I6aa8db2214e07906f1f3e035411fc80ba911a274

tags: added: in-stable-wallaby
tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/803839
Committed: https://opendev.org/openstack/kolla-ansible/commit/8aa8e617d982222e4b3c8d6a26b7190baca1b199
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 8aa8e617d982222e4b3c8d6a26b7190baca1b199
Author: Mark Goddard <email address hidden>
Date: Wed Jun 30 09:30:54 2021 +0100

    nova: Use cinder user for Ceph

    In Ussuri, nova stopped using separate Ceph keys for the volumes and vms
    pools by default. Instead, we set ceph_nova_keyring to the value of
    ceph_cinder_keyring by default, which is ceph.client.cinder.keyring.
    This is in line with the Ceph OpenStack integration guide [1]. However,
    the user used by nova to access the vms pool (ceph_nova_user) defaults
    to nova, meaning that nova will still try to use a
    ceph.client.nova.keyring, which probably does not exist. We did not see
    this issue in CI, because we set ceph_nova_user to cinder.

    This change fixes the issue by setting ceph_nova_user to the value of
    ceph_cinder_user by default, which is cinder.

    Closes-Bug: #1934145
    Related-Bug: #1928690

    [1] https://docs.ceph.com/en/latest/rbd/rbd-openstack/

    Change-Id: I6aa8db2214e07906f1f3e035411fc80ba911a274

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/803840
Committed: https://opendev.org/openstack/kolla-ansible/commit/fcad47657a280949a81053d65cc10a24fc2f242e
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit fcad47657a280949a81053d65cc10a24fc2f242e
Author: Mark Goddard <email address hidden>
Date: Wed Jun 30 09:30:54 2021 +0100

    nova: Use cinder user for Ceph

    In Ussuri, nova stopped using separate Ceph keys for the volumes and vms
    pools by default. Instead, we set ceph_nova_keyring to the value of
    ceph_cinder_keyring by default, which is ceph.client.cinder.keyring.
    This is in line with the Ceph OpenStack integration guide [1]. However,
    the user used by nova to access the vms pool (ceph_nova_user) defaults
    to nova, meaning that nova will still try to use a
    ceph.client.nova.keyring, which probably does not exist. We did not see
    this issue in CI, because we set ceph_nova_user to cinder.

    This change fixes the issue by setting ceph_nova_user to the value of
    ceph_cinder_user by default, which is cinder.

    Closes-Bug: #1934145
    Related-Bug: #1928690

    [1] https://docs.ceph.com/en/latest/rbd/rbd-openstack/

    Change-Id: I6aa8db2214e07906f1f3e035411fc80ba911a274

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/796948
Committed: https://opendev.org/openstack/kolla-ansible/commit/e0e8ddf7574480aa5b004fad9b42a127145b0c25
Submitter: "Zuul (22348)"
Branch: master

commit e0e8ddf7574480aa5b004fad9b42a127145b0c25
Author: abraden <email address hidden>
Date: Thu Jun 17 20:18:30 2021 +0000

    Added upgrade note for separate nova and cinder keys.

    Closes-Bug: 1928690
    Change-Id: I1bf7c272c782134511e6553a1e2a4b7220556802

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/822623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/822624

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/822625

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/822626

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/822623
Committed: https://opendev.org/openstack/kolla-ansible/commit/832416e5063aacc398b99ccd80c946b2b5ffa75c
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 832416e5063aacc398b99ccd80c946b2b5ffa75c
Author: abraden <email address hidden>
Date: Thu Jun 17 20:18:30 2021 +0000

    Added upgrade note for separate nova and cinder keys.

    Closes-Bug: 1928690
    Change-Id: I1bf7c272c782134511e6553a1e2a4b7220556802

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/822624
Committed: https://opendev.org/openstack/kolla-ansible/commit/3a212faef9454cf679576c96654ed5c7e674e5ad
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 3a212faef9454cf679576c96654ed5c7e674e5ad
Author: abraden <email address hidden>
Date: Thu Jun 17 20:18:30 2021 +0000

    Added upgrade note for separate nova and cinder keys.

    Closes-Bug: 1928690
    Change-Id: I1bf7c272c782134511e6553a1e2a4b7220556802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/822625
Committed: https://opendev.org/openstack/kolla-ansible/commit/f022d47b932227ae69a73775b12c34793ff7cd35
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit f022d47b932227ae69a73775b12c34793ff7cd35
Author: abraden <email address hidden>
Date: Thu Jun 17 20:18:30 2021 +0000

    Added upgrade note for separate nova and cinder keys.

    Closes-Bug: 1928690
    Change-Id: I1bf7c272c782134511e6553a1e2a4b7220556802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/822626
Committed: https://opendev.org/openstack/kolla-ansible/commit/37c45f71bf7b117378153ad7f26667ef7fde4e10
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 37c45f71bf7b117378153ad7f26667ef7fde4e10
Author: abraden <email address hidden>
Date: Thu Jun 17 20:18:30 2021 +0000

    Added upgrade note for separate nova and cinder keys.

    Closes-Bug: 1928690
    Change-Id: I1bf7c272c782134511e6553a1e2a4b7220556802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 11.2.0

This issue was fixed in the openstack/kolla-ansible 11.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 12.3.0

This issue was fixed in the openstack/kolla-ansible 12.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 13.0.1

This issue was fixed in the openstack/kolla-ansible 13.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 14.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible ussuri-eol

This issue was fixed in the openstack/kolla-ansible ussuri-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.