cinder ceph backup driver breaking on conf_read_file

Bug #1881022 reported by Daniel Queen
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned

Bug Description

openstack ussuri
ceph octopus
cinder 16

When using ceph as backend for cinder volume, trying to initiate a backup using the cinder ceph backup driver results in an error of "rados.InvalidArgumentError: [errno 22] RADOS invalid argument (error calling conf_read_file)".
I've tried a number of different configurations and all result in the same error.

##### /var/log/cinder/backup.log (debug)

2020-05-27 16:22:38.718 5234 INFO cinder.backup.manager [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] Create backup started, backup: 238a2caa-05d9-4478-9981-a368e173d7f9 volume: 4cca54d9-9230-4227-989a-df311abf4ba0.
2020-05-27 16:22:38.726 5234 DEBUG os_brick.utils [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] ==> get_connector_properties: call "{'root_helper': 'sudo cinder-rootwrap /etc/cinder/rootwrap.conf', 'my_ip': '10.10.10.100', 'multipath': False, 'enforce_multipath': False, 'host': None, 'execute': None}" trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:146
2020-05-27 16:22:38.729 5234 INFO oslo.privsep.daemon [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] Running privsep helper: ['sudo', 'cinder-rootwrap', '/etc/cinder/rootwrap.conf', 'privsep-helper', '--config-file', '/usr/share/cinder/cinder-dist.conf', '--config-file', '/etc/cinder/cinder.conf', '--privsep_context', 'os_brick.privileged.default', '--privsep_sock_path', '/tmp/tmpzs510u0v/privsep.sock']
2020-05-27 16:22:39.574 5234 DEBUG oslo_db.sqlalchemy.engines [req-2792145b-5bc6-456e-9684-3a4243f32e61 - - - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/engines.py:304
2020-05-27 16:22:40.193 5234 INFO oslo.privsep.daemon [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] Spawned new privsep daemon via rootwrap
2020-05-27 16:22:40.194 5234 DEBUG oslo.privsep.daemon [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] Accepted privsep connection to /tmp/tmpzs510u0v/privsep.sock __init__ /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:335
2020-05-27 16:22:40.070 7137 INFO oslo.privsep.daemon [-] privsep daemon starting
2020-05-27 16:22:40.078 7137 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0
2020-05-27 16:22:40.085 7137 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none
2020-05-27 16:22:40.086 7137 INFO oslo.privsep.daemon [-] privsep daemon running as pid 7137
2020-05-27 16:22:40.200 7137 DEBUG oslo.privsep.daemon [-] privsep: reply[139903332001032]: (2,) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:475
2020-05-27 16:22:40.505 7137 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): cat /etc/iscsi/initiatorname.iscsi execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372
2020-05-27 16:22:40.512 7137 DEBUG oslo_concurrency.processutils [-] CMD "cat /etc/iscsi/initiatorname.iscsi" returned: 0 in 0.007s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409
2020-05-27 16:22:40.513 7137 DEBUG oslo.privsep.daemon [-] privsep: reply[139903332001032]: (4, ('InitiatorName=iqn.1994-05.com.redhat:77d26f985142\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:475
2020-05-27 16:22:40.515 5234 DEBUG os_brick.initiator.linuxfc [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] No Fibre Channel support detected on system. get_fc_hbas /usr/lib/python3.6/site-packages/os_brick/initiator/linuxfc.py:157
2020-05-27 16:22:40.516 5234 DEBUG os_brick.initiator.linuxfc [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] No Fibre Channel support detected on system. get_fc_hbas /usr/lib/python3.6/site-packages/os_brick/initiator/linuxfc.py:157
2020-05-27 16:22:40.518 7137 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): cat /sys/class/dmi/id/product_uuid execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:372
2020-05-27 16:22:40.525 7137 DEBUG oslo_concurrency.processutils [-] CMD "cat /sys/class/dmi/id/product_uuid" returned: 0 in 0.007s execute /usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py:409
2020-05-27 16:22:40.526 7137 DEBUG oslo.privsep.daemon [-] privsep: reply[139903332001032]: (4, ('c1f4ca28-9e84-499a-88c6-9c01545d1a71\n', '')) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:475
2020-05-27 16:22:40.531 5234 DEBUG os_brick.utils [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] <== get_connector_properties: return (1800ms) {'platform': 'x86_64', 'os_type': 'linux', 'ip': '10.10.10.100', 'host': 'controller', 'multipath': False, 'initiator': 'iqn.1994-05.com.redhat:77d26f985142', 'do_local_attach': False, 'system uuid': 'c1f4ca28-9e84-499a-88c6-9c01545d1a71'} trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:170
2020-05-27 16:22:41.337 5234 DEBUG os_brick.initiator.connector [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] Factory for rbd on None factory /usr/lib/python3.6/site-packages/os_brick/initiator/connector.py:301
2020-05-27 16:22:41.339 5234 DEBUG os_brick.initiator.connectors.rbd [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] ==> connect_volume: call "{'self': <os_brick.initiator.connectors.rbd.RBDConnector object at 0x7f3dc856c860>, 'connection_properties': {'name': 'volumes/volume-4cca54d9-9230-4227-989a-df311abf4ba0', 'hosts': ['10.10.10.110'], 'ports': ['6789'], 'cluster_name': 'ceph', 'auth_enabled': True, 'auth_username': 'volumes', 'secret_type': 'ceph', 'secret_uuid': '***', 'volume_id': '4cca54d9-9230-4227-989a-df311abf4ba0', 'discard': True, 'keyring': None, 'qos_specs': None, 'access_mode': 'rw', 'encrypted': False}}" trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:146
2020-05-27 16:22:41.341 5234 DEBUG os_brick.initiator.linuxrbd [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] opening connection to ceph cluster (timeout=-1). connect /usr/lib/python3.6/site-packages/os_brick/initiator/linuxrbd.py:70
2020-05-27 16:22:41.347 5234 DEBUG os_brick.initiator.connectors.rbd [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] <== connect_volume: exception (6ms) InvalidArgumentError('RADOS invalid argument (error calling conf_read_file)',) trace_logging_wrapper /usr/lib/python3.6/site-packages/os_brick/utils.py:156
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server [req-7a50dde1-0474-46b9-a8c9-dbe6e8a15929 5573c0ebec4e46f297fc31f04dc1987c 712909e5adea4cbfbcd38d97667313ad - default default] Exception during message handling: rados.InvalidArgumentError: [errno 22] RADOS invalid argument (error calling conf_read_file)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 276, in dispatch
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 196, in _do_dispatch
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 409, in create_backup
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server volume_utils.update_backup_error(backup, six.text_type(err))
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server raise value
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 398, in create_backup
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server updates = self._run_backup(context, backup, volume)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 469, in _run_backup
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server backup_device.is_snapshot)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1006, in _attach_device
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server return self._attach_volume(ctxt, backup_device, properties)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1028, in _attach_volume
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server {'volume_id', volume.id})
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server raise value
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1017, in _attach_volume
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server return self._connect_device(conn)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/cinder/backup/manager.py", line 1059, in _connect_device
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server vol_handle = connector.connect_volume(conn['data'])
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/utils.py", line 150, in trace_logging_wrapper
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/rbd.py", line 204, in connect_volume
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server rbd_handle = self._get_rbd_handle(connection_properties)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/connectors/rbd.py", line 123, in _get_rbd_handle
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server rbd_cluster_name=str(cluster_name))
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxrbd.py", line 60, in __init__
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server self.client, self.ioctx = self.connect()
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/os_brick/initiator/linuxrbd.py", line 73, in connect
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server conffile=self.rbd_conf)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "rados.pyx", line 692, in rados.Rados.__init__
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "rados.pyx", line 578, in rados.requires.wrapper.validate_func
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "rados.pyx", line 743, in rados.Rados.__setup
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "rados.pyx", line 578, in rados.requires.wrapper.validate_func
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server File "rados.pyx", line 826, in rados.Rados.conf_read_file
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server rados.InvalidArgumentError: [errno 22] RADOS invalid argument (error calling conf_read_file)
2020-05-27 16:22:41.445 5234 ERROR oslo_messaging.rpc.server

##### rpm versions on controller node

[root@controller (admin:admin) cinder]$ rpm -qa|grep cinder
python3-cinder-16.0.0-1.el8.noarch
openstack-cinder-16.0.0-1.el8.noarch
python3-cinderclient-7.0.0-1.el8.noarch
[root@controller (admin:admin) cinder]$ rpm -qa|grep ceph
python3-cephfs-15.2.2-0.el8.x86_64
libcephfs2-15.2.2-0.el8.x86_64
ceph-release-1-1.el8.noarch
python3-ceph-argparse-15.2.2-0.el8.x86_64
python3-ceph-common-15.2.2-0.el8.x86_64
ceph-common-15.2.2-0.el8.x86_64
[root@controller (admin:admin) cinder]$ rpm -qa|grep rbd
python3-rbd-15.2.2-0.el8.x86_64
librbd1-15.2.2-0.el8.x86_64
[root@controller (admin:admin) cinder]$ rpm -qa|grep rados
librados2-15.2.2-0.el8.x86_64
libradosstriper1-15.2.2-0.el8.x86_64
python3-rados-15.2.2-0.el8.x86_64

##### /etc/cinder/cinder.conf on controller node

[DEFAULT]
backup_ceph_conf = /etc/ceph/ceph.conf
backup_ceph_pool = ecbackup
backup_ceph_user = ecbackup
backup_driver = cinder.backup.drivers.ceph.CephBackupDriver
debug = true
enabled_backends = rbd,ecrbd
default_volume_type = Standard-CephRBD
my_ip = 10.10.10.100
auth_strategy = keystone
transport_url = rabbit://openstack:password@controller
[backend]
[backend_defaults]
[barbican]
[brcd_fabric_example]
[cisco_fabric_example]
[coordination]
[cors]
[database]
connection = mysql+pymysql://cinder:password@controller/cinder
[fc-zone-manager]
[healthcheck]
[key_manager]
[keystone_authtoken]
password = password
username = cinder
project_name = service
user_domain_name = default
project_domain_name = default
auth_type = password
memcached_servers = controller:11211
auth_url = http://controller:5000
www_authenticate_uri = http://controller:5000
[nova]
[oslo_concurrency]
lock_path = /var/lib/cinder/tmp
[oslo_messaging_amqp]
[oslo_messaging_kafka]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
[oslo_middleware]
[oslo_policy]
[oslo_reports]
[oslo_versionedobjects]
[privsep]
[profiler]
[sample_castellan_source]
[sample_remote_file_source]
[service_user]
[ssl]
[vault]
[rbd]
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_secret_uuid = 717fd0ac-ea8e-4c0f-b380-6179f9dc16da
rbd_user = volumes
rbd_pool = volumes
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = CephRBD
[ecrbd]
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_secret_uuid = 6ac87892-d82f-4253-99fd-0a4e17b0d253
rbd_user = ecvolumes
rbd_pool = ecvolumes
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = CephRBD-EC

##### /etc/ceph/ceph.conf on controller node

[global]
debug rbd = 20
debug rgw = 20
debug rados = 20
fsid = 80a99a18-ab0c-4a8f-af13-937626320534
mon initial members = ceph
mon host = 10.10.10.110
public network = 10.10.10.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 2
osd pool default min size = 2
osd pool default pg num = 32
osd pool default pgp num = 32
osd crush chooseleaf type = 0

[client.rgw.ceph]
host = ceph
rgw dns name = ceph
rgw_frontends = "civetweb port=8080"

[client.images]
keyring = /etc/ceph/ceph.client.images.keyring

[client.volumes]
keyring = /etc/ceph/ceph.client.volumes.keyring

[client.ecvolumes]
keyring = /etc/ceph/ceph.client.ecvolumes.keyring
rbd default data pool = ecvolumes_data

[client.ecbackup]
keyring = /etc/ceph/ceph.client.ecbackup.keyring
rbd default data pool = ecbackup_data

##### /etc/ceph/ceph.conf on ceph node

[global]
fsid = 80a99a18-ab0c-4a8f-af13-937626320534
mon initial members = ceph
mon host = 10.10.10.110
public network = 10.10.10.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 2
osd pool default min size = 2
osd pool default pg num = 32
osd pool default pgp num = 32
osd crush chooseleaf type = 0

[client.rgw.ceph]
rgw_swift_account_in_url = True
rgw_keystone_url = http://controller:5000
rgw_keystone_api_version = 3
rgw_keystone_admin_user = swift
rgw_keystone_admin_password = password
rgw_keystone_admin_tenant = service
rgw_keystone_admin_domain = default
rgw_keystone_accepted_roles = admin, member
rgw_keystone_token_cache_size = 10
rgw_keystone_revocation_interval = 300
rgw_keystone_make_new_tenants = true
rgw_keystone_implicit_tenants = true
rgw_rgw_swift_versioning_enabled = true
rgw_s3_auth_use_keystone = true
rgw_enable_apis = swift, s3
rgw_keystone_verify_ssl = false

host = ceph
rgw dns name = ceph
rgw_frontends = "civetweb port=8080"

Revision history for this message
Daniel Queen (crono782) wrote :

further info. This occurs when adding repo for ceph octopus instead of nautilus. Something must have changed in newer versions of rados packages in octopus release that will eventually break the cinder ceph backup driver.

Revision history for this message
JF Taltavull (jftalta) wrote :

Hello Daniel. I've experienced exactly the same bug, on the same configuration as yours (Ubuntu, Ussuri, Octopus).

Is this issue still present for you or did you implement a workaround or a code fix ?

Revision history for this message
JF Taltavull (jftalta) wrote :

The problem comes from os-brick component. A fix is included in os-brick 3.0.4 (see https://bugs.launchpad.net/tripleo/+bug/1865754) but Ussuri uses 3.0.3.

I jut created this story in openstack/requirements Storyboard: https://storyboard.openstack.org/#!/story/2008431

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.