Race in Cinder backup manager

Bug #1887859 reported by Alex Deiter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Undecided
Alex Deiter

Bug Description

I found a race condition in the Cinder backup manager:

- when the Cinder backup service starts
- it loads the configured backup driver
- and it calls the setup_backup_backend function two times

  1. from init_host()
  2. and from _report_driver_status()

  at the same time, and this causes the __init__ method from the configured driver to run again.

And this (for example for NFS backup driver) in turn, leads in an attempt to re-mount the NFS resource to the same mount point.

Steps to reproduce:
1. Configure NFS backup driver and enable debug:

cinder.conf
[DEFAULT]
debug = True
backup_driver = cinder.backup.drivers.nfs.NFSBackupDriver
backup_share = 10.3.35.43:/volumes/pool1/backup/ns5_nfs
backup_mount_options = vers=3

2. Restart cinder-backup service and check the service log file:
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61505]: INFO oslo_service.service [None req-b6b67a1d-ac40-46a9-8cfb-5927888c9f08 None None] Starting 1 workers
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.007s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:54 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mkdir -p /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.019s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:377}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.034s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG os_brick.remotefs.remotefs [-] Mounted 10.3.35.43:/volumes/pool1/backup/ns5_nfs using nfs. {{_mount_nfs /opt/stack/os-brick/os_brick/remotefs/remotefs.py:152}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG oslo_concurrency.processutils [-] CMD "mount -t nfs -o vers=3,minorversion=0,timeo=100,nolock 10.3.35.43:/volumes/pool1/backup/ns5_nfs /opt/stack/data/cinder/backup_mount/92b96a49dff33a167398d7d3876ac2ee" returned: 0 in 0.057s {{execute /usr/local/lib/python3.6/dist-packages/oslo_concurrency/processutils.py:416}}
Jul 16 20:37:55 openstack-master-lustre7 cinder-backup[61518]: DEBUG os_brick.remotefs.remotefs [-] Mounted 10.3.35.43:/volumes/pool1/backup/ns5_nfs using nfs. {{_mount_nfs /opt/stack/os-brick/os_brick/remotefs/remotefs.py:152}}

As you can see, the Cinder backup manager ran the __init__ function from the NFS driver two times at the same time.

As a workaround I added coordination lock to the NFS driver:

$ git diff
diff --git a/cinder/backup/drivers/nfs.py b/cinder/backup/drivers/nfs.py
index 886c2cb..0434d50 100644
--- a/cinder/backup/drivers/nfs.py
+++ b/cinder/backup/drivers/nfs.py
@@ -26,6 +26,7 @@ from oslo_config import cfg
 from oslo_log import log as logging

 from cinder.backup.drivers import posix
+from cinder import coordination
 from cinder import exception
 from cinder import interface
 from cinder import utils
@@ -78,6 +79,7 @@ class NFSBackupDriver(posix.PosixBackupDriver):
                 raise exception.InvalidConfigurationValue(option=flag,
                                                           value=val)

+ @coordination.synchronized('check-for-race')
     def _init_backup_repo_path(self):
         if self.backup_share is None:
             LOG.info("_init_backup_repo_path: "

But it would be better to remove second call of setup_backup_backend function from _report_driver_status function:

diff --git a/cinder/backup/manager.py b/cinder/backup/manager.py
index 203ce12..0026f21 100644
--- a/cinder/backup/manager.py
+++ b/cinder/backup/manager.py
@@ -1089,8 +1089,6 @@ class BackupManager(manager.SchedulerDependentManager):
         self._publish_service_capabilities(context)

     def _report_driver_status(self, context):
- if not self.is_working():
- self.setup_backup_backend(context)
         backup_stats = {
             'backend_state': self.is_working(),
             'driver_name': self.driver_name,

Alex Deiter (deiter)
Changed in cinder:
assignee: nobody → Alex Deiter (deiter)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/741543

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.0.0.0b1

This issue was fixed in the openstack/cinder 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.