Cinder

Unable to create a volume backup within 1 min after restarting the cinder backup service

Bug #2059416 reported by Anton Kurbatov on 2024-03-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	In Progress	Undecided	Unassigned

Bug Description

I've encountered an odd behavior after restarting the cinder-backup process.
Subsequent attempts to create a volume backup result in errors despite the absence of any errors reported by the cinder-backup service.

1) Start the cinder-volume backup process (around 11:19:20).
2) Wait for 20-30 seconds, then attempt to create a volume backup (around 11:19:40)

If I wait a little more than 1 minute, subsequent backups are created successfully.

In the cinder-scheduler logs, I found the following lines:

Mar 28 11:19:26 ak-devstack0 cinder-scheduler[134179]: DEBUG cinder.scheduler.host_manager [None req-98a5047b-45e1-47db-a2a6-6681ef3c6a80 None None] Received backup service update from ak-devstack0: {'backend_state': False, 'driver_name': 'cinder.backup.drivers.nfs.NFSBackupDriver', 'availability_zone': 'nova'} {{(pid=134179) update_service_capabilities /opt/stack/cinder/cinder/scheduler/host_manager.py:597}}
Mar 28 11:19:43 ak-devstack0 cinder-scheduler[134179]: ERROR cinder.scheduler.manager [None req-960bd56a-974f-4d28-a5e1-458e9257ec46 demo None] Service not found for creating backup.: cinder.exception.ServiceNotFound: Service cinder-backup could not be found.
Mar 28 11:20:43 ak-devstack0 cinder-scheduler[134179]: DEBUG cinder.scheduler.host_manager [None req-fc2aa722-3509-4294-acaa-c22a84312590 None None] Received backup service update from ak-devstack0: {'backend_state': True, 'driver_name': 'cinder.backup.drivers.nfs.NFSBackupDriver', 'availability_zone': 'nova'} {{(pid=134179) update_service_capabilities /opt/stack/cinder/cinder/scheduler/host_manager.py:597}}

I've reviewed the code [1] and noted that the logs do not contain "Backup driver was successfully initialized" or "Failed to initialize driver."
Additionally, the init_loop.start method catches the LoopingCallDone exception, meaning that we never enter the 'except loopingcall.LoopingCallDone:' condition within setup_backup_backend.

[1] https://opendev.org/openstack/cinder/src/commit/54856da91045299537fdb69edf43fb61aba79cc6/cinder/backup/manager.py#L166

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-03-28: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/914641

Changed in cinder:
status:	New → In Progress

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.