Cinder

Cinder backup appears as down

Bug #2026877 reported by Gorka Eguileor on 2023-07-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	New	High	Unassigned

Bug Description

When doing concurrent backup operations the backup service may appear as being down and the connection with the RabbitMQ broker may be lost.

This is problematic because any monitoring service (Pacemaker, Kubernetes/OpenShift probes) will detect the service is down and take action.

This action is usually to restart the service or stop it and run it somewhere else. In both cases this will stop all ongoing operations.

Increasing the service_down_time is not great either because it also affects cinder-volume, and it's not like 60 seconds is a low time anyway.

Example of the RabbitMQ connection issue:
2023-07-11 11:02:30.117 136067 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer

If we increase the service_down_time we will get to see complains from the backup service about not being able to report to the DB in time.
2023-07-11 11:25:29.215 378376 WARNING oslo.service.loopingcall [None req-57cdd23b-77a2-4b92-8075-e7ff971ae80e - - - - - -] Function 'cinder.service.Service.report_state' run outlasted interval by 61.92 sec

Tags:

Sofia Enriquez (lsofia-enriquez) on 2023-07-12

Changed in cinder:
importance:	Undecided → High

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.