Cinder Volume deadlock in quota_reserve and reservation_commit

Bug #1613947 reported by James Dempsey on 2016-08-17
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Cinder
Medium
Unassigned

Bug Description

Summary:

Deleting an instance with multiple volumes attached is causing deadlocks in cinder-volume quota_reserve and reservation_commit.

Version: 8.0.0

OS: Ubuntu 14.04

Database: Galera Cluster

Impact:

cinder-volume hangs while deadlocks occur, causing instance creation to fail in production and pre-production environments.

Reproduction:
See attached docs which detail the reproduction and configuration.

Basically, create an instance with a volume-from-image root device and six attached volumes(all backed by ceph RBD). Delete this instance with 'nova delete'

Logs: see attached.

Cinder Config:

[DEFAULT]
glance_api_servers = https://API_URL:9292
glance_api_version = 2
enable_v1_api = True
enable_v2_api = True
enable_v3_api = True
storage_availability_zone = TEST-1
default_availability_zone = TEST-1
default_volume_type = b1.standard
volume_usage_audit_period = hour
auth_strategy = keystone
enabled_backends = b1.standard
osapi_volume_listen = 0.0.0.0
osapi_volume_workers = 4
scheduler_max_attempts = 3
volume_backend_name = DEFAULT
rbd_pool = volumes
rbd_user = volumes
rbd_ceph_conf =/etc/ceph/ceph.conf
rbd_secret_uuid = REDACTED
scheduler_default_weighers = CapacityWeigher
scheduler_driver = cinder.scheduler.filter_scheduler.FilterScheduler
nova_catalog_info = compute:Compute Service:publicURL
nova_catalog_admin_info = compute:Compute Service:adminURL
os_region_name = test-1
volume_driver = cinder.volume.drivers.rbd.RBDDriver
debug = False
verbose = True
log_dir = /var/log/cinder
use_syslog = True
syslog_log_facility = LOG_USER
rpc_backend = rabbit
control_exchange = cinder
api_paste_config = /etc/cinder/api-paste.ini
notification_driver=messagingv2
backend_host=rbd:volumes
[BACKEND]
[BRCD_FABRIC_EXAMPLE]
[CISCO_FABRIC_EXAMPLE]
[COORDINATION]
[FC-ZONE-MANAGER]
[KEYMGR]
[cors]
[cors.subdomain]
[database]
connection = mysql://cinder:REDACTED@REDACTED/cinder
idle_timeout = 60
[keystone_authtoken]
auth_uri = https://API_URL:5000/
admin_password=REDACTED
admin_tenant_name=services
identity_uri=https://API_URL:35357/
admin_user=cinder
[matchmaker_redis]
[oslo_concurrency]
lock_path = /var/lock/cinder
[oslo_messaging_amqp]
[oslo_messaging_notifications]
[oslo_messaging_rabbit]
amqp_durable_queues = False
rabbit_hosts = REDACTED:5672,REDACTED:5672
rabbit_use_ssl = False
rabbit_userid = cinder
rabbit_password = REDACTED
rabbit_virtual_host = /
rabbit_ha_queues = True
heartbeat_timeout_threshold = 0
heartbeat_rate = 2
[oslo_middleware]
[oslo_policy]
policy_file = /etc/cinder/policy.json
[oslo_reports]
[oslo_versionedobjects]
[ssl]
ca_file = False
cert_file = /REDACTED
key_file = /REDACTED
[b1.standard]
rbd_user=volumes
volume_backend_name=b1.standard
backend_host=rbd:volumes
rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_secret_uuid=REDACTED
rbd_max_clone_depth=5
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_pool=volumes

James Dempsey (jamespd) wrote :
James Dempsey (jamespd) wrote :

attaching cinder-volume logs.

Changed in cinder:
importance: Undecided → Medium
haobing1 (haobing1) on 2016-12-06
Changed in cinder:
assignee: nobody → haobing1 (haobing1)
haobing1 (haobing1) on 2016-12-14
Changed in cinder:
assignee: haobing1 (haobing1) → nobody
Arne Wiebalck (arne-wiebalck) wrote :

Just realised: this seems to be the same problem as I reported in https://bugs.launchpad.net/cinder/+bug/1685818.

Gerhard Muntingh (gerhard-1) wrote :

you should use connection = mysql+pymysql://cinder:REDACTED@REDACTED/cinder

otherwise the greenthreads will block each other while waiting for IO.

I'm pretty sure this solves this issue.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers