Cinder

Bug #1685818
Comment #17

Comment 17 for bug 1685818

Revision history for this message

Alex Walender (awalende) wrote on 2019-12-05:

#17

The given solutions did NOT fix this bug for us.
pymysql+sql was already set and increasing the innodb_lock_wait_timeout did not help either.

We make a rally test with creating 100 volumes (20 concurrent) and deleting them afterwards.
On creation we see about 10% failing with "Lost connection to mysql during query", resulting in volumes stuck in Creation.

Therefore we increased connection_timeout in our mariadb from 10 to 30 with mixed result. The volume creation still gets slowed down after a the first 5 have been created.

Deletion of 100 volumes even takes longer, throwing a bunch of db lock exceptions in the end. Also we sometimes see rabbitmq missed heartbeats.

We are running:
OpenStack Rocky+Stein
Ceph RBD as backend
Mariadb with Galera
Volume Ceph pool with ~350 volumes

We have already increased osapi_volume_workers, tuned InnoDB, checked on ceph and what not.
My assumption is, that there must be a blocking call in the rbd driver that prevents releasing locks in time for the other volume creations. It blocks so hard, that even rabbit starts timing out.