During a high load benchmark on a HA deployment of RDO Newton Ceilometer collector will hit a deadlock killing mysql on the primary controller, the primary will then be moved and the next mysql server will fail with the same error immediately. This goes on continuously effectively killing the cloud.
This issue is more rarely observed in Ocata and Pike, but is 100% reproducible in Newton by running several hundred Neutron create operations back to back, before this bug was first observed the exact same set of operations completed slowly but without issue.
This was first observed after the introduction of Neutron L3 HA by default although no concrete link between that change and this issue has been found. Current theory L3 HA puts more load on the database, database hits open file limit, open file limit causes irresolvable locks to be acquired, when pacemaker tries to fail over the same actions are played back and the problem repeats.
During a high load benchmark on a HA deployment of RDO Newton Ceilometer collector will hit a deadlock killing mysql on the primary controller, the primary will then be moved and the next mysql server will fail with the same error immediately. This goes on continuously effectively killing the cloud.
This issue is more rarely observed in Ocata and Pike, but is 100% reproducible in Newton by running several hundred Neutron create operations back to back, before this bug was first observed the exact same set of operations completed slowly but without issue.
This was first observed after the introduction of Neutron L3 HA by default although no concrete link between that change and this issue has been found. Current theory L3 HA puts more load on the database, database hits open file limit, open file limit causes irresolvable locks to be acquired, when pacemaker tries to fail over the same actions are played back and the problem repeats.
Collector Log pastebin
https:/ /paste. fedoraproject. org/paste/ e-gHTVEqB- gnQKMCAqtt1Q
Full log (warning large)
https:/ /thirdparty. logs.rdoproject .org/jenkins- browbeat- quickstart- ocata-baremetal -mixed- 20/overcloud- controller- 0/var/log/ ceilometer/ collector. log.txt. gz
mariadb log pastebin
https:/ /paste. fedoraproject. org/paste/ a077ZTXXma0kGar bNZ~Aug