Collector does not retry sample create transaction in case of deadlock and gives up, losing data
Bug #1432914 reported by
Rohit Jaiswal
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceilometer |
Fix Released
|
Medium
|
Rohit Jaiswal |
Bug Description
When there are multiple mySQL db processes executing like ceilometer-expirer trying to expire samples and remove orphaned metering data, it acquire locks on tables. An incoming sample create from collector also tries to acquire locks on same table but due to locks already been acquired, either times out or gets rolled back (leading to an error in collector).
The sample insert transaction does not seem to get re-tried and the sample data gets lost.
The sqlalchemy storage driver should implement transaction retries, especially in case of deadlocks.
Changed in ceilometer: | |
assignee: | nobody → Rohit Jaiswal (rohit-jaiswal-3) |
Changed in ceilometer: | |
importance: | Undecided → Medium |
milestone: | none → kilo-rc1 |
Changed in ceilometer: | |
status: | Fix Committed → Fix Released |
Changed in ceilometer: | |
milestone: | kilo-rc1 → 2015.1.0 |
To post a comment you must log in.
for avoid data loss, you can try requeue_ event_on_ dispatcher_ error /github. com/openstack/ ceilometer/ blob/3f9e48155a 6dd474a7843dc9a aaef378c7f4ca53 /ceilometer/ collector. py#L45
https:/
there are two problems when ceilometer-expirer runs: /bugs.launchpad .net/bugs/ 1431986 event_on_ dispatcher_ error is enabled in large scale env
1) potential dead lock, see bug: https:/
2) too long time cost which may cause AMQP holds too many messages if requeue_