Heartbeats stop when time is changed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Undecided
|
Roman Podoliaka | ||
Pike |
Fix Committed
|
Undecided
|
John Smith | ||
masakari |
Fix Released
|
Undecided
|
SamP | ||
oslo.service |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Heartbeats stop working when you mess with the system time. If a monotonic clock were used, they would continue to work when the system time was changed.
Steps to reproduce:
1. List the nova services ('nova-manage service list'). Note that the 'State' for each services is a happy face ':-)'.
2. Move the time ahead (for example 2 hours in the future), and then list the nova services again. Note that heartbeats continue to work and use the future time (see 'Updated_At').
3. Revert back to the actual time, and list the nova services again. Note that all heartbeats stop, and have a 'State' of 'XXX'.
4. The heartbeats will start again in 2 hours when the actual time catches up to the future time, or if you restart the services.
5. You'll see a log message like the following when the heartbeats stop:
2015-10-26 17:14:10.538 DEBUG nova.servicegro
Here's example output demonstrating the issue:
http://
See bug #1450438 for more context:
https:/
Long story short: looping call is using the built-in time rather than a monotonic clock for sleeps.
Oslo Service: version 0.11
Nova: master (commit 2c3f9c339cae245
description: | updated |
description: | updated |
tags: | added: oslo |
description: | updated |
Changed in oslo.service: | |
status: | New → Confirmed |
no longer affects: | nova |
Changed in nova: | |
assignee: | Roman Podoliaka (rpodolyaka) → Stephen Finucane (stephenfinucane) |
Changed in masakari: | |
assignee: | nobody → Dinesh Bhor (dinesh-bhor) |
Changed in masakari: | |
assignee: | Dinesh Bhor (dinesh-bhor) → SamP (sampath-priyankara) |
Changed in nova: | |
assignee: | Stephen Finucane (stephenfinucane) → Roman Podoliaka (rpodolyaka) |
Someone suggested that I try the following approach.
diff --git a/nova/service.py b/nova/service.py
index a09187d..396bf30 100644
--- a/nova/service.py
+++ b/nova/service.py
@@ -21,6 +21,9 @@ import os
import random
import sys
+import eventlet service. Service) :
periodic_ interval_ max=None, db_allowed=True,
*args, **kwargs):
super( Service, self).__init__() hubs.get_ hub()
self. binary = binary
+import monotonic
+
from oslo_concurrency import processutils
from oslo_config import cfg
from oslo_log import log as logging
@@ -139,6 +142,10 @@ class Service(
+
+ hub = eventlet.
+ hub.clock = monotonic.monotonic
+
self.host = host
self.topic = topic
After applying the above patch, all services except nova-conductor use monotonic for the clock in loopingcall. The nova-conductor service isn't working because it continues to use the bulit-in time, and sleeps in loopingcall until the actual time catches up to the future time. I'm not sure why the nova-conductor service continues to use the built-in time after the patch. Does anyone else know?
http:// paste.openstack .org/show/ 477406/
Thoughts on this approach or other approaches?
Thanks!