services hang when time is jumping forward and backward
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Won't Fix
|
High
|
MOS Nova | ||
10.0.x |
Won't Fix
|
High
|
MOS Nova | ||
7.0.x |
Won't Fix
|
High
|
MOS Maintenance | ||
8.0.x |
Won't Fix
|
High
|
MOS Maintenance | ||
9.x |
Won't Fix
|
High
|
MOS Maintenance |
Bug Description
MOS 9.0
Consider the following scenario:
The cluster is power-cycled, some nodes have their system time reset due to hardware.
When the cluster is up, ntp starts synchronizing the time. Meanwhile, services try to connect to rabbitmq and waiting for rabbitmq to be available using time.sleep(), monkeypatched by eventlet.
For some reason during the sync the time gets adjusted forward and backwards several years (that much because default systime is 200x-01-01), that makes monkeypatched time.sleep(
The root cause is that eventlet is using non-monotonic timer.
Related patch that fixes the issue for oslo: https:/
That approach should be extended to all services that utilize oslo_service
description: | updated |
Changed in mos: | |
importance: | Undecided → High |
tags: | added: sla1 |
tags: | added: area-linux |
Changed in mos: | |
milestone: | 9.2 → 10.0 |
Changed in mos: | |
status: | Confirmed → Won't Fix |
>>> That approach should be extended to all services that utilize oslo_service
My understanding is that oslo_service part of the problem is actually fixed and now we need to make eventlet use a monotonic clock everywhere. The main thing we are interested in is probably Hub class: https:/ /github. com/eventlet/ eventlet/ blob/master/ eventlet/ hubs/hub. py#L116
But it's initialized implicitly when used in OpenStack:
http:// paste.openstack .org/show/ 589381/
thus, we'll probably need to patch eventlet directly. I took a quick look at https:/ /github. com/eventlet/ eventlet/ blob/master/ eventlet/ hubs/hub. py module and looks like using of a monotonic clock there must be fine, as we are not really interested in time value itself, but only in differences between two given points of time.
But there are more usages of time.time() in eventlet - http:// paste.openstack .org/show/ 589384/ - we'll need to check those as well.
I don't think this is suitable for a stable release, though. This must be thoroughly tested in our current development branches first.