service_down_time shouldn't be allowed to be less than report_interval

Bug #1255685 reported by Soren Hansen on 2013-11-27
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Low
Liyingjun
OpenStack Compute (nova)
Low
Liyingjun

Bug Description

If service_down_time is less than report_interval, services will routinely be considered down, because they report in too rarely.

We should err out somehow if someone tries to do this. (Not that anyone would. *ahem*)

Matt Riedemann (mriedem) wrote :

Makes sense to me. Looks like nova.service.Service.basic_config_check would be a good place to check this. Would you completely keep the service from being created though? Or maybe log a warning and override the values to use the defaults?

Changed in nova:
status: New → Confirmed
Matt Riedemann (mriedem) wrote :

Looks like if basic_config_check fails, it's a system exit for the service, so that's probably good enough for this.

Changed in nova:
importance: Undecided → Low
Liyingjun (liyingjun) wrote :

I think this also needed by cinder and neutron.

Fix proposed to branch: master
Review: https://review.openstack.org/60760

Changed in cinder:
assignee: nobody → Liyingjun (liyingjun)
status: New → In Progress
Changed in cinder:
importance: Undecided → Low
Liyingjun (liyingjun) on 2013-12-09
Changed in nova:
assignee: nobody → Liyingjun (liyingjun)
Changed in nova:
status: Confirmed → In Progress
Matt Riedemann (mriedem) wrote :

Reviewed: https://review.openstack.org/60760
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=18b14203d8b30c9792d8c819f7993961b2ec8fc5
Submitter: Jenkins
Branch: master

commit 18b14203d8b30c9792d8c819f7993961b2ec8fc5
Author: liyingjun <email address hidden>
Date: Sat Nov 23 22:06:19 2013 +0800

    Make sure report_interval is less than service_down_time

    If service_down_time is less than report_interval, services will
    routinely be considered down, because they report in too rarely.
    Add check for service_down_time vs report_interval, if report_interval
    is larger than service_down_time, automatically change
    service_down_time to be report_interval * 2.5

    DocImpact

    Closes bug #1255685

    Change-Id: I9b291669ea201321b03a48d2a132810f3bace2dc

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2014-01-22
Changed in cinder:
milestone: none → icehouse-2
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/69288
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=80096b6fb62cec4056efd1aba070623a789571dc
Submitter: Jenkins
Branch: master

commit 80096b6fb62cec4056efd1aba070623a789571dc
Author: Zhiteng Huang <email address hidden>
Date: Thu Jan 9 14:54:22 2014 +0800

    Make sure report_interval is less than service_down_time

    Services that inherit service.py/Service class would register
    themselves to DB and then update stats periodically (every
    report_interval second). The consumer of this kind of information,
    like scheduler or 'os-service' API extension, will consider a service
    is 'up' (active) if last update from that service is not longer than
    'service_down_time' ago.

    The problem is if 'report_interval' was configured/provided greater
    than 'service_down_time' by mistake, services would then be always
    considered in 'down' state, which can result in unsuccesful placement
    of volume create request for example. This is what Bug #1255685 is
    about.

    In previous fix: https://review.openstack.org/#/c/60760/, a
    configuration check helper function basic_config_check() was added
    *wrongly* to WSGIService class instead of Service class. This patch
    moves the configuration check helper function and the check to the
    right place to make sure 'report_interval' is less then
    'service_down_time'.

    Closes-bug #1255685

    Change-Id: I14bd8c54e5ce20719844f437808ad98a011820de

Reviewed: https://review.openstack.org/60748
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=94e2cbd6dea374b7e355b1272de9ba6e1d9f7b0a
Submitter: Jenkins
Branch: master

commit 94e2cbd6dea374b7e355b1272de9ba6e1d9f7b0a
Author: liyingjun <email address hidden>
Date: Sat Nov 23 20:17:42 2013 +0800

    Make sure report_interval is less than service_down_time

    If service_down_time is less than report_interval, services will
    routinely be considered down, because they report in too rarely.
    Add check for service_down_time vs report_interval, if report_interval
    is larger than service_down_time, automatically change
    service_down_time to be report_interval * 2.5

    DocImpact

    Closes bug #1255685

    Change-Id: I1be42e1826142b8f3f2c39f3734bef713a12a693

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-3
Thierry Carrez (ttx) on 2014-03-05
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-04-17
Changed in nova:
milestone: icehouse-3 → 2014.1
Thierry Carrez (ttx) on 2014-04-17
Changed in cinder:
milestone: icehouse-2 → 2014.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers