Scheduler logs "Re-scheduling is disabled" on initial server create scheduling even though max_attempts>0

Bug #1738876 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
jichenjc

Bug Description

Looking at these scheduler logs during the initial select_destinations flow for a server create:

http://logs.openstack.org/89/527289/1/check/ironic-tempest-dsvm-ipa-wholedisk-agent_ipmitool-tinyipa-multinode/406d4ab/logs/screen-n-sch.txt.gz#_Dec_12_22_23_47_783670

There are 12 candidate hosts and the RetryFilter logs that re-scheduling is disabled on all 12 of them:

Dec 12 22:23:47.783670 ubuntu-xenial-ovh-bhs1-0001397778 nova-scheduler[19797]: DEBUG nova.filters [None req-bd0816cf-243e-4ceb-90cb-053b15dbcbc7 tempest-TestServerBasicOps-1888713420 tempest-TestServerBasicOps-1888713420] Starting with 12 host(s) {{(pid=19797) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70}}
Dec 12 22:23:47.783964 ubuntu-xenial-ovh-bhs1-0001397778 nova-scheduler[19797]: DEBUG nova.scheduler.filters.retry_filter [None req-bd0816cf-243e-4ceb-90cb-053b15dbcbc7 tempest-TestServerBasicOps-1888713420 tempest-TestServerBasicOps-1888713420] Re-scheduling is disabled {{(pid=19797) host_passes /opt/stack/new/nova/nova/scheduler/filters/retry_filter.py:38}}
...

This is confusing because CONF.scheduler.max_attempts is the default value of 3:

Dec 12 22:01:58.784033 ubuntu-xenial-ovh-bhs1-0001397778 nova-scheduler[19797]: DEBUG oslo_service.service [None req-f45f1271-8d1d-4bb7-8bea-4d3a39be493c None None] scheduler.max_attempts = 3 {{(pid=19797) log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2888}}

So retries are not disabled. The problem appears to be that we don't set the RequestSpec.retry field before calling select_destinations.

The schedule_and_build_instance method in conductor calls the scheduler here:

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L1016

And doesn't populate filter properties (request spec) until later:

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L1108

Compare that to the old build_instances method where retry is populated before calling the scheduler:

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L543

https://github.com/openstack/nova/blob/07c925a5321e379293bbf0e55bf3c40798eaf21b/nova/conductor/manager.py#L548

This is mostly just a serviceability bug because we'll continue to support reschedules in the chosen compute host fails, but it's definitely confusing when looking at the scheduler logs on the initial create.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/528835

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

The patch in comment 1 is for another bug.

Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Triaged
Matt Riedemann (mriedem)
Changed in nova:
importance: Medium → Low
jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/541655

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/541655
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=128c179e29fec9741a887a23a808b18feb10a551
Submitter: Zuul
Branch: master

commit 128c179e29fec9741a887a23a808b18feb10a551
Author: jichenjc <email address hidden>
Date: Sat Feb 3 07:27:16 2018 +0800

    Only log during pop retry phase

    Retry filter reports confusing log 'Re-scheduling is disabled'
    now even if the retry attempts > 0 , this is because we populate
    retry after calling scheduling function in conductor, this patch
    only log the info when needed in populate function.

    Change-Id: I67b3650564503061480b439d6265a3b4f45ac16c
    Closes-Bug: 1738876

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.