Instance affinity filters do not work in a heterogeneous cloud with Ironic computes

Bug #1606496 reported by Roman Podoliaka
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Roman Podoliaka

Bug Description

Description
===========

In a heterogeneous cloud with both libvirt and ironic compute nodes instance affinity filters like DifferentHostFilter or SameHostFilter do not filter hosts out when scheduling a subsequent instance.

Steps to reproduce
==================

Make sure you have at least two libvirt compute nodes and one ironic node.

Make sure DifferentHostFilter and SameHostFilter are configured as nova-scheduler filters in nova.conf, filters scheduler is used.

1. Boot a libvirt instance A.
2. Check the host name of the compute node instance A is running on (nova show from an admin user).
3. Boot a libvirt instance B passing a different_host=$A.uuid hint for nova-scheduler.
4. Check the host name of the compute node instance B is running on (nova show from an admin user).

Expected result
===============

Instances A and B are running on two different compute nodes.

Actual result
=============

Instances A and B are running on the same compute node.

nova-scheduler logs shows that DifferentHost filter was run, but did not filter out one of the hosts: Filter DifferentHostFilter returned 2 host(s) get_filtered_objects

Environment
===========

OpenStack Mitaka

2 libvirt compute nodes
1 ironic compute node
FiltersScheduler is used
DifferentHostFilter and SameHostFilter filters are enabled in nova.conf

Root cause analysis
===================

Debugging shown that IronicHostManager is configured to be used by nova-scheduler instead of the default host manager, when Ironic compute are deployed in the same cloud together with libvirt compute nodes.

IronicHostManager overrides the _get_instance_info() method and unconditionally returns an empty instance dict, even if this method is called for non-ironic computes of the same cloud. DifferentHostFilter and similar filters later use this info to find an intersection of a set of instances running on a libvirt compute node (currently, always {}) and a set of instances uuids passed as a hint for nova-scheduler, thus compute nodes are never filtered out and the hint is effectively ignored.

Changed in nova:
assignee: nobody → Roman Podoliaka (rpodolyaka)
description: updated
tags: added: ironic
tags: added: scheduler
Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → High
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/347948

Michael Davies (mrda)
Changed in nova:
importance: High → Medium
tags: added: tempest
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

I've described the same bug in MOS project because we need to make a backport:
https://bugs.launchpad.net/mos/+bug/1618406

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/346966
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=af218caba4f532c7d182071ab4304d49d02de08f
Submitter: Jenkins
Branch: master

commit af218caba4f532c7d182071ab4304d49d02de08f
Author: Roman Podoliaka <email address hidden>
Date: Mon Jul 25 20:35:29 2016 +0300

    ironic_host_manager: fix population of instances info on schedule

    IronicHostManager currently overrides the _get_instance_info() method
    of the base class and unconditionally returns an empty dict of
    instances for a given compute node.

    The problem with that is that in a heterogeneous cloud with both
    libvirt and ironic compute nodes this will always return {} for the
    former too, which is incorrect and will effectively break instance
    affinity filters like DifferentHostFilter or SameHostFilter, that
    check set intersections of instances running on a particular host and
    the ones passed as a hint for nova-scheduler in a boot request.

    IronicHostManager should use the method implementation of the base
    class for non-ironic compute nodes.

    This is a partial fix which only modifies _get_instance_info() called
    down the select_destinations() stack. A following change will modify
    _init_instance_info() that pre-populates node instances info on start
    of a nova-scheduler process.

    Partial-Bug: #1606496

    Change-Id: Ib1ddb44d71f7b085512c1f3fc0544f7b00c754fe

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/347948
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cc64a45d98d7576a78a853cc3da8109c31f4b75d
Submitter: Jenkins
Branch: master

commit cc64a45d98d7576a78a853cc3da8109c31f4b75d
Author: Roman Podoliaka <email address hidden>
Date: Wed Jul 27 19:46:16 2016 +0300

    ironic_host_manager: fix population of instances info on start

    IronicHostManager currently overrides the _init_instance_info()
    method of the base class and unconditionally skips population of
    instances information for all compute nodes, even if they are not
    Ironic ones.

    If there are compute nodes with the hypervisor_type different from
    Ironic in the same cloud. the instances info will be missing in
    nova-scheduler (if IronicHostManager is configured as a host manager
    impl in nova.conf), which will effectively break instance affinity
    filters like DifferentHostFilter or SameHostFilter, that check set
    intersections of instances running on a particular host and the ones
    passed as a hint for nova-scheduler in a boot request.

    IronicHostManager should use the method implementation of the base
    class for non-ironic compute nodes.

    Ib1ddb44d71f7b085512c1f3fc0544f7b00c754fe fixed the problem with
    scheduling, this change is needed to make sure we also populate the
    instances info on start of nova-scheduler.

    Closes-Bug: #1606496

    Co-Authored-By: Timofei Durakov <email address hidden>

    Change-Id: I9d8d2dc99773df4097c178d924d182a0d1971bcc

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/367402

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/367403

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/367402
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6f1151378b24f134bd7bf41fcfa0e4c65aca86e7
Submitter: Jenkins
Branch: stable/mitaka

commit 6f1151378b24f134bd7bf41fcfa0e4c65aca86e7
Author: Roman Podoliaka <email address hidden>
Date: Mon Jul 25 20:35:29 2016 +0300

    ironic_host_manager: fix population of instances info on schedule

    IronicHostManager currently overrides the _get_instance_info() method
    of the base class and unconditionally returns an empty dict of
    instances for a given compute node.

    The problem with that is that in a heterogeneous cloud with both
    libvirt and ironic compute nodes this will always return {} for the
    former too, which is incorrect and will effectively break instance
    affinity filters like DifferentHostFilter or SameHostFilter, that
    check set intersections of instances running on a particular host and
    the ones passed as a hint for nova-scheduler in a boot request.

    IronicHostManager should use the method implementation of the base
    class for non-ironic compute nodes.

    This is a partial fix which only modifies _get_instance_info() called
    down the select_destinations() stack. A following change will modify
    _init_instance_info() that pre-populates node instances info on start
    of a nova-scheduler process.

    Partial-Bug: #1606496

    (cherry-picked from af218caba4f532c7d182071ab4304d49d02de08f)
    Change-Id: Ib1ddb44d71f7b085512c1f3fc0544f7b00c754fe

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367403
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=13513e6232366a82a8a81c01f9ec8210f6da3a30
Submitter: Jenkins
Branch: stable/mitaka

commit 13513e6232366a82a8a81c01f9ec8210f6da3a30
Author: Roman Podoliaka <email address hidden>
Date: Wed Jul 27 19:46:16 2016 +0300

    ironic_host_manager: fix population of instances info on start

    IronicHostManager currently overrides the _init_instance_info()
    method of the base class and unconditionally skips population of
    instances information for all compute nodes, even if they are not
    Ironic ones.

    If there are compute nodes with the hypervisor_type different from
    Ironic in the same cloud. the instances info will be missing in
    nova-scheduler (if IronicHostManager is configured as a host manager
    impl in nova.conf), which will effectively break instance affinity
    filters like DifferentHostFilter or SameHostFilter, that check set
    intersections of instances running on a particular host and the ones
    passed as a hint for nova-scheduler in a boot request.

    IronicHostManager should use the method implementation of the base
    class for non-ironic compute nodes.

    Ib1ddb44d71f7b085512c1f3fc0544f7b00c754fe fixed the problem with
    scheduling, this change is needed to make sure we also populate the
    instances info on start of nova-scheduler.

    Closes-Bug: #1606496

    Co-Authored-By: Timofei Durakov <email address hidden>

    (cherry-picked from cc64a45d98d7576a78a853cc3da8109c31f4b75d)
    Change-Id: I9d8d2dc99773df4097c178d924d182a0d1971bcc

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.0.0rc1

This issue was fixed in the openstack/nova 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.0.0rc1

This issue was fixed in the openstack/nova 14.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 13.1.2

This issue was fixed in the openstack/nova 13.1.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.