Debug logging in scheduler kills performance

Bug #1620692 reported by Ryan Rossiter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Radoslav Gerganov

Bug Description

When using the filter_scheduler with 400 hosts (not a very large number) and debug logging turned on, scheduling times start taking a very long time. With debug logging on, select_destinations() can swing anywhere between 3 and 18 seconds. With debug logging off, select_destinations() takes 0-4 seconds (http://paste.openstack.org/show/566153/).

The main problem is https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L162-L178 because in a 400 host environment, it's trying to log debug 1600 times on every instance boot.

Ryan Rossiter (rlrossit)
tags: added: scheduler
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/366289

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Not exactly sold on the idea that DEBUG log is needed for a 400-node production cloud. Sure we could just add a new TRACE level, but why not try to make sure that when operators need DEBUG, they should rather have INFO or AUDIT levels ?

Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Ryan Rossiter (<email address hidden>) on branch: master
Review: https://review.openstack.org/366289

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Ryan Rossiter (<email address hidden>) on branch: master
Review: https://review.openstack.org/366283

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/598611
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4fd7c93726eff4cc0b010741ea1772cf19c314fc
Submitter: Zuul
Branch: master

commit 4fd7c93726eff4cc0b010741ea1772cf19c314fc
Author: Radoslav Gerganov <email address hidden>
Date: Fri Aug 31 13:53:19 2018 +0300

    Do not dump all instances in the scheduler

    There is a log message in the scheduler which dumps all instances
    running on a compute host. There are at least 2 problems with this:
    1. it generates huge amount of logs which are not really useful
    2. it crashes when there is an instance with non-ascii name

    Instead of fixing 2), just print instance UUIDs.

    Closes-Bug: #1790126
    Related-Bug: #1620692
    Change-Id: I0eda1c58a7eb54121230c880818b4b1d0fdf4893

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/601285

Changed in nova:
assignee: Matt Riedemann (mriedem) → s10 (vlad-esten)
s10 (vlad-esten)
Changed in nova:
assignee: s10 (vlad-esten) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/601285
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1a22c456b4c1ccd41ffa1bae5a37ba88652b2081
Submitter: Zuul
Branch: stable/rocky

commit 1a22c456b4c1ccd41ffa1bae5a37ba88652b2081
Author: Radoslav Gerganov <email address hidden>
Date: Fri Aug 31 13:53:19 2018 +0300

    Do not dump all instances in the scheduler

    There is a log message in the scheduler which dumps all instances
    running on a compute host. There are at least 2 problems with this:
    1. it generates huge amount of logs which are not really useful
    2. it crashes when there is an instance with non-ascii name

    Instead of fixing 2), just print instance UUIDs.

    Closes-Bug: #1790126
    Related-Bug: #1620692
    Change-Id: I0eda1c58a7eb54121230c880818b4b1d0fdf4893
    (cherry picked from commit 4fd7c93726eff4cc0b010741ea1772cf19c314fc)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/629822

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/629822
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16b2a56005cc44973ec87f71037214db73264580
Submitter: Zuul
Branch: stable/queens

commit 16b2a56005cc44973ec87f71037214db73264580
Author: Radoslav Gerganov <email address hidden>
Date: Fri Aug 31 13:53:19 2018 +0300

    Do not dump all instances in the scheduler

    There is a log message in the scheduler which dumps all instances
    running on a compute host. There are at least 2 problems with this:
    1. it generates huge amount of logs which are not really useful
    2. it crashes when there is an instance with non-ascii name

    Instead of fixing 2), just print instance UUIDs.

    Closes-Bug: #1790126
    Related-Bug: #1620692
    Change-Id: I0eda1c58a7eb54121230c880818b4b1d0fdf4893
    (cherry picked from commit 4fd7c93726eff4cc0b010741ea1772cf19c314fc)
    (cherry picked from commit 1a22c456b4c1ccd41ffa1bae5a37ba88652b2081)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/366283
Reason: I0eda1c58a7eb54121230c880818b4b1d0fdf4893 more or less fixed this.

Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Fix Released
assignee: nobody → Radoslav Gerganov (rgerganov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.opendev.org/659832

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.opendev.org/659833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/659832
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fc8cd8f3b346c8f53e2dfc8e3de9fdcaedb0d35d
Submitter: Zuul
Branch: stable/pike

commit fc8cd8f3b346c8f53e2dfc8e3de9fdcaedb0d35d
Author: Radoslav Gerganov <email address hidden>
Date: Fri Aug 31 13:53:19 2018 +0300

    Do not dump all instances in the scheduler

    There is a log message in the scheduler which dumps all instances
    running on a compute host. There are at least 2 problems with this:
    1. it generates huge amount of logs which are not really useful
    2. it crashes when there is an instance with non-ascii name

    Instead of fixing 2), just print instance UUIDs.

    Closes-Bug: #1790126
    Related-Bug: #1620692
    Change-Id: I0eda1c58a7eb54121230c880818b4b1d0fdf4893
    (cherry picked from commit 4fd7c93726eff4cc0b010741ea1772cf19c314fc)
    (cherry picked from commit 1a22c456b4c1ccd41ffa1bae5a37ba88652b2081)
    (cherry picked from commit 16b2a56005cc44973ec87f71037214db73264580)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.opendev.org/659833
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a8b8524e724141a7abc9e9bc73c324300851f956
Submitter: Zuul
Branch: stable/ocata

commit a8b8524e724141a7abc9e9bc73c324300851f956
Author: Radoslav Gerganov <email address hidden>
Date: Fri Aug 31 13:53:19 2018 +0300

    Do not dump all instances in the scheduler

    There is a log message in the scheduler which dumps all instances
    running on a compute host. There are at least 2 problems with this:
    1. it generates huge amount of logs which are not really useful
    2. it crashes when there is an instance with non-ascii name

    Instead of fixing 2), just print instance UUIDs.

    Closes-Bug: #1790126
    Related-Bug: #1620692
    Change-Id: I0eda1c58a7eb54121230c880818b4b1d0fdf4893
    (cherry picked from commit 4fd7c93726eff4cc0b010741ea1772cf19c314fc)
    (cherry picked from commit 1a22c456b4c1ccd41ffa1bae5a37ba88652b2081)
    (cherry picked from commit 16b2a56005cc44973ec87f71037214db73264580)
    (cherry picked from commit fc8cd8f3b346c8f53e2dfc8e3de9fdcaedb0d35d)

tags: added: in-stable-ocata
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.