masakari fails if hypervisor name does not match nova service name

Bug #1839715 reported by Liam Young
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
masakari
Fix Released
High
Radosław Piliszek
Stein
Won't Fix
High
Unassigned
Train
Fix Committed
High
Unassigned
Ussuri
Fix Committed
High
Unassigned
Victoria
Fix Released
High
Radosław Piliszek
masakari (Ubuntu)
Fix Released
High
James Page
Eoan
Won't Fix
High
James Page
Focal
Fix Released
High
James Page

Bug Description

When adding a host to a segment masakari validates the provided hostname is a valid hypervisor name *1. But when masakari is mapping nova services *2 and servers *3 it uses the host attribute which may not necessarily match. It is common for the hypervisor entry to be a fqdn but service entry to be a bare hostname. In this situation both service and servers lookup fail and throw an IndexError.

$ openstack hypervisor list -c ID -c "Hypervisor Hostname"
+----+------------------------------------------------------+
| ID | Hypervisor Hostname |
+----+------------------------------------------------------+
| 1 | juju-f4bd71-zaza-9db566d782a2-18.project.serverstack |
| 2 | juju-f4bd71-zaza-9db566d782a2-16.project.serverstack |
| 3 | juju-f4bd71-zaza-9db566d782a2-17.project.serverstack |
+----+------------------------------------------------------+

$ openstack compute service list -c ID -c Binary -c Host
+----+----------------+----------------------------------+
| ID | Binary | Host |
+----+----------------+----------------------------------+
| 1 | nova-conductor | juju-f4bd71-zaza-9db566d782a2-15 |
| 2 | nova-scheduler | juju-f4bd71-zaza-9db566d782a2-15 |
| 4 | nova-compute | juju-f4bd71-zaza-9db566d782a2-18 |
| 5 | nova-compute | juju-f4bd71-zaza-9db566d782a2-16 |
| 6 | nova-compute | juju-f4bd71-zaza-9db566d782a2-17 |
+----+----------------+----------------------------------+

$ openstack server show 65419346-6509-4c73-b8f4-22c5aa7f0d77 -c 'OS-EXT-SRV-ATTR:host'
+----------------------+----------------------------------+
| Field | Value |
+----------------------+----------------------------------+
| OS-EXT-SRV-ATTR:host | juju-f4bd71-zaza-9db566d782a2-18 |
+----------------------+----------------------------------+

*1 https://github.com/openstack/masakari/blob/master/masakari/compute/nova.py#L248
*2 https://github.com/openstack/masakari/blob/master/masakari/compute/nova.py#L147
*3 https://github.com/openstack/masakari/blob/master/masakari/compute/nova.py#L154

Liam Young (gnuoy)
Changed in masakari:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari (master)

Fix proposed to branch: master
Review: https://review.opendev.org/675734

Changed in masakari:
status: New → In Progress
Tushar Patil (tpatil)
Changed in masakari:
importance: Undecided → High
Revision history for this message
Tushar Patil (tpatil) wrote :

gnuoy,

Have you set 'host' config option in /etc/nova/nova-cpu.conf on the compute nodes?

On my compute node machine:

$hostname
compute_01.xyz

$openstack compute service list --service nova-compute
+----+--------------+--------------------+------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+--------------+--------------------+------+---------+-------+----------------------------+
| 12 | nova-compute | compute_01.xyz | nova | enabled | up | 2019-11-12T08:09:55.000000 |
+----+--------------+--------------------+------+---------+-------+----------------------------+

$openstack hypervisor list

It returns empty list. I can see below error in n-api.log.

n-api.log
=================

Nov 12 08:11:09 open <email address hidden>[37310]: DEBUG nova.api.openstack.compute.hypervisors [None req-2257554a-735d-4e0e-a660-798d5203aed2 admin admin] Unable to find service for compute node compute_01.xyz. The service may be deleted and compute nodes need to be manually cleaned up. {{(pid=37312) _get_hypervisors /opt/stack/nova/nova/api/openstack/compute/hypervisors.py:187}}

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in masakari (Ubuntu Eoan):
status: New → Confirmed
Changed in masakari (Ubuntu):
status: New → Confirmed
James Page (james-page)
Changed in masakari (Ubuntu Focal):
status: Confirmed → Triaged
Changed in masakari (Ubuntu Eoan):
status: Confirmed → Triaged
importance: Undecided → High
Changed in masakari (Ubuntu Focal):
importance: Undecided → High
Changed in masakari (Ubuntu Eoan):
assignee: nobody → James Page (james-page)
Changed in masakari (Ubuntu Focal):
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package masakari - 9.0.0~b2~git2020020609.8b122a8-0ubuntu2

---------------
masakari (9.0.0~b2~git2020020609.8b122a8-0ubuntu2) focal; urgency=medium

  * d/p/allow-bare-hostnames.patch: Cherry pick inflight fix to allow use of
    bare hostnames when adding hosts to segments (LP: #1839715).
  * d/p/python3.8-compat.patch,skip-py38-failures.patch: Drop skip patch and
    pick proposed fixes for Python 3.8 compatibility (LP: #1860265).
  * d/rules: Ensure Python doctree's are not included in binary packages.
  * d/control: Tidy lintian warnings with regards to package long
    descriptions.

 -- James Page <email address hidden> Mon, 17 Feb 2020 16:22:28 +0000

Changed in masakari (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
Ryan Beisner (1chb1n) wrote :

For the record, this bug should not affect a Charmed OpenStack deployment, as it is handled by the following fix:

https://opendev.org/openstack/charm-nova-compute/commit/1869bfbc9711eac157821f8a4702409822c0842e
https://bugs.launchpad.net/charm-nova-compute/+bug/1839300

Revision history for this message
Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in masakari (Ubuntu Eoan):
status: Triaged → Won't Fix
Changed in masakari:
assignee: Liam Young (gnuoy) → Dmitriy Rabotyagov (noonedeadpunk)
Changed in masakari:
assignee: Dmitriy Rabotyagov (noonedeadpunk) → Radosław Piliszek (yoctozepto)
Changed in masakari:
assignee: Radosław Piliszek (yoctozepto) → Dmitriy Rabotyagov (noonedeadpunk)
Changed in masakari:
assignee: Dmitriy Rabotyagov (noonedeadpunk) → Radosław Piliszek (yoctozepto)
Changed in masakari:
assignee: Radosław Piliszek (yoctozepto) → Dmitriy Rabotyagov (noonedeadpunk)
Changed in masakari:
assignee: Dmitriy Rabotyagov (noonedeadpunk) → Radosław Piliszek (yoctozepto)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to masakari (master)

Reviewed: https://review.opendev.org/728629
Committed: https://git.openstack.org/cgit/openstack/masakari/commit/?id=4322968b893b242f229912c2b70e3895f0227402
Submitter: Zuul
Branch: master

commit 4322968b893b242f229912c2b70e3895f0227402
Author: Dmitriy Rabotyagov <email address hidden>
Date: Sat May 16 12:22:13 2020 +0300

    Search in nova services instead of hypervisors

    Nova services and hypervisor naming can differ, as they retireve node
    names in different way.
    In the meanwhile we operate with nova.services while enabling/disabling
    nodes duringh the incident. So we're supposed to have in database record
    matching to what we have in service list, but not in hypervisor list.

    Closes-Bug: #1839715
    Change-Id: I9c591d33f17a8d5950bdb1fc2d686e2301fc6d95

Changed in masakari:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on masakari (master)

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: master
Review: https://review.opendev.org/675734
Reason: we merged the alternative as a better solution

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/masakari/+/787899

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/masakari/+/787900

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to masakari (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/masakari/+/787899
Committed: https://opendev.org/openstack/masakari/commit/35519c0ce02092aaef6b802817b74d017a84b08b
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 35519c0ce02092aaef6b802817b74d017a84b08b
Author: Dmitriy Rabotyagov <email address hidden>
Date: Sat May 16 12:22:13 2020 +0300

    Search in nova services instead of hypervisors

    Nova services and hypervisor naming can differ, as they retireve node
    names in different way.
    In the meanwhile we operate with nova.services while enabling/disabling
    nodes duringh the incident. So we're supposed to have in database record
    matching to what we have in service list, but not in hypervisor list.

    Closes-Bug: #1839715
    Change-Id: I9c591d33f17a8d5950bdb1fc2d686e2301fc6d95
    (cherry picked from commit 4322968b893b242f229912c2b70e3895f0227402)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to masakari (stable/train)

Reviewed: https://review.opendev.org/c/openstack/masakari/+/787900
Committed: https://opendev.org/openstack/masakari/commit/9eff08c7a8041c28a5f9c5ecb2b9915e7f42ce8c
Submitter: "Zuul (22348)"
Branch: stable/train

commit 9eff08c7a8041c28a5f9c5ecb2b9915e7f42ce8c
Author: Dmitriy Rabotyagov <email address hidden>
Date: Sat May 16 12:22:13 2020 +0300

    Search in nova services instead of hypervisors

    Nova services and hypervisor naming can differ, as they retireve node
    names in different way.
    In the meanwhile we operate with nova.services while enabling/disabling
    nodes duringh the incident. So we're supposed to have in database record
    matching to what we have in service list, but not in hypervisor list.

    Closes-Bug: #1839715
    Change-Id: I9c591d33f17a8d5950bdb1fc2d686e2301fc6d95
    (cherry picked from commit 4322968b893b242f229912c2b70e3895f0227402)
    (cherry picked from commit 35519c0ce02092aaef6b802817b74d017a84b08b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/masakari 8.1.2

This issue was fixed in the openstack/masakari 8.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/masakari 9.1.2

This issue was fixed in the openstack/masakari 9.1.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.