Actions instance-count, remove-from-cloud and register-to-cloud are failing

Bug #1955164 reported by Martin Kalcok
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Fix Released
High
Martin Kalcok

Bug Description

Root of this cause is a change of behavior in python's `socket.getfqdn()`. Previously this function returned hostname including full domain but now it returns only hostname without the domain. The full domain name is required to identify hypervisors in hypervisor list [1]. I don't know yet why this function changed its behavior but it got me to trace how actual nova comes up with the hostname when registering hypervisor in nova-cloud-controller. Here's my trace:

* Here's the final call on the nova-compute that register the hypervisor [2]

Working backwards from there and tracing origins of `host` variable

* `ResourceTracker` class receives it as an argument [3]

* ResourceTracker instance is created by `ComputeManager`[4] which also receives it as an argument (in **kwargs)

* `ComputeManager` is instantiated in `Service` class [5] which, either gets the `host` from constructor or, if it's not supplied, from config

* And finally the `Service` class is instantiated in `nova.cmd.compute:main` [6] function which does NOT pass the `host` argument explicitly.

So the bottom line is that the `host` should be sourced from config.

This works out nice because it gives us a good place to source this variable ourselves. However, on bionic-train distribution, there's no `host` variable in configs. I tried it and the `nova.con.CONF.host` has value of hostname (without the domain). Regardless, of this, even hypervisors on this this distribution are registered with full hostname+domain.
That's as far as I got for now but bottom line I think is that we need to reliably reproduce the behavior of nova, when it's registering hypervisors, in our nova-compute charm.

mka.

---

[1] https://github.com/openstack/charm-nova-compute/blob/1d132e68ccabc44bec51d884724c5d2b96700fb3/lib/nova_compute/cloud_utils.py#L113

[2]https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/compute/resource_tracker.py#L693

[3] https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/compute/resource_tracker.py#L90

[4] https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/compute/manager.py#L610

[5] https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/service.py#L128

[6] https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/service.py#L128

Changed in charm-nova-compute:
assignee: nobody → Martin Kalcok (martin-kalcok)
Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 1955164] [NEW] Actions instance-count, remove-from-cloud and register-to-cloud are failing

Hi Martin,

Thanks for reporting this issue, which may not be a regression in the
tests and infrastructure related problems instead.

The CI jobs run on top of a focal-ussuri cloud that was migrated from
OVS to OVN-21.09 as SDN on December 15th.

We were discussing the symptoms described in this bug earlier and it
seems to be because our version of neutron is missing this patch
https://review.opendev.org/c/openstack/neutron/+/822294 which updates
ovn NB entries.

Best,

On Fri, 2021-12-17 at 15:39 +0000, Martin Kalcok wrote:
> Public bug reported:
>
> Root of this cause is a change of behavior in python's
> `socket.getfqdn()`. Previously this function returned hostname
> including
> full domain but now it returns only hostname without the domain. The
> full domain name is required to identify hypervisors in hypervisor list
> [1]. I don't know yet why this function changed its behavior but it got
> me to trace how actual nova comes up with the hostname when registering
> hypervisor in nova-cloud-controller. Here's my trace:
>
> * Here's the final call on the nova-compute that register the
> hypervisor
> [2]
>
> Working backwards from there and tracing origins of `host` variable
>
> * `ResourceTracker` class receives it as an argument [3]
>
> * ResourceTracker instance is created by `ComputeManager`[4] which also
> receives it as an argument (in **kwargs)
>
> * `ComputeManager` is instantiated in `Service` class [5] which, either
> gets the `host` from constructor or, if it's not supplied, from config
>
> * And finally the `Service` class is instantiated in
> `nova.cmd.compute:main` [6] function which does NOT pass the `host`
> argument explicitly.
>
> So the bottom line is that the `host` should be sourced from config.
>
> This works out nice because it gives us a good place to source this
> variable ourselves. However, on bionic-train distribution, there's no
> `host` variable in configs. I tried it and the `nova.con.CONF.host` has
> value of hostname (without the domain). Regardless, of this, even
> hypervisors on this this distribution are registered with full
> hostname+domain.
> That's as far as I got for now but bottom line I think is that we need
> to reliably reproduce the behavior of nova, when it's registering
> hypervisors, in our nova-compute charm.
>
>
> mka.
>
> ---
>
> [1] https://github.com/openstack/charm-nova-
> compute/blob/1d132e68ccabc44bec51d884724c5d2b96700fb3/lib/nova_compute/
> cloud_utils.py#L113
>
> [2]
> https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/compute/resource_tracker.py#L693
>
> [3]
> https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/compute/resource_tracker.py#L90
>
> [4]
> https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/compute/manager.py#L610
>
> [5]
> https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/service.py#L128
>
> [6]
> https://github.com/openstack/nova/blob/6667fcb92bfaf03a8a274dc26806c137aace6b49/nova/service.py#L128
>
> ** Affects: charm-nova-compute
>      Importance: Undecided
>          Status: New
>

Felipe Reyes (freyes)
tags: added: unstable-test
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (master)
Changed in charm-nova-compute:
status: New → In Progress
Changed in charm-nova-compute:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/823506
Committed: https://opendev.org/openstack/charm-nova-compute/commit/d23d25b3a0a705dc1357b33ef11577f05625aef9
Submitter: "Zuul (22348)"
Branch: master

commit d23d25b3a0a705dc1357b33ef11577f05625aef9
Author: Martin Kalcok <email address hidden>
Date: Wed Jan 5 13:46:10 2022 +0100

    Fix socket.fqdn() not returning full hostname

    This change aims to make resolving of the unit's
    FQDN more consistent. Python's standard
    `socket.getfqdn()` can "fail" in some conditions
    and return only hostname, without the domain
    part, even in cases when `hostname -f` would
    return correct fqdn.

    This new approach provides behavior consistent
    with executing `hostname -f`

    Closes-Bug: #1955164
    Change-Id: Icc39b32b3e471c1960402dfcba61bed5ce309a6f

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Felipe Reyes (freyes)
Changed in charm-nova-compute:
milestone: none → 22.04
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.