Overcloud service deploy fails when Kolla VIP contains dashes, and changing FQDN does not update all service configs

Bug #2066357 reported by Martin Ananda Boeker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kayobe
New
Undecided
Unassigned

Bug Description

OpenStack 2023.1, Kayobe 14.1, Ubuntu 22.04

I had the kolla FQDN set as 'omc_iaas_deploy.aio.local' and things seemed to be working, except horizon. That was complaining about some RFC about underscores. So I changed it from _ to -

In /etc/kayobe/kolla.yml:
kolla_internal_fqdn: "omc-iaas-deploy.aio.local"
kolla_external_fqdn: "omc-iaas-deploy.aio.local"

When I ran overcloud service deploy, it failed at this part:

TASK [nova-cell : Waiting for nova-compute services to register themselves] **********************************************************************************************************************************************************************************************************************Wednesday 22 May 2024 09:50:43 +0000 (0:00:00.092) 0:07:14.591 *********
skipping: [AIOTEST02]
skipping: [AIOTEST03]
fatal: [AIOTEST01]: FAILED! =>
  msg: 'The conditional check ''(nova_compute_services.stdout | from_json | map(attribute=''Host'') | list) is superset(expected_compute_service_hosts)'' failed. The error was: Expecting value: line 1 column 1 (char 0)'

TASK [nova-cell : Fail if nova-compute service failed to register] *******************************************************************************************************************************************************************************************************************************Wednesday 22 May 2024 09:51:15 +0000 (0:00:32.123) 0:07:46.714 *********
fatal: [AIOTEST02]: FAILED! =>
  msg: |-
    The conditional check 'any_failed_services or (nova_compute_registration_fatal | bool and
     failed_compute_service_hosts | length > 0)' failed. The error was: error while evaluating conditional (any_failed_services or (nova_compute_registration_fatal | bool and
     failed_compute_service_hosts | length > 0)): {{ ansible_facts.nodename in failed_compute_service_hosts or
       (ansible_facts.hostname ~ "-ironic") in failed_compute_service_hosts }}: {{ expected_compute_service_hosts | difference(nova_compute_service_hosts) | list }}: {{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}: Unable to look up a name or access an attribute in template string ({{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}).
    Make sure your variable name does not contain invalid characters like '-': the JSON object must be str, bytes or bytearray, not AnsibleUndefined. the JSON object must be str, bytes or bytearray, not AnsibleUndefined. Unable to look up a name or access an attribute in template string ({{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}).
    Make sure your variable name does not contain invalid characters like '-': the JSON object must be str, bytes or bytearray, not AnsibleUndefined. the JSON object must be str, bytes or bytearray, not AnsibleUndefined. {{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}: Unable to look up a name or access an attribute in template string ({{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}).
    Make sure your variable name does not contain invalid characters like '-': the JSON object must be str, bytes or bytearray, not AnsibleUndefined. the JSON object must be str, bytes or bytearray, not AnsibleUndefined. Unable to look up a name or access an attribute in template string ({{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}).
    Make sure your variable name does not contain invalid characters like '-': the JSON object must be str, bytes or bytearray, not AnsibleUndefined. the JSON object must be str, bytes or bytearray, not AnsibleUndefined. {{ expected_compute_service_hosts | difference(nova_compute_service_hosts) | list }}: {{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}: Unable to look up a name or access an attribute in template string ({{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
       from_json |
       map(attribute='Host') |
       list }}).

There are a dozen more blocks of that message, pretty much the same. This seems to be an ansible issue and not an openstack issue since it's complaining about dashes in variable names..

The thing is that many components seem to be fine. Horizon, Grafana, Prometheus, they are all fine with that FQDN. But nova and neutron containers stop working.

During troubleshooting I tried changing the FQDN to "omciaasdeploy.aio.local" but it failed in the same place. When I looked in /etc/kolla in the servers, I saw that many of the configuration files had not been changed:

root@AIOTEST01:~# grep -Irn omc-iaas-deploy /etc/kolla/ | wc -l
86
root@AIOTEST01:~# grep -Irn omciaasdeploy /etc/kolla/ | wc -l
95

After fixing those with `find exec sed` to make them match the FQDN I set the deployment still failed in the same place.

So there are two issues:
Dashes seem to make nova and neutron containers fail
Changing the FQDN doesn't push the config everywhere, probably because it fails before it gets around to that

Currently all containers are running, only nova_api complains that the nodes can't resolve themselves, but service deploy is still failing at the error above.

Revision history for this message
Martin Ananda Boeker (mboeker) wrote :

More details from that error:

    Make sure your variable name does not contain invalid characters like '-': the JSON object must be str, bytes or bytearray, not AnsibleUndefined. the JSON object must be str, bytes or bytearray, not AnsibleUndefined

    The error appears to be in '/home/ubuntu/venvs/kolla-ansible/share/kolla-ansible/ansible/roles/nova-cell/tasks/wait_discover_computes.yml': line 47, column 7, but may
    be elsewhere in the file depending on the exact syntax problem.

    The offending line appears to be:

        # that failed to register.
        - name: Fail if nova-compute service failed to register
          ^ here

This is the task at line 47:

    # NOTE(mgoddard): Use a separate fail task to ensure we fail only those hosts
    # that failed to register.
    - name: Fail if nova-compute service failed to register
      vars:
        # 'Host' field of all registered compute services.
        nova_compute_service_hosts: >-
          {{ hostvars[all_computes_in_batch[0]].nova_compute_services.stdout |
             from_json |
             map(attribute='Host') |
             list }}
        # 'Host' field of failed compute services.
        failed_compute_service_hosts: >-
          {{ expected_compute_service_hosts | difference(nova_compute_service_hosts) | list }}
        # Whether any compute services failed on this host.
        any_failed_services: >-
          {{ ansible_facts.nodename in failed_compute_service_hosts or
             (ansible_facts.hostname ~ "-ironic") in failed_compute_service_hosts }}
      fail:
        msg: >-
          The Nova compute service failed to register itself on the following
          hosts: {{ failed_compute_service_hosts | join(',') }}
      when: >-
        any_failed_services or
        (nova_compute_registration_fatal | bool and
         failed_compute_service_hosts | length > 0)

Revision history for this message
Martin Ananda Boeker (mboeker) wrote (last edit ):

I could not get service deploy to pass at all anymore after getting this error, even reverting the config to before I set an FQDN... Only solution was to reprovision. I assume service destroy and deploy would have worked too.

Revision history for this message
Will Szumski (willjs) wrote :

Very odd. Could the openstack API have been broken by your FQDN change? This would have possibly caused nova_compute_services.stdout to be undefined. If this is the case, we could probably give the user a better error message.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.