Unable to add ARM compute to existing x86 deployment

Bug #2057847 reported by Sam Schmitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

I have a 2023.2 based deployment with three x86 controllers and several more x86s based computes. After that stack is deployed, I am trying to now add an aarch64 based comptute to it using `kolla-ansible -i inventory/ --config config/ --passwords config/passwords.yml deploy --limit arm-compute.

The server OS for all of these systems is Ubuntu 22.04, and the kolla container base for the x86 hosts is Ubuntu. However for this new compute I am trying to use the published debian aarch64 images. I am setting a hostvar for the Arm node for openstack_tag=2023.2-debian-bookworm-aarch64. It seems like setting the kolla_base_distro on a host level is ignored, as I tried to set that and the openstack_tag_suffix to be "debian" and "-aarch64" respectively, but it would try and pull ubuntu-aarch64 images which aren't in Quay. Setting the openstack_tag works to pull and start deploying the proper images, however I eventually hit the following error (debug output):

```
TASK [nova-cell : Get a list of existing cells]
 fatal: [arm-compute -> infra-prod-controller-01(infra-prod-controller-01)]: FAILED! => {
    "changed": false,
    "failed_when_result": true,
    "invocation": {
        "module_args": {
            "action": "start_container",
            "api_version": "auto",
            "auth_email": null,
            "auth_password": null,
            "auth_registry": "quay.io",
            "auth_username": null,
            "cap_add": [],
            "client_timeout": 120,
            "command": "bash -c 'sudo -E kolla_set_configs && nova-manage cell_v2 list_cells --verbose'",
            "container_engine": "docker",
            "detach": false,
            "dimensions": {},
            "environment": {
                "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS",
                "KOLLA_SERVICE_NAME": "nova-list-cells"
            },
            "graceful_timeout": 60,
            "ignore_missing": false,
            "image": "quay.io/openstack.kolla/nova-conductor:2023.2-debian-bookworm-aarch64",
            "labels": {
                "BOOTSTRAP": null
            },
            "name": "nova_list_cells",
            "privileged": false,
            "remove_on_exit": true,
            "restart_policy": "oneshot",
            "restart_retries": 10,
            "security_opt": [],
            "state": "running",
            "tls_verify": false,
            "tty": false,
            "volumes": [
                "/etc/kolla/nova-cell-bootstrap/:/var/lib/kolla/config_files/:ro",
                "/etc/localtime:/etc/localtime:ro",
                "/etc/timezone:/etc/timezone:ro",
                "kolla_logs:/var/log/kolla/",
                "",
                ""
            ]
        }
    },
    "msg": "Container exited with non-zero return code 1",
    "rc": 1,
    "stderr": "exec /usr/bin/dumb-init: exec format error\n",
    "stderr_lines": [
        "exec /usr/bin/dumb-init: exec format error"
    ],
    "stdout": "",
    "stdout_lines": []
}

It looks like this command is delegated to the nova-conductor container on one of the controllers. I assume its failing as the image tag is not the correct image running on the controller.

I understand that this may be an unintended consequence of me setting openstack_tag directly. However like I said I cannot find a way to specify a Debian base for this host and Ubuntu for the rest. Even if I set openstack_tag_suffix for just this host, it seems like that would still add it to openstack_tag_suffix based on this:
https://github.com/openstack/kolla-ansible/blob/465f6ce298c6ee175aa94ccfdcee1e569eb3b5ae/ansible/group_vars/all.yml#L723

Sam Schmitt (samcat116)
description: updated
Revision history for this message
Sam Schmitt (samcat116) wrote (last edit ):

I was able to work around this for now by setting host_vars for `<container>_tag:` (such as nova_libvirt_tag) to be 2023.2-debian-bookworm-aarch64, which caused the deploy to succeed. Just means that I am setting like 13 extra vars for these hosts.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.