Comment 0 for bug 2057847

Revision history for this message
Sam Schmitt (samcat116) wrote :

I have a 2023.2 based deployment with three x86 controllers and several more x86s based computes. After that stack is deployer, I am trying to now add an aarch64 based comptute to it using `kolla-ansible -i inventory/ --config config/ --passwords config/passwords.yml deploy --limit arm-compute.

The server OS for all of these systems is Ubuntu 22.04, and the kolla container base for the x86 hosts is Ubuntu. However for this new compute I am trying to use the published debian aarch64 images. I am setting a hostvar for the Arm node for openstack_tag=2023.2-debian-bookworm-aarch64. It seems like setting the kolla_base_distro on a host level is ignored, as I tried to set that and the openstack_tag_suffix to be "debian" and "-aarch64" respectively, but it would try and pull ubuntu-aarch64 images which aren't in Quay. Setting the openstack_tag works to pull and start deploying the proper images, however I eventually hit the following error (debug output):

```
TASK [nova-cell : Get a list of existing cells]
 fatal: [arm-compute -> infra-prod-controller-01(infra-prod-controller-01)]: FAILED! => {
    "changed": false,
    "failed_when_result": true,
    "invocation": {
        "module_args": {
            "action": "start_container",
            "api_version": "auto",
            "auth_email": null,
            "auth_password": null,
            "auth_registry": "quay.io",
            "auth_username": null,
            "cap_add": [],
            "client_timeout": 120,
            "command": "bash -c 'sudo -E kolla_set_configs && nova-manage cell_v2 list_cells --verbose'",
            "container_engine": "docker",
            "detach": false,
            "dimensions": {},
            "environment": {
                "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS",
                "KOLLA_SERVICE_NAME": "nova-list-cells"
            },
            "graceful_timeout": 60,
            "ignore_missing": false,
            "image": "quay.io/openstack.kolla/nova-conductor:2023.2-debian-bookworm-aarch64",
            "labels": {
                "BOOTSTRAP": null
            },
            "name": "nova_list_cells",
            "privileged": false,
            "remove_on_exit": true,
            "restart_policy": "oneshot",
            "restart_retries": 10,
            "security_opt": [],
            "state": "running",
            "tls_verify": false,
            "tty": false,
            "volumes": [
                "/etc/kolla/nova-cell-bootstrap/:/var/lib/kolla/config_files/:ro",
                "/etc/localtime:/etc/localtime:ro",
                "/etc/timezone:/etc/timezone:ro",
                "kolla_logs:/var/log/kolla/",
                "",
                ""
            ]
        }
    },
    "msg": "Container exited with non-zero return code 1",
    "rc": 1,
    "stderr": "exec /usr/bin/dumb-init: exec format error\n",
    "stderr_lines": [
        "exec /usr/bin/dumb-init: exec format error"
    ],
    "stdout": "",
    "stdout_lines": []
}

It looks like this command is delegated to the nova-conductor container on one of the controllers. I assume its failing as the image tag is not the correct image running on the controller.

I understand that this may be an unintended consequence of me setting openstack_tag directly. However like I said I cannot find a way to specify a Debian base for this host and Ubuntu for the rest. Even if I set openstack_tag_suffix for just this host, it seems like that would still add it to openstack_tag_suffix based on this:
https://github.com/openstack/kolla-ansible/blob/465f6ce298c6ee175aa94ccfdcee1e569eb3b5ae/ansible/group_vars/all.yml#L723