Activity log for bug #2057847

Date Who What changed Old value New value Message
2024-03-13 22:06:55 Sam Schmitt bug added bug
2024-03-13 22:08:08 Sam Schmitt description I have a 2023.2 based deployment with three x86 controllers and several more x86s based computes. After that stack is deployer, I am trying to now add an aarch64 based comptute to it using `kolla-ansible -i inventory/ --config config/ --passwords config/passwords.yml deploy --limit arm-compute. The server OS for all of these systems is Ubuntu 22.04, and the kolla container base for the x86 hosts is Ubuntu. However for this new compute I am trying to use the published debian aarch64 images. I am setting a hostvar for the Arm node for openstack_tag=2023.2-debian-bookworm-aarch64. It seems like setting the kolla_base_distro on a host level is ignored, as I tried to set that and the openstack_tag_suffix to be "debian" and "-aarch64" respectively, but it would try and pull ubuntu-aarch64 images which aren't in Quay. Setting the openstack_tag works to pull and start deploying the proper images, however I eventually hit the following error (debug output): ``` TASK [nova-cell : Get a list of existing cells] fatal: [arm-compute -> infra-prod-controller-01(infra-prod-controller-01)]: FAILED! => { "changed": false, "failed_when_result": true, "invocation": { "module_args": { "action": "start_container", "api_version": "auto", "auth_email": null, "auth_password": null, "auth_registry": "quay.io", "auth_username": null, "cap_add": [], "client_timeout": 120, "command": "bash -c 'sudo -E kolla_set_configs && nova-manage cell_v2 list_cells --verbose'", "container_engine": "docker", "detach": false, "dimensions": {}, "environment": { "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS", "KOLLA_SERVICE_NAME": "nova-list-cells" }, "graceful_timeout": 60, "ignore_missing": false, "image": "quay.io/openstack.kolla/nova-conductor:2023.2-debian-bookworm-aarch64", "labels": { "BOOTSTRAP": null }, "name": "nova_list_cells", "privileged": false, "remove_on_exit": true, "restart_policy": "oneshot", "restart_retries": 10, "security_opt": [], "state": "running", "tls_verify": false, "tty": false, "volumes": [ "/etc/kolla/nova-cell-bootstrap/:/var/lib/kolla/config_files/:ro", "/etc/localtime:/etc/localtime:ro", "/etc/timezone:/etc/timezone:ro", "kolla_logs:/var/log/kolla/", "", "" ] } }, "msg": "Container exited with non-zero return code 1", "rc": 1, "stderr": "exec /usr/bin/dumb-init: exec format error\n", "stderr_lines": [ "exec /usr/bin/dumb-init: exec format error" ], "stdout": "", "stdout_lines": [] } It looks like this command is delegated to the nova-conductor container on one of the controllers. I assume its failing as the image tag is not the correct image running on the controller. I understand that this may be an unintended consequence of me setting openstack_tag directly. However like I said I cannot find a way to specify a Debian base for this host and Ubuntu for the rest. Even if I set openstack_tag_suffix for just this host, it seems like that would still add it to openstack_tag_suffix based on this: https://github.com/openstack/kolla-ansible/blob/465f6ce298c6ee175aa94ccfdcee1e569eb3b5ae/ansible/group_vars/all.yml#L723 I have a 2023.2 based deployment with three x86 controllers and several more x86s based computes. After that stack is deployed, I am trying to now add an aarch64 based comptute to it using `kolla-ansible -i inventory/ --config config/ --passwords config/passwords.yml deploy --limit arm-compute. The server OS for all of these systems is Ubuntu 22.04, and the kolla container base for the x86 hosts is Ubuntu. However for this new compute I am trying to use the published debian aarch64 images. I am setting a hostvar for the Arm node for openstack_tag=2023.2-debian-bookworm-aarch64. It seems like setting the kolla_base_distro on a host level is ignored, as I tried to set that and the openstack_tag_suffix to be "debian" and "-aarch64" respectively, but it would try and pull ubuntu-aarch64 images which aren't in Quay. Setting the openstack_tag works to pull and start deploying the proper images, however I eventually hit the following error (debug output): ``` TASK [nova-cell : Get a list of existing cells]  fatal: [arm-compute -> infra-prod-controller-01(infra-prod-controller-01)]: FAILED! => {     "changed": false,     "failed_when_result": true,     "invocation": {         "module_args": {             "action": "start_container",             "api_version": "auto",             "auth_email": null,             "auth_password": null,             "auth_registry": "quay.io",             "auth_username": null,             "cap_add": [],             "client_timeout": 120,             "command": "bash -c 'sudo -E kolla_set_configs && nova-manage cell_v2 list_cells --verbose'",             "container_engine": "docker",             "detach": false,             "dimensions": {},             "environment": {                 "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS",                 "KOLLA_SERVICE_NAME": "nova-list-cells"             },             "graceful_timeout": 60,             "ignore_missing": false,             "image": "quay.io/openstack.kolla/nova-conductor:2023.2-debian-bookworm-aarch64",             "labels": {                 "BOOTSTRAP": null             },             "name": "nova_list_cells",             "privileged": false,             "remove_on_exit": true,             "restart_policy": "oneshot",             "restart_retries": 10,             "security_opt": [],             "state": "running",             "tls_verify": false,             "tty": false,             "volumes": [                 "/etc/kolla/nova-cell-bootstrap/:/var/lib/kolla/config_files/:ro",                 "/etc/localtime:/etc/localtime:ro",                 "/etc/timezone:/etc/timezone:ro",                 "kolla_logs:/var/log/kolla/",                 "",                 ""             ]         }     },     "msg": "Container exited with non-zero return code 1",     "rc": 1,     "stderr": "exec /usr/bin/dumb-init: exec format error\n",     "stderr_lines": [         "exec /usr/bin/dumb-init: exec format error"     ],     "stdout": "",     "stdout_lines": [] } It looks like this command is delegated to the nova-conductor container on one of the controllers. I assume its failing as the image tag is not the correct image running on the controller. I understand that this may be an unintended consequence of me setting openstack_tag directly. However like I said I cannot find a way to specify a Debian base for this host and Ubuntu for the rest. Even if I set openstack_tag_suffix for just this host, it seems like that would still add it to openstack_tag_suffix based on this: https://github.com/openstack/kolla-ansible/blob/465f6ce298c6ee175aa94ccfdcee1e569eb3b5ae/ansible/group_vars/all.yml#L723