Multi-node sunbeam bootstrap on physical and libvirt machines fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Snap |
New
|
Undecided
|
Unassigned |
Bug Description
Hello, it's been 2 weeks not able to have success to bootstrp microstack by following:
1. Single node installation: https:/
2. Single node guided: https:/
3. Multi-node: https:/
after seeing the Microstack launch demo at https:/
I was trying to set the microstack in a newly acquired test physical machine according to:
https:/
my hardware specs is on attached file: phys_hw.txt, meets the requirements.
Physical host network configuration can be found on netplan.txt
Disks for OSD are wiped using:
#!/bin/bash
disks="sda sdb sdc sdd"
for d in $disks; do echo wipe disk /dev/$d;sudo wipefs -af /dev/$d; (echo gwq | sudo fdisk /dev/$d); done
Using multi-node procedure for:
== Control plane networking i have:
CIDR: 192.168.0.0/24
Gateway: 192.168.0.1
DHCP addr range: 192.168.
Control plane addrr range: 192.168.
Interface: br0
==External networking
CIDR: 192.168.2.0/24
Gateway: 192.168.2.1
DHCP addr range: 192.168.
Floating IP addrr range: 192.168.
Interface: br-ext
Steps execution:
1. sudo snap install openstack --channel 2023.1
(no issues)
2. sunbeam prepare-node-script | bash -x && newgrp snap_daemon
(no issues)
3. sunbeam -v cluster bootstrap --role control --role compute --role storage | tee -a multi_bootstrap
Management networks shared by hosts (CIDRs, separated by comma) (192.168.0.0/24): [default selection]
MetalLB address allocation range (supports multiple ranges, comma separated) (10.20.
Disks to attach to MicroCeph
(/dev/disk/
(hours later...)
DEBUG Application monitored for readiness: ['certificate-
[15:15:27] WARNING Timed out while waiting for model 'openstack' to be ready openstack.py:240
DEBUG Finished running step 'Deploying OpenStack Control Plane'. Result: ResultType.FAILED common.py:260
Error: Timed out while waiting for model 'openstack' to be ready
ubuntu@opstk2464:~$ juju models
Controller: sunbeam-controller
Model Cloud/Region Type Status Machines Cores Units Access Last connection
admin/controller* sunbeam/default manual available 1 12 4 admin just now
openstack sunbeam-
reports only openstack model is created.
juju status model openstack model:
ubuntu@opstk2464:~$ juju status -m openstack
Model Controller Cloud/Region Version SLA Timestamp
openstack sunbeam-controller sunbeam-
SAAS Status Store URL
microceph active local admin/controlle
App Version Status Scale Charm Channel Rev Address Exposed Message
certificate-
cinder waiting 1 cinder-k8s 2023.1/stable 47 10.152.183.162 no installing agent
cinder-ceph waiting 1 cinder-ceph-k8s 2023.1/stable 38 10.152.183.90 no installing agent
cinder-
cinder-mysql 8.0.34-
cinder-mysql-router 8.0.34-
glance active 1 glance-k8s 2023.1/stable 59 10.152.183.200 no
glance-mysql 8.0.34-
glance-mysql-router 8.0.34-
horizon active 1 horizon-k8s 2023.1/stable 56 10.152.183.46 no http://
horizon-mysql 8.0.34-
horizon-
keystone active 1 keystone-k8s 2023.1/stable 125 10.152.183.216 no
keystone-mysql 8.0.34-
keystone-
neutron active 1 neutron-k8s 2023.1/stable 53 10.152.183.97 no
neutron-mysql 8.0.34-
neutron-
nova waiting 1 nova-k8s 2023.1/stable 48 10.152.183.240 no installing agent
nova-api-
nova-cell-
nova-mysql 8.0.34-
nova-mysql-router 8.0.34-
ovn-central active 1 ovn-central-k8s 23.03/stable 61 10.152.183.50 no
ovn-relay active 1 ovn-relay-k8s 23.03/stable 49 192.168.0.4 no
placement active 1 placement-k8s 2023.1/stable 43 10.152.183.48 no
placement-mysql 8.0.34-
placement-
rabbitmq 3.9.13 active 1 rabbitmq-k8s 3.9/stable 30 192.168.0.3 no
traefik 2.10.4 active 1 traefik-k8s 1.0/candidate 148 192.168.0.2 no
Unit Workload Agent Address Ports Message
certificate-
cinder-
cinder-ceph/0* blocked idle 10.1.94.163 (workload) Error in charm (see logs): cannot perform the following tasks:
- Start service "cinder-volume" (cannot sta...
cinder-
cinder-mysql/0* active idle 10.1.94.154 Primary
cinder/0* blocked idle 10.1.94.174 (workload) Error in charm (see logs): cannot perform the following tasks:
- Start service "cinder-scheduler" (cannot ...
glance-
glance-mysql/0* active idle 10.1.94.140 Primary
glance/0* active idle 10.1.94.173
horizon-
horizon-mysql/0* active idle 10.1.94.151 Primary
horizon/0* active idle 10.1.94.161
keystone-
keystone-mysql/0* active idle 10.1.94.150 Primary
keystone/0* active idle 10.1.94.157
neutron-
neutron-mysql/0* active idle 10.1.94.148 Primary
neutron/0* active idle 10.1.94.172
nova-api-
nova-cell-
nova-mysql-
nova-mysql/0* active idle 10.1.94.155 Primary
nova/0* error idle 10.1.94.176 hook failed: "amqp-relation-
ovn-central/0* active idle 10.1.94.177
ovn-relay/0* active idle 10.1.94.171
placement-
placement-mysql/0* active idle 10.1.94.142 Primary
placement/0* active idle 10.1.94.158
rabbitmq/0* active idle 10.1.94.162
traefik/0* active idle 10.1.94.144
Offer Application Charm Rev Connected Endpoint Interface Role
certificate-
keystone keystone keystone-k8s 125 0/0 identity-
ovn-relay ovn-relay ovn-relay-k8s 49 0/0 ovsdb-cms-relay ovsdb-cms provider
rabbitmq rabbitmq rabbitmq-k8s 30 0/0 amqp rabbitmq provider
ubuntu@opstk2464:~$
failed on cinder-k8s installing agent waiting
failed on cinder-ceph installing agent waiting
on juju applications
cinder-ceph/0* blocked idle 10.1.94.163 (workload) Error in charm (see logs): cannot perform the following tasks: - Start service "cinder-volume" (cannot sta...
cinder/0* blocked idle 10.1.94.174 (workload) Error in charm (see logs): cannot perform the following tasks: - Start service "cinder-scheduler" (cannot ...
nova/0* error idle 10.1.94.176 hook failed: "amqp-relation-
error on cinder-ceph/0 w/ cannot start cinder-volume
error on cinder/0 w/ cannot start cinder-scheduler
error on nova/0 w/ hook failed: "amqp-relation-
But microceph is running
ubuntu@opstk2464:~$ sudo microceph status
MicroCeph deployment summary:
- opstk2464 (192.168.0.2)
Services: mds, mgr, mon, osd
Disks: 4
Accordidng to https:/
Openstack-
ubuntu@opstk2464:~$ juju status -m openstack-
ERROR model sunbeam-
ubuntu@opstk2464:~$
ubuntu@opstk2464:~$ sudo systemctl status snap.openstack-
ubuntu@opstk2464:~$
Microk8s set:
ubuntu@opstk2464:~$ sudo systemctl status snap.openstack-
ubuntu@opstk2464:~$ sudo microk8s status
microk8s is running
high-availability: no
datastore master nodes: 127.0.0.1:19001
datastore standby nodes: none
addons:
enabled:
dns # (core) CoreDNS
ha-cluster # (core) Configure high availability on the current node
helm # (core) Helm - the package manager for Kubernetes
helm3 # (core) Helm 3 - the package manager for Kubernetes
hostpath-
metallb # (core) Loadbalancer for your Kubernetes cluster
storage # (core) Alias to hostpath-storage add-on, deprecated
disabled:
cert-manager # (core) Cloud native certificate management
community # (core) The community addons repository
dashboard # (core) The Kubernetes dashboard
host-access # (core) Allow Pods connecting to Host services smoothly
ingress # (core) Ingress controller for external access
mayastor # (core) OpenEBS MayaStor
metrics-server # (core) K8s Metrics Server for API access to service metrics
minio # (core) MinIO object storage
observability # (core) A lightweight observability stack for logs, traces and metrics
prometheus # (core) Prometheus operator for monitoring and logging
rbac # (core) Role-Based Access Control for authorisation
registry # (core) Private image registry exposed on localhost:32000
sudo microk8s inspect
(report is attached)
ubuntu@opstk2464:~$ sudo microk8s.kubectl get pods --namespace openstack
NAME READY STATUS RESTARTS AGE
modeloperator-
certificate-
ovn-relay-0 2/2 Running 0 3h33m
keystone-0 2/2 Running 0 3h34m
horizon-
horizon-mysql-0 2/2 Running 0 3h34m
placement-mysql-0 2/2 Running 0 3h34m
cinder-
glance-
glance-mysql-0 2/2 Running 0 3h34m
neutron-
nova-cell-
nova-mysql-router-0 2/2 Running 0 3h33m
keystone-
nova-api-
nova-mysql-0 2/2 Running 0 3h34m
cinder-
placement-
ovn-central-0 4/4 Running 0 3h32m
rabbitmq-0 2/2 Running 0 3h33m
traefik-0 2/2 Running 0 3h34m
horizon-0 2/2 Running 0 3h33m
cinder-mysql-0 2/2 Running 0 3h34m
placement-0 2/2 Running 0 3h33m
keystone-mysql-0 2/2 Running 0 3h34m
glance-0 2/2 Running 0 3h33m
neutron-0 2/2 Running 0 3h33m
nova-0 4/4 Running 0 3h32m
cinder-ceph-0 2/2 Running 0 3h33m
cinder-0 3/3 Running 0 3h32m
neutron-mysql-0 2/2 Running 0 3h34m
ubuntu@opstk2464:~$
ubuntu@opstk2464:~$ sudo microk8s.kubectl get pod --namespace openstack -o jsonpath=
charm cinder-api cinder-scheduler
(cinder-
ubuntu@opstk2464:~$ sudo microk8s.kubectl get pod --namespace openstack -o jsonpath=
charm cinder-volume
(cinder-
ubuntu@opstk2464:~$ sudo microk8s.kubectl get pod --namespace openstack -o jsonpath=
charm nova-api nova-conductor nova-scheduler
(nova-api_
No locked terraform plans:
ubuntu@opstk2464:~$ sunbeam inspect plans
┏━━━━━━
┃ Plan ┃ Locked ┃
┡━━━━━━
│ microceph-plan │ │
│ microk8s-plan │ │
│ openstack-plan │ │
│ sunbeam-
└──────
[note also tried with after 12 Teardown (https:/
ubuntu@opstk2464:~$ cat reset_disks.sh
#!/bin/bash
disks="sda sdb sdc sdd"
for d in $disks; do echo wipe disk /dev/$d;sudo wipefs -af /dev/$d; (echo gwq | sudo fdisk /dev/$d); done
### Deploy using edge channel:
sudo snap openstack --edge
Problems remained (after a few hours later...)
Looks like I have the same problem. I tried:
2023.1/stable
2023.1/edge
2023.2/edge
in various configurations, all fail to bootstrap with:
nova/0* error idle 10.1.12.49 hook failed: "cell-database- relation- changed"
where the 'hook failed' might change.
The logsink.log shows several of these:
- Start service "nova-scheduler" (cannot start service: exited quickly with code 1) 24T05:25: 37Z INFO Most recent service output: python3/ dist-packages/ pymysql/ cursors. py", line 310, in _query python3/ dist-packages/ pymysql/ connections. py", line 548, in qu query_result( unbuffered= unbuffered) python3/ dist-packages/ pymysql/ connections. py", line 775, in _r python3/ dist-packages/ pymysql/ connections. py", line 1156, in r ._read_ packet( ) python3/ dist-packages/ pymysql/ connections. py", line 725, in _r raise_for_ error() python3/ dist-packages/ pymysql/ protocol. py", line 221, in raise mysql_exception (self._ data) python3/ dist-packages/ pymysql/ err.py" , line 143, in raise_mysq exc.Programming Error: (pymysql. err.Programming Error) (1146, "Table ' cell_mappings' doesn't exist") created_ at AS cell_mappings_ created_ at, cell_mapping updated_ at, cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS cell_mappings_uuid, transport_ url AS cell_mappings_ transport_ url, cell_mappings.data database_ connection, cell_mappings. disabled AS cell_mappings_ disabled /sqlalche. me/e/14/ f405) 24T05:25: 37Z ERROR cannot start service: exited quickly with code 1
----- Logs from task 0 -----
2023-10-
(...)
2023-10-24 05:25:37.612 63 ERROR nova result = self._query(query)
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
2023-10-24 05:25:37.612 63 ERROR nova conn.query(q)
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
ery
2023-10-24 05:25:37.612 63 ERROR nova self._affected_rows = self._read_
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
ead_query_result
2023-10-24 05:25:37.612 63 ERROR nova result.read()
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
ead
2023-10-24 05:25:37.612 63 ERROR nova first_packet = self.connection
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
ead_packet
2023-10-24 05:25:37.612 63 ERROR nova packet.
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
_for_error
2023-10-24 05:25:37.612 63 ERROR nova err.raise_
2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/
l_exception
2023-10-24 05:25:37.612 63 ERROR nova raise errorclass(errno, errval)
2023-10-24 05:25:37.612 63 ERROR nova sqlalchemy.
nova_api.
2023-10-24 05:25:37.612 63 ERROR nova [SQL: SELECT cell_mappings.
s.updated_at AS cell_mappings_
cell_mappings.name AS cell_mappings_name, cell_mappings.
base_connection AS cell_mappings_
2023-10-24 05:25:37.612 63 ERROR nova FROM cell_mappings ORDER BY cell_mappings.id ASC]
2023-10-24 05:25:37.612 63 ERROR nova (Background on this error at: https:/
2023-10-24 05:25:37.612 63 ERROR nova ESC[00m
2023-10-
-----
Any ideas what's wrong?