OpenStack Snap

Multi-node sunbeam bootstrap on physical and libvirt machines fail

Bug #2038566 reported by Manuel Eurico Paula on 2023-10-05

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Snap	New	Undecided	Unassigned

Bug Description

Hello, it's been 2 weeks not able to have success to bootstrp microstack by following:
1. Single node installation: https://microstack.run/docs/single-node
2. Single node guided: https://microstack.run/docs/single-node-guided
3. Multi-node: https://microstack.run/docs/multi-node

after seeing the Microstack launch demo at https://www.youtube.com/watch?v=ifDtBM_EHPE

I was trying to set the microstack in a newly acquired test physical machine according to:
https://microstack.run/docs/enterprise-reqs

my hardware specs is on attached file: phys_hw.txt, meets the requirements.

Physical host network configuration can be found on netplan.txt
Disks for OSD are wiped using:
#!/bin/bash

disks="sda sdb sdc sdd"
for d in $disks; do echo wipe disk /dev/$d;sudo wipefs -af /dev/$d; (echo gwq | sudo fdisk /dev/$d); done

Using multi-node procedure for:
== Control plane networking i have:
CIDR: 192.168.0.0/24
Gateway: 192.168.0.1
DHCP addr range: 192.168.0.30-192.168.0.199
Control plane addrr range: 192.168.0.2-192.168.0.29
Interface: br0

==External networking
CIDR: 192.168.2.0/24
Gateway: 192.168.2.1
DHCP addr range: 192.168.2.2-192.168.2.29
Floating IP addrr range: 192.168.2.30-192.168.0.199
Interface: br-ext

Steps execution:
1. sudo snap install openstack --channel 2023.1
(no issues)

2. sunbeam prepare-node-script | bash -x && newgrp snap_daemon
(no issues)

3. sunbeam -v cluster bootstrap --role control --role compute --role storage | tee -a multi_bootstrap_log.txt
Management networks shared by hosts (CIDRs, separated by comma) (192.168.0.0/24): [default selection]
MetalLB address allocation range (supports multiple ranges, comma separated) (10.20.21.10-10.20.21.20): 192.168.0.2-192.168.0.29
Disks to attach to MicroCeph
(/dev/disk/by-id/wwn-0x58ce38ec01e26c31,/dev/disk/by-id/wwn-0x58ce38ec01e274ab,/dev/disk/by-id/wwn-0x58ce38ec01e26abc,/dev/disk/by-id/wwn-0x58ce38ec01ea6bb6): [default selection]

(hours later...)

                    ca-offer-url = "opstk2464/openstack.certificate-authority"
                    keystone-offer-url = "opstk2464/openstack.keystone"
                    ovn-relay-offer-url = "opstk2464/openstack.ovn-relay"
                    rabbitmq-offer-url = "opstk2464/openstack.rabbitmq"
                    , stderr=
           DEBUG Application monitored for readiness: ['certificate-authority', openstack.py:229
                    'keystone-mysql-router', 'glance-mysql', 'traefik', 'placement-mysql',
                    'neutron-mysql', 'keystone-mysql', 'cinder-mysql', 'horizon-mysql', 'nova-mysql',
                    'horizon-mysql-router', 'keystone', 'placement-mysql-router',
                    'cinder-ceph-mysql-router', 'placement', 'rabbitmq', 'horizon', 'cinder-ceph',
                    'glance', 'ovn-central', 'glance-mysql-router', 'nova-cell-mysql-router',
                    'ovn-relay', 'nova-mysql-router', 'neutron-mysql-router', 'nova-api-mysql-router',
                    'cinder-mysql-router', 'neutron', 'nova', 'cinder']
[15:15:27] WARNING Timed out while waiting for model 'openstack' to be ready openstack.py:240
           DEBUG Finished running step 'Deploying OpenStack Control Plane'. Result: ResultType.FAILED common.py:260
Error: Timed out while waiting for model 'openstack' to be ready

ubuntu@opstk2464:~$ juju models
Controller: sunbeam-controller

Model Cloud/Region Type Status Machines Cores Units Access Last connection
admin/controller* sunbeam/default manual available 1 12 4 admin just now
openstack sunbeam-microk8s/localhost kubernetes available 0 - 30 admin 3 hours ago

reports only openstack model is created.

juju status model openstack model:

ubuntu@opstk2464:~$ juju status -m openstack
Model Controller Cloud/Region Version SLA Timestamp
openstack sunbeam-controller sunbeam-microk8s/localhost 3.2.0 unsupported 17:38:19Z

SAAS Status Store URL
microceph active local admin/controller.microceph

App Version Status Scale Charm Channel Rev Address Exposed Message
certificate-authority active 1 tls-certificates-operator latest/stable 22 10.152.183.23 no
cinder waiting 1 cinder-k8s 2023.1/stable 47 10.152.183.162 no installing agent
cinder-ceph waiting 1 cinder-ceph-k8s 2023.1/stable 38 10.152.183.90 no installing agent
cinder-ceph-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.196 no
cinder-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.83 no
cinder-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.19 no
glance active 1 glance-k8s 2023.1/stable 59 10.152.183.200 no
glance-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.64 no
glance-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.84 no
horizon active 1 horizon-k8s 2023.1/stable 56 10.152.183.46 no http://192.168.0.2/openstack-horizon
horizon-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.30 no
horizon-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.18 no
keystone active 1 keystone-k8s 2023.1/stable 125 10.152.183.216 no
keystone-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.164 no
keystone-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.168 no
neutron active 1 neutron-k8s 2023.1/stable 53 10.152.183.97 no
neutron-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.134 no
neutron-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.161 no
nova waiting 1 nova-k8s 2023.1/stable 48 10.152.183.240 no installing agent
nova-api-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.52 no
nova-cell-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.157 no
nova-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.34 no
nova-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.73 no
ovn-central active 1 ovn-central-k8s 23.03/stable 61 10.152.183.50 no
ovn-relay active 1 ovn-relay-k8s 23.03/stable 49 192.168.0.4 no
placement active 1 placement-k8s 2023.1/stable 43 10.152.183.48 no
placement-mysql 8.0.34-0ubuntu0.22.04.1 active 1 mysql-k8s 8.0/candidate 99 10.152.183.242 no
placement-mysql-router 8.0.34-0ubuntu0.22.04.1 active 1 mysql-router-k8s 8.0/candidate 64 10.152.183.68 no
rabbitmq 3.9.13 active 1 rabbitmq-k8s 3.9/stable 30 192.168.0.3 no
traefik 2.10.4 active 1 traefik-k8s 1.0/candidate 148 192.168.0.2 no

Unit Workload Agent Address Ports Message
certificate-authority/0* active idle 10.1.94.135
cinder-ceph-mysql-router/0* active idle 10.1.94.159
cinder-ceph/0* blocked idle 10.1.94.163 (workload) Error in charm (see logs): cannot perform the following tasks:
- Start service "cinder-volume" (cannot sta...
cinder-mysql-router/0* active idle 10.1.94.169
cinder-mysql/0* active idle 10.1.94.154 Primary
cinder/0* blocked idle 10.1.94.174 (workload) Error in charm (see logs): cannot perform the following tasks:
- Start service "cinder-scheduler" (cannot ...
glance-mysql-router/0* active idle 10.1.94.164
glance-mysql/0* active idle 10.1.94.140 Primary
glance/0* active idle 10.1.94.173
horizon-mysql-router/0* active idle 10.1.94.149
horizon-mysql/0* active idle 10.1.94.151 Primary
horizon/0* active idle 10.1.94.161
keystone-mysql-router/0* active idle 10.1.94.136
keystone-mysql/0* active idle 10.1.94.150 Primary
keystone/0* active idle 10.1.94.157
neutron-mysql-router/0* active idle 10.1.94.167
neutron-mysql/0* active idle 10.1.94.148 Primary
neutron/0* active idle 10.1.94.172
nova-api-mysql-router/0* active idle 10.1.94.168
nova-cell-mysql-router/0* active idle 10.1.94.165
nova-mysql-router/0* active idle 10.1.94.166
nova-mysql/0* active idle 10.1.94.155 Primary
nova/0* error idle 10.1.94.176 hook failed: "amqp-relation-changed"
ovn-central/0* active idle 10.1.94.177
ovn-relay/0* active idle 10.1.94.171
placement-mysql-router/0* active idle 10.1.94.156
placement-mysql/0* active idle 10.1.94.142 Primary
placement/0* active idle 10.1.94.158
rabbitmq/0* active idle 10.1.94.162
traefik/0* active idle 10.1.94.144

Offer Application Charm Rev Connected Endpoint Interface Role
certificate-authority certificate-authority tls-certificates-operator 22 0/0 certificates tls-certificates provider
keystone keystone keystone-k8s 125 0/0 identity-credentials keystone-credentials provider
ovn-relay ovn-relay ovn-relay-k8s 49 0/0 ovsdb-cms-relay ovsdb-cms provider
rabbitmq rabbitmq rabbitmq-k8s 30 0/0 amqp rabbitmq provider
ubuntu@opstk2464:~$

failed on cinder-k8s installing agent waiting
failed on cinder-ceph installing agent waiting
on juju applications

cinder-ceph/0* blocked idle 10.1.94.163 (workload) Error in charm (see logs): cannot perform the following tasks: - Start service "cinder-volume" (cannot sta...
cinder/0* blocked idle 10.1.94.174 (workload) Error in charm (see logs): cannot perform the following tasks: - Start service "cinder-scheduler" (cannot ...
nova/0* error idle 10.1.94.176 hook failed: "amqp-relation-changed"

error on cinder-ceph/0 w/ cannot start cinder-volume
error on cinder/0 w/ cannot start cinder-scheduler
error on nova/0 w/ hook failed: "amqp-relation-changed" (varies)

But microceph is running

ubuntu@opstk2464:~$ sudo microceph status
MicroCeph deployment summary:
- opstk2464 (192.168.0.2)
Services: mds, mgr, mon, osd
Disks: 4

Accordidng to https://microstack.run/docs/inspect

Openstack-hypervisor is not bootstraped:
ubuntu@opstk2464:~$ juju status -m openstack-hypervisor
ERROR model sunbeam-controller:opstk2464/openstack-hypervisor not found
ubuntu@opstk2464:~$

ubuntu@opstk2464:~$ sudo systemctl status snap.openstack-hypervisor.*
ubuntu@opstk2464:~$

Microk8s set:
ubuntu@opstk2464:~$ sudo ubuntu@opstk2464:~$ sudo microk8s is running
high-availability: no
  datastore master   datastore standby addons:
  enabled:
    dns     ha-cluster     helm     helm3     hostpath-     cert-manager     community     dashboard     host-access     ingress     mayastor     metrics-server     minio     observability     prometheus     rbac     registry systemctl status snap.openstack-hypervisor.*
microk8s status
nodes: 127.0.0.1:19001
nodes: none
# (core) CoreDNS
# (core) Configure high availability on the current node
# (core) Helm - the package manager for Kubernetes
# (core) Helm 3 - the package manager for Kubernetes
/>storage # (core) Storage class; allocates storage from host directory
# (core) Loadbalancer for your Kubernetes cluster
# (core) Alias to hostpath-storage add-on, deprecated
# (core) Cloud native certificate management
# (core) The community addons repository
# (core) The Kubernetes dashboard
# (core) Allow Pods connecting to Host services smoothly
# (core) Ingress controller for external access
# (core) OpenEBS MayaStor
# (core) K8s Metrics Server for API access to service metrics
# (core) MinIO object storage
# (core) A lightweight observability stack for logs, traces and metrics
# (core) Prometheus operator for monitoring and logging
# (core) Role-Based Access Control for authorisation
# (core) Private image registry exposed on localhost:32000

sudo microk8s inspect
(report is attached)

ubuntu@opstk2464:~$ sudo microk8s.kubectl get pods --namespace openstack
NAME READY STATUS RESTARTS AGE
modeloperator-797bd6575b-jplvt 1/1 Running 0 3h36m
certificate-authority-0 1/1 Running 0 3h35m
ovn-relay-0 2/2 Running 0 3h33m
keystone-0 2/2 Running 0 3h34m
horizon-mysql-router-0 2/2 Running 0 3h34m
horizon-mysql-0 2/2 Running 0 3h34m
placement-mysql-0 2/2 Running 0 3h34m
cinder-mysql-router-0 2/2 Running 0 3h33m
glance-mysql-router-0 2/2 Running 0 3h33m
glance-mysql-0 2/2 Running 0 3h34m
neutron-mysql-router-0 2/2 Running 0 3h33m
nova-cell-mysql-router-0 2/2 Running 0 3h33m
nova-mysql-router-0 2/2 Running 0 3h33m
keystone-mysql-router-0 2/2 Running 0 3h34m
nova-api-mysql-router-0 2/2 Running 0 3h33m
nova-mysql-0 2/2 Running 0 3h34m
cinder-ceph-mysql-router-0 2/2 Running 0 3h33m
placement-mysql-router-0 2/2 Running 0 3h34m
ovn-central-0 4/4 Running 0 3h32m
rabbitmq-0 2/2 Running 0 3h33m
traefik-0 2/2 Running 0 3h34m
horizon-0 2/2 Running 0 3h33m
cinder-mysql-0 2/2 Running 0 3h34m
placement-0 2/2 Running 0 3h33m
keystone-mysql-0 2/2 Running 0 3h34m
glance-0 2/2 Running 0 3h33m
neutron-0 2/2 Running 0 3h33m
nova-0 4/4 Running 0 3h32m
cinder-ceph-0 2/2 Running 0 3h33m
cinder-0 3/3 Running 0 3h32m
neutron-mysql-0 2/2 Running 0 3h34m
ubuntu@opstk2464:~$

ubuntu@opstk2464:~$ sudo microk8s.kubectl get pod --namespace openstack -o jsonpath="{.spec.containers[*].name}" cinder-0
charm cinder-api cinder-scheduler
(cinder-api-cinder0.log & cinder-scheduler_cinder0.log attached)

ubuntu@opstk2464:~$ sudo microk8s.kubectl get pod --namespace openstack -o jsonpath="{.spec.containers[*].name}" cinder-ceph-0
charm cinder-volume
(cinder-volume_cinder-ceph-0.log attached)

ubuntu@opstk2464:~$ sudo microk8s.kubectl get pod --namespace openstack -o jsonpath="{.spec.containers[*].name}" nova-0
charm nova-api nova-conductor nova-scheduler
(nova-api_nova-0.log + nova-conductor_nova-0.log + nova-scheduler_nova-0.log attached)

No locked terraform plans:
ubuntu@opstk2464:~$ sunbeam inspect plans
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Plan ┃ Locked ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ microceph-plan │ │
│ microk8s-plan │ │
│ openstack-plan │ │
│ sunbeam-machine-plan │ │
└──────────────────────┴────────┘

[note also tried with after 12 Teardown (https://ubuntu.com/openstack/tutorials) and wipe all osd SATA disks with:

ubuntu@opstk2464:~$ cat reset_disks.sh
#!/bin/bash

disks="sda sdb sdc sdd"

for d in $disks; do echo wipe disk /dev/$d;sudo wipefs -af /dev/$d; (echo gwq | sudo fdisk /dev/$d); done

### Deploy using edge channel:
sudo snap openstack --edge

Problems remained (after a few hours later...)

Revision history for this message

Manuel Eurico Paula (manuel-paula) wrote on 2023-10-05:

all_logs_errors bundle Edit (1.1 MiB, application/x-tar)

Revision history for this message

Stephen Tressalk (stekl) wrote on 2023-10-24 (last edit on 2023-10-24):

Looks like I have the same problem. I tried:

2023.1/stable
2023.1/edge
2023.2/edge

in various configurations, all fail to bootstrap with:

nova/0* error idle 10.1.12.49 hook failed: "cell-database-relation-changed"

where the 'hook failed' might change.

The logsink.log shows several of these:

- Start service "nova-scheduler" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----
2023-10-24T05:25:37Z INFO Most recent service output:
    (...)
    2023-10-24 05:25:37.612 63 ERROR nova result = self._query(query)
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 310, in _query
    2023-10-24 05:25:37.612 63 ERROR nova conn.query(q)
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 548, in qu
ery
    2023-10-24 05:25:37.612 63 ERROR nova self._affected_rows = self._read_query_result(unbuffered=unbuffered)
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 775, in _r
ead_query_result
    2023-10-24 05:25:37.612 63 ERROR nova result.read()
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1156, in r
ead
    2023-10-24 05:25:37.612 63 ERROR nova first_packet = self.connection._read_packet()
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 725, in _r
ead_packet
    2023-10-24 05:25:37.612 63 ERROR nova packet.raise_for_error()
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 221, in raise
_for_error
    2023-10-24 05:25:37.612 63 ERROR nova err.raise_mysql_exception(self._data)
    2023-10-24 05:25:37.612 63 ERROR nova File "/usr/lib/python3/dist-packages/pymysql/err.py", line 143, in raise_mysq
l_exception
    2023-10-24 05:25:37.612 63 ERROR nova raise errorclass(errno, errval)
    2023-10-24 05:25:37.612 63 ERROR nova sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1146, "Table '
nova_api.cell_mappings' doesn't exist")
    2023-10-24 05:25:37.612 63 ERROR nova [SQL: SELECT cell_mappings.created_at AS cell_mappings_created_at, cell_mapping
s.updated_at AS cell_mappings_updated_at, cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS cell_mappings_uuid,
cell_mappings.name AS cell_mappings_name, cell_mappings.transport_url AS cell_mappings_transport_url, cell_mappings.data
base_connection AS cell_mappings_database_connection, cell_mappings.disabled AS cell_mappings_disabled
    2023-10-24 05:25:37.612 63 ERROR nova FROM cell_mappings ORDER BY cell_mappings.id ASC]
    2023-10-24 05:25:37.612 63 ERROR nova (Background on this error at: https://sqlalche.me/e/14/f405)
    2023-10-24 05:25:37.612 63 ERROR nova ESC[00m
2023-10-24T05:25:37Z ERROR cannot start service: exited quickly with code 1
-----

Any ideas what's wrong?

Looks like I have the same problem. I tried:

2023.1/stable
2023.1/edge
2023.2/edge

in various configurations, all fail to bootstrap with:

nova/0*                    error   idle  10.1.12.49    hook failed: "cell-database-relation-changed"

where the 'hook failed' might change.

The logsink.log shows several of these:

- Start service "nova-scheduler" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----
2023-10-24T05:25:37Z INFO Most recent service output:
    (...)
    2023-10-24 05:25:37.612 63 ERROR nova     result = self._query(query)
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 310, in _query
    2023-10-24 05:25:37.612 63 ERROR nova     conn.query(q)
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 548, in qu
ery
    2023-10-24 05:25:37.612 63 ERROR nova     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 775, in _r
ead_query_result
    2023-10-24 05:25:37.612 63 ERROR nova     result.read()
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1156, in r
ead
    2023-10-24 05:25:37.612 63 ERROR nova     first_packet = self.connection._read_packet()
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 725, in _r
ead_packet
    2023-10-24 05:25:37.612 63 ERROR nova     packet.raise_for_error()
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 221, in raise
_for_error
    2023-10-24 05:25:37.612 63 ERROR nova     err.raise_mysql_exception(self._data)
    2023-10-24 05:25:37.612 63 ERROR nova   File "/usr/lib/python3/dist-packages/pymysql/err.py", line 143, in raise_mysq
l_exception
    2023-10-24 05:25:37.612 63 ERROR nova     raise errorclass(errno, errval)
    2023-10-24 05:25:37.612 63 ERROR nova sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1146, "Table '
nova_api.cell_mappings' doesn't exist")
    2023-10-24 05:25:37.612 63 ERROR nova [SQL: SELECT cell_mappings.created_at AS cell_mappings_created_at, cell_mapping
s.updated_at AS cell_mappings_updated_at, cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS cell_mappings_uuid,
 cell_mappings.name AS cell_mappings_name, cell_mappings.transport_url AS cell_mappings_transport_url, cell_mappings.data
base_connection AS cell_mappings_database_connection, cell_mappings.disabled AS cell_mappings_disabled 
    2023-10-24 05:25:37.612 63 ERROR nova FROM cell_mappings ORDER BY cell_mappings.id ASC]
    2023-10-24 05:25:37.612 63 ERROR nova (Background on this error at: https://sqlalche.me/e/14/f405)
    2023-10-24 05:25:37.612 63 ERROR nova ESC[00m
2023-10-24T05:25:37Z ERROR cannot start service: exited quickly with code 1
-----

Any ideas what's wrong?

Revision history for this message

Hemanth Nakkina (hemanth-n) wrote on 2023-10-25:

Can you provide the following logs.

juju debug-log -m openstack -i unit-cinder-0 --replay
juju debug-log -m openstack -i unit-cinder-ceph-0 --replay
juju debug-log -m openstack -i unit-nova-0 --replay

(I see you have attached juju_debug_log.txt but it is empty)

Revision history for this message

Stephen Tressalk (stekl) wrote on 2023-10-25:

unit-cinder-0.log Edit (20.8 KiB, text/plain)

Hi,

here are my logs. Note that I replaced my public IP address by a random one in the logs.

Revision history for this message

Stephen Tressalk (stekl) wrote on 2023-10-25:

unit-cinder-ceph-0.log Edit (20.0 KiB, text/plain)

Revision history for this message

Stephen Tressalk (stekl) wrote on 2023-10-25:

unit-nova-0.log Edit (91.0 KiB, text/plain)

Revision history for this message

Stephen Tressalk (stekl) wrote on 2023-10-27:

I’ve got it managed to bootstrap one time yesterday, with the same configuration that failed about 50 times before. Unfortunately this did not survive a reboot. It was unusable then. The next bootstrap failed the same way all the other ones failed. I have 20 CPU cores on my machine, so maybe there is a race in the bootstrap.

Revision history for this message

James Page (james-page) wrote on 2023-10-27:

#10

Hi Stephen

Thanks for the logs - looking through the nova log it looks like we might have a bug in the charm code - the mandatory relations for the service are logging as incomplete, however the nova-scheduler service is still being started by the charm which is not right (and needs further debug and resolution).

You detailed that the machine you are using has lots of cores - what type of storage does it have? This smells like another issue I was helping a different user.

Revision history for this message

Stephen Tressalk (stekl) wrote on 2023-10-27:

#11

Hi James,

I have two 500GB nvme devices on a raid 1 for the system and an empty 2TB nvme device for Ceph.
Let me know if you need additional information.

Revision history for this message

Manuel Eurico Paula (manuel-paula) wrote on 2023-11-02:

#12

Download full text (3.9 KiB)

Hello everybody,

Sorry to respond late. Just returned to work from a recent event.

In my case I have 5 disks (is listed in the logs bundle i previously submited phys_hw.txt):
1x nvme 1T for OS
4x kioxia ssd (1x reserved (destinated) for cinder and 3x for micro.ceph).

nova-scheduler_nova-0.log shows:
multiple tranches of:
2023-10-05T17:58:53.673Z [nova-scheduler] Modules with known eventlet monkey patching issues were imported prior to eventlet monkey patching: urllib3. This warning can usually be ignored if the caller is only importing and not executing nova code.
2023-10-05T17:58:54.084Z [nova-scheduler] 2023-10-05 17:58:54.082 143 CRITICAL nova [None req-6da8ed32-f895-40d7-bd40-fd6a001429a8 - - - - - -] Unhandled error: sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1146, "Table 'nova_api.cell_mappings' doesn't exist")

with referece of nova_api.cell_mappings table does not exists.

cinder-scheduler_cinder-0.log shows:
2023-10-05T14:24:51.110Z [pebble] Service "cinder-scheduler" starting: cinder-scheduler --use-syslog
2023-10-05T14:24:51.838Z [cinder-scheduler] 2023-10-05 14:24:51.835 15 CRITICAL cinder [None req-a9a9208a-5dd5-458a-9f4f-71d418e60405 - - - - - -] Unhandled error: sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1146, "Table 'cinder.services' doesn't exist")
2023-10-05T14:24:51.838Z [cinder-scheduler] [SQL: SELECT services.created_at AS services_created_at, services.deleted_at AS services_deleted_at, services.deleted AS services_deleted, services.id AS services_id, services.uuid AS services_uuid, services.cluster_name AS services_cluster_name, services.host AS services_host, services.`binary` AS services_binary, services.updated_at AS services_updated_at, services.topic AS services_topic, services.report_count AS services_report_count, services.disabled AS services_disabled, services.availability_zone AS services_availability_zone, services.disabled_reason AS services_disabled_reason, services.modified_at AS services_modified_at, services.rpc_current_version AS services_rpc_current_version, services.object_current_version AS services_object_current_version, services.replication_status AS services_replication_status, services.active_backend_id AS services_active_backend_id, services.frozen AS services_frozen

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.