microstack bootstrap fails on timeout

Bug #2062993 reported by Pokkihju
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Incomplete
Undecided
Unassigned

Bug Description

Hello,

While trying to deploy a microstack cluster with storage on a single node, I encountered the following issue: The bootstrap fails with timeout.
Here is the juju openstack status of cinder-ceph and glance:

cinder-ceph/0* waiting idle 10.1.110.152 (workload) Not all relations are ready
glance/0* waiting idle 10.1.110.159 (ceph) integration incomplete

All other are Active and OK.

I have tried multiple times using openstack channels 2023.1 and 2023.2. I have yet to test 2024.1

Is there anything I can do or logs I can add that would help ?(I have tried juju debug-log -m admin/controller --replay | grep microceph | grep Error and there is not the same error as bug https://bugs.launchpad.net/snap-openstack/+bug/2023664)

Thanks in advance

Revision history for this message
Guillaume Boutry (gboutry) wrote :

Hi, can you be more explicit with the versions you've used ?

Can you try with:

```
snap install openstack --channel 2023.2/candidate
sunbeam -v cluster bootstrap --manifest /snap/openstack/current/etc/manifests/candidate.yml
```

Can you please provide the logs from `juju debug-log -m admin/controller --replay` and `juju debug-log -m openstack --replay`

Changed in snap-openstack:
status: New → Incomplete
Revision history for this message
Pokkihju (pokkihju) wrote :

Hello,

Sorry for the delay, here are the logs you asked, I will update you once I will have had time to test your commands.

Revision history for this message
Pokkihju (pokkihju) wrote :

And I could not find how to send to log files in the same comment so here is the second part (they are in the same order as your commands, admin controller in previous comment and juju debug-log in the second)

Revision history for this message
Guillaume Boutry (gboutry) wrote :

Current stable of sunbeam will install Microceph from `latest/edge`, and the rev 43, the one you've got installed introduced new behavior that seem to fail in your environment.
You can see a lot of
unit-microceph-0: 15:10:06 INFO unit.microceph/0.juju-log _on_relation_changed event
unit-microceph-0: 15:10:07 INFO unit.microceph/0.juju-log Storage not available, deferring event.
unit-microceph-0: 15:10:07 INFO unit.microceph/0.juju-log _on_relation_changed event

Candidate will install from the channel `reef/candidate` which should be more stable

Revision history for this message
Guillaume Boutry (gboutry) wrote :

It looks like microceph failed to add a disk:

unit-microceph-0: 20:51:08 ERROR unit.microceph/0.juju-log Failed executing cmd: ['microceph', 'disk', 'add', '/dev/sdi1'], error: Error: failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 1: exit status 250 (2024-04-20T20:51:08.503+0000 7f4c637be8c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (13) Permission denied

Did you wipe your block device before bootstrapping sunbeam again ?

Revision history for this message
Pokkihju (pokkihju) wrote :

Hello, I am currently testing your commands, I encountered issues when trying to use a virtual disk for microceph on my first tries which corresponds to the logs you mentionned in your last comment.
After looking around I found this issue:
https://github.com/canonical/microceph/issues/251

And applied the fix mentionned before restarting microceph and re running the bootstrap command which then worked.

Revision history for this message
Pokkihju (pokkihju) wrote :

Update, I have run your bootstrap command, you will find all the logs in the attached file.
At the bottom is the status at the end of the command.

It seems that microceph was not installed correctly using your command, do I need to install it separately ?

Revision history for this message
Pokkihju (pokkihju) wrote (last edit ):

New update, running with all roles active properly installs microceph and the cluster seems up and running. I will play a bit with it but it seems to have fixed my issue. So thanks.
command run:
sunbeam -v cluster bootstrap --manifest /snap/openstack/current/etc/manifests/candidate.yml --role control --role compute --role storage

Revision history for this message
Pokkihju (pokkihju) wrote :

Yet another update. The microceph deployed did not work. Maybe I gave it wrong parameters or something I don't kow. However, by running the following commands:

# Remove relations of microceph
juju remove-relation -m openstack microceph:ceph cinder-ceph:ceph
juju remove-relation -m openstack microceph:ceph glance:ceph

# remove microceph offer
juju remove-offer microceph

# remove microceph saas
juju remove-saas -m openstack microceph

# remove microceph application
juju remove-application microceph --force --no-wait

# remove microceph snap
sudo snap remove --purge microceph

# deploy microceph
juju deploy microceph --channel reef/candidate --to 0

# Add storage to microceph manually through juju storage
juju add-storage microceph/X osd-standalone='loop,200G,3'

# recreate microceph offer
juju offer microceph:ceph

# re integrate microceph to openstack
juju integrate -m k8s glance:ceph admin/controller.microceph
juju integrate -m k8s cinder-ceph:ceph admin/controller.microceph

And there you should have a microceph backed by local loop devices that works

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.