OpenStack Snap

microstack bootstrap fails on timeout

Bug #2062993 reported by Pokkihju on 2024-04-20

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Snap	Incomplete	Undecided	Unassigned

Bug Description

Hello,

While trying to deploy a microstack cluster with storage on a single node, I encountered the following issue: The bootstrap fails with timeout.
Here is the juju openstack status of cinder-ceph and glance:

cinder-ceph/0* waiting idle 10.1.110.152 (workload) Not all relations are ready
glance/0* waiting idle 10.1.110.159 (ceph) integration incomplete

All other are Active and OK.

I have tried multiple times using openstack channels 2023.1 and 2023.2. I have yet to test 2024.1

Is there anything I can do or logs I can add that would help ?(I have tried juju debug-log -m admin/controller --replay | grep microceph | grep Error and there is not the same error as bug https://bugs.launchpad.net/snap-openstack/+bug/2023664)

Thanks in advance

Revision history for this message

Guillaume Boutry (gboutry) wrote on 2024-04-22:

Hi, can you be more explicit with the versions you've used ?

Can you try with:

```
snap install openstack --channel 2023.2/candidate
sunbeam -v cluster bootstrap --manifest /snap/openstack/current/etc/manifests/candidate.yml
```

Can you please provide the logs from `juju debug-log -m admin/controller --replay` and `juju debug-log -m openstack --replay`

Changed in snap-openstack:
status:	New → Incomplete

Revision history for this message

Pokkihju (pokkihju) wrote on 2024-04-25:

Logs part 1 Edit (3.5 MiB, text/plain)

Hello,

Sorry for the delay, here are the logs you asked, I will update you once I will have had time to test your commands.

Revision history for this message

Pokkihju (pokkihju) wrote on 2024-04-25:

Logs part 2 Edit (14.0 MiB, text/plain)

And I could not find how to send to log files in the same comment so here is the second part (they are in the same order as your commands, admin controller in previous comment and juju debug-log in the second)

Revision history for this message

Guillaume Boutry (gboutry) wrote on 2024-04-26:

Current stable of sunbeam will install Microceph from `latest/edge`, and the rev 43, the one you've got installed introduced new behavior that seem to fail in your environment.
You can see a lot of
unit-microceph-0: 15:10:06 INFO unit.microceph/0.juju-log _on_relation_changed event
unit-microceph-0: 15:10:07 INFO unit.microceph/0.juju-log Storage not available, deferring event.
unit-microceph-0: 15:10:07 INFO unit.microceph/0.juju-log _on_relation_changed event

Candidate will install from the channel `reef/candidate` which should be more stable

Revision history for this message

Guillaume Boutry (gboutry) wrote on 2024-04-26:

It looks like microceph failed to add a disk:

unit-microceph-0: 20:51:08 ERROR unit.microceph/0.juju-log Failed executing cmd: ['microceph', 'disk', 'add', '/dev/sdi1'], error: Error: failed to bootstrap OSD: Failed to run: ceph-osd --mkfs --no-mon-config -i 1: exit status 250 (2024-04-20T20:51:08.503+0000 7f4c637be8c0 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (13) Permission denied

Did you wipe your block device before bootstrapping sunbeam again ?

Revision history for this message

Pokkihju (pokkihju) wrote on 2024-04-26:

Hello, I am currently testing your commands, I encountered issues when trying to use a virtual disk for microceph on my first tries which corresponds to the logs you mentionned in your last comment.
After looking around I found this issue:
https://github.com/canonical/microceph/issues/251

And applied the fix mentionned before restarting microceph and re running the bootstrap command which then worked.

Revision history for this message

Pokkihju (pokkihju) wrote on 2024-04-26:

bootstrap2.log Edit (376.9 KiB, text/plain)

Update, I have run your bootstrap command, you will find all the logs in the attached file.
At the bottom is the status at the end of the command.

It seems that microceph was not installed correctly using your command, do I need to install it separately ?

Revision history for this message

Pokkihju (pokkihju) wrote on 2024-05-01 (last edit on 2024-05-01):

New update, running with all roles active properly installs microceph and the cluster seems up and running. I will play a bit with it but it seems to have fixed my issue. So thanks.
command run:
sunbeam -v cluster bootstrap --manifest /snap/openstack/current/etc/manifests/candidate.yml --role control --role compute --role storage

Revision history for this message

Pokkihju (pokkihju) wrote on 2024-05-01:

Yet another update. The microceph deployed did not work. Maybe I gave it wrong parameters or something I don't kow. However, by running the following commands:

# Remove relations of microceph
juju remove-relation -m openstack microceph:ceph cinder-ceph:ceph
juju remove-relation -m openstack microceph:ceph glance:ceph

# remove microceph offer
juju remove-offer microceph

# remove microceph saas
juju remove-saas -m openstack microceph

# remove microceph application
juju remove-application microceph --force --no-wait

# remove microceph snap
sudo snap remove --purge microceph

# deploy microceph
juju deploy microceph --channel reef/candidate --to 0

# Add storage to microceph manually through juju storage
juju add-storage microceph/X osd-standalone='loop,200G,3'

# recreate microceph offer
juju offer microceph:ceph

# re integrate microceph to openstack
juju integrate -m k8s glance:ceph admin/controller.microceph
juju integrate -m k8s cinder-ceph:ceph admin/controller.microceph

And there you should have a microceph backed by local loop devices that works

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

auto-github-canonical-microceph #251
[open enhancement workaround-available] Edit

Bug watches keep track of this bug in other bug trackers.