Error: Timed out while waiting for units microceph/1 to be ready

Bug #2056218 reported by Andre Ruiz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Snap
New
Undecided
Unassigned

Bug Description

I'm seeing a lot of errors like this lately:

14:32:21 > Adding MicroK8S unit to machine ...
14:52:13 > Adding MicroCeph unit to machine ...
14:52:13 > Adding MicroCeph unit to machine ... Timed out while waiting for units microceph/1 to be ready
14:52:13 > Adding MicroCeph unit to machine ...
14:52:13 Error: Timed out while waiting for units microceph/1 to be ready

I'm using 2023.2/edge release 429.

First node installs fine, microceph and all other services (both for control plane and for hypervisor) are installed, all services are started and installation finishes without any problem. Then second node is added, and it times out when adding microceph to it.

This is happening in about 40% of all multi-node installs in the last week or so.

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

Logs from the CI installation.

Andre Ruiz (andre-ruiz)
description: updated
Revision history for this message
Andre Ruiz (andre-ruiz) wrote :
Revision history for this message
Andre Ruiz (andre-ruiz) wrote (last edit ):
Download full text (3.8 KiB)

I understand this is a strange network config, but it is a common one for baremetals in public clouds (in this case, equinix, which is a valid target for someone to install sunbeam on).

The real goal is to completely ignore the bond0 interface with both private and public with small netmasks and create other real layer 2 networks which exist across all nodes.

There is a detailed installation description on that other bug (mentioned above), but here is the interesting details for a real case:

in /etc/hosts of ALL nodes:

10.0.1.11 sunbeam11.mydomain sunbeam11
10.0.1.12 sunbeam12.mydomain sunbeam12
10.0.1.13 sunbeam13.mydomain sunbeam13

Machine 1: hostname "sunbeam11.mydomain" which resolves to "10.0.1.11", default-route via public-ip

5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 139.178.81.209/31 brd 255.255.255.255 scope global bond0
    inet 10.65.5.133/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.0.1.11/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    <no-ips, to be used with OVN>

Machine 2: hostname "sunbeam12.mydomain" which resolves to "10.0.1.12", default-route via public-ip

5: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 147.28.187.65/31 brd 255.255.255.255 scope global bond0
    inet 10.65.5.135/31 brd 255.255.255.255 scope global bond0:0
6: bond0.1002@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.0.1.12/24 brd 10.0.1.255 scope global bond0.1002
7: bond0.1003@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    <no-ips, to be used with OVN>

On the manifest file, I state that I want the "10.0.1.0/24" network for the sunbeam management network:

deployment:
  bootstrap:
    management_cidr: 10.0.1.0/24

And yet, when adding a second node, the token generated contains ips from other interfaces that I'm trying to ignore. At first node (which is already deployed successfully):

sunbeam cluster add --format yaml --name sunbeam12.mydomain
17:50:54 DEBUG: Got token: eyJuYW1lIjoic3VuYmVhbTEyLm15ZG9tYWluIiwic2VjcmV0IjoiZGZlNGMyMTgxZTIyM2QzMWQ1NDBlZGRhMWJmYjFhODQ3YzJkNTU0Y2YyYzc4NjVkODJiMDQzNWYwYTAxMzczNiIsImZpbmdlcnByaW50IjoiNjJiOTVhOThlNjE2OTc5OTAxNmRlYzdiNzJlZjUxYjA4MDEzYTE1NjYzNDVjNWQ1MzczZDM5M2YwMmY5NjM4YyIsImpvaW5fYWRkcmVzc2VzIjpbIjEzOS4xNzguODEuMjA5OjcwMDAiXX0=
17:50:54 DEBUG: Decoded token: b'{"name":"sunbeam12.mydomain","secret":"dfe4c2181e223d31d540edda1bfb1a847c2d554cf2c7865d82b0435f0a013736","fingerprint":"62b95a98e6169799016dec7b72ef51b08013a1566345c5d5373d393f02f9638c","join_addresses":["139.178.81.209:7000"]}'

I expected to see an ip from the network 10.0.1.0/24 in the "join_addresses".

But then, this is initially used to form the sunbeam cluster. Later, when adding MicroK8s to this machine, the same thing will happen again, a token will ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.