Validate corosync cluster member IPs

Bug #1850129 reported by Andrea Ieri
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
Won't Fix
Wishlist
Aurelien Lourot

Bug Description

A prerequisite for the correct functionality of the cluster is obviously for all cluster members to be able to reach each other and obtain quorum. If the hanode endpoint is bound correctly, this is generally not an issue, as all the IPs received over the peer relation would lie in the same subnet. If the operator does not however set an explicit binding for the hanode endpoint, it can happen that the hacluster charm will write a corosync configuration in which peers have IPs in different subnets. This is particularly problematic in case docker or lxd bridges are present on the cluster member machines, as these IPs are not reachable from other machines at all.

In order to provide a more robust experience, the hacluster charm should go in a blocked state if not all the cluster member IPs are in the same subnet.

Changed in charm-hacluster:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Andrea Ieri (aieri) wrote :

For reference, this is the kind of situation that should be avoided:

$ juju run -u hacluster-neutron/3 -- relation-get -r24 - hacluster-neutron/5
egress-subnets: 172.17.0.1/32
ingress-address: 172.17.0.1
member_ready: "True"
private-address: 172.17.0.1
ready: "True"

$ juju run -u hacluster-neutron/5 -- relation-get -r24 - hacluster-neutron/3
egress-subnets: 100.100.185.70/32
ingress-address: 100.100.185.70
member_ready: "True"
private-address: 100.100.185.70
ready: "True"

$ juju run -u hacluster-neutron/5 -- relation-get -r24 - hacluster-neutron/4
egress-subnets: 100.100.185.123/32
ingress-address: 100.100.185.123
member_ready: "True"
private-address: 100.100.185.123
ready: "True"

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Note that in a normal/working situation you end up with:

$ for i in 0 1 2; do u=neutron-hacluster/$i; echo "$u:"; juju run -u $u -- relation-get -r $(juju run -u $u -- relation-ids ha) - $u; done
neutron-hacluster/0:
clustered: "yes"
egress-subnets: 10.5.0.4/32
ingress-address: 10.5.0.4
private-address: 10.5.0.4
neutron-hacluster/1:
clustered: "yes"
egress-subnets: 10.5.0.3/32
ingress-address: 10.5.0.3
private-address: 10.5.0.3
neutron-hacluster/2:
clustered: "yes"
egress-subnets: 10.5.0.8/32
ingress-address: 10.5.0.8
private-address: 10.5.0.8

All subnets are /32, i.e. different. So egress-subnets alone can't be used to tell if all 3 subnets are identical. But it can be used in combination with:

$ for i in 0 1 2; do u=neutron-hacluster/$i; echo "$u:"; juju run -u $u -- network-get ha | grep "\saddress:\|\scidr:"; done
neutron-hacluster/0:
    address: 10.5.0.4
    cidr: 10.5.0.0/16
    address: 10.5.0.4
    cidr: 10.5.0.0/16
    address: 252.0.4.1
    cidr: 252.0.0.0/8
neutron-hacluster/1:
    address: 10.5.0.3
    cidr: 10.5.0.0/16
    address: 10.5.0.3
    cidr: 10.5.0.0/16
    address: 252.0.3.1
    cidr: 252.0.0.0/8
neutron-hacluster/2:
    address: 10.5.0.8
    cidr: 10.5.0.0/16
    address: 10.5.0.8
    cidr: 10.5.0.0/16
    address: 252.0.8.1
    cidr: 252.0.0.0/8

Changed in charm-hacluster:
status: Triaged → In Progress
assignee: nobody → Aurelien Lourot (aurelien-lourot)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-hacluster (master)

Fix proposed to branch: master
Review: https://review.opendev.org/702680

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

We decided with @ajkavanagh and @thedac not to proceed on this one. The user is expected to bind endpoints to network spaces when several spaces are eligible.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-hacluster (master)

Change abandoned by Aurelien Lourot (<email address hidden>) on branch: master
Review: https://review.opendev.org/702680

Changed in charm-hacluster:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.