Without cluster_count set to NUM_UNITS a race occurs where the relation to the last hacluster node is not yet set leading to the attempt to startup corosync and pacemaker with only n-1/n nodes.
The last node only has one relationship it is aware of yet when there should be 2 relations:
relation-list -r hanode:0
hacluster/0
corosync.conf looks like the following when there should be 3 nodes:
nodelist {
node { ring0_addr: 10.5.35.235 nodeid: 1000
}
node { ring0_addr: 10.5.35.237 nodeid: 1001
}
}
The services themselves (not the charm) fail:
corosync logs thousands of RETRANSMIT errors.
pacemaker eventually times out after waiting on corosync.
Adding more documentation to push the setting of cluster_count and updating the amulet tests to include it.
Additional information from the charm:
Without cluster_count set to NUM_UNITS a race occurs where the relation to the last hacluster node is not yet set leading to the attempt to startup corosync and pacemaker with only n-1/n nodes.
The last node only has one relationship it is aware of yet when there should be 2 relations:
relation-list -r hanode:0
hacluster/0
corosync.conf looks like the following when there should be 3 nodes:
nodelist {
node {
ring0_ addr: 10.5.35.235
nodeid: 1000
}
node {
ring0_ addr: 10.5.35.237
nodeid: 1001
}
}
The services themselves (not the charm) fail:
corosync logs thousands of RETRANSMIT errors.
pacemaker eventually times out after waiting on corosync.
Adding more documentation to push the setting of cluster_count and updating the amulet tests to include it.