Comment 6 for bug 1654403

Revision history for this message
David Ames (thedac) wrote :

Additional information from the charm:

Without cluster_count set to NUM_UNITS a race occurs where the relation to the last hacluster node is not yet set leading to the attempt to startup corosync and pacemaker with only n-1/n nodes.

The last node only has one relationship it is aware of yet when there should be 2 relations:
relation-list -r hanode:0
hacluster/0

corosync.conf looks like the following when there should be 3 nodes:

nodelist {

        node {
                ring0_addr: 10.5.35.235
                nodeid: 1000
        }

        node {
                ring0_addr: 10.5.35.237
                nodeid: 1001
        }

}

The services themselves (not the charm) fail:
corosync logs thousands of RETRANSMIT errors.
pacemaker eventually times out after waiting on corosync.

Adding more documentation to push the setting of cluster_count and updating the amulet tests to include it.