During FFU to Train, a new pacemaker cluster is recreated node by node. puppet-pacemaker handles the logics that detects whether a new node is being added into the pacemaker cluster. There is a safeguard check to make sure the scale-up is idempotent:
exec {"Adding Cluster node: ${node_to_add} to Cluster ${cluster_name}":
unless => "${::pacemaker::pcs_bin} status 2>&1 | grep -e \"^Online:.* ${node_name} .*\"",
command => "${::pacemaker::pcs_bin} cluster node add ${node_to_add} ${node_add_start_part} --wait",
timeout => $cluster_start_timeout,
tries => $cluster_start_tries,
try_sleep => $cluster_start_try_sleep,
notify => Exec["node-cluster-start-${node_name}"],
tag => 'pacemaker-scaleup',
}
However, the regex used in the "unless" attribute fails to match node due to an unecessary beg-of-line match. Consequently, under specific circumstances, it may happen that puppet-pacemaker tries to add a node to the cluster even if it's already present:
<13>Oct 20 23:44:22 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Authenticating new cluster node: overcloud-controller-2 addr=10.1.0.22]/returns: executed successfully
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Node name 'overcloud-controller-2' is already used by existing nodes; please, use other name
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Node address '10.1.0.22' is already used by existing nodes; please, use other address
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: overcloud-controller-2: Running cluster services: 'corosync', 'pacemaker', the host seems to be in a cluster already, use --force to override
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: overcloud-controller-2: Cluster configuration files found, the host seems to be in a cluster already, use --force to override
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Some nodes are already in a cluster. Enforcing this will destroy existing cluster on those nodes. You should remove the nodes from their clusters instead to keep the clusters working properly, use --force to\
override
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Errors have occurred, therefore pcs is unable to continue
<13>Oct 20 23:48:23 puppet-user: Error: '/sbin/pcs cluster node add overcloud-controller-2 addr=10.1.0.22 --start --wait' returned 1 instead of one of [0]
<13>Oct 20 23:48:23 puppet-user: Error: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: change from 'notrun' to ['0'] failed: '/sbin/pcs cluster node add overcloud-controller-2 addr=10.1.0.22 --start --wait' returned 1 instead of one of [0]
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[node-cluster-start-overcloud-controller-2]: Dependency Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster] has failures: true
This issue was fixed in the openstack/ puppet- pacemaker 1.4.0 release.