FFU: HA controller scale-up logics fail to match regex

Bug #1950294 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
In Progress
Low
Damien Ciabrini

Bug Description

During FFU to Train, a new pacemaker cluster is recreated node by node. puppet-pacemaker handles the logics that detects whether a new node is being added into the pacemaker cluster. There is a safeguard check to make sure the scale-up is idempotent:

          exec {"Adding Cluster node: ${node_to_add} to Cluster ${cluster_name}":
            unless => "${::pacemaker::pcs_bin} status 2>&1 | grep -e \"^Online:.* ${node_name} .*\"",
            command => "${::pacemaker::pcs_bin} cluster node add ${node_to_add} ${node_add_start_part} --wait",
            timeout => $cluster_start_timeout,
            tries => $cluster_start_tries,
            try_sleep => $cluster_start_try_sleep,
            notify => Exec["node-cluster-start-${node_name}"],
            tag => 'pacemaker-scaleup',
          }

However, the regex used in the "unless" attribute fails to match node due to an unecessary beg-of-line match. Consequently, under specific circumstances, it may happen that puppet-pacemaker tries to add a node to the cluster even if it's already present:

<13>Oct 20 23:44:22 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Authenticating new cluster node: overcloud-controller-2 addr=10.1.0.22]/returns: executed successfully
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Node name 'overcloud-controller-2' is already used by existing nodes; please, use other name
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Node address '10.1.0.22' is already used by existing nodes; please, use other address
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: overcloud-controller-2: Running cluster services: 'corosync', 'pacemaker', the host seems to be in a cluster already, use --force to override
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: overcloud-controller-2: Cluster configuration files found, the host seems to be in a cluster already, use --force to override
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Some nodes are already in a cluster. Enforcing this will destroy existing cluster on those nodes. You should remove the nodes from their clusters instead to keep the clusters working properly, use --force to\
 override
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: Error: Errors have occurred, therefore pcs is unable to continue
<13>Oct 20 23:48:23 puppet-user: Error: '/sbin/pcs cluster node add overcloud-controller-2 addr=10.1.0.22 --start --wait' returned 1 instead of one of [0]
<13>Oct 20 23:48:23 puppet-user: Error: /Stage[main]/Pacemaker::Corosync/Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster]/returns: change from 'notrun' to ['0'] failed: '/sbin/pcs cluster node add overcloud-controller-2 addr=10.1.0.22 --start --wait' returned 1 instead of one of [0]
<13>Oct 20 23:48:23 puppet-user: Notice: /Stage[main]/Pacemaker::Corosync/Exec[node-cluster-start-overcloud-controller-2]: Dependency Exec[Adding Cluster node: overcloud-controller-2 addr=10.1.0.22 to Cluster tripleo_cluster] has failures: true

Changed in tripleo:
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-pacemaker 1.4.0

This issue was fixed in the openstack/puppet-pacemaker 1.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.