tripleo

Bug #1973460
Comment #2

Comment 2 for bug 1973460

Revision history for this message

Brendan Shephard (bshephar) wrote on 2022-05-16:

Looks like it failed when Puppet tried to configure pacemaker:
416460 2022-05-15 16:10:08,766 p=453290 u=stack n=ansible | 2022-05-15 16:10:08.766562 | 002590f0-0f45-829a-6014-000000002ebe | FATAL | Wait for puppet host configuration to finish | co ntroller.arda | error={

2022-05-15 13:24:12 +0000 Exec[wait-for-settle](provider=posix) (debug): Executing check '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless (debug): Error: error running crm_mon, is pacemaker running?
2022-05-15 13:24:12 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless (debug): crm_mon: Error: cluster is not available on this node
2022-05-15 13:24:12 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Exec try 1/360
2022-05-15 13:24:12 +0000 Exec[wait-for-settle](provider=posix) (debug): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:13 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Sleeping for 10.0 seconds between tries
2022-05-15 13:24:23 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Exec try 2/360
2022-05-15 13:24:23 +0000 Exec[wait-for-settle](provider=posix) (debug): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:23 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:23 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Sleeping for 10.0 seconds between tries
2022-05-15 13:24:28 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (err): change from 'notrun' to ['0'] failed: exit

I assume the pacemaker service is failing to start? systemctl status pcsd

Was this node used for a previous deployment and not cleaned? ie, could this node already be configured from a previous deployment with a hacluster username and password?

Let's check what is preventing pcsd from starting on the Controller:
systemctl status pcsd
journalctl -u pcsd -e

Looks like it failed when Puppet tried to configure pacemaker:
416460 2022-05-15 16:10:08,766 p=453290 u=stack n=ansible | 2022-05-15 16:10:08.766562 | 002590f0-0f45-829a-6014-000000002ebe |      FATAL | Wait for puppet host configuration to finish | co       ntroller.arda | error={

2022-05-15 13:24:12 +0000 Exec[wait-for-settle](provider=posix) (debug): Executing check '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless (debug): Error: error running crm_mon, is pacemaker running?
2022-05-15 13:24:12 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/unless (debug):   crm_mon: Error: cluster is not available on this node
2022-05-15 13:24:12 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Exec try 1/360
2022-05-15 13:24:12 +0000 Exec[wait-for-settle](provider=posix) (debug): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:13 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Sleeping for 10.0 seconds between tries
2022-05-15 13:24:23 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Exec try 2/360
2022-05-15 13:24:23 +0000 Exec[wait-for-settle](provider=posix) (debug): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:23 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:23 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (debug): Sleeping for 10.0 seconds between tries
2022-05-15 13:24:28 +0000 /Stage[main]/Pacemaker::Corosync/Exec[wait-for-settle]/returns (err): change from 'notrun' to ['0'] failed: exit

I assume the pacemaker service is failing to start? systemctl status pcsd

Was this node used for a previous deployment and not cleaned? ie, could this node already be configured from a previous deployment with a hacluster username and password?

Let's check what is preventing pcsd from starting on the Controller:
systemctl status pcsd
journalctl -u pcsd -e