Pacemaker being required for non-HA deployment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
New
|
Undecided
|
Unassigned |
Bug Description
I have been trying to deploy the simplest TripleO stack for weeks, and the latest issue is that the deploy is stuck in a loop performing this command on the controller node:
```
Debug: Exec[wait-
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: /Stage[
Debug: /Stage[
Debug: /Stage[
Debug: Exec[wait-
Debug: Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
Debug: /Stage[
```
But I am not trying to deploy a HA cluster, and even if I include `docker-ha.yaml`, this issue persists. For reference, this is the answers-file I tried to use:
```
templates: /usr/share/
environments:
- /usr/share/
- ./overcloud-
- ./overcloud-
- ./overcloud-
- ./containers-
```
And the baremetal_
```
- name: Controller
count: 1
defaults:
networks:
- network: ctlplane
vif: true
- network: internalapi
- network: storagemgmt
- network: storage
network_config:
template: templates/
default_
- ctlplane
instances:
- hostname: controller.arda
name: controller
- name: Compute
count: 1
defaults:
networks:
- network: ctlplane
vif: true
- network: internalapi
- network: storage
network_config:
template: templates/
default_
- ctlplane
instances:
- hostname: compute.arda
name: compute
```
I do not see this to be anything special, and it is a very basic example according to the deployment guide, yet I cannot get it to work. This is running on master branch with CentOS 9 stream. I am attaching a more detailed archive of this setup:
no longer affects: | puppet-pacemaker |
Looks like it failed when Puppet tried to configure pacemaker: 0f45-829a- 6014-000000002e be | FATAL | Wait for puppet host configuration to finish | co ntroller.arda | error={
416460 2022-05-15 16:10:08,766 p=453290 u=stack n=ansible | 2022-05-15 16:10:08.766562 | 002590f0-
2022-05-15 13:24:12 +0000 Exec[wait- for-settle] (provider= posix) (debug): Executing check '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1' main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /unless (debug): Error: error running crm_mon, is pacemaker running? main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /unless (debug): crm_mon: Error: cluster is not available on this node main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /returns (debug): Exec try 1/360 for-settle] (provider= posix) (debug): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1' main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /returns (debug): Sleeping for 10.0 seconds between tries main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /returns (debug): Exec try 2/360 for-settle] (provider= posix) (debug): Executing '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1' main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /returns (debug): Sleeping for 10.0 seconds between tries main]/Pacemaker ::Corosync/ Exec[wait- for-settle] /returns (err): change from 'notrun' to ['0'] failed: exit
2022-05-15 13:24:12 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:12 +0000 /Stage[
2022-05-15 13:24:12 +0000 /Stage[
2022-05-15 13:24:12 +0000 /Stage[
2022-05-15 13:24:12 +0000 Exec[wait-
2022-05-15 13:24:12 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:13 +0000 /Stage[
2022-05-15 13:24:23 +0000 /Stage[
2022-05-15 13:24:23 +0000 Exec[wait-
2022-05-15 13:24:23 +0000 Puppet (debug): Executing: '/sbin/pcs status | grep -q 'partition with quorum' > /dev/null 2>&1'
2022-05-15 13:24:23 +0000 /Stage[
2022-05-15 13:24:28 +0000 /Stage[
I assume the pacemaker service is failing to start? systemctl status pcsd
Was this node used for a previous deployment and not cleaned? ie, could this node already be configured from a previous deployment with a hacluster username and password?
Let's check what is preventing pcsd from starting on the Controller:
systemctl status pcsd
journalctl -u pcsd -e