Comment 0 for bug 1875238

Revision history for this message
Reza (reza-b2008) wrote :

Description
===========
After fresh installation of TripleO Stain/Stable on 5 nodes (3 HA Controllers and 2 Computes),
rabbitmq bundle and some other resources Failed in Pacemaker.

Steps to reproduce
==================
1- installing undercloud
2- installing overcloud with this command:

openstack overcloud deploy \
--control-flavor control \
--compute-flavor compute \
--templates ~/openstack-tripleo-heat-templates \
-r /home/stack/roles_data.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e environment.yaml \
-e ~/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \
-e ~/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e ~/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e ~/openstack-tripleo-heat-templates/environments/network-environment.yaml \
--timeout 360 \
--ntp-server pool.ntp.org

I got same result without network isolation and custom network environment, and completely default settings.

Expected result
===============
Fresh healthy HA OpenStack.

Actual result
=============
pcs status output is as follows:

Full list of resources:

 Docker container set: rabbitmq-bundle [192.168.24.1:8787/tripleostein/centos-binary-rabbitmq:pcmklatest]
   rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): FAILED overcloud-controller-0 (Monitoring)
   rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-1
   rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-2
 Docker container set: galera-bundle [192.168.24.1:8787/tripleostein/centos-binary-mariadb:pcmklatest]
   galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0
   galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1
   galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-controller-2
 Docker container set: redis-bundle [192.168.24.1:8787/tripleostein/centos-binary-redis:pcmklatest]
   redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0
   redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1
   redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2
 ip-192.168.24.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
 ip-X.X.X.X (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
 ip-172.16.2.175 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
 ip-172.16.2.41 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
 ip-172.16.1.166 (ocf::heartbeat:IPaddr2): Stopped
 ip-172.16.3.10 (ocf::heartbeat:IPaddr2): Stopped
 Docker container set: haproxy-bundle [192.168.24.1:8787/tripleostein/centos-binary-haproxy:pcmklatest]
   haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-1
   haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-2
   haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-0
 Docker container set: ovn-dbs-bundle [192.168.24.1:8787/tripleostein/centos-binary-ovn-northd:pcmklatest]
   ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master overcloud-controller-1
   ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave overcloud-controller-2
   ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave overcloud-controller-0
 Docker container: openstack-cinder-volume [192.168.24.1:8787/tripleostein/centos-binary-cinder-volume:pcmklatest]
   openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0

Failed Resource Actions:
* ip-172.16.1.166_start_0 on overcloud-controller-0 'unknown error' (1): call=89, status=complete, exitreason='[findif] failed',
    last-rc-change='Sun Apr 26 17:19:01 2020', queued=0ms, exec=111ms
* ip-172.16.3.10_start_0 on overcloud-controller-0 'unknown error' (1): call=95, status=complete, exitreason='[findif] failed',
    last-rc-change='Sun Apr 26 17:19:42 2020', queued=0ms, exec=103ms
* ip-172.16.1.166_start_0 on overcloud-controller-1 'unknown error' (1): call=87, status=complete, exitreason='[findif] failed',
    last-rc-change='Sun Apr 26 17:19:00 2020', queued=1ms, exec=147ms
* ip-172.16.3.10_start_0 on overcloud-controller-1 'unknown error' (1): call=93, status=complete, exitreason='[findif] failed',
    last-rc-change='Sun Apr 26 17:19:41 2020', queued=0ms, exec=99ms
* ip-172.16.1.166_start_0 on overcloud-controller-2 'unknown error' (1): call=87, status=complete, exitreason='[findif] failed',
    last-rc-change='Sun Apr 26 17:19:00 2020', queued=0ms, exec=105ms
* ip-172.16.3.10_start_0 on overcloud-controller-2 'unknown error' (1): call=93, status=complete, exitreason='[findif] failed',
    last-rc-change='Sun Apr 26 17:19:42 2020', queued=0ms, exec=93ms
* rabbitmq_start_0 on rabbitmq-bundle-1 'unknown error' (1): call=2121, status=Timed Out, exitreason='',
    last-rc-change='Sun Apr 26 18:18:42 2020', queued=0ms, exec=200049ms
* rabbitmq_start_0 on rabbitmq-bundle-2 'unknown error' (1): call=1979, status=Timed Out, exitreason='',
    last-rc-change='Sun Apr 26 18:04:59 2020', queued=0ms, exec=200031ms
* rabbitmq_monitor_10000 on rabbitmq-bundle-0 'unknown error' (1): call=2298, status=Timed Out, exitreason='',
    last-rc-change='Sun Apr 26 18:36:59 2020', queued=0ms, exec=40036ms
* ovndb_servers_monitor_30000 on ovn-dbs-bundle-2 'not running' (7): call=23, status=complete, exitreason='',
    last-rc-change='Sun Apr 26 17:33:03 2020', queued=1ms, exec=1806ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

It seems cluster is completely unhealthy. even running these commands don't help:
pcs resource restart rabbitmq-bundle
pcs resource cleanup rabbitmq-bundle

or restarting the whole cluster or all nodes with deleting rmenia directory.
All requests on overcloud are extremely slow, Horizon takes one minute for each refresh.
adding additional services like Octavia cause failed overcloud installation due to 504 Gateway timeout.

Environment
===========
TripleO OpenStack Stable/Stein

Logs & Configs
==============

I can provide any required log or config.