Activity log for bug #1875238

Date Who What changed Old value New value Message
2020-04-26 20:02:00 Reza bug added bug
2020-04-26 20:19:30 Reza description Description =========== After fresh installation of TripleO Stain/Stable on 5 nodes (3 HA Controllers and 2 Computes), rabbitmq bundle and some other resources Failed in Pacemaker. Steps to reproduce ================== 1- installing undercloud 2- installing overcloud with this command: openstack overcloud deploy \ --control-flavor control \ --compute-flavor compute \ --templates ~/openstack-tripleo-heat-templates \ -r /home/stack/roles_data.yaml \ -e /home/stack/containers-prepare-parameter.yaml \ -e environment.yaml \ -e ~/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \ -e ~/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e ~/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/openstack-tripleo-heat-templates/environments/network-environment.yaml \ --timeout 360 \ --ntp-server pool.ntp.org I got same result without network isolation and custom network environment, and completely default settings. Expected result =============== Fresh healthy HA OpenStack. Actual result ============= pcs status output is as follows: Full list of resources: Docker container set: rabbitmq-bundle [192.168.24.1:8787/tripleostein/centos-binary-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): FAILED overcloud-controller-0 (Monitoring) rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-2 Docker container set: galera-bundle [192.168.24.1:8787/tripleostein/centos-binary-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1 galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-controller-2 Docker container set: redis-bundle [192.168.24.1:8787/tripleostein/centos-binary-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2 ip-192.168.24.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0 ip-X.X.X.X (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 ip-172.16.2.175 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2 ip-172.16.2.41 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1 ip-172.16.1.166 (ocf::heartbeat:IPaddr2): Stopped ip-172.16.3.10 (ocf::heartbeat:IPaddr2): Stopped Docker container set: haproxy-bundle [192.168.24.1:8787/tripleostein/centos-binary-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-1 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-2 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-0 Docker container set: ovn-dbs-bundle [192.168.24.1:8787/tripleostein/centos-binary-ovn-northd:pcmklatest] ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master overcloud-controller-1 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave overcloud-controller-2 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave overcloud-controller-0 Docker container: openstack-cinder-volume [192.168.24.1:8787/tripleostein/centos-binary-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0 Failed Resource Actions: * ip-172.16.1.166_start_0 on overcloud-controller-0 'unknown error' (1): call=89, status=complete, exitreason='[findif] failed', last-rc-change='Sun Apr 26 17:19:01 2020', queued=0ms, exec=111ms * ip-172.16.3.10_start_0 on overcloud-controller-0 'unknown error' (1): call=95, status=complete, exitreason='[findif] failed', last-rc-change='Sun Apr 26 17:19:42 2020', queued=0ms, exec=103ms * ip-172.16.1.166_start_0 on overcloud-controller-1 'unknown error' (1): call=87, status=complete, exitreason='[findif] failed', last-rc-change='Sun Apr 26 17:19:00 2020', queued=1ms, exec=147ms * ip-172.16.3.10_start_0 on overcloud-controller-1 'unknown error' (1): call=93, status=complete, exitreason='[findif] failed', last-rc-change='Sun Apr 26 17:19:41 2020', queued=0ms, exec=99ms * ip-172.16.1.166_start_0 on overcloud-controller-2 'unknown error' (1): call=87, status=complete, exitreason='[findif] failed', last-rc-change='Sun Apr 26 17:19:00 2020', queued=0ms, exec=105ms * ip-172.16.3.10_start_0 on overcloud-controller-2 'unknown error' (1): call=93, status=complete, exitreason='[findif] failed', last-rc-change='Sun Apr 26 17:19:42 2020', queued=0ms, exec=93ms * rabbitmq_start_0 on rabbitmq-bundle-1 'unknown error' (1): call=2121, status=Timed Out, exitreason='', last-rc-change='Sun Apr 26 18:18:42 2020', queued=0ms, exec=200049ms * rabbitmq_start_0 on rabbitmq-bundle-2 'unknown error' (1): call=1979, status=Timed Out, exitreason='', last-rc-change='Sun Apr 26 18:04:59 2020', queued=0ms, exec=200031ms * rabbitmq_monitor_10000 on rabbitmq-bundle-0 'unknown error' (1): call=2298, status=Timed Out, exitreason='', last-rc-change='Sun Apr 26 18:36:59 2020', queued=0ms, exec=40036ms * ovndb_servers_monitor_30000 on ovn-dbs-bundle-2 'not running' (7): call=23, status=complete, exitreason='', last-rc-change='Sun Apr 26 17:33:03 2020', queued=1ms, exec=1806ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled It seems cluster is completely unhealthy. even running these commands don't help: pcs resource restart rabbitmq-bundle pcs resource cleanup rabbitmq-bundle or restarting the whole cluster or all nodes with deleting rmenia directory. All requests on overcloud are extremely slow, Horizon takes one minute for each refresh. adding additional services like Octavia cause failed overcloud installation due to 504 Gateway timeout. Environment =========== TripleO OpenStack Stable/Stein Logs & Configs ============== I can provide any required log or config. Description =========== After fresh installation of TripleO Stein/Stable on 5 nodes (3 HA Controllers and 2 Computes), rabbitmq bundle and some other resources Failed in Pacemaker. Steps to reproduce ================== 1- installing undercloud 2- installing overcloud with this command: openstack overcloud deploy \ --control-flavor control \ --compute-flavor compute \ --templates ~/openstack-tripleo-heat-templates \ -r /home/stack/roles_data.yaml \ -e /home/stack/containers-prepare-parameter.yaml \ -e environment.yaml \ -e ~/openstack-tripleo-heat-templates/environments/services/neutron-ovn-dvr-ha.yaml \ -e ~/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e ~/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/openstack-tripleo-heat-templates/environments/network-environment.yaml \ --timeout 360 \ --ntp-server pool.ntp.org I got same result without network isolation and custom network environment, and completely default settings. Expected result =============== Fresh healthy HA OpenStack. Actual result ============= pcs status output is as follows: Full list of resources:  Docker container set: rabbitmq-bundle [192.168.24.1:8787/tripleostein/centos-binary-rabbitmq:pcmklatest]    rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): FAILED overcloud-controller-0 (Monitoring)    rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-1    rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped overcloud-controller-2  Docker container set: galera-bundle [192.168.24.1:8787/tripleostein/centos-binary-mariadb:pcmklatest]    galera-bundle-0 (ocf::heartbeat:galera): Master overcloud-controller-0    galera-bundle-1 (ocf::heartbeat:galera): Master overcloud-controller-1    galera-bundle-2 (ocf::heartbeat:galera): Master overcloud-controller-2  Docker container set: redis-bundle [192.168.24.1:8787/tripleostein/centos-binary-redis:pcmklatest]    redis-bundle-0 (ocf::heartbeat:redis): Master overcloud-controller-0    redis-bundle-1 (ocf::heartbeat:redis): Slave overcloud-controller-1    redis-bundle-2 (ocf::heartbeat:redis): Slave overcloud-controller-2  ip-192.168.24.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0  ip-X.X.X.X (ocf::heartbeat:IPaddr2): Started overcloud-controller-1  ip-172.16.2.175 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2  ip-172.16.2.41 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1  ip-172.16.1.166 (ocf::heartbeat:IPaddr2): Stopped  ip-172.16.3.10 (ocf::heartbeat:IPaddr2): Stopped  Docker container set: haproxy-bundle [192.168.24.1:8787/tripleostein/centos-binary-haproxy:pcmklatest]    haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-1    haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started overcloud-controller-2    haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started overcloud-controller-0  Docker container set: ovn-dbs-bundle [192.168.24.1:8787/tripleostein/centos-binary-ovn-northd:pcmklatest]    ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Master overcloud-controller-1    ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Slave overcloud-controller-2    ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave overcloud-controller-0  Docker container: openstack-cinder-volume [192.168.24.1:8787/tripleostein/centos-binary-cinder-volume:pcmklatest]    openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started overcloud-controller-0 Failed Resource Actions: * ip-172.16.1.166_start_0 on overcloud-controller-0 'unknown error' (1): call=89, status=complete, exitreason='[findif] failed',     last-rc-change='Sun Apr 26 17:19:01 2020', queued=0ms, exec=111ms * ip-172.16.3.10_start_0 on overcloud-controller-0 'unknown error' (1): call=95, status=complete, exitreason='[findif] failed',     last-rc-change='Sun Apr 26 17:19:42 2020', queued=0ms, exec=103ms * ip-172.16.1.166_start_0 on overcloud-controller-1 'unknown error' (1): call=87, status=complete, exitreason='[findif] failed',     last-rc-change='Sun Apr 26 17:19:00 2020', queued=1ms, exec=147ms * ip-172.16.3.10_start_0 on overcloud-controller-1 'unknown error' (1): call=93, status=complete, exitreason='[findif] failed',     last-rc-change='Sun Apr 26 17:19:41 2020', queued=0ms, exec=99ms * ip-172.16.1.166_start_0 on overcloud-controller-2 'unknown error' (1): call=87, status=complete, exitreason='[findif] failed',     last-rc-change='Sun Apr 26 17:19:00 2020', queued=0ms, exec=105ms * ip-172.16.3.10_start_0 on overcloud-controller-2 'unknown error' (1): call=93, status=complete, exitreason='[findif] failed',     last-rc-change='Sun Apr 26 17:19:42 2020', queued=0ms, exec=93ms * rabbitmq_start_0 on rabbitmq-bundle-1 'unknown error' (1): call=2121, status=Timed Out, exitreason='',     last-rc-change='Sun Apr 26 18:18:42 2020', queued=0ms, exec=200049ms * rabbitmq_start_0 on rabbitmq-bundle-2 'unknown error' (1): call=1979, status=Timed Out, exitreason='',     last-rc-change='Sun Apr 26 18:04:59 2020', queued=0ms, exec=200031ms * rabbitmq_monitor_10000 on rabbitmq-bundle-0 'unknown error' (1): call=2298, status=Timed Out, exitreason='',     last-rc-change='Sun Apr 26 18:36:59 2020', queued=0ms, exec=40036ms * ovndb_servers_monitor_30000 on ovn-dbs-bundle-2 'not running' (7): call=23, status=complete, exitreason='',     last-rc-change='Sun Apr 26 17:33:03 2020', queued=1ms, exec=1806ms Daemon Status:   corosync: active/enabled   pacemaker: active/enabled   pcsd: active/enabled It seems cluster is completely unhealthy. even running these commands don't help: pcs resource restart rabbitmq-bundle pcs resource cleanup rabbitmq-bundle or restarting the whole cluster or all nodes with deleting rmenia directory. All requests on overcloud are extremely slow, Horizon takes one minute for each refresh. adding additional services like Octavia cause failed overcloud installation due to 504 Gateway timeout. Environment =========== TripleO OpenStack Stable/Stein Logs & Configs ============== I can provide any required log or config.