FFU: deploy_steps_playbook.yaml playbook fails because rabbitmq_init_bundle container is unable to successfully run Executing: 'rabbitmqctl status | grep -F "{rabbit,"'

Bug #1765552 reported by Marius Cornea on 2018-04-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

Description of problem:
FFU: deploy_steps_playbook.yaml playbook fails because rabbitmq_init_bundle container is unable to successfully run Executing: 'rabbitmqctl status | grep -F "{rabbit,"'

[root@controller-0 ~]# docker logs --tail 10 rabbitmq_init_bundle
Debug: Executing: 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Sleeping for 10 seconds between tries
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Exec try 47/180
Debug: Exec[rabbitmq-ready](provider=posix): Executing 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: Executing: 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Sleeping for 10 seconds between tries
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Exec try 48/180
Debug: Exec[rabbitmq-ready](provider=posix): Executing 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: Executing: 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Sleeping for 10 seconds between tries

This step eventually times out. When trying to run the rabbitmqctl status command manually I get:

[root@controller-0 ~]# docker exec -it rabbitmq_init_bundle rabbitmqctl status
Error: Failed to initialize erlang distribution: {{shutdown,
                                                   {failed_to_start_child,
                                                    net_kernel,
                                                    {'EXIT',nodistribution}}},
                                                  {child,undefined,
                                                   net_sup_dynamic,
                                                   {erl_distribution,
                                                    start_link,
                                                    [['rabbitmq-cli-85',
                                                      shortnames]]},
                                                   permanent,1000,supervisor,
                                                   [erl_distribution]}}.

Version-Release number of selected component (if applicable):
rhosp13/openstack-rabbitmq:2018-03-02.2
pacemaker-1.1.18-11.el7.x86_64
resource-agents-3.9.5-124.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Deploy Newton with 1 controller + 1 compute
2. Run through the FFU steps
3. Run deploy_steps_playbook.yaml

Actual results:
Playbook eventually times out because rabbitmq_init_bundle cannot exit successfully.

Expected results:
deploy_steps_playbook.yaml finishes successfully.

Additional info:

Marius Cornea (mcornea) wrote :

Acording to the discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1551397#c6:

"Ok so reason is that when the init_bundle runs the cluster is in maintenance mode:
4 nodes configured
16 resources configured

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Online: [ controller-0 ]

Full list of resources:

 ip-172.17.3.17 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
 ip-172.17.4.11 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
 ip-172.17.1.15 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
 ip-10.0.0.110 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
 ip-192.168.24.9 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
 ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
 Docker container: rabbitmq-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-rabbitmq:pcmklatest] (unmanaged)
   rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Stopped (unmanaged)
 Docker container: galera-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-mariadb:pcmklatest] (unmanaged)
   galera-bundle-0 (ocf::heartbeat:galera): Stopped (unmanaged)
 Docker container: redis-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-redis:pcmklatest] (unmanaged)
   redis-bundle-0 (ocf::heartbeat:redis): Stopped (unmanaged)
 Docker container: haproxy-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-haproxy:pcmklatest] (unmanaged)
   haproxy-bundle-docker-0 (ocf::heartbeat:docker): Stopped (unmanaged)

So we're actually waiting for rabbitmq bundle to come up but it never will because cluster is in maintenance mode."

Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3

Is this still an issue?

Changed in tripleo:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.