tripleo

FFU: deploy_steps_playbook.yaml playbook fails because rabbitmq_init_bundle container is unable to successfully run Executing: 'rabbitmqctl status | grep -F "{rabbit,"'

Bug #1765552 reported by Marius Cornea on 2018-04-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Invalid	High	Unassigned	tripleo stein-3

Bug Description

Description of problem:
FFU: deploy_steps_playbook.yaml playbook fails because rabbitmq_init_bundle container is unable to successfully run Executing: 'rabbitmqctl status | grep -F "{rabbit,"'

[root@controller-0 ~]# docker logs --tail 10 rabbitmq_init_bundle
Debug: Executing: 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Sleeping for 10 seconds between tries
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Exec try 47/180
Debug: Exec[rabbitmq-ready](provider=posix): Executing 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: Executing: 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Sleeping for 10 seconds between tries
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Exec try 48/180
Debug: Exec[rabbitmq-ready](provider=posix): Executing 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: Executing: 'rabbitmqctl status | grep -F "{rabbit,"'
Debug: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: Sleeping for 10 seconds between tries

This step eventually times out. When trying to run the rabbitmqctl status command manually I get:

[root@controller-0 ~]# docker exec -it rabbitmq_init_bundle rabbitmqctl status
Error: Failed to initialize erlang distribution: {{shutdown,
                                                   {failed_to_start_child,
                                                    net_kernel,
                                                    {'EXIT',nodistribution}}},
                                                  {child,undefined,
                                                   net_sup_dynamic,
                                                   {erl_distribution,
                                                    start_link,
                                                    [['rabbitmq-cli-85',
                                                      shortnames]]},
                                                   permanent,1000,supervisor,
                                                   [erl_distribution]}}.

Version-Release number of selected component (if applicable):
rhosp13/openstack-rabbitmq:2018-03-02.2
pacemaker-1.1.18-11.el7.x86_64
resource-agents-3.9.5-124.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Deploy Newton with 1 controller + 1 compute
2. Run through the FFU steps
3. Run deploy_steps_playbook.yaml

Actual results:
Playbook eventually times out because rabbitmq_init_bundle cannot exit successfully.

Expected results:
deploy_steps_playbook.yaml finishes successfully.

Additional info:

Tags:

Revision history for this message

Marius Cornea (mcornea) wrote on 2018-04-19:

Acording to the discussion in https://bugzilla.redhat.com/show_bug.cgi?id=1551397#c6:

"Ok so reason is that when the init_bundle runs the cluster is in maintenance mode:
4 nodes configured
16 resources configured

*** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services

Online: [ controller-0 ]

Full list of resources:

ip-172.17.3.17 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
ip-172.17.4.11 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
ip-172.17.1.15 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
ip-10.0.0.110 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
ip-192.168.24.9 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-0 (unmanaged)
Docker container: rabbitmq-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-rabbitmq:pcmklatest] (unmanaged)
   rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Stopped (unmanaged)
Docker container: galera-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-mariadb:pcmklatest] (unmanaged)
   galera-bundle-0 (ocf::heartbeat:galera): Stopped (unmanaged)
Docker container: redis-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-redis:pcmklatest] (unmanaged)
   redis-bundle-0 (ocf::heartbeat:redis): Stopped (unmanaged)
Docker container: haproxy-bundle [rhos-qe-mirror-brq.usersys.redhat.com:5000/rhosp13/openstack-haproxy:pcmklatest] (unmanaged)
   haproxy-bundle-docker-0 (ocf::heartbeat:docker): Stopped (unmanaged)

So we're actually waiting for rabbitmq bundle to come up but it never will because cluster is in maintenance mode."

Alex Schultz (alex-schultz) on 2018-04-20

Changed in tripleo:
milestone:	rocky-1 → rocky-2

Emilien Macchi (emilienm) on 2018-06-05

Changed in tripleo:
milestone:	rocky-2 → rocky-3

Emilien Macchi (emilienm) on 2018-07-26

Changed in tripleo:
milestone:	rocky-3 → rocky-rc1

Alex Schultz (alex-schultz) on 2018-08-14

Changed in tripleo:
milestone:	rocky-rc1 → stein-1

Juan Antonio Osorio Robles (juan-osorio-robles) on 2018-10-30

Changed in tripleo:
milestone:	stein-1 → stein-2

Emilien Macchi (emilienm) on 2019-01-13

Changed in tripleo:
milestone:	stein-2 → stein-3

Revision history for this message

Juan Antonio Osorio Robles (juan-osorio-robles) wrote on 2019-02-26:

Is this still an issue?

Juan Antonio Osorio Robles (juan-osorio-robles) on 2019-02-26

Changed in tripleo:
status:	Triaged → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

redhat-bugs #1551397
[ASSIGNED] Edit

Bug watches keep track of this bug in other bug trackers.