Failed to execute hook 'dump_rabbitmq_definitions' Puppet run failed.

Bug #1529861 reported by Alexander Kurenyshev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)

Bug Description

Found on CI https://product-ci.infra.mirantis.net/job/8.0.system_test.ubuntu.huge_ha_neutron/90/console

Test Deploy cluster with separate roles in HA mode failed with Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. Method granular_deploy. Failed to execute hook 'dump_rabbitmq_definitions' Puppet run failed. Check puppet logs for details

Steps from system test:
Deploy cluster with separate roles in HA mode with Neutron VLAN, RadosGW
Scenario:
            1. Create cluster
            2. Add 3 nodes with controller
            3. Add 3 nodes with compute
            4. Add 1 node with mongo roles
            5. Add 2 nodes as ceph
            6. Verify network
            7. Deploy the cluster

Traceback (most recent call last):
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/helpers/decorators.py", line 81, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/tests/tests_strength/test_huge_environments.py", line 303, in huge_ha_neutron_vlan_ceph_ceilometer_rados
    interval=30)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/helpers/decorators.py", line 430, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/helpers/decorators.py", line 415, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/helpers/decorators.py", line 466, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/helpers/decorators.py", line 476, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/helpers/decorators.py", line 357, in wrapper
    return func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/models/fuel_web_client.py", line 710, in deploy_cluster_wait
    self.assert_task_success(task, interval=interval, timeout=timeout)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/__init__.py", line 57, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/8.0.system_test.ubuntu.huge_ha_neutron/fuelweb_test/models/fuel_web_client.py", line 319, in assert_task_success
    task["name"], task['status'], 'ready', _message(task)
AssertionError: Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. Method granular_deploy. Failed to execute hook 'dump_rabbitmq_definitions' Puppet run failed. Check puppet logs for details

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :
Changed in fuel:
importance: Undecided → High
status: New → Confirmed
tags: added: area-library
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The cluster got inconsistent_database and running_partitioned_network. It's possibly related to the dev env and I wasn't able to recover the cluster in any way. It seemed to break itself worse and worse. Need to re-reproduce because Egor can't keep this env up. Back to incomplete until we get another reproduction.

Changed in fuel:
status: Confirmed → Incomplete
tags: added: ha rabbitmq team-bugfix tricky
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes, according to the rabbitmqctl report outputs on the moment of the diag logs taken, the rabbit cluster ended up being partitioned and have yet to be recovered. Cannot investigate it unless all partitions healed.

Regarding the node-5, it failed the curl and the dump_rabbitmq_definitions task because it ended up in UNMANGED state since the

2015-12-29T02:05:44.770620+00:00 notice: notice: process_lrm_event: Operation p_rabbitmq-server_stop_0: unknown error (node=node-5.test.domain.local, call=201, rc=1, cib-update=201, confirmed=true)

This is expected behavior for pacemaker to bring a resource to unmanaged state if the action stop failed and there is no STONITH enabled. Nothing can be done then but heal and re-manage the resource manually.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

At least one thing looks incorrect, the stop operation shall not fail normally. It seems we have a flaw in the rabbit OCF logic

Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This issue should be addressed separately, https://bugs.launchpad.net/fuel/+bug/1529897. This one is invalid due to network partitions was not ended by the submission time

Changed in fuel:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.