Bug #1479815 “Make OCF scripts tolerate rabbitmqctl timeouts to ...” : Series 7.0.x : Bugs : Fuel for OpenStack

Dmitry Mescheryakov (dmitrymex) on 2015-07-30

Changed in fuel:
status:	New → Confirmed
importance:	Undecided → High
assignee:	nobody → Dmitry Mescheryakov (dmitrymex)
milestone:	none → 7.0

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2015-08-03:

#1

Folks

There is a log of pacemaker showing 25.x LA - you cannot expect for cluster to handle such load.

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-08-03:

#2

Have reproduced the issue with rabbitmq-3.5.4:
<30>Aug 2 16:02:56 node-1 lrmd: INFO: p_rabbitmq-server: su_rabbit_cmd(): the invoked command exited 2: /usr/sbin/rabbitmqctl -t 30 -q list_queues memory messages consumer_utilisation
<30>Aug 2 16:02:56 node-1 lrmd: INFO: p_rabbitmq-server: get_monitor(): get_monitor function ready to return 0
Aug 02 16:02:56 [8897] node-1.domain.tld pacemaker_remoted: notice: operation_finished: p_rabbitmq-server_monitor_103000:18822:stderr [ Error: operation list_queues on node 'rabbit@node-1' timed out ]
Aug 02 16:02:57 [8900] node-1.domain.tld crmd: info: throttle_handle_load: Moderate CPU load detected: 17.080000

http://paste.openstack.org/show/406748/

Cluster configuration:
Baremetal,Ubuntu,IBP,HA, Neutron-vxlan,Ceph-all,Nova-debug,Nova-quotas,Sahara,Murano,7.0-custom-764
Controllers:3 Computes+Ceph:47

api: '1.0'
astute_sha: 34e0493afa22999c4a07d3198ceb945116ab7932
auth_required: true
build_id: 2015-07-27_09-24-22
build_number: '98'
feature_groups:
- mirantis
fuel-agent_sha: 2a65f11c10b0aeb5184247635a19740fc3edde21
fuel-library_sha: 39c3162ee2e2ff6e3af82f703998f95ff4cc2b7a
fuel-ostf_sha: 94a483c8aba639be3b96616c1396ef290dcc00cd
fuelmain_sha: 921918a3bd3d278431f35ad917989e46b0c24100
nailgun_sha: d5c19f6afc66b5efe3c61ecb49025c1002ccbdc6
openstack_version: 2015.1.0-7.0
production: docker
python-fuelclient_sha: 58c411d87a7eaf0fd6892eae2b5cb1eff4190c98
release: '7.0'

Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-08-03_10-03-53.tar.xz

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-08-13:

#3

I have found that rabbitmq keeps all CPU http://paste.openstack.org/show/412864/
at the time when rabbitmq isn't reachable http://paste.openstack.org/show/412867/
I monitored used sockets at the time:
Thu Aug 13 11:34:03 UTC 2015
rabbitmq with pid=33083 opened 583 files from 102300
rabbitmq with pid=33083 used 581 sockets from 92068

socket limits it isn't cause of the issue

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-08-13:

#4

rabbitmq_management_console.tar Edit (155.0 KiB, application/x-tar)

can easily reproduce the issue with NeutronSecGroupPlugin.create_and_delete_secgroups rally scenario. Test was from 13 15:05:19 to 13 15:58:40 UTC

During the test rabbitmq was stopped on node-196: http://paste.openstack.org/show/412901/
dstat at the time on node-196: http://paste.openstack.org/show/412902/
from pacemaker on node-196: http://paste.openstack.org/show/412904/
Screenshot of management console at the time of stopping rabbitmq on node-196 (from node-172) is attached (rabbit_management.png).
node-197 was stopped by pacemaker since minute after stopping node-196.
node-196 and node-197 wasn't recovered. node-172 successfully worked all test. Test was passed successfully. Result from management console is attached (rabbit_management_after_test.png)

Diagnostic Snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2015-08-13_15-56-36.tar.xz

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-20:

#5

What is the impact, please elaborate. Was all of the rabbit nodes down or not? Was a control plane affected by complete downtime or not? Was data plane functioning or not? Setting to the medium, unless details provided

tags:	added: rabbitmq
Changed in fuel:
status:	Confirmed → Incomplete
importance:	High → Medium

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-08-24:

#6

The RabbitMQ rebooting on a single machine causes several minutes outage on the OpenStack side, because right now oslo.messaging can not seamlessly failover to live controllers. As discussed with Bogdan we will make OCF scripts tolerate rabbitmqctl timeouts to a certain degree by introducing fail count. This will help us avoid non-needed RabbitMQ reboots and as a result OpenStack outages.

Changed in fuel:
status:	Incomplete → In Progress
importance:	Medium → High

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-08-26:

#7

Related info https://bugs.launchpad.net/fuel/+bug/1487517/comments/4

summary:

+ Make OCF scripts tolerate rabbitmqctl timeouts to a certain degree. UX:
rabbitmq node was stopped by pacemaker

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-27: Fix proposed to fuel-library (master)

#8

Fix proposed to branch: master
Review: https://review.openstack.org/217738

OpenStack Infra (hudson-openstack) on 2015-09-03

Changed in fuel:
assignee:	Dmitry Mescheryakov (dmitrymex) → Sergii Golovatiuk (sgolovatiuk)

Sergii Golovatiuk (sgolovatiuk) on 2015-09-03

Changed in fuel:
assignee:	Sergii Golovatiuk (sgolovatiuk) → Dmitry Mescheryakov (dmitrymex)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-04: Fix merged to fuel-library (master)

#9

Reviewed: https://review.openstack.org/217738
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=2707a5ebbff7012a94de77b60fd594f5bcb29e05
Submitter: Jenkins
Branch: master

commit 2707a5ebbff7012a94de77b60fd594f5bcb29e05
Author: Dmitry Mescheryakov <email address hidden>
Date: Tue Aug 25 17:38:44 2015 +0300

Make RabbitMQ OCF script tolerate rabbitmqctl timeouts

    The change makes OCF script ignore small number of timeouts of rabbitmqctl
    for 'heavy' operations: list_channels, get_alarms and list_queues.
    Number of tolerated timeouts in a row is configured through a new variable
    'max_rabbitmqctl_timeouts'. By default it is set to 1, i.e. rabbitmqctl
    timeouts are not tolerated at all.

Bug #1487517 is fixed by extracting declaration of local variables
'rc_alarms' and 'rc_queues' from assignment operations.

Text for Operations Guide:

    If on node where RabbitMQ is deployed
    other processes consume significant part of CPU, RabbitMQ starts
    responding slow to queries by 'rabbitmqctl' utility. The utility is
    used by RabbitMQ's OCF script to monitor state of the RabbitMQ.
    When utility fails to return in pre-defined timeout, OCF script
    considers RabbitMQ to be down and restarts it, which might lead to
    a limited (several minutes) OpenStack downtime. Such restarts
    are undesirable as they cause downtime without benefit. To
    mitigate the issue, the OCF script might be told to tolerate
    certain amount of rabbitmqctl timeouts in a row using the following
    command:
      crm_resource --resource p_rabbitmq-server --set-parameter \
          max_rabbitmqctl_timeouts --parameter-value N

    Here N should be replaced with the number of timeouts. For instance,
    if it is set to 3, the OCF script will tolerate two rabbitmqctl
    timeouts in a row, but fail if the third one occurs.

    By default the parameter is set to 1, i.e. rabbitmqctl timeout is not
    tolerated at all. The downside of increasing the parameter is that
    if a real issue occurs which causes rabbitmqctl timeout, OCF script
    will detect that only after N monitor runs and so the restart, which
    might fix the issue, will be delayed.

    To understand that RabbitMQ's restart was caused by rabbitmqctl timeout
    you should examine lrmd.log of the corresponding controller on Fuel
    master node in /var/log/docker-logs/remote/ directory. Here lines like
    "the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..."

    indicate rabbitmqctl timeout. The next line will explain if it
    caused restart or not. For example:
    "rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."

DocImpact: user-guide, operations-guide

    Closes-Bug: #1479815
    Closes-Bug: #1487517
    Change-Id: I9dec06fc08dbeefbc67249b9e9633c8aab5e09ca

Reviewed:  https://review.openstack.org/217738
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=2707a5ebbff7012a94de77b60fd594f5bcb29e05
Submitter: Jenkins
Branch:    master

commit 2707a5ebbff7012a94de77b60fd594f5bcb29e05
Author: Dmitry Mescheryakov <dmescheryakov@mirantis.com>
Date:   Tue Aug 25 17:38:44 2015 +0300

Make RabbitMQ OCF script tolerate rabbitmqctl timeouts
    
    The change makes OCF script ignore small number of timeouts of rabbitmqctl
    for 'heavy' operations: list_channels, get_alarms and list_queues.
    Number of tolerated timeouts in a row is configured through a new variable
    'max_rabbitmqctl_timeouts'. By default it is set to 1, i.e. rabbitmqctl
    timeouts are not tolerated at all.
    
    Bug #1487517 is fixed by extracting declaration of local variables
    'rc_alarms' and 'rc_queues' from assignment operations.
    
    
    Text for Operations Guide:
    
    If on node where RabbitMQ is deployed
    other processes consume significant part of CPU, RabbitMQ starts
    responding slow to queries by 'rabbitmqctl' utility. The utility is
    used by RabbitMQ's OCF script to monitor state of the RabbitMQ.
    When utility fails to return in pre-defined timeout, OCF script
    considers RabbitMQ to be down and restarts it, which might lead to
    a limited (several minutes) OpenStack downtime. Such restarts
    are undesirable as they cause downtime without benefit. To
    mitigate the issue, the OCF script might be told to tolerate
    certain amount of rabbitmqctl timeouts in a row using the following
    command:
      crm_resource --resource p_rabbitmq-server --set-parameter \
          max_rabbitmqctl_timeouts --parameter-value N
    
    Here N should be replaced with the number of timeouts. For instance,
    if it is set to 3, the OCF script will tolerate two rabbitmqctl
    timeouts in a row, but fail if the third one occurs.
    
    By default the parameter is set to 1, i.e. rabbitmqctl timeout is not
    tolerated at all. The downside of increasing the parameter is that
    if a real issue occurs which causes rabbitmqctl timeout, OCF script
    will detect that only after N monitor runs and so the restart, which
    might fix the issue, will be delayed.
    
    To understand that RabbitMQ's restart was caused by rabbitmqctl timeout
    you should examine lrmd.log of the corresponding controller on Fuel
    master node in /var/log/docker-logs/remote/ directory. Here lines like
    "the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..."
    
    indicate rabbitmqctl timeout. The next line will explain if it
    caused restart or not. For example:
    "rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."
    
    DocImpact: user-guide, operations-guide
    
    Closes-Bug: #1479815
    Closes-Bug: #1487517
    Change-Id: I9dec06fc08dbeefbc67249b9e9633c8aab5e09ca

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-09: Related fix proposed to fuel-docs (master)

#10

Related fix proposed to branch: master
Review: https://review.openstack.org/221680

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-11: Fix proposed to fuel-library (stable/6.1)

#11

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/222614

Alexander Adamov (aadamov) on 2015-09-14

tags:

added: release-notes-done

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-15: Related fix merged to fuel-docs (master)

#12

Reviewed: https://review.openstack.org/221680
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=ad7999d3f8288d56c2167fc99c5371acb030fdb3
Submitter: Jenkins
Branch: master

commit ad7999d3f8288d56c2167fc99c5371acb030fdb3
Author: Alexander Adamov <email address hidden>
Date: Wed Sep 9 13:01:39 2015 +0300

[OpsGuide]Make RabbitMQ OCF script tolerate timeouts

Adds resolved issue LP1479815 to Operations Guide.

Change-Id: I3ebd6d510b1d300a895d5d7905aabeaf4c96c1e4
Related-Bug: #1479815

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-09-17:

#13

change status to confirmed for 7.0 due the patch has been reverted

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

Polina Petriuk (ppetriuk) wrote on 2015-09-18:

#14

Leontiy, what was the reason for reverting the patch for MOS7.0?

tags:

added: customer-found

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-09-21:

#15

One more repro on Scale lab, env 10. lmrd.log from node-100:

2015-09-19T10:28:31.956130+00:00 info: INFO: p_rabbitmq-server: su_rabbit_cmd(): the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels 2>&1 > /dev/null
2015-09-19T10:28:31.989742+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is failed.

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-09-21:

#16

Polina, the patch itself if not reverted. But due to the bug https://bugs.launchpad.net/fuel/+bug/1496386 enabling fix for the current issue breaks RabbitMQ cluster.

Right now we are working on fix for #1496386. If we do not make it for 7.0 release, we will revert the change for documentation suggesting to enable the fix. Here is the revert - https://review.openstack.org/#/c/224556/ . Leontiy was referring to it.

Revision history for this message

Vitaly Sedelnik (vsedelnik) wrote on 2015-09-21:

#17

This bug will be fixed by merging https://review.openstack.org/#/c/225120 which fixes Critical 7.0 issue - so leaving this issue targeted to 7.0 milestone.

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-09-22:

#18

Fix for the issue, which blocks enabling of the fix for the current bug, is merged. Hence we are not going to revert our documentation and can consider current bug to be closed.

Changed in fuel:
status:	Confirmed → Fix Committed

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-09-22:

#19

I've applied https://review.openstack.org/#/c/225120 and performed
crm_resource --resource p_rabbitmq-server --set-parameter max_rabbitmqctl_timeouts --parameter-value 5
then I've restarted rabbitmq by:
pcs resource disable master_p_rabbitmq-server
wait stopping rabbitmq
pcs resource enable master_p_rabbitmq-server
During boot_and_delete_server_with_neutron_secgroups rally scenario Rabbitmq was broken even in this case due "list_queues timeout"

Leontii Istomin (listomin) on 2015-09-22

Changed in fuel:
status:	Fix Committed → Confirmed

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-09-22:

#20

As Leontiy pointed out, the fix does not fully help in Scale lab's case. After discussing the issue with Leontiy, we decided to continue tracking Scale lab problem in this bug.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-10-06: Fix merged to fuel-library (stable/6.1)

#21

Reviewed: https://review.openstack.org/222614
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=a304fac9bf1ee4e98cfc355e3058b9664c2768c2
Submitter: Jenkins
Branch: stable/6.1

commit a304fac9bf1ee4e98cfc355e3058b9664c2768c2
Author: Dmitry Mescheryakov <email address hidden>
Date: Tue Aug 25 17:38:44 2015 +0300

Make RabbitMQ OCF script tolerate rabbitmqctl timeouts

    The change makes OCF script ignore small number of timeouts of rabbitmqctl
    for 'heavy' operations: list_channels, get_alarms and list_queues.
    Number of tolerated timeouts in a row is configured through a new variable
    'max_rabbitmqctl_timeouts'. By default it is set to 1, i.e. rabbitmqctl
    timeouts are not tolerated at all.

Bug #1487517 is fixed by extracting declaration of local variables
'rc_alarms' and 'rc_queues' from assignment operations.

Text for Operations Guide:

    If on node where RabbitMQ is deployed
    other processes consume significant part of CPU, RabbitMQ starts
    responding slow to queries by 'rabbitmqctl' utility. The utility is
    used by RabbitMQ's OCF script to monitor state of the RabbitMQ.
    When utility fails to return in pre-defined timeout, OCF script
    considers RabbitMQ to be down and restarts it, which might lead to
    a limited (several minutes) OpenStack downtime. Such restarts
    are undesirable as they cause downtime without benefit. To
    mitigate the issue, the OCF script might be told to tolerate
    certain amount of rabbitmqctl timeouts in a row using the following
    command:
      crm_resource --resource p_rabbitmq-server --set-parameter \
          max_rabbitmqctl_timeouts --parameter-value N

    Here N should be replaced with the number of timeouts. For instance,
    if it is set to 3, the OCF script will tolerate two rabbitmqctl
    timeouts in a row, but fail if the third one occurs.

    By default the parameter is set to 1, i.e. rabbitmqctl timeout is not
    tolerated at all. The downside of increasing the parameter is that
    if a real issue occurs which causes rabbitmqctl timeout, OCF script
    will detect that only after N monitor runs and so the restart, which
    might fix the issue, will be delayed.

    To understand that RabbitMQ's restart was caused by rabbitmqctl timeout
    you should examine lrmd.log of the corresponding controller on Fuel
    master node in /var/log/docker-logs/remote/ directory. Here lines like
    "the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..."

    indicate rabbitmqctl timeout. The next line will explain if it
    caused restart or not. For example:
    "rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."

DocImpact: user-guide, operations-guide

    Closes-Bug: #1479815
    Closes-Bug: #1487517
    Change-Id: I9dec06fc08dbeefbc67249b9e9633c8aab5e09ca
    (cherry picked from commit 2707a5ebbff7012a94de77b60fd594f5bcb29e05)

Reviewed:  https://review.openstack.org/222614
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=a304fac9bf1ee4e98cfc355e3058b9664c2768c2
Submitter: Jenkins
Branch:    stable/6.1

commit a304fac9bf1ee4e98cfc355e3058b9664c2768c2
Author: Dmitry Mescheryakov <dmescheryakov@mirantis.com>
Date:   Tue Aug 25 17:38:44 2015 +0300

Make RabbitMQ OCF script tolerate rabbitmqctl timeouts
    
    The change makes OCF script ignore small number of timeouts of rabbitmqctl
    for 'heavy' operations: list_channels, get_alarms and list_queues.
    Number of tolerated timeouts in a row is configured through a new variable
    'max_rabbitmqctl_timeouts'. By default it is set to 1, i.e. rabbitmqctl
    timeouts are not tolerated at all.
    
    Bug #1487517 is fixed by extracting declaration of local variables
    'rc_alarms' and 'rc_queues' from assignment operations.
    
    Text for Operations Guide:
    
    If on node where RabbitMQ is deployed
    other processes consume significant part of CPU, RabbitMQ starts
    responding slow to queries by 'rabbitmqctl' utility. The utility is
    used by RabbitMQ's OCF script to monitor state of the RabbitMQ.
    When utility fails to return in pre-defined timeout, OCF script
    considers RabbitMQ to be down and restarts it, which might lead to
    a limited (several minutes) OpenStack downtime. Such restarts
    are undesirable as they cause downtime without benefit. To
    mitigate the issue, the OCF script might be told to tolerate
    certain amount of rabbitmqctl timeouts in a row using the following
    command:
      crm_resource --resource p_rabbitmq-server --set-parameter \
          max_rabbitmqctl_timeouts --parameter-value N
    
    Here N should be replaced with the number of timeouts. For instance,
    if it is set to 3, the OCF script will tolerate two rabbitmqctl
    timeouts in a row, but fail if the third one occurs.
    
    By default the parameter is set to 1, i.e. rabbitmqctl timeout is not
    tolerated at all. The downside of increasing the parameter is that
    if a real issue occurs which causes rabbitmqctl timeout, OCF script
    will detect that only after N monitor runs and so the restart, which
    might fix the issue, will be delayed.
    
    To understand that RabbitMQ's restart was caused by rabbitmqctl timeout
    you should examine lrmd.log of the corresponding controller on Fuel
    master node in /var/log/docker-logs/remote/ directory. Here lines like
    "the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..."
    
    indicate rabbitmqctl timeout. The next line will explain if it
    caused restart or not. For example:
    "rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now."
    
    DocImpact: user-guide, operations-guide
    
    Closes-Bug: #1479815
    Closes-Bug: #1487517
    Change-Id: I9dec06fc08dbeefbc67249b9e9633c8aab5e09ca
    (cherry picked from commit 2707a5ebbff7012a94de77b60fd594f5bcb29e05)

Dmitry Pyzhov (dpyzhov) on 2015-10-12

no longer affects:

fuel/8.0.x

Revision history for this message

Vitaly Sedelnik (vsedelnik) wrote on 2015-10-16:

#22

Moved from 7,0-mu-1 to 7.0-updates as root cause is still unknown and we don't expect the fix to be available soon

Polina Petriuk (ppetriuk) on 2015-10-20

tags:

added: support

Revision history for this message

Polina Petriuk (ppetriuk) wrote on 2015-10-21:

#23

This OCF script will have an issue described in https://bugs.launchpad.net/mos/+bug/1503331.

Dmitry Pyzhov (dpyzhov) on 2015-10-22

tags:

added: area-mos

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2015-12-22:

#24

We agreed with Mikhail Chernik that he will investigate if 200 nodes env is still affected by that issue

Changed in fuel:
assignee:	Dmitry Mescheryakov (dmitrymex) → Mikhail Chernik (mchernik)

Fuel Devops McRobotson (fuel-devops-robot) on 2015-12-30

Changed in fuel:
milestone:	8.0 → 9.0
status:	Confirmed → New

Artem Roma (aroma-x) on 2016-01-04

Changed in fuel:
status:	New → Confirmed

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-02-03:

#25

Moving to Incomplete until we hear from Mikhail

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Mikhail Chernik (mchernik) wrote on 2016-02-03:

#26

Run NeutronSecGroupPlugin.create_and_delete_secgroups Rally scenario on MOS 8.0, ISO 482

Cluster configuration: Neutron+VLAN with DVR. 200 computes, 3 controllers, Cinder.

No failed connections to RabbitMQ was detected during test run.

Revision history for this message

Mikhail Chernik (mchernik) wrote on 2016-02-03:

#27

Update: scenario mentioned in previous comment was executed in 5 thread. When running the same scenario in 20 threads, RabbitMQ begins to restart, however test completed without errors and at least one RMQ instance was online.

Revision history for this message

Roman Podoliaka (rpodolyaka) wrote on 2016-02-04:

#28

Per discussion with Dmitry: he's going to continue the investigation of this problem on the Mikhail's environment. Still, this must not be a blocker for us, thus, tentatively moving this to 8.0-updates.

Changed in fuel:
assignee:	Mikhail Chernik (mchernik) → Dmitry Mescheryakov (dmitrymex)
status:	Incomplete → Confirmed
tags:	added: move-to-mu

Revision history for this message

Bug Checker Bot (esikachev-l) wrote on 2016-03-21: Autochecker

#29

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags:

added: need-info

Fuel Devops McRobotson (fuel-devops-robot) on 2016-04-19

Changed in fuel:
milestone:	9.0 → 10.0

Revision history for this message

Roman Rufanov (rrufanov) wrote on 2016-05-19:

#30

Please back-port to 6.0 and it will be delivered to customers as patch or instruction with steps.

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2016-05-23:

#31

Roman, backport to 6.x could be found in that CR - https://review.openstack.org/#/c/316053 (it is a cumulative set of changes between 6.x and 9.0)

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2016-05-23:

#32

Guys, after further thinking I am closing this bug. The scope of this issue was to cover intermittent 'rabbitmqctl list_channels' timeouts due to high load. We did it with the change https://review.openstack.org/217738

What people observe after that are RabbitMQ bugs, which cause list_channels to timeout until RabbitMQ cluster is restarted. That new issue will be tracked in bug https://bugs.launchpad.net/mos/+bug/1566816

Changed in fuel:
status:	Confirmed → Invalid

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-11-16:

#33

We no longer support MOS5.1, MOS6.0, MOS6.1
We deliver only Critical/Security fixes to MOS7.0, MOS8.0.
We deliver only High/Critical/Security fixes to MOS9.2.

Fuel for OpenStack

Make OCF scripts tolerate rabbitmqctl timeouts to a certain degree. UX: rabbitmq node was stopped by pacemaker

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Invalid	High	Dmitry Mescheryakov	Fuel for OpenStack 10.0
6.0.x	Won't Fix	High	MOS Maintenance	Fuel for OpenStack 6.0-updates
6.1.x	Fix Released	High	Michal Rostecki	Fuel for OpenStack 6.1-mu-3
7.0.x	Fix Committed	High	Dmitry Mescheryakov	Fuel for OpenStack 7.0-updates
8.0.x	Invalid	High	Dmitry Mescheryakov	Fuel for OpenStack 8.0-updates
Future	Invalid	High	Mikhail Chernik	Fuel for OpenStack next
Mitaka	Invalid	High	Dmitry Mescheryakov	Fuel for OpenStack 9.0