Reset RabbitMQ's net_ticktime to default 60 seconds

Bug #1598154 reported by Dmitry Mescheryakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Maksim Malchuk
6.1.x
Won't Fix
High
MOS Maintenance
7.0.x
Fix Released
High
Anton Chevychalov
8.0.x
Fix Released
High
Anton Chevychalov
Mitaka
Fix Released
High
Maksim Malchuk

Bug Description

Right now we have net_ticktime set to 10 seconds in attempt to detect network partitions fast. The downside of that decision is that spurious network partitions/outages now trigger Erlang's partition handling logic.

The problem is, that partitioning handling logic is not bug-free and as a result, cluster might need to be fixed after short network hiccup. While if net_ticktime is 60 seconds, that would allow short outages to go unnoticed.

When selecting between 10 and 60 seconds net_ticktime, the 50 seconds difference in detection of partition could be neglected. At the same time, 10 seconds raises risk of downtime due to bugs in the logic. Hence default 60 seconds are preferable.

Tags: area-library
Changed in fuel:
status: New → Confirmed
tags: added: area-mos
tags: added: area-library
removed: area-mos
no longer affects: fuel/newton
Changed in fuel:
assignee: MOS Oslo (mos-oslo) → Maksim Malchuk (mmalchuk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/337127

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/337127
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7f7214abf3e174e9ef84dc86fae2f5b2ca4a237d
Submitter: Jenkins
Branch: master

commit 7f7214abf3e174e9ef84dc86fae2f5b2ca4a237d
Author: Maksim Malchuk <email address hidden>
Date: Mon Jul 4 13:35:26 2016 +0300

    Reset rabbitmq default for net_ticktime to 60 sec

    This parameter should be increased to eliminate the triggering
    partition handling logic on spurious network partitions/outages.

    Change-Id: Iaadf50bb499d27a5859a4335764a4d7ae49d4a5e
    Closes-Bug: #1598154
    Signed-off-by: Maksim Malchuk <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/337581

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/337581
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=e34bee546b9523f6802e4c590a955b6d1b9f346d
Submitter: Jenkins
Branch: stable/mitaka

commit e34bee546b9523f6802e4c590a955b6d1b9f346d
Author: Maksim Malchuk <email address hidden>
Date: Mon Jul 4 13:35:26 2016 +0300

    Reset rabbitmq default for net_ticktime to 60 sec

    This parameter should be increased to eliminate the triggering
    partition handling logic on spurious network partitions/outages.

    Change-Id: Iaadf50bb499d27a5859a4335764a4d7ae49d4a5e
    Closes-Bug: #1598154
    Signed-off-by: Maksim Malchuk <email address hidden>
    (cherry picked from commit 7f7214abf3e174e9ef84dc86fae2f5b2ca4a237d)

tags: added: on-verification
Revision history for this message
Andrey Lavrentyev (alavrentyev) wrote :

Verified on 9.1 snapshot #64

cl output: http://paste.openstack.org/show/543891/

[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-nailgun-9.0.0-1.mos8748.noarch
 fuel-utils-9.0.0-1.mos8495.noarch
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-library9.0-9.0.0-1.mos8495.noarch
 shotgun-9.0.0-1.mos90.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-mirror-9.0.0-1.mos142.noarch
 fuel-ostf-9.0.0-1.mos938.noarch
 rubygem-astute-9.0.0-1.mos753.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 python-packetary-9.0.0-1.mos142.noarch
 nailgun-mcagents-9.0.0-1.mos753.noarch
 fuel-migrate-9.0.0-1.mos8495.noarch
 fuel-misc-9.0.0-1.mos8495.noarch
 fuel-notify-9.0.0-1.mos8495.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-openstack-metadata-9.0.0-1.mos8748.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8748.noarch

MOS_CENTOS_OS_MIRROR_ID: os-2016-06-23-135731
MOS_CENTOS_PROPOSED_MIRROR_ID: proposed-2016-07-28-170322
MOS_CENTOS_UPDATES_MIRROR_ID: updates-2016-06-23-135916
MOS_CENTOS_SECURITY_MIRROR_ID: security-2016-06-23-140002
MOS_CENTOS_HOLDBACK_MIRROR_ID: holdback-2016-06-23-140047
MOS_CENTOS_HOTFIX_MIRROR_ID: hotfix-2016-07-18-162958
MOS_UBUNTU_MIRROR_ID: 9.0-2016-07-28-170322
UBUNTU_MIRROR_ID: ubuntu-2016-07-27-174626
CENTOS_MIRROR_ID: centos-7.2.1511-2016-05-31-083834

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/321799
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/356502

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/365642

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/365642
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=5d760a4536dae8745495a97a43f636d5afb75d5b
Submitter: Jenkins
Branch: stable/7.0

commit 5d760a4536dae8745495a97a43f636d5afb75d5b
Author: Anton Chevychalov <email address hidden>
Date: Wed Aug 17 17:08:27 2016 +0300

    Reset rabbitmq default for net_ticktime to 60 sec

    This parameter should be increased to eliminate the triggering partition
    handling logic on spurious network partitions/outages.

    Closes-Bug: #1598154
    Change-Id: I1759ee27ddeae9334d7d9653d87127a914b79a37
    (backported 7f7214abf3e174e9ef84dc86fae2f5b2ca4a237d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/356502
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=6eb815b615758b67f1443709d4ce6f4e4d3c06b6
Submitter: Jenkins
Branch: stable/8.0

commit 6eb815b615758b67f1443709d4ce6f4e4d3c06b6
Author: Anton Chevychalov <email address hidden>
Date: Wed Aug 17 17:08:27 2016 +0300

    Reset rabbitmq default for net_ticktime to 60 sec

    This parameter should be increased to eliminate the triggering
    partition handling logic on spurious network partitions/outages.

    Closes-Bug: #1598154
    Change-Id: I1759ee27ddeae9334d7d9653d87127a914b79a37
    (backported 7f7214abf3e174e9ef84dc86fae2f5b2ca4a237d)

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on MOS 7.0 + MU6 updates.

After updates net_ticktime was changed from 10 to 60 On controller nodes:
root@node-5:~# cat /etc/rabbitmq/rabbitmq.config | grep net_ticktime
    {net_ticktime, 60}

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0rc1

This issue was fixed in the openstack/fuel-library 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0

This issue was fixed in the openstack/fuel-library 10.0.0 release.

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on MOS 8.0 + MU4 updates.

After updates net_ticktime was changed from 10 to 60.
On controller nodes:
root@node-1:~# cat /etc/rabbitmq/rabbitmq.config | grep net_ticktime
    {net_ticktime, 60}

tags: removed: on-verification
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

We no longer support MOS5.1, MOS6.0, MOS6.1
We deliver only Critical/Security fixes to MOS7.0, MOS8.0.
We deliver only High/Critical/Security fixes to MOS9.2.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.