Get 'Heartbeat read failed from 'stomp://mcollective@10.108.0.2:61613' error on controller node

Bug #1298262 reported by Andrey Sledzinskiy
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Maksim Malchuk

Bug Description

Bug is reproduced on {"build_id": "2014-03-26_13-22-07", "mirantis": "yes", "build_number": "51", "nailgun_sha": "cb27e483815aa5ee09fcb69fe25e45b0e5fd6a7e", "ostf_sha": "b54c48a50ea815d4e4aee1ed97c68601a2f458e8", "fuelmain_sha": "27a04526eb1a0596b03dc9805435d2a3147d216e", "astute_sha": "d7c6c4d00ffd6e2fa74da442f573e6f39049961e", "release": "5.0", "fuellib_sha": "d7fb2fe73788cae5241f321b11a90235d297b493"}

Steps:
1. Create next cluster - Centos, HA, KVM, Nova-network Vlan, no tagging, Cinder LVM, Ceph for images
2. Add 3 controllers, 1 compute, 1 cinder, 3 ceph nodes
3. Deploy cluster - deploy is successful
4. Open Logs tab, select third controller and select 'mcollective' log
There is next error:
ERROR -- : rabbitmq.rb:50:in `on_hbread_fail' Heartbeat read failed from 'stomp://mcollective@10.108.0.2:61613': {"read_fail_count"=>0, "ticker_interval"=>29.5, "lock_fail"=>true, "lock_fail_count"=>1}

Diagnostic snapshot is attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
assignee: nobody → Fuel Astute Team (fuel-astute)
status: New → Confirmed
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Could not reproduce.

{
    "mirantis": "no",
    "nailgun_sha": "3e955a06aab9baf03210fef5d606971ef462be6c",
    "production": "prod",
    "ostf_sha": "134765fcb5a07dce0cd1bb399b2290c988c3c63b",
    "fuelmain_sha": "f7094a4100600a089019b826dd5c4061434331d3",
    "astute_sha": "6e8fa4cc12968d7b468fc590b2f06bb59bf74511",
    "release": "5.0",
    "fuellib_sha": "ec6986f39ba105d14c1461c30af16613d093e303"
}

And most interesting details: log 'mcollective' in web interface only show logs from boostrap stage. If nodes deploy success, it works fine and problem has temporality nature.

Changed in fuel:
status: Confirmed → Incomplete
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: 5.0 → 5.1
Curtis Hovey (sinzui)
Changed in fuel:
assignee: Registry Administrators (registry) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369344

Changed in fuel:
assignee: nobody → Georgy Kibardin (gkibardin)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369546

Changed in fuel:
assignee: Georgy Kibardin (gkibardin) → Maksim Malchuk (mmalchuk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/370015

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-agent (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/370025

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (master)

Reviewed: https://review.openstack.org/369344
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=b50241a7b243f553cc35e521ab99bb7f94d8b54a
Submitter: Jenkins
Branch: master

commit b50241a7b243f553cc35e521ab99bb7f94d8b54a
Author: Georgy Kibardin <email address hidden>
Date: Tue Sep 13 14:08:34 2016 +0300

    Ignore heartbeats lock fails

    Stomp heartbeat handling is quite poorly designed. It happens in a
    separate thread which sleeps, then tries to read a heartbeat if reading
    mutex is acquired by message receiving thread it fails and increases
    lock failure count. Upon reaching the limit (in our packets it is 2 by
    default) it forcibly closes the connetion causing reconnect. Setting the
    value to 0 turns the feature off.

    Change-Id: I2187ce69508c530073582c542c963014acc5123a
    Closes-Bug: #1613246
    Closes-Bug: #1298262

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/370015
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=8318d7056556337f17f596edad9d7eed48ec3ca5
Submitter: Jenkins
Branch: stable/mitaka

commit 8318d7056556337f17f596edad9d7eed48ec3ca5
Author: Georgy Kibardin <email address hidden>
Date: Tue Sep 13 18:43:45 2016 +0300

    Ignore heartbeats lock fails

    Stomp heartbeat handling is quite poorly designed. It happens in a
    separate thread which sleeps, then tries to read a heartbeat if reading
    mutex is acquired by message receiving thread it fails and increases
    lock failure count. Upon reaching the limit (in our packets it is 2 by
    default) it forcibly closes the connetion causing reconnect. Setting the
    value to 0 turns the feature off.

    Change-Id: Ieec889828d1dd2654ee760e7d5676efd14c7c348
    Closes-Bug: #1613246
    Closes-Bug: #1298262

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-agent (stable/mitaka)

Reviewed: https://review.openstack.org/370025
Committed: https://git.openstack.org/cgit/openstack/fuel-agent/commit/?id=898bcca75224ad82fa98a85b77651faaf554e2b6
Submitter: Jenkins
Branch: stable/mitaka

commit 898bcca75224ad82fa98a85b77651faaf554e2b6
Author: Georgy Kibardin <email address hidden>
Date: Tue Sep 13 14:08:34 2016 +0300

    Ignore heartbeats lock fails

    Stomp heartbeat handling is quite poorly designed. It happens in a
    separate thread which sleeps, then tries to read a heartbeat if reading
    mutex is acquired by message receiving thread it fails and increases
    lock failure count. Upon reaching the limit (in our packets it is 2 by
    default) it forcibly closes the connetion causing reconnect. Setting the
    value to 0 turns the feature off.

    Change-Id: I2187ce69508c530073582c542c963014acc5123a
    Closes-Bug: #1613246
    Closes-Bug: #1298262
    (cherry picked from commit b50241a7b243f553cc35e521ab99bb7f94d8b54a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/369546
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=22116add36b830e1418d2dd8345d633f194e2f27
Submitter: Jenkins
Branch: master

commit 22116add36b830e1418d2dd8345d633f194e2f27
Author: Georgy Kibardin <email address hidden>
Date: Tue Sep 13 18:43:45 2016 +0300

    Ignore heartbeats lock fails

    Stomp heartbeat handling is quite poorly designed. It happens in a
    separate thread which sleeps, then tries to read a heartbeat if reading
    mutex is acquired by message receiving thread it fails and increases
    lock failure count. Upon reaching the limit (in our packets it is 2 by
    default) it forcibly closes the connetion causing reconnect. Setting the
    value to 0 turns the feature off.

    Change-Id: Ieec889828d1dd2654ee760e7d5676efd14c7c348
    Closes-Bug: #1613246
    Closes-Bug: #1298262

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-agent 10.0.0rc1

This issue was fixed in the openstack/fuel-agent 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0rc1

This issue was fixed in the openstack/fuel-library 10.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-agent 10.0.0

This issue was fixed in the openstack/fuel-agent 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0

This issue was fixed in the openstack/fuel-library 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-agent 10.0.0

This issue was fixed in the openstack/fuel-agent 10.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 10.0.0

This issue was fixed in the openstack/fuel-library 10.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.