RabbitMQ OCF node uptime is not always reset in the CIB

Bug #1530150 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Bogdan Dobrelya
7.0.x
Confirmed
High
Denis Puchkin
8.0.x
Fix Released
High
Bogdan Dobrelya
Mitaka
Fix Released
High
Bogdan Dobrelya

Bug Description

Source bug https://bugs.launchpad.net/fuel/+bug/1529875

What I found is that the uptime value was not reset for the node-3 after it was rebooted, which is wrong and impacts election of new masters very badly:

node-4 lrmd:
before node-3 reset
2015-12-29T13:34:44.242420+00:00 info: INFO: p_rabbitmq-server: get_monitor(): comparing our uptime (4490) with node-9.domain.tld (4488)
2015-12-29T13:34:44.317439+00:00 info: INFO: p_rabbitmq-server: get_monitor(): comparing our uptime (4490) with node-3.domain.tld (7142)
after node-3 reset
2015-12-29T13:38:23.253014+00:00 info: INFO: p_rabbitmq-server: get_monitor(): comparing our uptime (4709) with node-9.domain.tld (4707)
2015-12-29T13:38:23.359804+00:00 info: INFO: p_rabbitmq-server: get_monitor(): comparing our uptime (4709) with node-3.domain.tld (7361)

The crm_mon -fotAW -1 also shows the value for the node's rabbit-start-time is not being reset after the non-graceful shutdown.

Changed in fuel:
importance: Undecided → High
milestone: none → 8.0
assignee: nobody → Bogdan Dobrelya (bogdando)
status: New → In Progress
tags: added: area-build
tags: added: area-library ha rabbitmq
removed: area-build
summary: - RabbitMQ OCF node uptime is not reset in the CIB
+ RabbitMQ OCF node uptime is not always reset in the CIB
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/262572

tags: added: team-bugfix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/262572
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=d833b3ac710a950affedd7cd32c0abaef81944e0
Submitter: Jenkins
Branch: master

commit d833b3ac710a950affedd7cd32c0abaef81944e0
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Dec 30 18:08:46 2015 +0100

    Ensure rabbit node uptime is reset in the CIB for OCF resource

    * Add ocf_run wrappers and info log messages for CIB attribute events
    * Move "fast" CIB attribute updates before "heavy" operations like
      start/stop/wait to ensure CIB consistent even if the timeouts
      exceeded for the ops
    * Delete master and start time attributes from CIB on action_start
      to ensure the correct rabbit nodes uptime evaluation for new
      master elections for corresponding pacemaker resources
    * For post-demote notify and action_demote() delete the master
      attribute from CIB as well.
    * For post-start notify, update the start time in the CIB even when
      the node is already clustered. Otherwise it would remain running
      in cluster w/o the start time registered, which affects the new
      master elections badly.

    Upstream RR https://github.com/rabbitmq/rabbitmq-server/pull/524
    Closes-bug: #1530150

    Change-Id: I9db3c819031cef620377b4fee08ea92e90b11c70
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/267507

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/267507
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=51b4072521684a19c2ba7d394cf8d42b191ced93
Submitter: Jenkins
Branch: stable/8.0

commit 51b4072521684a19c2ba7d394cf8d42b191ced93
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Dec 30 18:08:46 2015 +0100

    Ensure rabbit node uptime is reset in the CIB for OCF resource

    * Add ocf_run wrappers and info log messages for CIB attribute events
    * Move "fast" CIB attribute updates before "heavy" operations like
      start/stop/wait to ensure CIB consistent even if the timeouts
      exceeded for the ops
    * Delete master and start time attributes from CIB on action_start
      to ensure the correct rabbit nodes uptime evaluation for new
      master elections for corresponding pacemaker resources
    * For post-demote notify and action_demote() delete the master
      attribute from CIB as well.
    * For post-start notify, update the start time in the CIB even when
      the node is already clustered. Otherwise it would remain running
      in cluster w/o the start time registered, which affects the new
      master elections badly.

    Upstream RR https://github.com/rabbitmq/rabbitmq-server/pull/524
    Closes-bug: #1530150

    Change-Id: I9db3c819031cef620377b4fee08ea92e90b11c70
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit d833b3ac710a950affedd7cd32c0abaef81944e0)

tags: added: on-verification
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

verified on 8.0-506

tags: removed: on-verification
Vladimir (vushakov)
tags: added: on-verification
Revision history for this message
Vladimir (vushakov) wrote :

Verified on:
    Fuel 9.0 ISO #142

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.