hostmonitor can not monitor pacemaker_remote node via cibadmin query

Bug #1728527 reported by Hieu LE on 2017-10-30
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Undecided
Unassigned
Rocky
Undecided
Unassigned
Stein
Undecided
Unassigned
masakari-monitors
Undecided
Liam Young
masakari-monitors (Ubuntu)
High
James Page
Disco
High
James Page

Bug Description

Currently Masakari host-monitor only grep the `crmd` status of real node via `cibadmin -Q` command.
In case of pacemaker_remote, the `crmd` attribute is not existed, so remote node always marked in `None` state.

Below is an example xml status of remote node:
<node_state remote_node="true" id="cpu1" uname="cpu1" crm-debug-origin="remote_node_init_status" node_fenced="0">
      <transient_attributes id="cpu1">
        <instance_attributes id="status-cpu1"/>
      </transient_attributes>
    </node_state>
    <node_state remote_node="true" id="cpu2" uname="cpu2" crm-debug-origin="remote_node_init_status" node_fenced="0"/>

And the log from masakari hostmonitor:
2017-10-30 14:15:44.679 1813 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] Recognized 'cpu1' as a new member of cluster. Host status is 'None'.

Changed in masakari-monitors:
assignee: nobody → takahara.kengo (takahara.kengo)
Changed in masakari-monitors:
assignee: takahara.kengo (takahara.kengo) → Hieu LE (hieulq)
status: New → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/647756

Changed in masakari-monitors:
assignee: Hieu LE (hieulq) → Liam Young (gnuoy)
Adam Spiers (adam.spiers) wrote :

Pacemaker is a cluster manager rather than monitoring software, so IIUC its (host) state was not really designed to be polled via this kind of "pull" model - instead its state machine was designed to "push" events and initiate actions via resource agents. Therefore long term I think we need to implement https://storyboard.openstack.org/#!/story/2002124 which replaces this hostmonitor with the nova-host-alerter OCF RA.

Changed in masakari-monitors (Ubuntu Disco):
status: New → Triaged
status: Triaged → New
James Page (james-page) on 2019-04-09
Changed in masakari-monitors (Ubuntu Disco):
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package masakari-monitors - 7.0.0~rc1-0ubuntu2

---------------
masakari-monitors (7.0.0~rc1-0ubuntu2) disco; urgency=medium

  [ Corey Bryant ]
  * d/control: Set source package Section to net, fixing
    binary-control-field-duplicates-source lintian tag.

  [ James Page ]
  * d/p/bug1728527.patch: Cherry pick fix to resolve issues with use of
    pacemaker-remote for remote management of hypervisors (LP: #1728527).

 -- Corey Bryant <email address hidden> Tue, 09 Apr 2019 16:22:18 +0100

Changed in masakari-monitors (Ubuntu Disco):
status: In Progress → Fix Released
Changed in cloud-archive:
status: New → Fix Committed
James Page (james-page) wrote :

This bug was fixed in the package masakari-monitors - 7.0.0~rc1-0ubuntu2~cloud0
---------------

 masakari-monitors (7.0.0~rc1-0ubuntu2~cloud0) bionic-stein; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 masakari-monitors (7.0.0~rc1-0ubuntu2) disco; urgency=medium
 .
   [ Corey Bryant ]
   * d/control: Set source package Section to net, fixing
     binary-control-field-duplicates-source lintian tag.
 .
   [ James Page ]
   * d/p/bug1728527.patch: Cherry pick fix to resolve issues with use of
     pacemaker-remote for remote management of hypervisors (LP: #1728527).

Changed in cloud-archive:
status: Fix Committed → Fix Released

Reviewed: https://review.opendev.org/647756
Committed: https://git.openstack.org/cgit/openstack/masakari-monitors/commit/?id=dc9b77772417c99368a4bbe243bb8e0e7c0bca47
Submitter: Zuul
Branch: master

commit dc9b77772417c99368a4bbe243bb8e0e7c0bca47
Author: Liam Young <email address hidden>
Date: Tue Mar 19 20:05:22 2019 +0000

    Use crm_mon for pacemaker-remote deployments

    As described in bug #1728527 cibadmin does not expose the state of
    the pacemaker-remote nodes which means hostmonitor cannot track
    them. This change switches to use crm_mon to check the status of
    remote nodes if the new config option host.restrict_to_remotes
    to set to True. This will trigger host monitor to use crm_mon
    to monitor nodes and will only monitor nodes that are marked
    as remotes (not members).

    Change-Id: I3f2026805413504c875ea5f39eb036d44b26dd43
    Depends-On: Iaa2251708616e9c69817bf5b346d795ea7a4d21b
    Closes-Bug: #1728527

Changed in masakari-monitors:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/688021
Committed: https://git.openstack.org/cgit/openstack/masakari-monitors/commit/?id=b02c6b6931c0256f4ce6d7167c97ebb849ff3453
Submitter: Zuul
Branch: stable/train

commit b02c6b6931c0256f4ce6d7167c97ebb849ff3453
Author: Liam Young <email address hidden>
Date: Tue Mar 19 20:05:22 2019 +0000

    Use crm_mon for pacemaker-remote deployments

    As described in bug #1728527 cibadmin does not expose the state of
    the pacemaker-remote nodes which means hostmonitor cannot track
    them. This change switches to use crm_mon to check the status of
    remote nodes if the new config option host.restrict_to_remotes
    to set to True. This will trigger host monitor to use crm_mon
    to monitor nodes and will only monitor nodes that are marked
    as remotes (not members).

    Change-Id: I3f2026805413504c875ea5f39eb036d44b26dd43
    Depends-On: Iaa2251708616e9c69817bf5b346d795ea7a4d21b
    Closes-Bug: #1728527
    (cherry picked from commit dc9b77772417c99368a4bbe243bb8e0e7c0bca47)

tags: added: in-stable-train
Larry Lile (llile) wrote :

Can this merged back to Train please?

Thanks.

James Page (james-page) wrote :

@llile This fix is already in the packaging for OpenStack Train for Ubuntu

Larry Lile (llile) wrote :

@james-page I'm working with CentOS 7, the patch doesn't appear in the latest (8.0.0) masakari-monitors release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers