Resource: res_masakari_4cc6744_vip not running

Bug #1935984 reported by Hybrid512
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Masakari Charm
New
Undecided
Unassigned

Bug Description

Hi,

I used to deploy an Openstack bundle including Masakari for monthes but since Charmed Openstack 21.04, everytime I deploy Masakari, I get this message : Resource: res_masakari_4cc6744_vip not running
and the VIP is not coming up (the service itself seems to work properly though, curling the haproxy target endpoint gives an answer).

Here is a snipet of my masakari configuration :

========================================================================
series: focal
applications:

  masakari:
    charm: cs:masakari-11
    channel: stable
    num_units: 3
    to:
    - lxd:3
    - lxd:4
    - lxd:5
    options:
      debug: false
      evacuation-delay: 10
      openstack-origin: distro
      use-internal-endpoints: false
      use-syslog: false
      verbose: false
      vip: 192.168.111.225
      worker-multiplier: 0.25

  masakari-hacluster:
    charm: cs:hacluster-76
    channel: stable
    options:
      cluster_count: 3
      maas_credentials: <MAAS ADMIN CREDENTIALS>
      maas_url: http://<MAAS-IP>:5240/MAAS

  masakari-monitors:
    charm: cs:masakari-monitors-9
    channel: stable

  masakari-mysql-router:
    charm: cs:mysql-router-10
    channel: stable

  masakari-pacemaker-remote:
    charm: cs:pacemaker-remote-9
    channel: stable
    options:
      enable-resources: false
      enable-stonith: true

relations:
- - masakari:ha
  - masakari-hacluster:ha
- - nova-compute:juju-info
  - masakari-pacemaker-remote:juju-info
- - masakari-hacluster:pacemaker-remote
  - masakari-pacemaker-remote:pacemaker-remote
- - nova-compute:juju-info
  - masakari-monitors:container
- - keystone:identity-credentials
  - masakari-monitors:identity-credentials
- - masakari-monitors:certificates
  - vault:certificates
- - masakari:identity-service
  - keystone:identity-service
- - masakari:shared-db
  - masakari-mysql-router:shared-db
- - masakari-mysql-router:db-router
  - mysql-innodb-cluster:db-router
- - masakari:amqp
  - rabbitmq-server:amqp
- - masakari:certificates
  - vault:certificates
========================================================================

For what I can tell, pacemaker is stuck on the masakari-hacluster units but also on the nova-compute units.

Here is a snippet of /var/log/pacemaker/pacemaker.log on a masakari-hacluster unit :

=========================================================================
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (crm_timer_popped) info: Cluster Recheck Timer just popped | input=I_PE_CALC time=60000ms
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (do_state_transition) notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (do_state_transition) info: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (process_pe_message) info: Input has not changed since last time, not saving to disk
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_online_status_fencing) info: Node juju-5073fb-4-lxd-7 is active
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_online_status) info: Node juju-5073fb-4-lxd-7 is online
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_online_status_fencing) info: Node juju-5073fb-5-lxd-7 is active
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_online_status) info: Node juju-5073fb-5-lxd-7 is online
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_online_status_fencing) info: Node juju-5073fb-3-lxd-7 is active
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_online_status) info: Node juju-5073fb-3-lxd-7 is online
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_op_status) info: Probe found res_masakari_haproxy:0 active on juju-5073fb-4-lxd-7 at Jul 13 15:06:53 2021
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_op_status) info: Probe found res_masakari_haproxy:0 active on juju-5073fb-4-lxd-7 at Jul 13 15:06:53 2021
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_op_status) info: Probe found res_masakari_haproxy:1 active on juju-5073fb-3-lxd-7 at Jul 13 14:57:56 2021
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (determine_op_status) info: Probe found res_masakari_haproxy:1 active on juju-5073fb-3-lxd-7 at Jul 13 14:57:56 2021
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (unpack_node_loop) info: Node 1000 is already processed
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (unpack_node_loop) info: Node 1001 is already processed
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (unpack_node_loop) info: Node 1002 is already processed
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (unpack_node_loop) info: Node 1000 is already processed
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (unpack_node_loop) info: Node 1001 is already processed
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (unpack_node_loop) info: Node 1002 is already processed
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (group_print) info: Resource Group: grp_masakari_vips
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: res_masakari_4cc6744_vip (ocf::heartbeat:IPaddr2): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (clone_print) info: Clone Set: cl_res_masakari_haproxy [res_masakari_haproxy]
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (short_print) info: Started: [ juju-5073fb-3-lxd-7 juju-5073fb-4-lxd-7 ]
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (short_print) info: Stopped: [ juju-5073fb-5-lxd-7 ]
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: lab-vit-worker2.maas (ocf::pacemaker:remote): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: lab-vit-worker3.maas (ocf::pacemaker:remote): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: lab-vit-worker4.maas (ocf::pacemaker:remote): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: st-maas (stonith:external/maas): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: st-null (stonith:null): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (common_print) info: lab-vit-worker1.maas (ocf::pacemaker:remote): Stopped
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (native_color) info: Resource res_masakari_haproxy:3 cannot run anywhere
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (10s) for res_masakari_4cc6744_vip on juju-5073fb-4-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (5s) for res_masakari_haproxy:2 on juju-5073fb-5-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (30s) for lab-vit-worker2.maas on juju-5073fb-4-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (30s) for lab-vit-worker3.maas on juju-5073fb-5-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (30s) for lab-vit-worker4.maas on juju-5073fb-3-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (25s) for st-maas on juju-5073fb-5-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (25s) for st-null on juju-5073fb-3-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (RecurringOp) info: Start recurring monitor (30s) for lab-vit-worker1.maas on juju-5073fb-3-lxd-7
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (stage6) warning: Scheduling Node node1 for STONITH
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogNodeActions) notice: * Fence (reboot) node1 'node is unclean'
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start res_masakari_4cc6744_vip ( juju-5073fb-4-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogActions) info: Leave res_masakari_haproxy:0 (Started juju-5073fb-4-lxd-7)
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogActions) info: Leave res_masakari_haproxy:1 (Started juju-5073fb-3-lxd-7)
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start res_masakari_haproxy:2 ( juju-5073fb-5-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogActions) info: Leave res_masakari_haproxy:3 (Stopped)
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start lab-vit-worker2.maas ( juju-5073fb-4-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start lab-vit-worker3.maas ( juju-5073fb-5-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start lab-vit-worker4.maas ( juju-5073fb-3-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start st-maas ( juju-5073fb-5-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start st-null ( juju-5073fb-3-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (LogAction) notice: * Start lab-vit-worker1.maas ( juju-5073fb-3-lxd-7 )
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-schedulerd[551] (pcmk__log_transition_summary) warning: Calculated transition 21 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-65.bz2
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (do_state_transition) info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (do_te_invoke) info: Processing graph 21 (ref=pe_calc-dc-1626189853-54) derived from /var/lib/pacemaker/pengine/pe-warn-65.bz2
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (te_fence_node) notice: Requesting fencing (reboot) of node node1 | action=39 timeout=60000
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (handle_request) notice: Client pacemaker-controld.553.1301b018 wants to fence (reboot) 'node1' with device '(any)'
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (initiate_remote_stonith_op) notice: Requesting peer fencing (reboot) targeting node1 | id=6c9f15ed-e5ab-46fd-b783-fab0d50fdd5a state=0
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (can_fence_host_with_device) notice: st-maas is not eligible to fence (reboot) node1: dynamic-list
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (process_remote_stonith_query) info: Query result 1 of 3 from juju-5073fb-4-lxd-7 for node1/reboot (0 devices) 6c9f15ed-e5ab-46fd-b783-fab0d50fdd5a
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (process_remote_stonith_query) info: Query result 2 of 3 from juju-5073fb-5-lxd-7 for node1/reboot (0 devices) 6c9f15ed-e5ab-46fd-b783-fab0d50fdd5a
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (dynamic_list_search_cb) info: Refreshing port list for st-null
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (process_remote_stonith_query) info: Query result 3 of 3 from juju-5073fb-3-lxd-7 for node1/reboot (0 devices) 6c9f15ed-e5ab-46fd-b783-fab0d50fdd5a
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (process_remote_stonith_query) info: All query replies have arrived, continuing (3 expected/3 received)
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (stonith_choose_peer) notice: Couldn't find anyone to fence (reboot) node1 with any device
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (call_remote_stonith) info: Total timeout set to 60 for peer's fencing targeting node1 for pacemaker-controld.553|id=6c9f15ed-e5ab-46fd-b783-fab0d50fdd5a
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (call_remote_stonith) info: No peers (out of 3) have devices capable of fencing (reboot) node1 for client pacemaker-controld.553 | state=0
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-fenced [548] (remote_op_done) error: Operation 'reboot' targeting node1 on <no-one> for <email address hidden>: No such device
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (tengine_stonith_callback) notice: Stonith operation 23/39:21:0:880f869b-b0f6-4019-b8e3-2e5a1ac888c7: No such device (-19)
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (tengine_stonith_callback) notice: Stonith operation 23 for node1 failed (No such device): aborting transition.
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (tengine_stonith_callback) warning: No devices found in cluster to fence node1, giving up
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (abort_transition_graph) notice: Transition 21 aborted: Stonith failed | source=abort_for_stonith_failure:257 complete=false
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (tengine_stonith_notify) notice: Peer node1 was not terminated (reboot) by <anyone> on behalf of pacemaker-controld.553: No such device | initiator=juju-5073fb-3-lxd-7 ref=6c9f15ed-e5ab-46fd-b783-fab0d50fdd5a
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (run_graph) notice: Transition 21 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=27, Source=/var/lib/pacemaker/pengine/pe-warn-65.bz2): Complete
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (do_log) info: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Jul 13 15:24:13 juju-5073fb-3-lxd-7 pacemaker-controld [553] (do_state_transition) notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
=========================================================================

And here is a snippet of pacemaker.log on masakari-pacemaker-remote units :

=========================================================================
Set r/w permissions for uid=122, gid=127 on /var/log/pacemaker/pacemaker.log
Jul 12 16:03:58 lab-vit-worker2 pacemaker-remoted [210996] (crm_log_init) info: Changed active directory to /var/lib/pacemaker/cores
Jul 12 16:03:58 lab-vit-worker2 pacemaker-remoted [210996] (main) notice: Starting Pacemaker remote executor
Jul 12 16:03:58 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: lrmd
Jul 12 16:03:58 lab-vit-worker2 pacemaker-remoted [210996] (pcmk__init_tls_dh) info: Generating Diffie-Hellman parameters with 2048-bit prime for TLS
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: cib_ro
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: cib_rw
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: cib_shm
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: attrd
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: stonith-ng
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_publish) info: server name: crmd
Jul 12 16:03:59 lab-vit-worker2 pacemaker-remoted [210996] (main) notice: Pacemaker remote executor successfully started and accepting connections
Jul 12 16:04:38 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_rw because not connected to cluster
Jul 12 16:04:38 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-213747-14-5bQ1KK/qb): Remote I/O error (121)
Jul 12 16:05:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:05:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-222623-14-CeXHRK/qb): Remote I/O error (121)
Jul 12 16:05:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:05:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-222623-14-WO75bL/qb): Remote I/O error (121)
Jul 12 16:06:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:06:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-222869-14-4h0iQK/qb): Remote I/O error (121)
Jul 12 16:06:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:06:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-222869-14-DvzM1H/qb): Remote I/O error (121)
Jul 12 16:07:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:07:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-224050-14-1wmLkL/qb): Remote I/O error (121)
Jul 12 16:07:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:07:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-224050-14-F2HYTG/qb): Remote I/O error (121)
Jul 12 16:08:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:08:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-224529-14-82UXkI/qb): Remote I/O error (121)
Jul 12 16:08:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:08:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-224529-14-DfhHpL/qb): Remote I/O error (121)
Jul 12 16:09:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:09:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-225132-14-iV9MfL/qb): Remote I/O error (121)
Jul 12 16:09:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:09:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-225132-14-29E0pK/qb): Remote I/O error (121)
Jul 12 16:10:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:10:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-226419-14-GfRdGI/qb): Remote I/O error (121)
Jul 12 16:10:40 lab-vit-worker2 pacemaker-remoted [210996] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:10:40 lab-vit-worker2 pacemaker-remoted [210996] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-210996-226419-14-96ZQWH/qb): Remote I/O error (121)
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (crm_signal_dispatch) notice: Caught 'Terminated' signal | 15 (invoking handler)
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (lrmd_exit) info: Terminating with 0 clients
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (qb_ipcs_us_withdraw) info: withdrawing server sockets
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [210996] (crm_xml_cleanup) info: Cleaning up memory from libxml2
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (crm_log_init) info: Changed active directory to /var/lib/pacemaker/cores
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (main) notice: Starting Pacemaker remote executor
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: lrmd
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (pcmk__init_tls_dh) info: Generating Diffie-Hellman parameters with 2048-bit prime for TLS
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: cib_ro
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: cib_rw
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: cib_shm
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: attrd
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: stonith-ng
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (qb_ipcs_us_publish) info: server name: crmd
Jul 12 16:11:23 lab-vit-worker2 pacemaker-remoted [226614] (main) notice: Pacemaker remote executor successfully started and accepting connections
Jul 12 16:11:40 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:11:40 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-226875-14-ATvyrI/qb): Remote I/O error (121)
Jul 12 16:11:40 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:11:40 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-226875-14-1xTLTF/qb): Remote I/O error (121)
Jul 12 16:12:40 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:12:40 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-228508-14-rZsKtG/qb): Remote I/O error (121)
Jul 12 16:12:40 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:12:40 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-228508-14-j4PyBI/qb): Remote I/O error (121)
Jul 12 16:13:41 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:13:41 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-228595-14-cvl7MH/qb): Remote I/O error (121)
Jul 12 16:13:41 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:13:41 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-228595-14-V0ZJWI/qb): Remote I/O error (121)
Jul 12 16:14:41 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 12 16:14:41 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-229218-14-ObmfwH/qb): Remote I/O error (121)
Jul 12 16:14:41 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 12 16:14:41 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-229218-14-I30PKE/qb): Remote I/O error (121)
Jul 12 16:14:58 lab-vit-worker2 pacemaker-remoted [226614] (crm_remote_accept) info: New remote connection from ::ffff:192.168.111.66
Jul 12 16:14:58 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_listen) info: Remote client pending authentication | 0x55f9497f64f0 id: 379b517e-add1-49d8-b3d9-6ac2301e763d
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (remoted__read_handshake_data) notice: Remote client connection accepted
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (process_lrmd_get_rsc_info) info: Agent information for 'res_masakari_4cc6744_vip' not in cache
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (process_lrmd_rsc_register) info: Cached agent information for 'res_masakari_4cc6744_vip'
Jul 12 16:14:59 IPaddr2(res_masakari_4cc6744_vip)[229220]: ERROR: Unable to find nic or netmask.
Jul 12 16:14:59 IPaddr2(res_masakari_4cc6744_vip)[229220]: INFO: [findif] failed
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (process_lrmd_get_rsc_info) info: Agent information for 'res_masakari_haproxy' not in cache
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (process_lrmd_get_rsc_info) info: Agent information for 'res_masakari_haproxy:0' not in cache
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (process_lrmd_rsc_register) info: Cached agent information for 'res_masakari_haproxy'
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (log_execute) info: executing - rsc:res_masakari_haproxy action:stop call_id:12
Jul 12 16:14:59 lab-vit-worker2 pacemaker-remoted [226614] (log_finished) info: finished - rsc:res_masakari_haproxy action:stop call_id:12 pid:229260 exit-code:0 exec-time:36ms queue-time:0ms
Jul 12 16:18:08 lab-vit-worker2 pacemaker-remoted [226614] (crm_remote_accept) info: New remote connection from ::ffff:192.168.111.67
Jul 12 16:18:08 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_listen) info: Remote client pending authentication | 0x55f9497f5de0 id: 03dc5746-eeda-44a7-9f70-eccbab18692e
Jul 12 16:18:08 lab-vit-worker2 pacemaker-remoted [226614] (remoted__read_handshake_data) notice: Remote client connection accepted
Jul 12 16:18:10 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_client_msg) info: Remote client disconnected while reading from it
Jul 12 16:18:10 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_client_destroy) notice: Cleaning up after remote client pacemaker-remote-192.168.112.20:3121 disconnected | id=379b517e-add1-49d8-b3d9-6ac2301e763d
Jul 13 07:45:57 lab-vit-worker2 pacemaker-remoted [226614] (crm_remote_accept) info: New remote connection from ::ffff:192.168.111.66
Jul 13 07:45:57 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_listen) info: Remote client pending authentication | 0x55f9498b4750 id: 9c1d3ef6-8d16-4757-87ab-887f721e17e1
Jul 13 07:45:58 lab-vit-worker2 pacemaker-remoted [226614] (remoted__read_handshake_data) notice: Remote client connection accepted
Jul 13 07:46:36 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_client_msg) info: Remote client disconnected while reading from it
Jul 13 07:46:36 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_client_destroy) notice: Cleaning up after remote client pacemaker-remote-192.168.112.20:3121 disconnected | id=03dc5746-eeda-44a7-9f70-eccbab18692e
Jul 13 07:46:36 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_client_msg) info: Remote client disconnected while reading from it
Jul 13 07:46:36 lab-vit-worker2 pacemaker-remoted [226614] (lrmd_remote_client_destroy) notice: Cleaning up after remote client pacemaker-remote-192.168.112.20:3121 disconnected | id=9c1d3ef6-8d16-4757-87ab-887f721e17e1
Jul 13 07:47:24 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 13 07:47:24 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-821390-14-7JkKXH/qb): Remote I/O error (121)
Jul 13 07:47:24 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 13 07:47:24 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-821390-14-eCftQG/qb): Remote I/O error (121)
Jul 13 07:48:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 13 07:48:25 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-822693-14-BvBkCG/qb): Remote I/O error (121)
Jul 13 07:48:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 13 07:48:25 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-822693-14-7h7VPG/qb): Remote I/O error (121)
Jul 13 07:49:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 13 07:49:25 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-823228-14-mb51zF/qb): Remote I/O error (121)
Jul 13 07:49:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 13 07:49:25 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-823228-14-tpP7NG/qb): Remote I/O error (121)
Jul 13 07:50:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
Jul 13 07:50:25 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-823340-14-APqjEG/qb): Remote I/O error (121)
Jul 13 07:50:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to cib_ro because not connected to cluster
Jul 13 07:50:25 lab-vit-worker2 pacemaker-remoted [226614] (handle_new_connection) error: Error in connection setup (/dev/shm/qb-226614-823340-14-1u5ikH/qb): Remote I/O error (121)
Jul 13 07:51:25 lab-vit-worker2 pacemaker-remoted [226614] (ipc_proxy_accept) warning: Cannot proxy IPC connection from uid 0 gid 0 to stonith-ng because not connected to cluster
=========================================================================

I'm no corosync/pacemaker expert but there seem to be an issue somewhere around that.
As MaaS is used as a fencing device, I checked that MaaS is accessible from every pacemaker units and this is the case.
I have no more ideas on what to check/try.

Is it due to some misconfiguration in my deployment ?

Best regards

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.