The issue is related to package upgrade procedure, after it pacemaker start dying constantly.
Linux team, can you check?
[16:07:44] Oleksiy Molchanov: Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: notice: check_active_before_startup_processes: Process lrmd terminated (pid=25845)
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: notice: pcmk_process_exit: Respawning failed child process: lrmd
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: start_child: Forked child 6582 for process lrmd
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: notice: check_active_before_startup_processes: Process pengine terminated (pid=25847)
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: notice: pcmk_process_exit: Respawning failed child process: pengine
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: start_child: Using uid=107 and group=114 for process pengine
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: start_child: Forked child 6583 for process pengine
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores/root
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: info: qb_ipcs_us_publish: server name: lrmd
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: error: qb_ipcs_us_publish: Could not bind AF_UNIX (): Address already in use (98)
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: info: qb_ipcs_us_withdraw: withdrawing server sockets
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: error: mainloop_add_ipc_server: Could not start lrmd IPC server: Address already in use (-98)
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: error: main: Failed to create IPC server: shutting down and inhibiting respawn
Nov 16 18:07:54 [6582] node-3.test.domain.local lrmd: info: crm_xml_cleanup: Cleaning up memory from libxml2
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: warning: pcmk_child_exit: The lrmd process (6582) can no longer be respawned, shutting the cluster down.
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: notice: pcmk_shutdown_worker: Shutting down Pacemaker
Nov 16 18:07:54 [5968] node-3.test.domain.local pacemakerd: notice: stop_child: Stopping crmd: Sent -15 to process 5986
Nov 16 18:07:54 [5986] node-3.test.domain.local crmd: notice: crm_signal_dispatch: Invoking handler for signal 15: Terminated
The issue is related to package upgrade procedure, after it pacemaker start dying constantly.
Linux team, can you check?
[16:07:44] Oleksiy Molchanov: Nov 16 18:07:54 [5968] node-3. test.domain. local pacemakerd: notice: check_active_ before_ startup_ processes: Process lrmd terminated (pid=25845) test.domain. local pacemakerd: notice: pcmk_process_exit: Respawning failed child process: lrmd test.domain. local pacemakerd: info: start_child: Forked child 6582 for process lrmd test.domain. local pacemakerd: notice: check_active_ before_ startup_ processes: Process pengine terminated (pid=25847) test.domain. local pacemakerd: notice: pcmk_process_exit: Respawning failed child process: pengine test.domain. local pacemakerd: info: start_child: Using uid=107 and group=114 for process pengine test.domain. local pacemakerd: info: start_child: Forked child 6583 for process pengine test.domain. local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node test.domain. local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node test.domain. local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node test.domain. local pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node test.domain. local lrmd: info: crm_log_init: Changed active directory to /var/lib/ pacemaker/ cores/root test.domain. local lrmd: info: qb_ipcs_us_publish: server name: lrmd test.domain. local lrmd: error: qb_ipcs_us_publish: Could not bind AF_UNIX (): Address already in use (98) test.domain. local lrmd: info: qb_ipcs_ us_withdraw: withdrawing server sockets test.domain. local lrmd: error: mainloop_ add_ipc_ server: Could not start lrmd IPC server: Address already in use (-98) test.domain. local lrmd: error: main: Failed to create IPC server: shutting down and inhibiting respawn test.domain. local lrmd: info: crm_xml_cleanup: Cleaning up memory from libxml2 test.domain. local pacemakerd: warning: pcmk_child_exit: The lrmd process (6582) can no longer be respawned, shutting the cluster down. test.domain. local pacemakerd: notice: pcmk_shutdown_ worker: Shutting down Pacemaker test.domain. local pacemakerd: notice: stop_child: Stopping crmd: Sent -15 to process 5986 test.domain. local crmd: notice: crm_signal_ dispatch: Invoking handler for signal 15: Terminated
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [6582] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5968] node-3.
Nov 16 18:07:54 [5986] node-3.