Pacemaker_pcsd_auth[setup] fails when a node is down

Bug #1723939 reported by Martins Innus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
puppet-pacemaker
New
Undecided
Unassigned

Bug Description

Running on:

###############
[root@srv-p24-26:minnus]# pcs --version
0.9.158

[root@srv-p24-26:minnus]# more /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)

[minnus@srv-m14-34:pacemaker]$ git log -1
commit c44548c6e51cc4991ef3a1acd325e89122db0a69
Author: ZhongShengping <chdzsp@163.com>
Date: Sun Jul 2 14:25:18 2017 +0800

    Fix openstackdocstheme

    See the docs[0] how to use it. Also add the variables for the bug
    log feature.

    [0]https://docs.openstack.org/openstackdocstheme/latest/

    Change-Id: I3f6d80920ce2ef8277d8fa1668827d5f5adfad21
#################

When a node in a previously configured cluster goes down, subsequent puppet runs fail with:

Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) Could not evaluate: Could not parse the result of the cluster auth command: '{
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "status": "ok",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "data": {
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "auth_responses": {
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "srv-p24-26": {
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "status": "already_authorized"
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) },
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "srv-p24-28": {
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "status": "already_authorized"
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) },
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "srv-p24-30": {
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "status": "noresponse"
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) }
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) },
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "sync_successful": true,
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "sync_nodes_err": [
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup])
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) ],
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "sync_responses": {
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) }
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) },
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "log": [
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.921998 #20802] INFO -- : PCSD Debugging enabled\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "D, [2017-10-16T08:47:56.922244 #20802] DEBUG -- : Did not detect RHEL 6\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.922282 #20802] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.922304 #20802] INFO -- : CIB USER: hacluster, groups: \n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "D, [2017-10-16T08:47:56.928135 #20802] DEBUG -- : [\"totem.cluster_name (str) = ccr-os-prod\\n\"]\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "D, [2017-10-16T08:47:56.928202 #20802] DEBUG -- : []\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "D, [2017-10-16T08:47:56.928231 #20802] DEBUG -- : Duration: 0.005810529s\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.928282 #20802] INFO -- : Return Value: 0\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.928854 #20802] INFO -- : SRWT Node: srv-p24-26 Request: check_auth\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.929688 #20802] INFO -- : SRWT Node: srv-p24-28 Request: check_auth\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.930210 #20802]Error: Unable to communicate with srv-p24-30
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) INFO -- : SRWT Node: srv-p24-30 Request: check_auth\n",
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) "I, [2017-10-16T08:47:56.935436 #20802] INFO -- : No response from: srv-p24-30 request: check_auth, error: couldnt_connect\n"
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) ]
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) }
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Pacemaker_pcsd_auth[setup]) '
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Exec[create-cluster]) Dependency Pacemaker_pcsd_auth[setup] has failures: true
Oct 16 08:47:57 srv-p24-26 puppet-agent[17028]: (/Stage[main]/Pacemaker::New::Setup::Pcsd/Exec[create-cluster]) Skipping because of failed dependencies

I assume because the stderr message of:

Error: Unable to communicate with srv-p24-30

gets interleaved with the debug output and therefore can't be parsed as json. Not sure if this is strictly a problem in the puppet module or should be fixed in the pcs code, but it causes the puppet run to fail.

Running the auth command from the command line produces no such interleaving, I guess due to different buffering characteristics in the terminal:

--Debug Stdout Start--
{
  "status": "ok",
  "data": {
    "auth_responses": {
      "srv-p24-26": {
        "status": "already_authorized"
      },
      "srv-p24-28": {
        "status": "already_authorized"
      },
      "srv-p24-30": {
        "status": "noresponse"
      }
    },
    "sync_successful": true,
    "sync_nodes_err": [

    ],
    "sync_responses": {
    }
  },
  "log": [
    "I, [2017-10-16T08:51:35.556119 #24992] INFO -- : PCSD Debugging enabled\n",
    "D, [2017-10-16T08:51:35.556383 #24992] DEBUG -- : Did not detect RHEL 6\n",
    "I, [2017-10-16T08:51:35.556441 #24992] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name\n",
    "I, [2017-10-16T08:51:35.556465 #24992] INFO -- : CIB USER: hacluster, groups: \n",
    "D, [2017-10-16T08:51:35.562941 #24992] DEBUG -- : [\"totem.cluster_name (str) = ccr-os-prod\\n\"]\n",
    "D, [2017-10-16T08:51:35.563013 #24992] DEBUG -- : []\n",
    "D, [2017-10-16T08:51:35.563044 #24992] DEBUG -- : Duration: 0.00645078s\n",
    "I, [2017-10-16T08:51:35.563100 #24992] INFO -- : Return Value: 0\n",
    "I, [2017-10-16T08:51:35.563608 #24992] INFO -- : SRWT Node: srv-p24-30 Request: check_auth\n",
    "I, [2017-10-16T08:51:35.564393 #24992] INFO -- : SRWT Node: srv-p24-28 Request: check_auth\n",
    "I, [2017-10-16T08:51:35.564939 #24992] INFO -- : SRWT Node: srv-p24-26 Request: check_auth\n",
    "I, [2017-10-16T08:51:35.568750 #24992] INFO -- : No response from: srv-p24-30 request: check_auth, error: couldnt_connect\n"
  ]
}

--Debug Stdout End--
--Debug Stderr Start--

--Debug Stderr End--

srv-p24-28: Already authorized
Error: Unable to communicate with srv-p24-30
srv-p24-26: Already authorized

Let me know if you need anymore information

Thanks

Martins

Revision history for this message
Michele Baldessari (michele) wrote :

I think you need commit a2e9675cda2e9152569ce4ab0f69484c03fc7c93 (I32a7a9fa4b896871929fcae847bba3481df5b517) i.e LP https://bugs.launchpad.net/puppet-pacemaker/+bug/1718186

Revision history for this message
Martins Innus (minnus) wrote :

Thanks for the follow up. I've got that one applied already just as a local patch. I believe this issue is different and occurs even with that commit applied.

If all the nodes are up, everything works great. Its only with a down node, and that extra stderr output that there is a problem.

Thanks

Martins

Revision history for this message
Martins Innus (minnus) wrote :

Hi,
   Any other thoughts on this? I haven't upgraded since submitting this report, but I looked through the commit history and didn't see anything that would address this. Currently, if one node goes completely offline, puppet runs fail on the other nodes.

Thanks

Martins

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.