Bug #799711 “o2cb[11796]: ERROR: ocfs2_controld.pcmk did not com...” : Bugs : ocfs2-tools package : Ubuntu

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2011-06-21:

#1

Hi There!

Thank you for taking the time to report bugs and trying to make Ubuntu better.

Now, I have a few questions and suggestions that will help determine your issue:

1. Where did you install the tools from? From the Ubuntu Archive or from the PPA given at the HowTo? (Tools in the Ubuntu archive do not support OCFS2/Pacemaker clusters, and that's why we were pointing to the ones on PPA)
2. Did you install OpenAIS? If not please do so. If yes, List what's in /etc/corosync/service.d/
3. When you dpkg-reconfigure ocfs2-tools package, and after the output has finished showing, did you disable o2cb as showed in the HowTo? "sudo update-rc.d o2cb disable"
4. When you use OCFS2 with pacemaker you *don't* have to create /etc/ocfs2/cluster.conf. Please drop that file.

Please. also attach pacemaker's and corosync's config file;s, as well as what's inside of /etc/corosync/service.d/. additionally, is there any other step you followed that is not listed in the Ubuntu howto? So that I'm able to try to reproduce this report following the howto.

For now I'm marking this bug report as incomplete until more information is provided.

Thank you again for filing bug reports!

Changed in ocfs2-tools (Ubuntu):
status:	New → Incomplete

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-06-23:

#2

Hello Andres,

I have to apologize having files this as a bug. It definitely was a simple configuration issue on my side. The problem
was not having installed the package openais.

1. Where did you install the tools from? From the Ubuntu Archive or from the PPA given at the HowTo? (Tools in the Ubuntu archive do not support OCFS2/Pacemaker clusters, and that's why we were pointing to the ones on PPA)
- Yes, packages were installed/ upgraded using the PPA.

2. Did you install OpenAIS? If not please do so. If yes, List what's in /etc/corosync/service.d/
- No, it was not. After installing it all services came up fine.

3. When you dpkg-reconfigure ocfs2-tools package, and after the output has finished showing, did you disable o2cb as showed in the HowTo? "sudo update-rc.d o2cb disable"
- Yes, did that. Enabled the services to be loaded at boot time and answered all other questions accepting the defaults.

4. When you use OCFS2 with pacemaker you *don't* have to create /etc/ocfs2/cluster.conf. Please drop that file.
- Did that too.

As mentioned above simply following the steps of the HowTo lead to a properly working dual master configuration.

Thank you for your help!

So long

Henning

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-06-23:

#3

Hi there,

one more question:

Even though everything seems to work fine I get the following message on the second node

Jun 23 11:21:22 node2 ocfs2_controld[3986]: Unable to open checkpoint "ocfs2:controld": Object does not exist
Jun 23 11:21:22 node2 ocfs2_controld[3986]: last message repeated 17 times

Any idea what that might be related to?

Thanks in advance

Henning

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-06-23:

#4

Hello again,

after a while the setup stopped working completely. Even though the status (cat /proc/drbd) showed that everything was ok on both nodes the following message was issued when trying to mount the device on the first node

root@node1:[/tmp] # mount -t ocfs2 /dev/drbd2 /var/www/
mount.ocfs2: Unable to access cluster service while trying to join the group

After rebooting both nodes things still did not work out. The attempt to mount the device manually ended up with the message

root@node1:[~] # mount -t ocfs2 /dev/drbd2 /var/www/
mount.ocfs2: Device or resource busy while mounting /dev/drbd2 on /var/www/. Check 'dmesg' for more information on this error.

I was not able to see anything related in dmesg.

I suspect that the problems might have to do with the fact that I used the device in a master/ slave setup before and the device was formatted with ext4. After getting the master/master setup to work I simply reformatted the device using ocfs2. I reinitialized the device completely using the commands

Both nodes
drbdadm create-md r2

drbdadm attach r2

drbdadm syncer r2

Second node:
drbdadm -- --discard-my-data connect r2

First node:
drbdadm -- --overwrite-data-of-peer primary r2

drbdadm connect r2

Afterwards I formatted the device again using ocfs2.

Let's see whether things are working reliably now...

So long

Henning

Revision history for this message

Ante Karamatić (ivoks) wrote on 2011-06-29: Re: [Ubuntu-ha] [Bug 799711] Re: o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

#5

U Čet, 23. 06. 2011., u 07:13 +0000, HenningMalzahn je napisao/la:

> 3. When you dpkg-reconfigure ocfs2-tools package, and after the output has finished showing, did you disable o2cb as showed in the HowTo? "sudo update-rc.d o2cb disable"
> - Yes, did that. Enabled the services to be loaded at boot time and answered all other questions accepting the defaults.
>
> 4. When you use OCFS2 with pacemaker you *don't* have to create /etc/ocfs2/cluster.conf. Please drop that file.
> - Did that too.

That's why it doesn't work. OCFS2 supports two cluster modes. One is
OCFS2 native, for which you have to enable o2cb service and
setup /etc/ocfs2/cluster.conf. For this setup you don't need pacemaker.

Other mode is when you integrate OCFS2 with pacemaker. For that you have
to disable o2cb service in upstart, remove /etc/ocfs2/cluster.conf and
setup OCFS2 within pacemaker.

If you removed /etc/ocfs2/cluster.conf, but didn't integrate OCFS2 with
pacemaker, it won't work.

--
Ante Karamatic
OEM Server Engineer, Canonical Ltd
<email address hidden>

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-07-07:

#6

Download full text (5.8 KiB)

Hi there,

sorry for getting back that late to the issue but I had to work on somehting else for the past few days.

I did revert both virtual machines again and here's the exact sequence of commands I've use to attempt to get the Pacemaker integrated dual master setup to work:

- apt-get install python-software-properties && \
add-apt-repository ppa:ubuntu-ha/lucid-cluster && \
apt-get update

- apt-get install pacemaker libdlm3-pacemaker ocfs2-tools drbd8-utils openais

- Rebooted

- shred -n 1 -v /dev/mapper/sde1_crypt

- Created the following configuration file for the DRBD device (/etc/drbd.d/r2.res)

resource r2 {

  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysreq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt-f";
  }

  startup {
    degr-wfc-timeout 120;
    become-primary-on both;
  }

  disk {
    on-io-error detach;
  }

  net {
    cram-hmac-alg sha1;
    shared-secret "SECRET";

data-integrity-alg sha1;
allow-two-primaries;

    after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 60M;
  }

  on janus {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.2:7882;
    meta-disk internal;
  }

  on mimas {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.3:7882;
    meta-disk internal;
  }
}

- drbdadm create-md r2

md_offset 26836983808
al_offset 26836951040
bm_offset 26836131840

Found some data

==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Both nodes
- drbdadm create-md r2

- drbdadm attach r2

- drbdadm syncer r2

Second node:
- drbdadm -- --discard-my-data connect r2

First node:
drbdadm -- --overwrite-data-of-peer primary r2

- drbdadm connect r2

- dpkg-reconfigure ocfs2-tools

- update-rc.d o2cb disable

- Created the following cib objects

primitive resDrbd2 ocf:linbit:drbd \
    params drbd_resource="r2" \
    operations $id="resDrbd2-operations" \
    op monitor interval="20s" role="Master" timeout="20s" \
    op monitor interval="30s" role="Slave" timeout="20s"

ms msDrbd2 resDrbd2 \
          meta resource-stickiness="100" \
          master-max="2" master-node-max="1" \
          clone-max="2" clone-node-max="1" \
          notify="true" globally-unique="false"

location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or #uname eq node2

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

primitive resDlm ocf:pacemaker:controld \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100"

clone cloneDlm resDlm \
meta globally-unique="false" interleave="true"

colocation colDlm-on-msDrb2dMaster inf: cloneDlm msDrbd2:Master

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2

primitive ...

Hi there,

sorry for getting back that late to the issue but I had to work on somehting else for the past few days.

I did revert both virtual machines again and here's the exact sequence of commands I've use to attempt to get the Pacemaker integrated dual master setup to work:

- apt-get install python-software-properties && \
  add-apt-repository ppa:ubuntu-ha/lucid-cluster && \
  apt-get update

- apt-get install pacemaker libdlm3-pacemaker ocfs2-tools drbd8-utils openais

- Rebooted

- shred -n 1 -v /dev/mapper/sde1_crypt

- Created the following configuration file for the DRBD device (/etc/drbd.d/r2.res)

resource r2 {
  
  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysreq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt-f";
  }

startup {
    degr-wfc-timeout 120;
    become-primary-on both;
  }

disk {
    on-io-error detach;
  }

net {
    cram-hmac-alg sha1;
    shared-secret "SECRET";

data-integrity-alg sha1;
    allow-two-primaries;

after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

syncer {
    rate 60M;
  }

on janus {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.2:7882;
    meta-disk internal;
  }

on mimas {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.3:7882;
    meta-disk internal;
  }
}

- drbdadm create-md r2

md_offset 26836983808
al_offset 26836951040
bm_offset 26836131840

Found some data

==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Both nodes
- drbdadm create-md r2

- drbdadm attach r2

- drbdadm syncer r2

Second node:
- drbdadm -- --discard-my-data connect r2

First node:
drbdadm -- --overwrite-data-of-peer primary r2

- drbdadm connect r2

- dpkg-reconfigure ocfs2-tools

- update-rc.d o2cb disable

- Created the following cib objects

primitive resDrbd2 ocf:linbit:drbd \
    params drbd_resource="r2" \
    operations $id="resDrbd2-operations" \
    op monitor interval="20s" role="Master" timeout="20s" \
    op monitor interval="30s" role="Slave" timeout="20s"

ms msDrbd2 resDrbd2 \
          meta resource-stickiness="100" \
          master-max="2" master-node-max="1" \
          clone-max="2" clone-node-max="1" \
          notify="true" globally-unique="false"

location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or #uname eq node2

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

primitive resDlm ocf:pacemaker:controld \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100"

clone cloneDlm resDlm \
      meta globally-unique="false" interleave="true"

colocation colDlm-on-msDrb2dMaster inf: cloneDlm msDrbd2:Master

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2

primitive resO2CB ocf:pacemaker:o2cb \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100"

clone cloneO2CB resO2CB \
      meta globally-unique="false" interleave="true"

colocation colO2CB-on-Dlm inf: cloneO2CB cloneDlm

order ordO2CB-after-Dlm 0: cloneDlm cloneO2CB

location locCloneO2CBAllowedNodes cloneO2CB rule 100: #uname eq node1 or #uname eq node2

- Rebooted both nodes

- After the reboot the Pacemaker services the required Pacemaker service are up and running (Output of crm_mon -1f)

Master/Slave Set: msDrbd2
     Masters: [ node1 node2 ]
 Clone Set: cloneDlm
     Started: [ node1 node2 ]
     Stopped: [ resDlm:2 ]
 Clone Set: cloneO2CB
     Started: [ node1 node2 ]
     Stopped: [ resO2CB:2 ]

- Created the filesystem afterwards using the command: mkfs.ocfs2 -L r2 /dev/drbd2
mkfs.ocfs2 1.4.3
Cluster stack: pcmk
Cluster name: pacemaker
NOTE: Selecting extended slot map for userspace cluster stack
Filesystem label=r2
Block size=4096 (bits=12)
Cluster size=4096 (bits=12)
Volume size=26836131840 (6551790 clusters) (6551790 blocks)
204 cluster groups (tail covers 3822 clusters, rest cover 32256 clusters)
Journal size=167723008
Initial number of node slots: 8
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 3 block(s)
Formatting Journals: done
Formatting slot map: done
Writing lost+found: done
mkfs.ocfs2 successful

- Created the following CIB objects for the filesystem:

primitive resFs2 ocf:heartbeat:Filesystem \
          params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100" \
          meta target-role="stopped"

clone cloneFs2 resFs2 \
      meta globally-unique="false" interleave="true"

colocation colFs2-on-CloneO2CB inf: cloneFs2 cloneO2CB

order ordFs2-after-cloneO2CB inf: cloneO2CB cloneFs2

location locCloneFs0AllowedNodes cloneFs2 rule 100: #uname eq node1 or #uname eq node2

- and started the file system by executing the command

crm resource start cloneFs2

The file system comes up fine on the first node (crm_mon -1f)

Clone Set: cloneFs2
     Started: [ node1 ]
     Stopped: [ resFs2:0 ]

but fails on the second node with the following messages in the system log:

ocfs2_controld[3483]: Unable to open checkpoint "ocfs2:controld": Object does not exist

As requested here's the content of /etc/corosync/service.d/
root@node1:[~] # la /etc/corosync/service.d/
total 12
drwxr-xr-x 2 root root 4096 2011-07-05 10:56 .
drwxr-xr-x 4 root root 4096 2011-05-31 16:28 ..
-rw-r--r-- 1 root root   59 2010-02-18 11:09 ckpt-service

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-07-07:

#7

corosync.conf Edit (1.7 KiB, text/plain)

The requested corosync configuration file

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-07-07:

#8

PacemakerConfiguration.txt Edit (11.4 KiB, text/plain)

The full Pacemaker configuration

Revision history for this message

Jacob Smith (jsmith-argotecinc) wrote on 2011-07-07:

#9

You have a location constraint for msDrbd2 that requires it to be master on node 1 only. I would assume it's leftover from your master/slave setup:

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

I would remove that and see what happens. Also a few other things:

In ordering statements the action performed on the first item is applied to all the others unless explicitly defined. In the statement below if msDrbd2 is not master it will promote it to master and once that finished it will then try to take the same "promote" action on cloneDlm also. If all items required a "start" action then you could not define the action at all or only on define it on the first and it would be fine but if you have mixed actions then all must be defined. It should be changed to match the second statement.

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm:start

In this ordering statement I think you meant to put before not after... :-)

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

Also to follow-up on Ante's comment - you were following DRBD users guide for Legacy non-Pacmaker setup. I don't know if you found it but here is the direct link to the Pacemaker config in the DRBD users guide:

http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html

Last - once you have it functioning well you can group the primitives for Dlm and o2cb before you clone them and then eliminate a couple order and colo statements. Then you can combine the colocation and ordering statements for Drbd, the ocfs control group clone and the file system into one of each. I.E.

group g_ocfs2control p_controld p_o2cb

clone cl_ocfs2control g_ocfs2control \
meta globally-unique="false" interleave="true" target-role="Started"

colocation c_fs_srv_on_ocfs2control_on_drbd_srv_master inf: cl_fs_srv cl_ocfs2control ms_drbd_srv:Master

order o_drbd_srv_master_before_ocfs2control_before_fs_srv 0: ms_drbd_srv:promote cl_ocfs2control:start cl_fs_srv:start

Hope that helps!

You have a location constraint for msDrbd2 that requires it to be master on node 1 only.  I would assume it's leftover from your master/slave setup:

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

I would remove that and see what happens.  Also a few other things:

In ordering statements the action performed on the first item is applied to all the others unless explicitly defined.  In the statement below if msDrbd2 is not master it will promote it to master and once that finished it will then try to take the same "promote" action on cloneDlm also.  If all items required a "start" action then you could not define the action at all or only on define it on the first and it would be fine but if you have mixed actions then all must be defined.  It should be changed to match the second statement.

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm:start

In this ordering statement I think you meant to put before not after... :-)

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

Also to follow-up on Ante's comment - you were following DRBD users guide for Legacy non-Pacmaker setup.  I don't know if you found it but here is the direct link to the Pacemaker config in the DRBD users guide:

http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html

Last - once you have it functioning well you can group the primitives for Dlm and o2cb before you clone them and then eliminate a couple order and colo statements.  Then you can combine the colocation and ordering statements for Drbd, the ocfs control group clone and the file system into one of each.  I.E.

group g_ocfs2control p_controld p_o2cb

clone cl_ocfs2control g_ocfs2control \
           meta globally-unique="false" interleave="true" target-role="Started"
 
   colocation c_fs_srv_on_ocfs2control_on_drbd_srv_master inf: cl_fs_srv cl_ocfs2control ms_drbd_srv:Master

order o_drbd_srv_master_before_ocfs2control_before_fs_srv 0: ms_drbd_srv:promote cl_ocfs2control:start cl_fs_srv:start

Hope that helps!

Revision history for this message

Jacob Smith (jsmith-argotecinc) wrote on 2011-07-07:

#10

Whoops! I put them backwards too!

"In this ordering statement I think you meant to put before not after... :-)

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm"

I meant it should be "after" not "before"!

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-07-13:

#11

Hello Jacob,

thank you for your reply. I'm not sure whether the Pacemaker resource definitions are the real problem as I keep getting the following message

Jul 13 16:18:01 node2 ocfs2_controld[29698]: Unable to open checkpoint "ocfs2:controld": Object does not exist
Jul 13 16:18:53 node2 ocfs2_controld[29698]: last message repeated 102178 times
Jul 13 16:18:53 node2 ocfs2_controld[29698]: Unable to open checkpoint "ocfs2:controld": Object does not exist

What I do not understand is the the command: ps aux | grep -i controld yields

root 3757 0.0 0.0 7624 1012 pts/0 S+ 16:21 0:00 grep --color=auto -i controld
root 29115 0.0 0.0 131712 2332 ? Ssl 16:16 0:00 dlm_controld.pcmk -q 0
root 29698 7.1 0.0 84668 2872 ? Ss 15:55 1:53 /usr/sbin/ocfs2_controld.pcmk

Thank you for your help.

Henning

Revision history for this message

Jacob Smith (jsmith-argotecinc) wrote on 2011-07-13:

#12

I know how it might not look like a Pacemaker problem but I believe it is. Pacemaker is in charge of everything. As I put in the comment before I believe this line disallows Drbd2 to be master on node 2:

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

Since everything else relies on Drbd2 being promoted to master (o2cb, dlmcontrol, filesystem via order and colocation constraints) none of those service will be started on node 2. Therefore you cannot have ocfs2_controld running on node 2 because Drbd2 is never master which means it could not be contacted. This would generate the errors in your logs... at least that's what I think! Though I'm not sure why dlm_controld.pcmk is running...

Hopefully it help!

Jake

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-08-31:

#13

Download full text (10.4 KiB)

Hello Jacob,

you were right. Of course the leftover from the Master/Slave setup up was utterly wrong and I removed that.
Other than that I changed the following Pacemaker objects:

- Object that defines the multi state resource:
- ms msDrbd2 resDrbd2 meta resource-stickiness="100" master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" interleave="true"
- Previously defined without the interleave=true" option

- Object that defines the primitive for the DLM
- primitive resDlm ocf:pacemaker:controld op monitor interval="120s"
- Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100"

- Object that defines the primitive for the O2CB service
- primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
- Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100"

- Object that defines the primitive for the filesystem object
- primitive resFs2 ocf:heartbeat:Filesystem params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" op monitor interval="120s"
- Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100" meta target-role="stopped"

So the overall configuration that works is the following:

primitive resDrbd2 ocf:linbit:drbd params drbd_resource="r2" operations $id="resDrbd2-operations" op monitor interval="20s" role="Master" timeout="20s" op monitor interval="30s" role="Slave" timeout="20s"

ms msDrbd2 resDrbd2 meta resource-stickiness="100" master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" interleave="true"
location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or #uname eq node2

primitive resDlm ocf:pacemaker:controld op monitor interval="120s"
clone cloneDlm resDlm meta globally-unique="false" interleave="true"
colocation colDlmDrbd inf: cloneDlm msDrbd2:Master
order ordDrbdDlm 0: msDrbd2:promote cloneDlm
location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2

primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
colocation colO2CBDlm inf: cloneO2CB cloneDlm
order ordDlmO2CB 0: cloneDlm cloneO2CB
location locCloneO2CBAllowedNodes cloneO2CB rule 200: #uname eq node1 or #uname eq node2

primitive resFs2 ocf:heartbeat:Filesystem params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" op monitor interval="120s"
clone cloneFs2 resFs2 meta globally-unique="false" interleave="true"
colocation colFs2-on-CloneO2CB inf: cloneFs2 cloneO2CB
order ordFs2-after-cloneO2CB inf: cloneO2CB cloneFs2
location locFs2AllowedNodes cloneFs2 rule 200: #uname eq node1 or #uname eq node2

I monitored this configuration for the last several weeks and the only thing left to figure out why the network connection between the nodes is dropped from time to time. The "uptime" varies before the following is logged on the first node:

Aug 26 11:27:42 node1 kernel: [93305.714992] block drbd2: sock was shut down by peer
Aug 26 11:27:42 node1 ...

Hello Jacob,

you were right. Of course the leftover from the Master/Slave setup up was utterly wrong and I removed that.
Other than that I changed the following Pacemaker objects:

- Object that defines the multi state resource:
  - ms msDrbd2 resDrbd2 meta resource-stickiness="100" master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" interleave="true"
  - Previously defined without the interleave=true" option

- Object that defines the primitive for the DLM
  - primitive resDlm ocf:pacemaker:controld op monitor interval="120s"
  - Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100"

- Object that defines the primitive for the O2CB service
  - primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
  - Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100"

- Object that defines the primitive for the filesystem object
  - primitive resFs2 ocf:heartbeat:Filesystem params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" op monitor interval="120s"
  - Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100" meta target-role="stopped"

So the overall configuration that works is the following:

primitive resDrbd2 ocf:linbit:drbd params drbd_resource="r2" operations $id="resDrbd2-operations" op monitor interval="20s" role="Master" timeout="20s" op monitor interval="30s" role="Slave" timeout="20s"

ms msDrbd2 resDrbd2 meta resource-stickiness="100" master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" interleave="true"
location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or #uname eq node2

primitive resDlm ocf:pacemaker:controld op monitor interval="120s"
clone cloneDlm resDlm meta globally-unique="false" interleave="true"
colocation colDlmDrbd inf: cloneDlm msDrbd2:Master
order ordDrbdDlm 0: msDrbd2:promote cloneDlm
location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2

primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
colocation colO2CBDlm inf: cloneO2CB cloneDlm
order ordDlmO2CB 0: cloneDlm cloneO2CB
location locCloneO2CBAllowedNodes cloneO2CB rule 200: #uname eq node1 or #uname eq node2

primitive resFs2 ocf:heartbeat:Filesystem params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" op monitor interval="120s"
clone cloneFs2 resFs2 meta globally-unique="false" interleave="true"
colocation colFs2-on-CloneO2CB inf: cloneFs2 cloneO2CB
order ordFs2-after-cloneO2CB inf: cloneO2CB cloneFs2
location locFs2AllowedNodes cloneFs2 rule 200: #uname eq node1 or #uname eq node2

I monitored this configuration for the last several weeks and the only thing left to figure out why the network connection between the nodes is dropped from time to time. The "uptime" varies before the following is logged on the first node:

Aug 26 11:27:42 node1 kernel: [93305.714992] block drbd2: sock was shut down by peer
Aug 26 11:27:42 node1 kernel: [93305.714998] block drbd2: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Aug 26 11:27:42 node1 kernel: [93305.715031] block drbd2: short read expecting header on sock: r=0
Aug 26 11:27:42 node1 kernel: [93305.717147] block drbd2: meta connection shut down by peer.
Aug 26 11:27:42 node1 kernel: [93305.717150] block drbd2: asender terminated
Aug 26 11:27:42 node1 kernel: [93305.717154] block drbd2: Terminating asender thread
Aug 26 11:27:42 node1 kernel: [93305.717180] block drbd2: Creating new current UUID
Aug 26 11:27:42 node1 kernel: [93305.749481] block drbd2: Connection closed
Aug 26 11:27:42 node1 kernel: [93305.749486] block drbd2: conn( BrokenPipe -> Unconnected )
Aug 26 11:27:42 node1 kernel: [93305.749489] block drbd2: receiver terminated
Aug 26 11:27:42 node1 kernel: [93305.749491] block drbd2: Restarting receiver thread
Aug 26 11:27:42 node1 kernel: [93305.749493] block drbd2: receiver (re)started
Aug 26 11:27:42 node1 kernel: [93305.749496] block drbd2: conn( Unconnected -> WFConnection )
Aug 26 11:27:42 node1 kernel: [93306.044319] block drbd2: Handshake successful: Agreed network protocol version 91
Aug 26 11:27:42 node1 kernel: [93306.045138] block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
Aug 26 11:27:42 node1 kernel: [93306.045143] block drbd2: conn( WFConnection -> WFReportParams )
Aug 26 11:27:42 node1 kernel: [93306.045155] block drbd2: Starting asender thread (from drbd2_receiver [3210])
Aug 26 11:27:42 node1 kernel: [93306.045296] block drbd2: data-integrity-alg: sha1
Aug 26 11:27:42 node1 kernel: [93306.045575] block drbd2: drbd_sync_handshake:
Aug 26 11:27:42 node1 kernel: [93306.045577] block drbd2: self FC552462665D4783:BDD65C832186D577:9F7DA9B62FDA6714:2AF446A4F5219D1C bits:0 flags:0
Aug 26 11:27:42 node1 kernel: [93306.045579] block drbd2: peer 8CA84B52E8B891A7:BDD65C832186D577:9F7DA9B62FDA6714:2AF446A4F5219D1C bits:0 flags:0
Aug 26 11:27:42 node1 kernel: [93306.045581] block drbd2: uuid_compare()=100 by rule 90
Aug 26 11:27:42 node1 kernel: [93306.045583] block drbd2: Split-Brain detected, dropping connection!
Aug 26 11:27:42 node1 kernel: [93306.062952] block drbd2: helper command: /sbin/drbdadm split-brain minor-2
Aug 26 11:27:42 node1 kernel: [93306.064522] block drbd2: helper command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0)
Aug 26 11:27:42 node1 kernel: [93306.064526] block drbd2: conn( WFReportParams -> Disconnecting )
Aug 26 11:27:42 node1 kernel: [93306.064530] block drbd2: error receiving ReportState, l: 4!
Aug 26 11:27:42 node1 kernel: [93306.078690] block drbd2: meta connection shut down by peer.
Aug 26 11:27:42 node1 kernel: [93306.078693] block drbd2: asender terminated
Aug 26 11:27:42 node1 kernel: [93306.078694] block drbd2: Terminating asender thread
Aug 26 11:27:42 node1 kernel: [93306.098559] block drbd2: Connection closed
Aug 26 11:27:42 node1 kernel: [93306.098567] block drbd2: conn( Disconnecting -> StandAlone )
Aug 26 11:27:42 node1 kernel: [93306.098597] block drbd2: receiver terminated
Aug 26 11:27:42 node1 kernel: [93306.098598] block drbd2: Terminating receiver thread

and on the second node:

Aug 26 07:40:59 node2 lrmd: [1782]: info: rsc:resDrbd2:1:39: monitor
Aug 26 08:41:05 node2 lrmd: [1782]: info: rsc:resDrbd2:1:39: monitor
Aug 26 09:41:11 node2 lrmd: [1782]: info: rsc:resDrbd2:1:39: monitor
Aug 26 10:41:17 node2 lrmd: [1782]: info: rsc:resDrbd2:1:39: monitor
Aug 26 11:27:42 node2 kernel: [92955.613719] block drbd2: PingAck did not arrive in time.
Aug 26 11:27:42 node2 kernel: [92955.629774] block drbd2: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Aug 26 11:27:42 node2 kernel: [92955.629785] block drbd2: asender terminated
Aug 26 11:27:42 node2 kernel: [92955.629787] block drbd2: Terminating asender thread
Aug 26 11:27:42 node2 kernel: [92955.629812] block drbd2: short read expecting header on sock: r=-512
Aug 26 11:27:42 node2 kernel: [92955.630751] block drbd2: Creating new current UUID
Aug 26 11:27:42 node2 kernel: [92955.645302] block drbd2: Connection closed
Aug 26 11:27:42 node2 kernel: [92955.645306] block drbd2: conn( NetworkFailure -> Unconnected )
Aug 26 11:27:42 node2 kernel: [92955.645315] block drbd2: receiver terminated
Aug 26 11:27:42 node2 kernel: [92955.645317] block drbd2: Restarting receiver thread
Aug 26 11:27:42 node2 kernel: [92955.645318] block drbd2: receiver (re)started
Aug 26 11:27:42 node2 kernel: [92955.645321] block drbd2: conn( Unconnected -> WFConnection )
Aug 26 11:27:42 node2 kernel: [92955.974572] block drbd2: Handshake successful: Agreed network protocol version 91
Aug 26 11:27:42 node2 kernel: [92955.975297] block drbd2: Peer authenticated using 20 bytes of 'sha1' HMAC
Aug 26 11:27:42 node2 kernel: [92955.975302] block drbd2: conn( WFConnection -> WFReportParams )
Aug 26 11:27:42 node2 kernel: [92955.975346] block drbd2: Starting asender thread (from drbd2_receiver [3100])
Aug 26 11:27:42 node2 kernel: [92955.976133] block drbd2: data-integrity-alg: sha1
Aug 26 11:27:42 node2 kernel: [92955.976246] block drbd2: drbd_sync_handshake:
Aug 26 11:27:42 node2 kernel: [92955.976249] block drbd2: self 8CA84B52E8B891A7:BDD65C832186D577:9F7DA9B62FDA6714:2AF446A4F5219D1C bits:0 flags:0
Aug 26 11:27:42 node2 kernel: [92955.976251] block drbd2: peer FC552462665D4783:BDD65C832186D577:9F7DA9B62FDA6714:2AF446A4F5219D1C bits:0 flags:0
Aug 26 11:27:42 node2 kernel: [92955.976253] block drbd2: uuid_compare()=100 by rule 90
Aug 26 11:27:42 node2 kernel: [92955.976254] block drbd2: Split-Brain detected, dropping connection!
Aug 26 11:27:42 node2 kernel: [92955.991397] block drbd2: helper command: /sbin/drbdadm split-brain minor-2
Aug 26 11:27:42 node2 kernel: [92955.992758] block drbd2: helper command: /sbin/drbdadm split-brain minor-2 exit code 0 (0x0)
Aug 26 11:27:42 node2 kernel: [92955.992761] block drbd2: conn( WFReportParams -> Disconnecting )
Aug 26 11:27:42 node2 kernel: [92955.992764] block drbd2: error receiving ReportState, l: 4!
Aug 26 11:27:42 node2 kernel: [92956.008897] block drbd2: asender terminated
Aug 26 11:27:42 node2 kernel: [92956.008906] block drbd2: Terminating asender thread
Aug 26 11:27:42 node2 kernel: [92956.008986] block drbd2: Connection closed
Aug 26 11:27:42 node2 kernel: [92956.008994] block drbd2: conn( Disconnecting -> StandAlone )
Aug 26 11:27:42 node2 kernel: [92956.008998] block drbd2: receiver terminated
Aug 26 11:27:42 node2 kernel: [92956.008999] block drbd2: Terminating receiver thread
Aug 26 11:41:24 node2 lrmd: [1782]: info: rsc:resDrbd2:1:39: monitor

This causes a split brain every time this happens even though there are no writes on the devices yet.

Stopping the Apache clone that uses the resource and the multi state resource itself, followed by the following command on the second node:
- drbdadm attach r2
- drbdadm -- --discard-my-data connect r2

and the following on the first node:
- drbdadm attach r2
- drbdadm connect r2

brings up the resources successfully again. Starting the multi state resource afterwards also succeeds without any problem and the setup works again - sometimes for several hours sometimes for 2 - 3 days until the manual resync needs to happen again.
Don't know if this is still the place to discuss this issue as this might have nothing to do anymore with the original issue.

Thanks again to everyone who helped me out!

Henning

Revision history for this message

Jacob Smith (jsmith-argotecinc) wrote on 2011-08-31: Re: [Ubuntu-ha] [Bug 799711] Re: o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

#14

Henning

You are probably right that this isn't the right place to continue with non-bug questions though Andres or others could answer that more acurately.

I would recommend drbd-users mailing list as there are many experts there for config and troubleshooting. Also the pacemaker mailing lists is a good one.

*snip*

I don't have any idea why your getting the broken pipe... but do you have STONITH/fencing configured?!

Normally when you have a comm link break like that then you would want Pacemaker to STONITH the disconnected node. Prior to the STONITH which ever DRBD node is going to survive should fence the resource preventing it from becoming primary until it is UpToDate.

I use the fence peer handler in DRBD set to resource only and then have STONITH configure in Pacemaker. This way I can have a break in DRBD that doesn't automatically STONITH the node (but prevents the borked DRBD from coming up as Master and causing split brain) unless the Cluster communications are also dead at which point Pacemaker will shoot the node.

I think the lack of fencing/STONITH is causing the split brain because both nodes do their own thing when not communicating which causes the diverging data set.

>
> This causes a split brain every time this happens even though there
> are
> no writes on the devices yet.

You have your split brain handling configured like this still?:
    after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;

Your are telling it to disconnect regardless of changes with these split brain lines (I believe it's part of the cause). Have you considered using some more agressive split brain handling if your going with dual primary? They can be controversial topic due to data loss but...

In the users guide at the bottom of this page it lists split brain behaviors that are considered OK for dual primary/clustered filesystem setups:

http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html

    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;

You would likely hit sb-1pri if you had fencing. It would go something like this:

Break in comms
Node 1 is fenced preventing it from becoming master
Pacemaker shoots node 1
Node 1 reboots and (if setup to auto start the cluster) pacemaker accepts the node back into the cluster
Drbd links up and finds Node 1 is diverged
Node 1 is fenced so it is not master right now
Considers the after-sb-1pri rule - this assumes that since it's dual primary if you have one primary then the dataset on that primary is always good and there is no need to perserve the secondary data so just overwrite it.
Basically executes the commands you did manually and discards the Node 1 data.
Once Node 1 is UpToDate DRBD removes the fencing and allows Node 1 to become master

I hope all of that wasn't too confusing!

Jake

Henning

You are probably right that this isn't the right place to continue with non-bug questions though Andres or others could answer that more acurately.

I would recommend drbd-users mailing list as there are many experts there for config and troubleshooting.  Also the pacemaker mailing lists is a good one.

*snip*

I don't have any idea why your getting the broken pipe... but do you have STONITH/fencing configured?!

Normally when you have a comm link break like that then you would want Pacemaker to STONITH the disconnected node.  Prior to the STONITH which ever DRBD node is going to survive should fence the resource preventing it from becoming primary until it is UpToDate.

I use the fence peer handler in DRBD set to resource only and then have STONITH configure in Pacemaker.  This way I can have a break in DRBD that doesn't automatically STONITH the node (but prevents the borked DRBD from coming up as Master and causing split brain) unless the Cluster communications are also dead at which point Pacemaker will shoot the node.

I think the lack of fencing/STONITH is causing the split brain because both nodes do their own thing when not communicating which causes the diverging data set.

> 
> This causes a split brain every time this happens even though there
> are
> no writes on the devices yet.

You have your split brain handling configured like this still?:
    after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;

Your are telling it to disconnect regardless of changes with these split brain lines (I believe it's part of the cause).  Have you considered using some more agressive split brain handling if your going with dual primary?  They can be controversial topic due to data loss but...

In the users guide at the bottom of this page it lists split brain behaviors that are considered OK for dual primary/clustered filesystem setups:

http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html

after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;

You would likely hit sb-1pri if you had fencing.  It would go something like this:

Break in comms
Node 1 is fenced preventing it from becoming master
Pacemaker shoots node 1
Node 1 reboots and (if setup to auto start the cluster) pacemaker accepts the node back into the cluster
Drbd links up and finds Node 1 is diverged
Node 1 is fenced so it is not master right now
Considers the after-sb-1pri rule - this assumes that since it's dual primary if you have one primary then the dataset on that primary is always good and there is no need to perserve the secondary data so just overwrite it.
Basically executes the commands you did manually and discards the Node 1 data.
Once Node 1 is UpToDate DRBD removes the fencing and allows Node 1 to become master

I hope all of that wasn't too confusing!

Jake

Revision history for this message

HenningMalzahn (malzahn) wrote on 2011-08-31:

#15

Hi Jacob,

> You have your split brain handling configured like this still?:
> after-sb-0pri disconnect;
> after-sb-1pri consensus;
> after-sb-2pri disconnect;
Yes, I stil have this configuration active. I'll try your suggestions.
No, I have not configured STONIT properly yet.
> I hope all of that wasn't too confusing!
Nope. Again, thanks for all of your help and the help of everyone else here!

As I'm still in early stages of learning how to set up Pacemaker clusters I think it's really the best to close
this as this way truly not a bug.

> I would recommend drbd-users mailing list as there are many experts there for config and troubleshooting.
> Also the pacemaker mailing lists is a good one.
Yepp. Will use those resources for further questions.

So long

Henning

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-10-31:

#16

[Expired for ocfs2-tools (Ubuntu) because there has been no activity for 60 days.]

Changed in ocfs2-tools (Ubuntu):
status:	Incomplete → Expired

Ubuntu
ocfs2-tools package

o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntuocfs2-tools package

o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
ocfs2-tools package