o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

Bug #799711 reported by HenningMalzahn
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ocfs2-tools (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: ocfs2-tools

The system is an asymmetric-opt-in cluster. Following this
HowTo: https://wiki.ubuntu.com/ClusterStack/LucidTesting#Pacemaker.2C_drbd8_and_OCFS2_or_GFS2

I'm able to start the DRBD resource and the DLM but when trying to start the resource
cloneO2CB the following is logged in syslog:

o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up
ocfs2_controld[12870]: Unable to connect to CKPT: Object does not exist

The same thing happens when trying to test the o2cb script manually using:
/usr/sbin/ocf-tester -n oc2b /usr/lib/ocf/resource.d/pacemaker/o2cb

Found the following document while trying to gain more information about the error:
http://www.drbd.org/users-guide/s-ocfs2-enable.html

After creating the configuration file mentioned in that document and executing
dpkg-reconfigure ocfs2-tools

the following is logged on the console
Loading stack plugin "o2cb": OK
Setting cluster stack "o2cb": OK
Starting O2CB cluster ocfs2: OK

Leaving the file /etc/ocfs2/cluster.conf mentioned in the LinBit document in place
or deleting it does not help starting the o2cb resource as Pacemaker resource.

Any help would be greatly appreciated.

So long

Henning Malzahn

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: ocfs2-tools (not installed)
ProcVersionSignature: Ubuntu 2.6.32-32.62-generic 2.6.32.38+drm33.16
Uname: Linux 2.6.32-32-generic x86_64
NonfreeKernelModules: vmnet vsock vmci vmmon nvidia
Architecture: amd64
Date: Mon Jun 20 14:42:09 2011
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100429)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: ocfs2-tools

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi There!

Thank you for taking the time to report bugs and trying to make Ubuntu better.

Now, I have a few questions and suggestions that will help determine your issue:

1. Where did you install the tools from? From the Ubuntu Archive or from the PPA given at the HowTo? (Tools in the Ubuntu archive do not support OCFS2/Pacemaker clusters, and that's why we were pointing to the ones on PPA)
2. Did you install OpenAIS? If not please do so. If yes, List what's in /etc/corosync/service.d/
3. When you dpkg-reconfigure ocfs2-tools package, and after the output has finished showing, did you disable o2cb as showed in the HowTo? "sudo update-rc.d o2cb disable"
4. When you use OCFS2 with pacemaker you *don't* have to create /etc/ocfs2/cluster.conf. Please drop that file.

Please. also attach pacemaker's and corosync's config file;s, as well as what's inside of /etc/corosync/service.d/. additionally, is there any other step you followed that is not listed in the Ubuntu howto? So that I'm able to try to reproduce this report following the howto.

For now I'm marking this bug report as incomplete until more information is provided.

Thank you again for filing bug reports!

Changed in ocfs2-tools (Ubuntu):
status: New → Incomplete
Revision history for this message
HenningMalzahn (malzahn) wrote :

Hello Andres,

I have to apologize having files this as a bug. It definitely was a simple configuration issue on my side. The problem
was not having installed the package openais.

1. Where did you install the tools from? From the Ubuntu Archive or from the PPA given at the HowTo? (Tools in the Ubuntu archive do not support OCFS2/Pacemaker clusters, and that's why we were pointing to the ones on PPA)
- Yes, packages were installed/ upgraded using the PPA.

2. Did you install OpenAIS? If not please do so. If yes, List what's in /etc/corosync/service.d/
- No, it was not. After installing it all services came up fine.

3. When you dpkg-reconfigure ocfs2-tools package, and after the output has finished showing, did you disable o2cb as showed in the HowTo? "sudo update-rc.d o2cb disable"
- Yes, did that. Enabled the services to be loaded at boot time and answered all other questions accepting the defaults.

4. When you use OCFS2 with pacemaker you *don't* have to create /etc/ocfs2/cluster.conf. Please drop that file.
- Did that too.

As mentioned above simply following the steps of the HowTo lead to a properly working dual master configuration.

Thank you for your help!

So long

Henning

Revision history for this message
HenningMalzahn (malzahn) wrote :

Hi there,

one more question:

Even though everything seems to work fine I get the following message on the second node

Jun 23 11:21:22 node2 ocfs2_controld[3986]: Unable to open checkpoint "ocfs2:controld": Object does not exist
Jun 23 11:21:22 node2 ocfs2_controld[3986]: last message repeated 17 times

Any idea what that might be related to?

Thanks in advance

Henning

Revision history for this message
HenningMalzahn (malzahn) wrote :

Hello again,

after a while the setup stopped working completely. Even though the status (cat /proc/drbd) showed that everything was ok on both nodes the following message was issued when trying to mount the device on the first node

root@node1:[/tmp] # mount -t ocfs2 /dev/drbd2 /var/www/
mount.ocfs2: Unable to access cluster service while trying to join the group

After rebooting both nodes things still did not work out. The attempt to mount the device manually ended up with the message

root@node1:[~] # mount -t ocfs2 /dev/drbd2 /var/www/
mount.ocfs2: Device or resource busy while mounting /dev/drbd2 on /var/www/. Check 'dmesg' for more information on this error.

I was not able to see anything related in dmesg.

I suspect that the problems might have to do with the fact that I used the device in a master/ slave setup before and the device was formatted with ext4. After getting the master/master setup to work I simply reformatted the device using ocfs2. I reinitialized the device completely using the commands

Both nodes
drbdadm create-md r2

drbdadm attach r2

drbdadm syncer r2

Second node:
drbdadm -- --discard-my-data connect r2

First node:
drbdadm -- --overwrite-data-of-peer primary r2

drbdadm connect r2

Afterwards I formatted the device again using ocfs2.

Let's see whether things are working reliably now...

So long

Henning

Revision history for this message
Ante Karamatić (ivoks) wrote : Re: [Ubuntu-ha] [Bug 799711] Re: o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

U Čet, 23. 06. 2011., u 07:13 +0000, HenningMalzahn je napisao/la:

> 3. When you dpkg-reconfigure ocfs2-tools package, and after the output has finished showing, did you disable o2cb as showed in the HowTo? "sudo update-rc.d o2cb disable"
> - Yes, did that. Enabled the services to be loaded at boot time and answered all other questions accepting the defaults.
>
> 4. When you use OCFS2 with pacemaker you *don't* have to create /etc/ocfs2/cluster.conf. Please drop that file.
> - Did that too.

That's why it doesn't work. OCFS2 supports two cluster modes. One is
OCFS2 native, for which you have to enable o2cb service and
setup /etc/ocfs2/cluster.conf. For this setup you don't need pacemaker.

Other mode is when you integrate OCFS2 with pacemaker. For that you have
to disable o2cb service in upstart, remove /etc/ocfs2/cluster.conf and
setup OCFS2 within pacemaker.

If you removed /etc/ocfs2/cluster.conf, but didn't integrate OCFS2 with
pacemaker, it won't work.

--
Ante Karamatic
OEM Server Engineer, Canonical Ltd
<email address hidden>

Revision history for this message
HenningMalzahn (malzahn) wrote :
Download full text (5.8 KiB)

Hi there,

sorry for getting back that late to the issue but I had to work on somehting else for the past few days.

I did revert both virtual machines again and here's the exact sequence of commands I've use to attempt to get the Pacemaker integrated dual master setup to work:

- apt-get install python-software-properties && \
  add-apt-repository ppa:ubuntu-ha/lucid-cluster && \
  apt-get update

- apt-get install pacemaker libdlm3-pacemaker ocfs2-tools drbd8-utils openais

- Rebooted

- shred -n 1 -v /dev/mapper/sde1_crypt

- Created the following configuration file for the DRBD device (/etc/drbd.d/r2.res)

resource r2 {

  handlers {
    pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo o > /proc/sysreq-trigger ; halt -f";
    local-io-error "echo o > /proc/sysrq-trigger ; halt-f";
  }

  startup {
    degr-wfc-timeout 120;
    become-primary-on both;
  }

  disk {
    on-io-error detach;
  }

  net {
    cram-hmac-alg sha1;
    shared-secret "SECRET";

    data-integrity-alg sha1;
    allow-two-primaries;

    after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    rate 60M;
  }

  on janus {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.2:7882;
    meta-disk internal;
  }

  on mimas {
    device /dev/drbd2;
    disk /dev/mapper/sde1_crypt;
    address 10.10.1.3:7882;
    meta-disk internal;
  }
}

- drbdadm create-md r2

md_offset 26836983808
al_offset 26836951040
bm_offset 26836131840

Found some data

 ==> This might destroy existing data! <==

Do you want to proceed?
[need to type 'yes' to confirm] yes

Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Both nodes
- drbdadm create-md r2

- drbdadm attach r2

- drbdadm syncer r2

Second node:
- drbdadm -- --discard-my-data connect r2

First node:
drbdadm -- --overwrite-data-of-peer primary r2

- drbdadm connect r2

- dpkg-reconfigure ocfs2-tools

- update-rc.d o2cb disable

- Created the following cib objects

primitive resDrbd2 ocf:linbit:drbd \
    params drbd_resource="r2" \
    operations $id="resDrbd2-operations" \
    op monitor interval="20s" role="Master" timeout="20s" \
    op monitor interval="30s" role="Slave" timeout="20s"

ms msDrbd2 resDrbd2 \
          meta resource-stickiness="100" \
          master-max="2" master-node-max="1" \
          clone-max="2" clone-node-max="1" \
          notify="true" globally-unique="false"

location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or #uname eq node2

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

primitive resDlm ocf:pacemaker:controld \
          op monitor interval="120s" \
          op start interval="0" timeout="90" \
          op stop interval="0" timeout="100"

clone cloneDlm resDlm \
      meta globally-unique="false" interleave="true"

colocation colDlm-on-msDrb2dMaster inf: cloneDlm msDrbd2:Master

order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2

primitive ...

Read more...

Revision history for this message
HenningMalzahn (malzahn) wrote :

The requested corosync configuration file

Revision history for this message
HenningMalzahn (malzahn) wrote :

The full Pacemaker configuration

Revision history for this message
Jacob Smith (jsmith-argotecinc) wrote :

You have a location constraint for msDrbd2 that requires it to be master on node 1 only. I would assume it's leftover from your master/slave setup:

   location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

I would remove that and see what happens. Also a few other things:

In ordering statements the action performed on the first item is applied to all the others unless explicitly defined. In the statement below if msDrbd2 is not master it will promote it to master and once that finished it will then try to take the same "promote" action on cloneDlm also. If all items required a "start" action then you could not define the action at all or only on define it on the first and it would be fine but if you have mixed actions then all must be defined. It should be changed to match the second statement.

   order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

   order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm:start

In this ordering statement I think you meant to put before not after... :-)

   order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm

Also to follow-up on Ante's comment - you were following DRBD users guide for Legacy non-Pacmaker setup. I don't know if you found it but here is the direct link to the Pacemaker config in the DRBD users guide:

   http://www.drbd.org/users-guide/s-ocfs2-pacemaker.html

Last - once you have it functioning well you can group the primitives for Dlm and o2cb before you clone them and then eliminate a couple order and colo statements. Then you can combine the colocation and ordering statements for Drbd, the ocfs control group clone and the file system into one of each. I.E.

   group g_ocfs2control p_controld p_o2cb

   clone cl_ocfs2control g_ocfs2control \
           meta globally-unique="false" interleave="true" target-role="Started"

   colocation c_fs_srv_on_ocfs2control_on_drbd_srv_master inf: cl_fs_srv cl_ocfs2control ms_drbd_srv:Master

   order o_drbd_srv_master_before_ocfs2control_before_fs_srv 0: ms_drbd_srv:promote cl_ocfs2control:start cl_fs_srv:start

Hope that helps!

Revision history for this message
Jacob Smith (jsmith-argotecinc) wrote :

Whoops! I put them backwards too!

"In this ordering statement I think you meant to put before not after... :-)

   order ordDlm-before-msDrbdMaster2 0: msDrbd2:promote cloneDlm"

I meant it should be "after" not "before"!

Revision history for this message
HenningMalzahn (malzahn) wrote :

Hello Jacob,

thank you for your reply. I'm not sure whether the Pacemaker resource definitions are the real problem as I keep getting the following message

Jul 13 16:18:01 node2 ocfs2_controld[29698]: Unable to open checkpoint "ocfs2:controld": Object does not exist
Jul 13 16:18:53 node2 ocfs2_controld[29698]: last message repeated 102178 times
Jul 13 16:18:53 node2 ocfs2_controld[29698]: Unable to open checkpoint "ocfs2:controld": Object does not exist

What I do not understand is the the command: ps aux | grep -i controld yields

root 3757 0.0 0.0 7624 1012 pts/0 S+ 16:21 0:00 grep --color=auto -i controld
root 29115 0.0 0.0 131712 2332 ? Ssl 16:16 0:00 dlm_controld.pcmk -q 0
root 29698 7.1 0.0 84668 2872 ? Ss 15:55 1:53 /usr/sbin/ocfs2_controld.pcmk

Thank you for your help.

Henning

Revision history for this message
Jacob Smith (jsmith-argotecinc) wrote :

I know how it might not look like a Pacemaker problem but I believe it is. Pacemaker is in charge of everything. As I put in the comment before I believe this line disallows Drbd2 to be master on node 2:

location locDrbd2Master msDrbd2 rule role=master inf: #uname eq node1

Since everything else relies on Drbd2 being promoted to master (o2cb, dlmcontrol, filesystem via order and colocation constraints) none of those service will be started on node 2. Therefore you cannot have ocfs2_controld running on node 2 because Drbd2 is never master which means it could not be contacted. This would generate the errors in your logs... at least that's what I think! Though I'm not sure why dlm_controld.pcmk is running...

Hopefully it help!

Jake

Revision history for this message
HenningMalzahn (malzahn) wrote :
Download full text (10.4 KiB)

Hello Jacob,

you were right. Of course the leftover from the Master/Slave setup up was utterly wrong and I removed that.
Other than that I changed the following Pacemaker objects:

- Object that defines the multi state resource:
  - ms msDrbd2 resDrbd2 meta resource-stickiness="100" master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" interleave="true"
  - Previously defined without the interleave=true" option

- Object that defines the primitive for the DLM
  - primitive resDlm ocf:pacemaker:controld op monitor interval="120s"
  - Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100"

- Object that defines the primitive for the O2CB service
  - primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
  - Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100"

- Object that defines the primitive for the filesystem object
  - primitive resFs2 ocf:heartbeat:Filesystem params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" op monitor interval="120s"
  - Now added WITHOUT the previously used options: op start interval="0" timeout="90" op stop interval="0" timeout="100" meta target-role="stopped"

So the overall configuration that works is the following:

primitive resDrbd2 ocf:linbit:drbd params drbd_resource="r2" operations $id="resDrbd2-operations" op monitor interval="20s" role="Master" timeout="20s" op monitor interval="30s" role="Slave" timeout="20s"

ms msDrbd2 resDrbd2 meta resource-stickiness="100" master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" globally-unique="false" interleave="true"
location locDrbd2AllowedNodes msDrbd2 rule 200: #uname eq node1 or #uname eq node2

primitive resDlm ocf:pacemaker:controld op monitor interval="120s"
clone cloneDlm resDlm meta globally-unique="false" interleave="true"
colocation colDlmDrbd inf: cloneDlm msDrbd2:Master
order ordDrbdDlm 0: msDrbd2:promote cloneDlm
location locCloneDlmAllowedNodes cloneDlm rule 200: #uname eq node1 or #uname eq node2

primitive resO2CB ocf:pacemaker:o2cb op monitor interval="120s"
clone cloneO2CB resO2CB meta globally-unique="false" interleave="true"
colocation colO2CBDlm inf: cloneO2CB cloneDlm
order ordDlmO2CB 0: cloneDlm cloneO2CB
location locCloneO2CBAllowedNodes cloneO2CB rule 200: #uname eq node1 or #uname eq node2

primitive resFs2 ocf:heartbeat:Filesystem params device="/dev/drbd2" fstype="ocfs2" directory="/var/www" op monitor interval="120s"
clone cloneFs2 resFs2 meta globally-unique="false" interleave="true"
colocation colFs2-on-CloneO2CB inf: cloneFs2 cloneO2CB
order ordFs2-after-cloneO2CB inf: cloneO2CB cloneFs2
location locFs2AllowedNodes cloneFs2 rule 200: #uname eq node1 or #uname eq node2

I monitored this configuration for the last several weeks and the only thing left to figure out why the network connection between the nodes is dropped from time to time. The "uptime" varies before the following is logged on the first node:

Aug 26 11:27:42 node1 kernel: [93305.714992] block drbd2: sock was shut down by peer
Aug 26 11:27:42 node1 ...

Revision history for this message
Jacob Smith (jsmith-argotecinc) wrote : Re: [Ubuntu-ha] [Bug 799711] Re: o2cb[11796]: ERROR: ocfs2_controld.pcmk did not come up

Henning

You are probably right that this isn't the right place to continue with non-bug questions though Andres or others could answer that more acurately.

I would recommend drbd-users mailing list as there are many experts there for config and troubleshooting. Also the pacemaker mailing lists is a good one.

*snip*

I don't have any idea why your getting the broken pipe... but do you have STONITH/fencing configured?!

Normally when you have a comm link break like that then you would want Pacemaker to STONITH the disconnected node. Prior to the STONITH which ever DRBD node is going to survive should fence the resource preventing it from becoming primary until it is UpToDate.

I use the fence peer handler in DRBD set to resource only and then have STONITH configure in Pacemaker. This way I can have a break in DRBD that doesn't automatically STONITH the node (but prevents the borked DRBD from coming up as Master and causing split brain) unless the Cluster communications are also dead at which point Pacemaker will shoot the node.

I think the lack of fencing/STONITH is causing the split brain because both nodes do their own thing when not communicating which causes the diverging data set.

>
> This causes a split brain every time this happens even though there
> are
> no writes on the devices yet.

You have your split brain handling configured like this still?:
    after-sb-0pri disconnect;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;

Your are telling it to disconnect regardless of changes with these split brain lines (I believe it's part of the cause). Have you considered using some more agressive split brain handling if your going with dual primary? They can be controversial topic due to data loss but...

In the users guide at the bottom of this page it lists split brain behaviors that are considered OK for dual primary/clustered filesystem setups:

http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html

    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;

You would likely hit sb-1pri if you had fencing. It would go something like this:

Break in comms
Node 1 is fenced preventing it from becoming master
Pacemaker shoots node 1
Node 1 reboots and (if setup to auto start the cluster) pacemaker accepts the node back into the cluster
Drbd links up and finds Node 1 is diverged
Node 1 is fenced so it is not master right now
Considers the after-sb-1pri rule - this assumes that since it's dual primary if you have one primary then the dataset on that primary is always good and there is no need to perserve the secondary data so just overwrite it.
Basically executes the commands you did manually and discards the Node 1 data.
Once Node 1 is UpToDate DRBD removes the fencing and allows Node 1 to become master

I hope all of that wasn't too confusing!

Jake

Revision history for this message
HenningMalzahn (malzahn) wrote :

Hi Jacob,

> You have your split brain handling configured like this still?:
> after-sb-0pri disconnect;
> after-sb-1pri consensus;
> after-sb-2pri disconnect;
Yes, I stil have this configuration active. I'll try your suggestions.
No, I have not configured STONIT properly yet.
> I hope all of that wasn't too confusing!
Nope. Again, thanks for all of your help and the help of everyone else here!

As I'm still in early stages of learning how to set up Pacemaker clusters I think it's really the best to close
this as this way truly not a bug.

> I would recommend drbd-users mailing list as there are many experts there for config and troubleshooting.
> Also the pacemaker mailing lists is a good one.
Yepp. Will use those resources for further questions.

So long

Henning

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ocfs2-tools (Ubuntu) because there has been no activity for 60 days.]

Changed in ocfs2-tools (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.