drbd8-utils not compatible with linux-lts-raring kernel in 12.04

Bug #1185756 reported by Raybuntu
172
This bug affects 30 people
Affects Status Importance Assigned to Milestone
drbd8 (Ubuntu)
Invalid
Undecided
Unassigned
Precise
Fix Released
High
Stefan Bader

Bug Description

Request for SRU:
[Impact]
DRBD will not work (hang) on fresh install using Ubuntu 12.04.3 media, and will stop working on sites where the Raring Enablement Stacks is manually installed as the API between older and newer drbd kernel modules has changed.

[Fix]
The current version of drbd8 utils in Saucy/Raring can be compiled with legacy utils enabled (basically drbdadm and drbdsetup) and automatically switches to use the legacy version when an older kernel module is found. Comparing the code of those two legacy tools showed them to be mostly the same (except some things that actually look like bug fixes).
I only found two small issues, one was the init.d script which was changed to use a new command of drbdadm to activate resources. This would fail if drbdadm fell back to the legacy version. So I picked the shell function that the current util uses and verified that this still works with the new binary.
The other problem was the default config file which contained a new open which would cause the legacy util to fail. It does not seem to be a required one in the new tools to commenting it out by default seems to work in both cases, too.
Lastly (this did not seem to be a real issue) the legacy tools claimed to be a version 3.8.10 while the the code really looked like the 3.8.11 version we have in Precise. Since that also matches the version number of the drbd module in Precise I modified the legacy tools version to be 3.8.11.

[Test Case]
For testing compatibility with the Precise 3.2 based kernels, either just install the prepared package and verify everything still works as before (before installing any HWE kernel). Or if already having installed a HWE kernel and experiencing the issue, boot into the 3.2 kernel before installing the proposed package (or follow the downgrade instructions before booting back).

To test functionality with HWE kernels, install the Raring kernel in Precise, install/configure DRBD: you get "No response from the DRBD driver! Is the module loaded?". With the proposed backport the mirror continues to work. Only for switching back to an older kernel a special procedure must be followed (see comment #21):

http://www.drbd.org/users-guide/s-downgrading-drbd84.html

---

I've just installed linux-generic-lts-raring on 12.04.2 and my drbd device stoped working.
Seems like drbd8-utils is not compatible with DRBD 8.4 in Kernel 3.8.
I see that we can't upgrade the package since this would break compatibility with the older Kernels in Precise.

But since the new Plans for the LTS-Enablement Stack [1] there should be a package like drbd8-utils-lts-raring. And the dependancies should be resolved automatically with apt.

kind regards

[1] https://wiki.ubuntu.com/Kernel/LTSEnablementStack

upload caused regression bug 1314289.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.8.0-22-generic 3.8.0-22.33~precise1
ProcVersionSignature: Ubuntu 3.8.0-22.33~precise1-generic 3.8.11
Uname: Linux 3.8.0-22-generic x86_64
ApportVersion: 2.0.1-0ubuntu17.2
Architecture: amd64
Date: Thu May 30 11:53:13 2013
InstallationMedia: Ubuntu-Server 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120817.3)
MarkForUpload: True
SourcePackage: linux-lts-raring
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Raybuntu (raybuntu) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-raring (Ubuntu):
status: New → Confirmed
Revision history for this message
Rocco (rocco) wrote :

Same problem. Installed 2 server both 12.04.3. Drdb complaining in cluster configuration.

ERROR: drbd config error: DRBD module version: 8.4.2
   userland version: 8.3.11
you should upgrade your drbd tools!

Revision history for this message
Tenho Tuhkala (muppis) wrote :

This bug should affect drbd8-utils as well because that is the one what should be upgraded/fixed.

affects: linux-lts-raring (Ubuntu) → drbd8 (Ubuntu)
description: updated
Revision history for this message
Numérigraphe (numerigraphe) wrote :

Would someone from the Bug Control Team please consider this as a request for SRU?

If this cannot be fixed as an SRU, at least it should be documented in the Release Notes for 12.04.3 server, and affected users should be advised to downgrade to the Quantal stack.

Lionel Sausin.

Changed in drbd8 (Ubuntu Precise):
status: New → Confirmed
Changed in drbd8 (Ubuntu Precise):
importance: Undecided → High
Revision history for this message
Stefan Bader (smb) wrote :

The newer version of the drbd8 utils offer a way to include backwards compatible versions of drbdadm and drbdsetup. So it seemed possible to backport that version. There were a few details to add but I think I got a version that seems to work ok within 12.04/Precise and a 3.2 kernel and after installing a 3.8 kernel.
One WARNING: booting into a 3.8 kernel and having a resource connected seems to modify the meta-data in a way that is not recognized when booting back into an older kernel. It is possible to force a create-md but then a full resync happens.

I will also attach binary packages of the backport I did. Maybe affected parties could give those a more thorough validation than me. Not sure whether this is changing much more than SRU should do but on the other hand, if it would work with all kernels it would be much better than adding release notes which might be read too late.

tags: added: patch
2 comments hidden view all 103 comments
Revision history for this message
Numérigraphe (numerigraphe) wrote :

Great news.
I have 2 Precise clusters, one has the Raring stack and the other has the original Precise stack.
I'll give the package a try and let you know.
Please bear with me if that takes a little time.
Lionel

Revision history for this message
Numérigraphe (numerigraphe) wrote :

Dear Stefan Bader (smb),
I installed the 64-bits package you posted on 1 node of each cluster and it worked mostly as expected indeed.

The first cluster has kernel 3.8.
It had tools 8.4.2 installed from the sources. I uninstalled them (make unsintall was a bit buggy in that version but no mater), and replaced with your package.
drbdadm --version now shows :
DRBDADM_BUILDTAG=GIT-hash:\ 89a294209144b68adb3ee85a73221f964d3ee515\ build\ by\ phil@fat-tyre\,\ 2013-02-05\ 15:35:49
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x080402
DRBDADM_VERSION_CODE=0x080403
DRBDADM_VERSION=8.4.3

The second cluster has kernel 3.2. (backported by your humble servant)
It had tools and module v8.3.13 installed from a custom package backported by you humble servant
I upgraded to you package and drbdadm now shows :
DRBDADM_BUILDTAG=GIT-hash:\ 89a294209144b68adb3ee85a73221f964d3ee515\ build\ by\ phil@fat-tyre\,\ 2013-02-05\ 15:35:49
DRBDADM_API_VERSION=88
DRBD_KERNEL_VERSION_CODE=0x08030d
DRBDADM_VERSION_CODE=0x08030a
DRBDADM_VERSION=8.3.10

On the second cluster, 2 out of 6 resources failed to reconnect gracefully, citing a mismatch in verify-alg. drbdadm --adjust all on the upgraded node brought it back in order.

In both clusters, the drbd config was left untouched (I refused the packaged version of the config file).
The module was not unloaded and primary resources remained in production.

Lionel Sausin.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

Please note that when using kernel 3.2, the tools have actually been DOWNgraded from 8.3.11 to 8.3.13 when the "drbd8-utils_8.4.3..." package was installed.
Lionel

Revision history for this message
Numérigraphe (numerigraphe) wrote :

I plan to upgrade our DRBD 8.3 cluster to 8.4 some time this autumn : should I expect difficulties using your package, or will the 8.4 tools automatically become available ?
Lionel

Revision history for this message
Stefan Bader (smb) wrote :

@Numérigraphe,

thanks for the quick testing. To answer the question from #12 first: The version used by that package depends on the version of the kernel. So just upgrading kernel to 3.8 (or later) and booting with that kernel will cause the 8.4 utils part to be used. This unfortunately does some automatic change to the on-disk meta-data (as mentioned above). So while upgrading the kernel causes no issues, trying to boot back into an old kernel will cause a failure to bring up the drbd device. It looked to be possible to force it back but then caused a complete re-sync.

> Please note that when using kernel 3.2, the tools have actually been DOWNgraded
> from 8.3.11 to 8.3.13

I think you might have meant to say to 8.3.10. Yes it seems kind of odd as I would have expected the latest version in Raring to contain also the latest legacy version. Need to check whether there was smaller update in Precise and whether that can easily be merged.

Revision history for this message
Numérigraphe (numerigraphe) wrote : Re: [Bug 1185756] Re: drbd8-utils not compatible with linux-lts-raring kernel in 12.04

> I think you might have meant to say to 8.3.10.
Yes, sorry I mixed it up.
Lionel.

Revision history for this message
Stefan Bader (smb) wrote :

So it seems the version displayed by the tools is coded to be that older version. The code itself rather looks to be more recent. So I would rather just tweak the version numbers.

Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

I re-uploaded the Raring debdiff and packages (sorry a bit unclean and using the same version number again). The only change to the previous version is the legacy tools version number being bumped.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

Thanks for your care.
I've had a problem on the cluster with drbd 8.3 when the host where your
package was installed.
The power was lost and when it was restored, the drbd resources did not
connect. It may be related to your package or not, I'm unsure.
I reverted to the standard package and brought the resources down and up
for a quick fix, but that may be worth investigating.
If time permits, I'll try to reproduce that on the other cluster and let
you know - unless someone else can confirm whether this works or not.
Lionel Sausin.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

Le 27/09/2013 09:34, Stefan Bader a écrit :
> So while upgrading the kernel causes no issues, trying to boot back into an old
> kernel will cause a failure to bring up the drbd device. It looked to be
> possible to force it back but then caused a complete re-sync.
The instructions for downgrading from 8.4 to 8.3 are here, maybe it's
worth adding them to the release notes?
     http://www.drbd.org/users-guide/s-downgrading-drbd84.html
Lionel Sausin.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

I just did some more testing on the second cluster and I met no problem at all.

With your package installed, I did the drbd down/apply-al procedure and downgraded to kernel 3.2. After fixing the config syntax, the resources came back cleanly without a full-sync. The drbd utils just worked.
At that point I noticed that DRBD 8.3.11 connects normally with the other node still using DRBD 8.4.2. That's just great!

I upgraded again to kernel 3.8 and reverted the config files: the utils worked fine and the resources connected without problem.

So I'd say this is just fine. Thanks again.

Lionel Sausin

Revision history for this message
Numérigraphe (numerigraphe) wrote :

As to the problem I met on the other cluster, kern.log has no trace of a DRBD error, so I'm pretty confident it was due to a race condition in my custom startup plumbing.
Lionel.

Revision history for this message
Stefan Bader (smb) wrote :

Thanks Lionel, I will update the SRU description and we will see. Backporting a much newer version of anything into an older release always is met with a lot of (justified by experience) reservations. It probably will need more volunteers to make a case for it (at least convincing the SRU team that there isn't any serious regression).

Revision history for this message
Numérigraphe (numerigraphe) wrote :

I see your point.
You still have the option to ship the new version as drbd8-utils-lts-raring. The 8.3 compatibility would still be a bonus for those who downgrade.
Otherwise, maybe you could push it to precise-backports ?
Lionel.

Stefan Bader (smb)
description: updated
Changed in drbd8 (Ubuntu Precise):
assignee: nobody → Stefan Bader (smb)
Changed in drbd8 (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Stefan Bader (smb) wrote :

Backports might be an option, too. Though then requiring to be aware of the problem and manually pick the backports version when upgrading to the LTS Raring kernel. Usability looks to be simpler with a SRU but I would defer that decision to the SRU team.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Generically, I think that an SRU to get it working with the HWE stack is reasonable. It will need to be tested with both the original and HWE stacks, so please update the test procedure to include testing both.

1 comments hidden view all 103 comments
Revision history for this message
Numérigraphe (numerigraphe) wrote :

There is a package with a proposed fix in comment #17 and #18 and we're waiting for affected users to test it.
So please do test it and report either success or failure here.
Lionel Sausin.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

Sorry, I'm replying to the wrong report.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

@flickerfly (josiah-ritchie) : that's strange as I don't see anything in the changelog for the kernel that might fix this. Which version of drbd8-utils are you using please ?
Lionel

Revision history for this message
flickerfly (josiah-ritchie) wrote :

Yeah, the problem is back again. I'm not sure why it worked briefly after a kernel upgrade and now it is impacting my boot up.

Revision history for this message
flickerfly (josiah-ritchie) wrote :

Okay, so my immediate problem was unrelated to this entirely. I didn't have /etc/hosts setup on them so they simply couldn't find each other. I'm sorry for cluttering this up. Thanks for indulging a drbd newb.

Revision history for this message
flickerfly (josiah-ritchie) wrote :

So I did run into the problem mentioned here and installing drbd8-utils_8.4.3-0ubuntu0.12.04.1_amd64.deb fixed that so I validate. (Glad to in the end be able to contribute something.)

I'm now finally watching /proc/drbd show syncing on this new drbd setup.

Stefan Bader (smb)
description: updated
Revision history for this message
Tenho Tuhkala (muppis) wrote :

Sorry for delay, had another rush to work on. Fix posted in #17 works fine in my set. Thank you.

mailtest-1:~$ uname -a
Linux mailtest-1 3.8.0-31-generic #46~precise1-Ubuntu SMP Wed Sep 11 18:21:16 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

mailtest-1:~$ drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 89a294209144b68adb3ee85a73221f964d3ee515\ build\ by\ phil@fat-tyre\,\ 2013-02-05\ 15:35:49
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x080402
DRBDADM_VERSION_CODE=0x080403
DRBDADM_VERSION=8.4.3

Revision history for this message
Doug Goldstein (cardoe) wrote :

Any word on seeing this package published? This ticket hasn't seen an update in 4 months. I can confirm that Ubuntu 12.04.3 Server out of the box on amd64 and i386 will not work with DRDB and requires smb's fixed debs from #17 and #18 (respectively).

Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Raybuntu, or anyone else affected,

Accepted drbd8 into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/drbd8/2:8.4.3-0ubuntu0.12.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in drbd8 (Ubuntu Precise):
status: Confirmed → Fix Committed
tags: added: verification-needed
tags: added: verification-done
removed: verification-needed
Revision history for this message
Lionel Sausin - Initiatives/Numérigraphe (ls-initiatives) wrote :

Verified to fix the problem on 1 node in each cluster at our end:

First cluster, Ubuntu 12.04 with Raring kernel, all packages up-to-date except lxc 0.8.
Uninstalled drbd-8-utils and re-installed from ...-proposed, rebooted.
drbd-overview, drbdadm up|down|disconnect|connect|verify still functional.
# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 89a294209144b68adb3ee85a73221f964d3ee515\ build\ by\ phil@fat-tyre\,\ 2013-02-05\ 15:35:49
DRBDADM_API_VERSION=1
DRBD_KERNEL_VERSION_CODE=0x080402
DRBDADM_VERSION_CODE=0x080403
DRBDADM_VERSION=8.4.3

Second cluster, Ubuntu 12.04 with original kernel, many out-of-date packages (can't upgrade now, sorry).
Simply upgraded drbd8-utils from ...-proposed. Can't reboot now, sorry.
drbd-overview, drbdadm up|down|disconnect|connect|verify still functional.
# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 89a294209144b68adb3ee85a73221f964d3ee515\ build\ by\ phil@fat-tyre\,\ 2013-02-05\ 15:35:49
DRBDADM_API_VERSION=88
DRBD_KERNEL_VERSION_CODE=0x08030d
DRBDADM_VERSION_CODE=0x08030b
DRBDADM_VERSION=8.3.11

Revision history for this message
Colin Watson (cjwatson) wrote : Update Released

The verification of the Stable Release Update for drbd8 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package drbd8 - 2:8.4.3-0ubuntu0.12.04.1

---------------
drbd8 (2:8.4.3-0ubuntu0.12.04.1) precise; urgency=low

  * Backport version from 13.04 so that the utils will handle the
    changed in-kernel drbd module from the LTS Raring kernel
    (LP: #1185756).
  * Activate legacy command support by adding --with-legacy_utils to
    debian/rules.
  * Comment out "options" section in global_common.conf as testing
    showed problems in the backport.
  * Forward port the adjust_with_progress shell function from older
    versions of the utils and add the path to legacy utils to PATH.
  * Adjust legacy version to 8.3.11 as the code looks to be at least
    that version or later.
 -- Stefan Bader <email address hidden> Sat, 21 Sep 2013 18:40:51 -0500

Changed in drbd8 (Ubuntu Precise):
status: Fix Committed → Fix Released
23 comments hidden view all 103 comments
Revision history for this message
Rocco (rocco) wrote :

@smb I'm afraid I can't. The problem with the upgrade of drbd8-utils cause downtime on your main (production) DB-cluster today.

I can look though the logs tomorrow, but it was the drbd ocf master/slave in pacemaker that was failing, causing drbd to get "unconfigured" on both nodes.

Revision history for this message
Thomas Jagoditsch (tja) wrote :

got a maintainance-window today and tried drbd8-utils_8.4.3-0ubuntu0.12.04.2~rc1_amd64.deb which doesnt help the cluster cause.
attached a cleanup result of one of the drbd resources.
service drbd start still works.

Revision history for this message
Stefan Bader (smb) wrote :

It is a bit odd. Ok, some warnings seems to be coming from pacemaker commands. Like the
WARN: decode_transition_key: Bad UUID (crm-resource-3048) in sscanf result (3) for 0:0:crm-resource-3048

I guess the problem is this:
Apr 22 21:55:43 storage0 lrmd: [1242]: info: RA output: (p_drbd_bigpool:0:start:stderr) drbdadm: Unknown command 'syncer'
Apr 22 21:55:43 storage0 drbd[3079]: ERROR: bigpool: Called drbdadm -c /etc/drbd.conf syncer bigpool

This would mean that drbdadm did not pass control on drbdadm-83. And that would only happen if the current version did not seem to be 8.3. That version number looks to be read from /proc/drbd and would reflect the version of the kernel module. At least as long as reading from /proc/drbd is not failing. Then it would fall back to a version number defined in the command (which would be wrong). Would have been interesting to find out whether calling "drbdadm syncer bigpool" manually would fail the same way but that would need another maintenance window. Maybe try to see what /proc/drbd is showing.

I probably would suggest that we open a new bug report for the cluster case, so this one can be closed with the fix to drbdsetup. Then at least the direct usage seems to work.

Revision history for this message
Rocco (rocco) wrote : Re: [Bug 1185756] Re: drbd8-utils not compatible with linux-lts-raring kernel in 12.04
Download full text (5.3 KiB)

On a side note, I'm not sure why a new package where released for the
raring stack? Is there a really brilliant reason for not having two
different packages?
On 23 Apr 2014 12:06, "Stefan Bader" <email address hidden> wrote:

> It is a bit odd. Ok, some warnings seems to be coming from pacemaker
> commands. Like the
> WARN: decode_transition_key: Bad UUID (crm-resource-3048) in sscanf result
> (3) for 0:0:crm-resource-3048
>
> I guess the problem is this:
> Apr 22 21:55:43 storage0 lrmd: [1242]: info: RA output:
> (p_drbd_bigpool:0:start:stderr) drbdadm: Unknown command 'syncer'
> Apr 22 21:55:43 storage0 drbd[3079]: ERROR: bigpool: Called drbdadm -c
> /etc/drbd.conf syncer bigpool
>
> This would mean that drbdadm did not pass control on drbdadm-83. And
> that would only happen if the current version did not seem to be 8.3.
> That version number looks to be read from /proc/drbd and would reflect
> the version of the kernel module. At least as long as reading from
> /proc/drbd is not failing. Then it would fall back to a version number
> defined in the command (which would be wrong). Would have been
> interesting to find out whether calling "drbdadm syncer bigpool"
> manually would fail the same way but that would need another maintenance
> window. Maybe try to see what /proc/drbd is showing.
>
> I probably would suggest that we open a new bug report for the cluster
> case, so this one can be closed with the fix to drbdsetup. Then at least
> the direct usage seems to work.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1185756
>
> Title:
> drbd8-utils not compatible with linux-lts-raring kernel in 12.04
>
> Status in “drbd8” package in Ubuntu:
> Invalid
> Status in “drbd8” source package in Precise:
> Fix Released
>
> Bug description:
> Request for SRU:
> [Impact]
> DRBD will not work (hang) on fresh install using Ubuntu 12.04.3 media,
> and will stop working on sites where the Raring Enablement Stacks is
> manually installed as the API between older and newer drbd kernel modules
> has changed.
>
> [Fix]
> The current version of drbd8 utils in Saucy/Raring can be compiled with
> legacy utils enabled (basically drbdadm and drbdsetup) and automatically
> switches to use the legacy version when an older kernel module is found.
> Comparing the code of those two legacy tools showed them to be mostly the
> same (except some things that actually look like bug fixes).
> I only found two small issues, one was the init.d script which was
> changed to use a new command of drbdadm to activate resources. This would
> fail if drbdadm fell back to the legacy version. So I picked the shell
> function that the current util uses and verified that this still works with
> the new binary.
> The other problem was the default config file which contained a new open
> which would cause the legacy util to fail. It does not seem to be a
> required one in the new tools to commenting it out by default seems to work
> in both cases, too.
> Lastly (this did not seem to be a real issue) the legacy tools claimed
> to be a version 3.8.10 while the the code really looked ...

Read more...

Revision history for this message
Stefan Bader (smb) wrote :

This was done because later point releases of 12.04 come with newer kernels and the drbd module in later kernels needed the newer userspace.

Revision history for this message
Stefan Bader (smb) wrote :

@Rocco, that said (and not remembering), could it be that that server uses a kernel version >3.2.x? What does /proc/drbd say?

Revision history for this message
Rocco (rocco) wrote :

~# cat /proc/drbd
version: 8.3.11 (api:88/proto:86-96)
srcversion: 93CE421BB73A731BDC72D8E

~# uname -a
Linux zed1 3.2.0-60-generic #91-Ubuntu

@smb I mean like it states in the header:
But since the new Plans for the LTS-Enablement Stack [1] there should be a package like drbd8-utils-lts-raring. And the dependancies should be resolved automatically with apt.

Why wasn't the new drbd userspace tools needed for the lts-raring stack released as drbd8-utils-lts-raring? Then us people whom installed 12.04 during the beginning wouldn't be affected and people using the newer 12.04.x still could use drbd.

The state now is that people whom have upgraded to the raring stack are fine (one of my clusters are that, but here I had to compile the drbd tools myself to get drbd working since the old one provided didn't work, AKA what this bug is about from the start), and everyone else is borked.... So same problem just the other way around...

Revision history for this message
Andreas Braun (camouflagex) wrote :

I also had a problem starting DRBD with my Corosync/Pacemaker cluster after upgrading drbd8-utils to 8.4.3.

This is a line from the log file where it is failing:
Apr 24 08:21:02 www6a lrmd: [1183]: info: RA output: (p_drbd_r0:0:start:stderr) Could not connect to 'drbd' generic netlink family
Command 'drbdsetup new-resource r0' terminated with exit code 20

I am running Linux kernel 3.5.0-48 on Ubuntu 12.04.4 LTS
# uname -a
Linux www6a 3.5.0-48-generic #72~precise1-Ubuntu SMP Tue Mar 11 20:09:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

DRBD module was always at version 8.3.13
# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3

I was able to fix it by downgrading drbd8-utils again:
# apt-get install drbd8-utils=2:8.3.11-0ubuntu1

Also removing startup scripts:
# update-rc.d -f drbd remove

Revision history for this message
Stefan Bader (smb) wrote :

@Rocco, the problem is that there did not seem to be any sane way to do a separate package and dependencies. You cannot depend on it in the lts kernel package because drbd is not a required package. In don't know of any way in Debian packaging to declare complex relationships like "if package a gets installed and package b is installed then replace b with c". And even if there would, installing the lts-kernel package does not automatically remove the non-hwe kernels (for people that installed with the original release or .1 update images). So since the kernels can be installed in parallel, the two drbd package would need to be there in parallel, too.

So backporting the newer drbd userspace and enabling the compat binaries looked to be the best way. And at least testing with a resource defined in drbd, I could install the new package and get the drbd disk up with old-kernel/old package, old kernel/new package and newer kernel/newer package. And only because of that we went ahead with the backport.

So right now it seems like people who "only" use drbd resource are ok but clusters using drbd (pacemaker only or others?) are unfortunately broken. Hopefully we can figure out what goes wrong quickly. Unfortunately setting up a test cluster is not the most trivial thing to do when starting from zero. I started but right now I got the general config ok but no drbd resources defined.

So far, the only way with a 3.2 kernel and the new packager where I can get the drbdadm syncer command to fail like it was reported is to remove the drbd module. Then there is no /proc/drbd and drbdadm --version reports the 8.4 .x version. And that does not cause the drbdadm-83 binary to be chain-executed. So syncer is no valid command.
The drbd package ships with a drbd.ocf file (from reading guides on the web it feels like that could be related to use drbd from pacemaker). That script also does call drbdadm --version to decide whether to use the syncer command or not.
But maybe there are places in pacemaker code that run drbdadm commands directly. Still, if those execute the drbdadm command that comes with the new drbd package that should automatically execute the compat binary as long as the old kernel is running and the drbd module is loaded.

Revision history for this message
Stefan Bader (smb) wrote :

@Andreas, that particular error is fixed by the test packages (see comment #60). There could be one in conjunction with pacemaker which reports an error about "drbdadm syncer" not being a valid command. Unfortunately I don't think anyone who is running a pacemaker cluster really can afford to help debugging as that causes disruptions.

Revision history for this message
Christoph Mitasch (cmitasch) wrote :
Download full text (4.7 KiB)

Hello,

the Linux Cluster Management Console (LCMC) makes it quite easy to setup a DRBD/Pacemaker cluster.
http://lcmc.sourceforge.net/

Christoph

----- Ursprüngliche Mail -----
> Von: "Stefan Bader" <email address hidden>
> An: <email address hidden>
> Gesendet: Donnerstag, 24. April 2014 10:35:01
> Betreff: [Bug 1185756] Re: drbd8-utils not compatible with linux-lts-raring kernel in 12.04
>
> @Andreas, that particular error is fixed by the test packages (see
> comment #60). There could be one in conjunction with pacemaker which
> reports an error about "drbdadm syncer" not being a valid command.
> Unfortunately I don't think anyone who is running a pacemaker cluster
> really can afford to help debugging as that causes disruptions.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1185756
>
> Title:
> drbd8-utils not compatible with linux-lts-raring kernel in 12.04
>
> Status in “drbd8” package in Ubuntu:
> Invalid
> Status in “drbd8” source package in Precise:
> Fix Released
>
> Bug description:
> Request for SRU:
> [Impact]
> DRBD will not work (hang) on fresh install using Ubuntu 12.04.3 media, and
> will stop working on sites where the Raring Enablement Stacks is manually
> installed as the API between older and newer drbd kernel modules has
> changed.
>
> [Fix]
> The current version of drbd8 utils in Saucy/Raring can be compiled with
> legacy utils enabled (basically drbdadm and drbdsetup) and automatically
> switches to use the legacy version when an older kernel module is found.
> Comparing the code of those two legacy tools showed them to be mostly the
> same (except some things that actually look like bug fixes).
> I only found two small issues, one was the init.d script which was changed
> to use a new command of drbdadm to activate resources. This would fail if
> drbdadm fell back to the legacy version. So I picked the shell function
> that the current util uses and verified that this still works with the new
> binary.
> The other problem was the default config file which contained a new open
> which would cause the legacy util to fail. It does not seem to be a
> required one in the new tools to commenting it out by default seems to
> work in both cases, too.
> Lastly (this did not seem to be a real issue) the legacy tools claimed to
> be a version 3.8.10 while the the code really looked like the 3.8.11
> version we have in Precise. Since that also matches the version number of
> the drbd module in Precise I modified the legacy tools version to be
> 3.8.11.
>
> [Test Case]
> For testing compatibility with the Precise 3.2 based kernels, either just
> install the prepared package and verify everything still works as before
> (before installing any HWE kernel). Or if already having installed a HWE
> kernel and experiencing the issue, boot into the 3.2 kernel before
> installing the proposed package (or follow the downgrade instructions
> before booting back).
>
> To test functionality with HWE kernels, install the Raring kernel in
> Precise, install/configure DRBD: y...

Read more...

Revision history for this message
Rocco (rocco) wrote :
Download full text (8.9 KiB)

My clusters are configured to use Linbit:drbd which is:
/usr/lib/ocf/resource.d/linbit/drbd

and that file is in drbd8-utils

On Thu, Apr 24, 2014 at 10:45 AM, Christoph Mitasch <
<email address hidden>> wrote:

> Hello,
>
> the Linux Cluster Management Console (LCMC) makes it quite easy to setup a
> DRBD/Pacemaker cluster.
> http://lcmc.sourceforge.net/
>
> Christoph
>
> ----- Ursprüngliche Mail -----
> > Von: "Stefan Bader" <email address hidden>
> > An: <email address hidden>
> > Gesendet: Donnerstag, 24. April 2014 10:35:01
> > Betreff: [Bug 1185756] Re: drbd8-utils not compatible with
> linux-lts-raring kernel in 12.04
> >
> > @Andreas, that particular error is fixed by the test packages (see
> > comment #60). There could be one in conjunction with pacemaker which
> > reports an error about "drbdadm syncer" not being a valid command.
> > Unfortunately I don't think anyone who is running a pacemaker cluster
> > really can afford to help debugging as that causes disruptions.
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1185756
> >
> > Title:
> > drbd8-utils not compatible with linux-lts-raring kernel in 12.04
> >
> > Status in “drbd8” package in Ubuntu:
> > Invalid
> > Status in “drbd8” source package in Precise:
> > Fix Released
> >
> > Bug description:
> > Request for SRU:
> > [Impact]
> > DRBD will not work (hang) on fresh install using Ubuntu 12.04.3 media,
> and
> > will stop working on sites where the Raring Enablement Stacks is
> manually
> > installed as the API between older and newer drbd kernel modules has
> > changed.
> >
> > [Fix]
> > The current version of drbd8 utils in Saucy/Raring can be compiled with
> > legacy utils enabled (basically drbdadm and drbdsetup) and
> automatically
> > switches to use the legacy version when an older kernel module is
> found.
> > Comparing the code of those two legacy tools showed them to be mostly
> the
> > same (except some things that actually look like bug fixes).
> > I only found two small issues, one was the init.d script which was
> changed
> > to use a new command of drbdadm to activate resources. This would fail
> if
> > drbdadm fell back to the legacy version. So I picked the shell function
> > that the current util uses and verified that this still works with the
> new
> > binary.
> > The other problem was the default config file which contained a new
> open
> > which would cause the legacy util to fail. It does not seem to be a
> > required one in the new tools to commenting it out by default seems to
> > work in both cases, too.
> > Lastly (this did not seem to be a real issue) the legacy tools claimed
> to
> > be a version 3.8.10 while the the code really looked like the 3.8.11
> > version we have in Precise. Since that also matches the version number
> of
> > the drbd module in Precise I modified the legacy tools version to be
> > 3.8.11.
> >
> > [Test Case]
> > For testing compatibility with the Precise 3.2 based kernels, either
> just
> > install the prepared package and verify everything still works as
> bef...

Read more...

Revision history for this message
Brian Candler (b-candler) wrote :

I'm not sure if anyone has mentioned it yet, but the command syntax to the "drbdsetup" command has changed totally between drbd8-utils 8.3 and 8.4

e.g. if you are using drbd-utils 8.3 then the command is
drbdsetup /dev/drbd0 show

if you are using drbd-utils 8.4 then the command is
drbdsetup show /dev/drbd0

Therefore if you upgrade your kernel to the raring enablement one, your pacemaker (etc) scripts will break too.

Revision history for this message
Brian Candler (b-candler) wrote :

Hmm, just to clarify: when using the drbd8-utils 8.4 package and the kernel module 8.3, then drbdsetup execs the drbdsetup-83 compatibility module which implements the old syntax, and will also if necessary swap the first two arguments for "forward compatibility"

However there are some commands that drbdsetup from drbd8-utils 8.3.10 accepts but drbdsetup-83 doesn't. For example, ganeti issues the following command:

drbdsetup /dev/drbd2 disk /dev/xenvg/a614fd3b-d016-44a1-a6d6-c51334190757.disk0_data /dev/xenvg/a614fd3b-d016-44a1-a6d6-c51334190757.disk0_meta 0 -e detach --create-device -d 1024m

This works with drbd8-utils 8.3.10, but gives
USAGE: drbdsetup command device arguments options
with 8.4

Revision history for this message
Andreas Braun (camouflagex) wrote :

I tried the fixed package in comment 60. Unfortunately, the fix did not completely resolve the problems I had with my Corosync/Pacemaker cluster.

I also noticed, that I had problems with the new global_common.conf parameters, that is being installed with drbd-utils 8.4.3:
# service drbd start
 * Starting DRBD resources
drbd.d/global_common.conf:47: Parse error: 'an option keyword' expected,
 but got 'resync-rate'

So then I restored the old global_common.conf again. Now I could start DRBD services manually, but the cluster still did not work:
Apr 25 08:37:32 www6a lrmd: [1183]: info: RA output: (p_drbd_r0:0:start:stderr) drbdadm: Unknown command 'syncer'

I can't really say what the problem is, but it seems the service scripts use drbd-utils version 8.3 and the cluster uses drbd-utils 8.4 or something like that. Any ideas?

Revision history for this message
Stefan Bader (smb) wrote :

Well yes, there are already plenty of hints in this thread. The problem is that while having a config option to enable those compat binaries, those do not seem to be very well tested. I am working on it but its a bit of a mess. I think I got the ra script in a state now where it seems to work in the normal case but trying to use migrate from LCMC ended in a split-brain state.
Also LCMC seems to decide whether it allows to set up a drbd mirror based on some magic I have not yet figured out. So that gets deactivated as soon as one side installs the new package.

Revision history for this message
Stefan Bader (smb) wrote :

Updated http://people.canonical.com/~smb/lp1185756/ with packages that seem to fix normal corosync startup at least. Still not perfect. Migrate through the LCMC gui might still be a problem. I need to look into that next. Also if someone knows from the top of their head what the LCMC gui does to decide whether on can create a new drbd mirror on top of an existing volume, please let me know. That gets disabled when one side runs the new package and I can figure out why.

Revision history for this message
Stefan Bader (smb) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Finally figured out the problem with LCMC which maybe also improves other drbdsetup oddness. The problem is that unlike drbdadm, drbdsetup did not use the kernel drbd module version to decide whether it should run the legacy binary. Instead it waited for some socket connection to fail. But that happened relatively late and so at least the xml help would only be processed by the main command. And of course the arguments for that have been changed by upstream as well (help-xml instead of xml).

I added some code to additionally check /proc/drbd and in case of a module version < 8.4.0 run the legacy drbdsetup immediately. So far this allowed to use LCMC to define a new drbd resource and selecting migrate for the resource would display it as migrated instead of "exploding".

This should also be tested after installing a later kernel. But if the version as it is now unbreaks at least the users of 3.2 kernels its probably best to go with that and make fixes to the lts-kernel case later on top. Better than breaking current installs.

Revision history for this message
Stefan Bader (smb) wrote :

*sigh* Final update, at least this week. With rc4 my code actually does the right thing. Updated the kernel on one side to a 3.11 one and then the other side to the new package (keeping the 3.2 kernel there). From what I can tell, the cluster seems happy with that.

Revision history for this message
Simon Déziel (sdeziel) wrote :

Stefan, your rc4 update seems to have nailed it! My cluster running the 3.2 kernel is now happy. The new fix you used also pleases Ganeti. Thanks!

Revision history for this message
Jan Kellermann (jan-kellermann) wrote :

Stefan, thank you for the fixes! Will your version find its way to the update-repo for 12.04lts?

Revision history for this message
Stefan Bader (smb) wrote :

I hope if I can get someone to sponsor the upload. Of course every positive feedback about the last version helps me in my argument. :)

Revision history for this message
Thilo Uttendorfer (t-lo) wrote :

I still have problems after upgrading to drbd8-utils 2:8.4.3-0ubuntu0.12.04.2~rc4.

Especially the pacemaker drbd agent does not work (/usr/lib/ocf/resource.d/linbit/drbd). For example, there is a line "do_drbdadm $DRBD_TO_PEER -v adjust $DRBD_RESOURCE" which translates to "drbdadm -c /etc/drbd.conf --peer host01.example.com -v adjust res1". This results in an error:

--------------------
drbdsetup new-resource res1
USAGE: drbdsetup device command arguments options

Device is usually /dev/drbdX or /dev/drbd/X.
General options: --create-device, --set-defaults

Commands are:
[lots ofcommands...]

To get more details about a command issue 'drbdsetup help cmd'.

invalid command
Command 'drbdsetup new-resource res1' terminated with exit code 20
--------------------

$ uname -a
Linux coreboso-vsb02.coreboso.de 3.2.0-60-generic #91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

drbd version 8.3.11

I really think that 2:8.4.3-0ubuntu0.12.04.1 should be removed from udpates until this issue is resolved. It breaks working pacemaker clusters.

Many thanks for all your work and efforts!

Revision history for this message
Andreas Braun (camouflagex) wrote :

I can confirm that drbd8-utils 2:8.4.3-0ubuntu0.12.04.2~rc4 works fine, at least for me.

I put one node to standby, installed drbd8-utils 2:8.4.3-0ubuntu0.12.04.2~rc4, did not change drbd.conf, brought node up again and migrated all resources to that node. Everything is working as expected:

 Master/Slave Set: ms_drbd_r0 [p_drbd_r0]
     Masters: [ www6b ]
     Slaves: [ www6a ]

# uname -a
Linux www6b 3.5.0-48-generic #72~precise1-Ubuntu SMP Tue Mar 11 20:09:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/drbd
version: 8.3.13 (api:88/proto:86-96)
srcversion: C0F510A918B92928FB51EE3
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:72 nr:788 dw:860 dr:7561 al:0 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Revision history for this message
Stefan Bader (smb) wrote :

@Thilo,

this is odd. Locally I see the same adjust command but it will get translated into a drbdadm-83 call which works. Would it be possible for you to create '/tmp/drbd.ocf.ra.debug', set it to chmod 700 for root and touch /tmp/drbd.ocf.ra.debug/log as root. Then "service corosync restart" will restart the cluster and execution of the resource agent gets logged into /tmp. That file would help to help to figure out what is going wrong.

Revision history for this message
Stefan Bader (smb) wrote :

Thilo, I looked at the debug data you sent me and somehow it seems that using hearbeat runs a different resource agent. So drbd itself ships a resource agent in /usr/lib/ocf/resource.d/linbit/drbd but hearbeat uses another one (.../resource.d/heartbeat/drbd) that comes with a different package (resource-agents). I could not yet compare the scripts but glancing on the output this looks pretty similar. If you create a backup of the drbd file in the heartbeat directory and then copy the drbd file from linbit over to the heartbeat directory.... would that work?

Revision history for this message
Stefan Bader (smb) wrote :

Must admit the two scripts as they are installed on my test system differ more. Weirdly in your output there was a version check of

++ DRBDADM_VERSION_CODE=0x08030b
++ DRBDADM_VERSION=8.3.11
+ (( 0x08030b >= 0x080302 ))

and that check I cannot find in the heartbeat/drbd script. And the linbit/drbd script in rc4 would compare against 0x080400. So I am not sure what script is actually running.

Revision history for this message
Martin Gerhard Loschwitz (martin-loschwitz) wrote :

That really depends on the Pacemaker configuration. If the primitive DRBD resource uses ocf::heartbeat::drbd, it's the resource agent from resource-agents, /usr/lib/ocf/resource.d/heartbeat/drbd, which is ancient and shouldn't be used.

If the primitive, however, uses ocf::linbit::drbd, it's the one in /usr/lib/ocf/resource.d/linbit/drbd, which should be fine.

Revision history for this message
Thilo Uttendorfer (t-lo) wrote :

Thanks, Martin, fot the clarification. The ocf::linbit::drbd agent is (and was) configured, so that's not the source of the problem.

Revision history for this message
Stefan Bader (smb) wrote :

Right, thanks Martin. So Thilo, the mysterious question is what is actually executed. If you manually unpack the rc4 deb:

dpkg -x drbd8-utils_8.4.3-0ubuntu0.12.04.2~rc4_amd64.deb unpack

and compare unpack/usr/lib/ocf/resource.d/linbit/drbd with what is in /usr/lib/... they should be the same. And if you look at the script you should find the version check (look for 080400) and if the version is lower the binary names are replaced by -83 compat variants...

Revision history for this message
Scott Moser (smoser) wrote :

regression is tracked under bug 1314289

description: updated
Revision history for this message
Thilo Uttendorfer (t-lo) wrote :

FTR, my reported bug was triggered by the environment variable DRBD_DONT_WARN_ON_VERSION_MISMATCH, which was set on my systems.

To work around this bug:
unset DRBD_DONT_WARN_ON_VERSION_MISMATCH
restart pacemaker
(do this before installing drbd8-utils 8.4.3-0ubuntu0.12.04.x)

See also: https://bugs.launchpad.net/ubuntu/+source/drbd8/+bug/1314598

Revision history for this message
Frantisek Sklenar (frantisek-sklenar) wrote :

I was able to fix this by downgrading drbd8-utils as explained in #71, Pacemaker/heartbeat.
CRM is again UP!

# uname -a
Linux flexfive 3.5.0-48-generic #72~precise1-Ubuntu SMP Tue Mar 11 20:09:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# drbdadm --version
DRBD module version: 8.3.13
   userland version: 8.3.11
you should upgrade your drbd tools!
DRBDADM_BUILDTAG=GIT-hash:\ 0de839cee13a4160eed6037c4bddd066645e23c5\ build\ by\ buildd@allspice\,\ 2011-07-05\ 19:51:07
DRBDADM_API_VERSION=88
DRBD_KERNEL_VERSION_CODE=0x08030d
DRBDADM_VERSION_CODE=0x08030b
DRBDADM_VERSION=8.3.11

Revision history for this message
sm8ps (sm8ps) wrote :

I can confirm that solution in #97 (i.e. #71) worked for me as well (also Pacemaker/Heartbeat).

# uname -a
Linux server 3.5.0-49-generic #74~precise1-Ubuntu SMP Fri May 2 21:32:31 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# drbdadm --version : identical

# echo drbd8-utils "hold" | dpkg --set-selections
To make sure the package won't get upgraded without my noticing.

Revision history for this message
KAMI (kami911) wrote :

Hi,

I had same problem during a regular update. The package from http://people.canonical.com/~smb/lp1185756/ works for me in individual DRBD config and Pacemaker/Heartbeat env. Thank you!

Revision history for this message
Jacob van der Meulen (jacob-vd-meulen) wrote :

I had the same problem and found out that there is a different behaviour of the drbd agent between just stop and start the drbd resource or rebooting the system. When you stop drbd, the drbd module stay loaded so when you start drbd again, the validity check of the agent correctly detects an 8.3 module. However when you reboot, no module is loaded yet so the validity check thinks we have an 8.4 module. Then the agent loads the module but does not check the validity any more. So it uses the 8.4 tools to create the drbd devices. This fails! Adding a call to the validity check function just after the module is loaded fixes this problem. So no need to revert.

I just found this by trial and error so it could have side effects. Maybe Stefan can confirm that this is really fixing something.

Revision history for this message
Stefan Bader (smb) wrote :

@Jacob, sorry for the long delay. I tried to confirm your report but could not see any problem (drbd8 version 2:8.4.3-0ubuntu0.12.04.2). Not sure whether you refer to "eval $(drbdadm --version ...)" in /etc/init.d/drbd. That could indeed be wrong on start. Though the values are actually only evaluated for "stop". Probably should be moved there for being more obvious. If you still see the issue (there were a lot of compat mode issues fixed between 12.04.1 and .2) it would be best to open a new bug report to find out what the cause is.

Revision history for this message
Jacob van der Meulen (jacob-vd-meulen) wrote :

@Stefan, thanks for the reply. It seems my information was not that clear. Here some more so it is more clear what my problem was.

The problem is with the pacemaker drbd agent:
/usr/lib/ocf/resource.d/linbit/drbd

Here a snippet of this agent:
-----------------
drbd_start() {
        local rc
        local status
        local first_try=true

        rc=$OCF_ERR_GENERIC

        if ! is_drbd_enabled; then
                do_cmd modprobe -s drbd `$DRBDADM sh-mod-parms` || {
                        ocf_log err "Cannot load the drbd module.";
                        return $OCF_ERR_INSTALLED
                }
                drbd_validate_all
                ocf_log debug "$DRBD_RESOURCE start: Module loaded."
        fi
------------------

The line with "drbd_validate_all" is the one I added. Without this line, drbd will not work after reboot.
I hope this clearify a bit more.

Revision history for this message
Stefan Bader (smb) wrote :

Ah yes, that helps a lot. So somehow (not sure this is because I am using the Linux Cluster Management Console for setting up my test environment or because timing is just in my favour) I seem to avoid this because the drbd init script already loads the module. So by the time the pacemaker agent runs it does not have to. But I can see how it might happen. In order to get this into a Stable Release Update this should go into a new bug report, though. I believe "ubuntu-bug drbd8" should be the simplest way to do this. The description should point out why the module does not get loaded before in your case (if you know). Subscribe me to the new report, then I can spot it more easily.

Displaying first 40 and last 40 comments. View all 103 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.