upgrade process does not upgrade underlying BDB format from 4.7 to 4.8 (so slapd aborts with "Program version 4.8 doesn't match environment version 4.7" error message)

Bug #658227 reported by Andrew Schulman on 2010-10-11
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openldap (Ubuntu)
High
Unassigned
Lucid
High
James Page
Maverick
High
James Page
Natty
High
Unassigned

Bug Description

IMPACT:Currently all configured openldap installations being upgraded from Lucid -> Maverick will fail due to underlying database version incompatibility.

FIX:slapd install/upgrade scripts already have functionality to dump existing ldap databases and re-load post upgrade. These have been amended to trigger when upgrading to 2.4.23+ as this is the point where the BDB upgrade was introduced. The fix has been tested using a PPA + modifying a lucid install to point at maverick repositories (do-release-upgrade disables PPA's so unable to test this way). Data loaded into ldap was verified pre and post upgrade.

PATCH: See proposed linked branch

TEST CASE:Steps to re-produce on a new Lucid install
 sudo apt-get install slapd ldap-utilsFollow the Lucid Server guide through the section entitled populating LDAP to the point where data is loaded into ldap for the backend configuration and frontend data.Perform a Lucid->Maverick update:
 sudo do-release-upgrade -d
slapd fails to start with the following error message: Oct 11 16:19:33 lucid-clone-01 slapd[773]: bdb(dc=example,dc=com): Program version 4.8 doesn't match environment version 4.7

REGRESSION POTENTIAL:
Limited; impacts upgrades to latest version of slapd during lucid->maverick upgradeand changes are in installation scripts only.

ORIGINAL BUG REPORT:

I just upgraded from Lucid to Maverick, and now slapd won't start. From syslog:

Oct 11 06:10:31 helium slapd[12130]: @(#) $OpenLDAP: slapd 2.4.23 (Aug 7 2010 01:39:36) $#012#011buildd@yellow:/build/buildd/openldap-2.4.23/debian/build/servers/slapd
Oct 11 06:10:32 helium slapd[12131]: bdb(dc=5200-glenwood,dc=net): Program version 4.8 doesn't match environment version 4.7
Oct 11 06:10:32 helium slapd[12131]: hdb_db_open: database "dc=5200-glenwood,dc=net" cannot be opened, err -30971. Restore from backup!
Oct 11 06:10:32 helium slapd[12131]: bdb(dc=5200-glenwood,dc=net): txn_checkpoint interface requires an environment configured for the transaction subsystem
Oct 11 06:10:32 helium slapd[12131]: bdb_db_close: database "dc=5200-glenwood,dc=net": txn_checkpoint failed: Invalid argument (22).
Oct 11 06:10:32 helium slapd[12131]: backend_startup_one (type=hdb, suffix="dc=5200-glenwood,dc=net"): bi_db_open failed! (-30971)
Oct 11 06:10:32 helium slapd[12131]: bdb_db_close: database "dc=5200-glenwood,dc=net": alock_close failed
Oct 11 06:10:32 helium slapd[12131]: slapd stopped.

Information on the web says that I need to run some kind of manual database upgrade to BDB 4.8, but I have no familiarity with the BDB tools, and the recipes that are suggested are varied and uncertain.

Please suggest a workaround or manual upgrade, as some of my network services are at a dead stop due to this problem.

Thanks,
Andrew.

ProblemType: BugDistroRelease: Ubuntu 10.10
Package: slapd 2.4.23-0ubuntu3
ProcVersionSignature: Ubuntu 2.6.35-22.33-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelModules: nvidia
Architecture: amd64
Date: Mon Oct 11 06:07:49 2010InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bashSourcePackage: openldap

WORKAROUND:
Install db4.7-util and recover slapd databases:
apt-get install db4.7-util
cd /var/lib
cp -a ldap ldap.bak
cd ldap
db4.7_checkpoint -1
db4.7_recover

slapd will then start correctly.

Related branches

Andrew Schulman (andrex) wrote :
Andrew Schulman (andrex) wrote :

This appears to be the same as http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=595672 . A fix appears to have been released to that, but again I don't understand the cause or solution, or what I should do to recover at this point.

Andrew Schulman (andrex) wrote :

OK, per https://bbs.archlinux.org/viewtopic.php?id=84077 and then http://dbaspot.com/forums/berkeley-db/265933-how-upgrade-4-2-4-3-a.html, I installed db4.7-utils and ran

cd /var/lib
cp -a ldap ldap.bak
cd ldap
db4.7_checkpoint -1
db4.7_recover

After that, slapd started normally.

Can you look through the /var/log/dist-upgrade/apt-term.log and post the lines that come from the upgrade of the slapd package?

(I don't know off hand if any of the discussion there applies in the Lucid-to-Mavick upgrade case, but in case it's helpful I'll point you to LP #536958, which covers the similar bug I ran into when upgrading from Hardy to Lucid.)

Ubuntu devs,

I took a quick look at the slapd.posting/slapd.scripts-common files in the lp:ubuntu/maverick/openldap branch, and also in the Bazaar change summary for revision 26 (which is the one that includes the note "Use libdb4.8-dev (LP: #572489)"), but I don't see any edits to the postinst script to cause the "database_format_changed()" function to to get triggered when upgrading across that db4.7-to-db4.8 switch.

Is this step no longer expected to be needed when doing the BDB upgrade, or does the postinst need to be updated to recognize this situation?

James Page (james-page) wrote :

Confirmed; Steps to re-produce on a new Lucid install

 sudo apt-get install slapd ldap-utils

Follow the Lucid Server guide through the section entitled populating LDAP to the point where data is loaded into ldap for the backend configuration.

Perform a Lucid->Maverick update:

 sudo do-release-upgrade -d

slapd fails to start with the following error message

Oct 11 16:19:33 lucid-clone-01 slapd[773]: bdb(dc=example,dc=com): Program version 4.8 doesn't match environment version 4.7

Changed in openldap (Ubuntu):
status: New → Confirmed
James Page (james-page) on 2010-10-11
description: updated
James Page (james-page) wrote :

Note that the way the pre and post installation scripts should upgrade the database is to dump the data from the pre-upgrade database using slapcat, wipe out the existing database files and then reload from the pre-upgrade data dumped to disk in the post-installation step.

However as @Nathan points out this is not happening.

Andrew Schulman (andrex) wrote :

Here are excerpts from apt/term.log. Note that this is in a different apt log because my dist-upgrade was a little unusual - interrupted, then completed in a second session. I'm attaching the complete apt log file in case it's of interest.

Log started: 2010-10-11 03:59:04
<snip>
Restarting services possibly affected by the upgrade:
  slapd: restarting...done.
<snip>
Preparing to replace slapd 2.4.21-0ubuntu5.3 (using .../slapd_2.4.23-0ubuntu3_amd64.deb) ...
Stopping OpenLDAP: slapd.
Unpacking replacement slapd ...
<snip>
Setting up slapd (2.4.23-0ubuntu3) ...

Installing new version of config file /etc/ldap/schema/README ...

Installing new version of config file /etc/ldap/schema/cosine.ldif ...

Installing new version of config file /etc/ldap/schema/inetorgperson.ldif ...

Installing new version of config file /etc/ldap/schema/nis.ldif ...

Installing new version of config file /etc/ldap/schema/openldap.ldif ...

Installing new version of config file /etc/ldap/schema/duaconf.schema ...

Installing new version of config file /etc/ldap/schema/dyngroup.schema ...

Installing new version of config file /etc/ldap/schema/inetorgperson.schema ...

Installing new version of config file /etc/ldap/schema/misc.schema ...

Installing new version of config file /etc/ldap/schema/nis.schema ...

Installing new version of config file /etc/ldap/schema/openldap.schema ...

Installing new version of config file /etc/ldap/schema/pmi.schema ...

Installing new version of config file /etc/apparmor.d/usr.sbin.slapd ...

  Backing up /etc/ldap/slapd.d/ in /var/backups/slapd-2.4.21-0ubuntu5.3... done.

Starting OpenLDAP: slapd.

James Page (james-page) on 2010-10-11
Changed in openldap (Ubuntu):
importance: Undecided → Low

Andrew,
As we expected, this shows that the slapd scripts made no attempt to do an export/import cycle on your database. (When that happened during my Hardy->Lucid upgrade, I had a "Dumping..." line, like this:

  Preparing to replace slapd 2.4.9-0ubuntu0.8.04.3 (using .../slapd_2.4.21-0ubuntu3_amd64.deb) ...
  Stopping OpenLDAP: slapd.
    Dumping to /var/backups/slapd-2.4.9-0ubuntu0.8.04.3:
    - directory dc=example,dc=com... done.
  Unpacking replacement slapd ...

And the postinst script should generate the corresponding "Loading from ...." lines, as well.)

However, from the log you posted, it looks like slapd was restarted successfully during the upgrade process. Is that true? In other words, did the upgrade process complete successfully, and the slapd daemon only begin failing when you tried to restart it sometime later? (When I ran into this problem for the Hardy->Lucid upgrade, the error caused the "Starting OpenLDAP" command to fail, which in turned caused dpkg to abort with a "subprocess installed post-installation script returned error exit status 1" error message.)

I just remembered that the postinst failure I mentioned in my previous post wasn't triggered by the restart of the slapd daemon, but rather by another step that the postinst script was attempting to do at that time.

So, in your case, did the apt upgrade/configure cycle appear to complete normally, and it wasn't until afterwards you discovered that the slapd daemon wasn't actually running?

(Similarly, James, did your "do-release-upgrade -d" process appear to finish normally, or did you get an error during the slapd package install that cause the release-upgrade to abort?)

James Page (james-page) wrote :

do-release-upgrade -d completely successfully with no errors; this message from the log above is a bit misleading:

 Starting OpenLDAP: slapd.

slapd starts and then fails when it tried to attach to the database (with the associated 4.7/4.8 error message).

James Page (james-page) wrote :

Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!

Changed in openldap (Ubuntu):
status: Confirmed → Triaged
James Page (james-page) on 2010-10-12
description: updated
James Page (james-page) on 2010-10-12
summary: - won't start after Maverick upgrade; bdb "Program version 4.8 doesn't
- match environment version 4.7"
+ upgrade process fails to upgrade underlying BDB format from 4.7 to 4.8
James Page (james-page) on 2010-10-12
summary: - upgrade process fails to upgrade underlying BDB format from 4.7 to 4.8
+ upgrade process does not upgrade underlying BDB format from 4.7 to 4.8

Given that this seems to affect any system upgrading slapd from Lucid to Maverick, I wonder if it's worth trying to get it added to the Maverick release notes?

description: updated
summary: upgrade process does not upgrade underlying BDB format from 4.7 to 4.8
+ (so slapd aborts with "Program version 4.8 doesn't match environment
+ version 4.7" error message)
Thierry Carrez (ttx) on 2010-10-12
Changed in openldap (Ubuntu):
importance: Low → High
James Page (james-page) on 2010-10-13
Changed in openldap (Ubuntu):
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
James Page (james-page) on 2010-10-14
description: updated
description: updated
description: updated

I think this proposed patch will not do anything to help people who've already upgraded from Lucid to Maverick, but who have just left the package half-configured because of this problem (since it won't trigger the upgrade if the $OLD_VERSION is "2.4.23-0ubuntu3", for example).

I guess that's unavoidable, since on a system in that situation the slapd.preinst wouldn't be able to run the database export, anyway (since the "slapcat" binary would also give the "Program version 4.8 don't match" error on such a system).

Still, I wonder if it makes sense to have the changelog say something like "If you previously upgraded from a Lucid version of slapd to 2.4.23-0ubuntuX and haven't already manually upgraded the slapd database version, you will need to follow the steps described in that bug." or whatever?

Steve Langasek (vorlon) wrote :

That's not unavoidable; just bump the minimum version check to the maverick release version instead of the lucid version. New installations of maverick will get an excess database dump/restore, but the upgrade will be clean for everyone.

(This is what was done in the Debian package, fwiw.)

Changed in openldap (Ubuntu Lucid):
status: New → In Progress
importance: Undecided → High
assignee: nobody → James Page (james-page)
Changed in openldap (Ubuntu Natty):
status: In Progress → Triaged
assignee: James Page (james-page) → nobody
Steve Langasek (vorlon) wrote :

For natty, this should be addressed by merging the latest version of the Debian package.

On Thu, Oct 14, 2010 at 16:31:20 -0000, Steve Langasek wrote:
> That's not unavoidable; just bump the minimum version check to the
> maverick release version instead of the lucid version. New
> installations of maverick will get an excess database dump/restore, but
> the upgrade will be clean for everyone.

Yeah, that's what I was going to suggest at first, but I don't think it
will cover all the cases.

Say someone had a working Lucid version of slapd they've already tried
upgrading to the Maverick release 2.4.23-0ubuntu3, and they haven't ever
done the manual "db4.7_XXXX" steps to convert the database. In that
case, they'll still have a v4.7 BDB environment in their /var/lib/ldap
directory... but the version of slapcat installed on their machine will
be from 2.4.23-0ubuntu3.

That is, they'll already have a slapcat linked against libdb4.8, and so
the slapd.pre/postinst scripts won't be able to export the "old"
database in preparation for importing it using the "new" version.

I don't have an environment available where I can actually test this
myself, but as far as I can see once someone has installed slapd
2.4.23-0ubuntu3 it's too late to try fixing this problem with the usual
export/import cycle....

       Nathan

Steve Langasek (vorlon) wrote :

On Thu, Oct 14, 2010 at 05:32:03PM -0000, Nathan Stratton Treadway wrote:
> On Thu, Oct 14, 2010 at 16:31:20 -0000, Steve Langasek wrote:
> > That's not unavoidable; just bump the minimum version check to the
> > maverick release version instead of the lucid version. New
> > installations of maverick will get an excess database dump/restore, but
> > the upgrade will be clean for everyone.

> Yeah, that's what I was going to suggest at first, but I don't think it
> will cover all the cases.

> Say someone had a working Lucid version of slapd they've already tried
> upgrading to the Maverick release 2.4.23-0ubuntu3, and they haven't ever
> done the manual "db4.7_XXXX" steps to convert the database. In that
> case, they'll still have a v4.7 BDB environment in their /var/lib/ldap
> directory... but the version of slapcat installed on their machine will
> be from 2.4.23-0ubuntu3.

Ah, you're probably right then and I'm just misremembering how this was
handled in Debian.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer http://www.debian.org/
<email address hidden> <email address hidden>

James Page (james-page) wrote :

On Thu, 2010-10-14 at 16:31 +0000, Steve Langasek wrote:
>
> ** Also affects: openldap (Ubuntu Lucid)
> Importance: Undecided
> Status: New
>
> ** Also affects: openldap (Ubuntu Natty)
> Importance: High
> Assignee: James Page (james-page)
> Status: In Progress
>
> ** Changed in: openldap (Ubuntu Lucid)
> Status: New => In Progress
>
> ** Changed in: openldap (Ubuntu Lucid)
> Importance: Undecided => High
>
> ** Changed in: openldap (Ubuntu Lucid)
> Assignee: (unassigned) => James Page (james-page)
>
> ** Changed in: openldap (Ubuntu Natty)
> Status: In Progress => Triaged
>
> ** Changed in: openldap (Ubuntu Natty)
> Assignee: James Page (james-page) => (unassigned)

Steve

This bug impacts Maverick+ (not Lucid as this uses an earlier version of
openldap so remains un-impacted by this issue).

--
James Page
Software Engineer, Ubuntu Server Team

Mathias Gug (mathiaz) wrote :

Marking the lucid task invalid as this needs to be fixed in maverick.

Changed in openldap (Ubuntu Maverick):
assignee: nobody → James Page (james-page)
Changed in openldap (Ubuntu Lucid):
status: In Progress → Invalid
Changed in openldap (Ubuntu Maverick):
importance: Undecided → High
status: New → In Progress
Mathias Gug (mathiaz) wrote :

I've uploaded a fix to maverick-update:

--- openldap-2.4.23/debian/slapd.scripts-common
+++ openldap-2.4.23/debian/slapd.scripts-common
@@ -46,7 +46,7 @@
 # Return success if yes.
 # Usage: if database_format_changed; then

- if dpkg --compare-versions "$OLD_VERSION" lt-nl 2.4.14; then
+ if dpkg --compare-versions "$OLD_VERSION" lt-nl 2.4.23-0ubuntu3.1; then
   return 0
  else

That will force a database dump for every upgrade to maverick. This is the same fix as in Debian (modulo the package revision).

It won't address the use case outlined in comment 17. For users that have already upgraded to maverick the workaround outlined in the bug description should be followed.

On Thu, Oct 14, 2010 at 17:47:19 -0000, Steve Langasek wrote:
> Ah, you're probably right then and I'm just
> misremembering how this was
> handled in Debian.

Looking through the Debian changelog, it appears that
there was a similar problem between 2.4.23-1 and 2.4.23-4.
The switch to libdb4.8 was made in 2.4.23-1, but the
change to slapd.scripts-common made at that time (SVN
revision 1275) used "lt-nl 2.4.21" as the conversion
cutoff, so the export/import cycle was missed for systems
upgrading from 2.4.21-1.

This was corrected in 2.4.23-4 (SVN 1307) after being
reported in debbugs #593550. I see that the correction
does use "2.4.23-4" as the cutoff version number, so I
think that in Debian there would be errors attempting to
upgrade from a pre-2.4.23 database to one of the earlier
2.4.23 packages (but without doing any manual fixes) and
then up to the -4 version. But it also looks like 2.4.23-4
was the first of the 2.4.23 versions to make it into
Testing, so presumably there weren't too many people
affected by that gap.

In Ubuntu we have the situation that the "gap" includes
the version released in Maverick, so it seems likely to
affect more people....

     Nathan

On Thu, Oct 14, 2010 at 19:07:47 -0000, Mathias Gug wrote:
> + if dpkg --compare-versions "$OLD_VERSION" lt-nl 2.4.23-0ubuntu3.1; then
> return 0
> else
>
>
> That will force a database dump for every upgrade to
> maverick. This is the same fix as in Debian (modulo the
> package revision).
>
> It won't address the use case outlined in comment 17.
> For users that have already upgraded to maverick the
> workaround outlined in the bug description should be
> followed.

Again, I don't have a way to actually test this, but I
believe that putting the "0ubuntu3.1" onto the comparison
means that in the case described in comment 17, the
upgrade script will attempt to do the export/import but
will not be able to do so (because the existing slapcat
will not be compatible with the existing db files), and
the package installation will abort.

If, on the other hand, the user had done the manual
repair steps already, the export/import will succeed --
but in that situation the db files have already been
converted, so the export/import cycle would not actually
be needed....

If I'm correct, then it's probably better to leave the
comparison as "lt-nl 2.4.23" (or perhaps "lt-nl
2.4.23-0ubuntu1"). That should guarantee that the export
is being run using a pre-2.4.23 version of slapcat, and
avoid touching the database on systems coming up from
post-2.4.23 versions (since the script can't do anything
for those systems anyway).

Nathan

On Thu, Oct 14, 2010 at 19:07:47 -0000, Mathias Gug wrote:
> I've uploaded a fix to maverick-update:

How long before this new version will be available by
default for a user upgrading to Maverick?

Would it make sense to add a Maverick Release Note
mentioning this error and advising users with the slapd
package installed to wait until the new package is in
place (in order to avoid having to go through the manual
steps to recover from installing in intermediate package)?

Nathan

Mathias Gug (mathiaz) wrote :

Thanks Nathan and James for the comments on ubuntu3.1.

I've uploaded a new version ubuntu3.2 to maverick-proposed.

I haven't been able to thoroughly test the ubuntu3.2 packages as my testing environment is currently broken. Please test the ubuntu3.2 packages and report any failure in package new installs and upgrades.

James Page (james-page) wrote :

I have capability to easily test both a clean upgrade scenario and one where an existing upgrade to ubuntu3 has already failed.

As soon as the package is in maverick-proposed I will pick this up.

James, would you also be able to re-try an upgrade from Lucid to the current Maverick version (slapd 2.4.23-0ubuntu3), and then confirm that the "slapcat" command does fail at that point (i.e. without having done the manually recovery steps)?

(I'd just like to be sure that once 2.4.23-0ubuntu3 is installed it really is too late for the slapd.preinst script dump_databases() function to be able to export the database....)

Thanks.

Nathan

James Page (james-page) wrote :

On Fri, 2010-10-15 at 13:18 +0000, Nathan Stratton Treadway wrote:
> James, would you also be able to re-try an upgrade from Lucid to the
> current Maverick version (slapd 2.4.23-0ubuntu3), and then confirm that
> the "slapcat" command does fail at that point (i.e. without having done
> the manually recovery steps)?
>
> (I'd just like to be sure that once 2.4.23-0ubuntu3 is installed it
> really is too late for the slapd.preinst script dump_databases()
> function to be able to export the database....)

Was on my list of test scenarios so no problem.

--
James Page
Software Engineer, Ubuntu Server Team

Accepted openldap into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in openldap (Ubuntu Maverick):
status: In Progress → Fix Committed
tags: added: verification-needed
James Page (james-page) wrote :

Tested successfully when upgrading slapd from Lucid (slapd-2.4.21-0ubuntu5.3) -> Maverick (2.4.23-0ubuntu3.2). Database dumped, backed-up and reloaded post upgrade OK.

James Page (james-page) wrote :

Tested upgrade from Lucid ((slapd-2.4.21-0ubuntu5.3) -> Maverick (2.4.23-0ubuntu3) -> Maverick (2.4.23-0ubuntu3.2). Manual recovery steps as detailed in this bug still required as first step of upgrade process left underlying database in an in-accessible state. This is the expected behaviour in this scenario.

Martin Pitt (pitti) on 2010-10-18
tags: added: verification-done
removed: verification-needed
James Page (james-page) wrote :

Tested successfully on full upgrade from Lucid to Maverick using do-release-upgrade -d.

Martin Pitt (pitti) wrote :

Copied to natty as well.

Changed in openldap (Ubuntu Natty):
status: Triaged → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openldap - 2.4.23-0ubuntu3.2

---------------
openldap (2.4.23-0ubuntu3.2) maverick-proposed; urgency=low

  * debian/slapd.templates: re-add slapd/move_old_database template as it's
    used during the package upgrade. Thanks to James Page for pointing it.
  * debian/slapd.config: restore debconf question slapd/move_old_database.

openldap (2.4.23-0ubuntu3.1) maverick-proposed; urgency=low

  [ James Page ]
  * Fixed install/upgrade process to dump/restore databases due
    to uplift to libdb4.8-dev (LP: #658227)
 -- Mathias Gug <email address hidden> Thu, 14 Oct 2010 16:56:38 -0400

Changed in openldap (Ubuntu Maverick):
status: Fix Committed → Fix Released
Andrew Schulman (andrex) wrote :

Yay!

Thanks guys, that was fast work.

tags: added: testcase
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.