postgrey fails regularly : "fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223."

Bug #135038 reported by loafer on 2007-08-27
38
Affects Status Importance Assigned to Milestone
db4.4 (Debian)
Fix Released
Unknown
db4.4 (Ubuntu)
Medium
Unassigned
postgrey (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: postgrey

Every night postgrey dies unexpectedly in gutsy. (Works fine in 6.06LTS)

Aug 27 02:24:43 localhost postgrey[18295]: delayed 780 seconds: client=41.232.9.170, <email address hidden>, <email address hidden>
Aug 27 02:38:22 localhost postgrey[18295]: cleaning up old logs...
Aug 27 02:38:22 localhost postgrey[18295]: cleaning up old entries...
...
Aug 27 02:38:30 localhost postgrey[18295]: cleaning main database finished. before: 61360, after: 49426
Aug 27 02:38:30 localhost postgrey[18295]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
... ... error above .... .....
Aug 27 02:38:31 localhost postfix/smtpd[29844]: warning: problem talking to server 127.0.0.1:60000: Connection refused
Aug 27 02:38:31 localhost postfix/smtpd[29844]: NOQUEUE: reject: RCPT from ns2.rim.edu.bt[202.144.157.210]: 451 4.3.5 Server configuration problem; from=<email address hidden> to=<email address hidden> proto=SMTP helo=<bppzftw>
...
Aug 27 02:38:32 localhost postfix/smtpd[29765]: warning: connect to 127.0.0.1:60000: Connection refused
Aug 27 02:38:32 localhost postfix/smtpd[29765]: warning: problem talking to server 127.0.0.1:60000: Connection refused
Aug 27 02:38:33 localhost postfix/smtpd[29765]: warning: connect to 127.0.0.1:60000: Connection refused
Aug 27 02:38:33 localhost postfix/smtpd[29765]: warning: problem talking to server 127.0.0.1:60000: Connection refused

A postgrey restart fixes the issue. I suspect this is a logrotate foo-bar or some such.

description: updated

Oh and I've got Net::Server up to date too. ;)

ii libnet-server-perl 0.94-1 An extensible, general perl server engine

Thanks for taking time to report bug in Ubuntu. With the same configuration I can not reproduce it. The time you give does not appear to be logrotate.
Do you use a out of the box configuration for postgrey or a personalised one?
Could you launch logrotate by hand (" /usr/sbin/logrotate /etc/logrotate.conf") and see if postgrey is still running after.
Thanks

Changed in postgrey:
status: New → Incomplete
loafer (launchpad-net-iloaf) wrote :
Download full text (5.8 KiB)

This is now a busy mailserver with about 100spam/min at busy times. It looks like it's not syslog related now.

The postgrey config is:
POSTGREY_OPTS="--inet=127.0.0.1:60000 --delay=40"
POSTGREY_TEXT="Server is not keen on you, see http://greylist.example.com/"

Here are the syslog entries from when it ides.

Sep 3 06:52:04 localhost postgrey[10558]: cleaning main database finished. before: 274048, after: 63815
Sep 3 06:52:04 localhost postgrey[10558]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 4 02:04:10 localhost postgrey[12464]: cleaning main database finished. before: 289400, after: 49192
Sep 4 02:04:10 localhost postgrey[12464]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 4 03:11:01 localhost postgrey[8868]: cleaning main database finished. before: 290126, after: 48884
Sep 4 03:11:01 localhost postgrey[8868]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 4 04:21:39 localhost postgrey[10136]: cleaning main database finished. before: 290793, after: 48493
Sep 4 04:21:39 localhost postgrey[10136]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 4 05:31:11 localhost postgrey[11328]: cleaning main database finished. before: 291073, after: 47722
Sep 4 05:31:11 localhost postgrey[11328]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 3 02:13:29 localhost postgrey[14053]: cleaning main database finished. before: 271150, after: 65691
Sep 3 02:13:29 localhost postgrey[14053]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 3 03:20:52 localhost postgrey[6832]: cleaning main database finished. before: 271584, after: 64917
Sep 3 03:20:52 localhost postgrey[6832]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 3 04:31:01 localhost postgrey[7976]: cleaning main database finished. before: 272450, after: 64504
Sep 3 04:31:01 localhost postgrey[7976]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 3 05:40:56 localhost postgrey[9311]: cleaning main database finished. before: 273488, after: 64383
Sep 3 05:40:56 localhost postgrey[9311]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 2 02:30:27 localhost postgrey[3926]: cleaning main database finished. before: 242519, after: 69659
Sep 2 02:30:27 localhost postgrey[3926]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 2 03:40:58 localhost postgrey[10336]: cleaning main database finished. before: 243533, after: 69331
Sep 2 03:40:58 localhost postgrey[10336]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 2 04:50:44 localhost postgrey[11620]: cleaning main database finished. before: 244632, after: 69420
Sep 2 04:50:44 localhost postgrey[11620]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.
Sep 2 06:00:5...

Read more...

Changed in postgrey:
status: Incomplete → New
Daniel Mueller (ubuntu-danm) wrote :

I can confirm this behavior. It appears on my (brand-new!) Ubuntu "gutsy" server as well. (The server struggles with about 10 SMTP connections per minute). Every night between 1:00am and 7:00am postgrey dies:

Oct 1 02:44:04 x postgrey[x]: cleaning up old logs...
Oct 1 02:44:04 x postgrey[x]: cleaning up old entries...
Oct 1 02:44:04 x postgrey[x]: cleaning main database finished. before: x, after: x
Oct 1 02:44:04 x postgrey[x]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.

My server does not run any cron-daemon/logrotate at all. After a short look in postgrey's source I found this at line 160:

    [..]
    # remove old keys
    # this is very expensive: We might refuse to speak to postfix for too
    # long, after which clients will start getting "450 Server configuration
    # problem" errors... do it only during the night and only if at least one
    # day has passed
    my $hour = (localtime($now))[2];
    if($hour > 1 and $hour < 7 and
        $now - $self->{postgrey}{last_maint_keys} >= 82800)
    [..]

I would say it is an error in postgrey's do_maintenance function. The newest version of postgrey (1.31) runs very stable on my other mailhosts (OpenSuSE). I have just replaced the exiting /usr/sbin/postgrey with v.1.31 (and copied the needed configuration files to /etc/postfix). If the error occurs again it has probably something to do with the perl-db modules. I'll report any results here.

bye,
Daniel

Daniel Mueller (ubuntu-danm) wrote :

Hmm, the same error with the newest version of postgrey:

Oct 2 02:11:32 x postgrey[x]: cleaning up old logs...
Oct 2 02:11:32 x postgrey[x]: cleaning up old entries...
Oct 2 02:11:32 x postgrey[x]: cleaning main database finished. before: x, after: x
Oct 2 02:11:32 x postgrey[x]: FATAL: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 289.

bye,
Daniel

podslatt (garymcm) wrote :

Just wanted to say I am seeing this same error at postgrey 1.27, gutsy.

Oct 2 02:53:16 bluejay postgrey[4121]: fatal: Can't call method "txn_commit" on an undefined value at /usr/sbin/postgrey line 223.

loafer (launchpad-net-iloaf) wrote :

Thanks for your input guys. I think the importance needs raising on this bug.

Very lame-ass workaround:
*/2 * * * * sleep 10; pidof postgrey >/dev/null || /etc/init.d/postgrey start

podslatt (garymcm) wrote :

I just installed monit and turned monitoring postfix and postgrey, because it takes down postfix. I agree this needs to be fixed for this reason alone.

Daniel Mueller (ubuntu-danm) wrote :

Hi,

yesterday I have installed debian's libdb4.4 version 4.4.20-10 from the 'unstable' branch [1]. I read the debian bugtracker and found out that they have raised a ticket[2] against postgrey as well. Let's hope that the updated library solves the nightly postgrey-crash. I will report my results tomorrow - it's about 0:12am here and I don't want to wait till 2:00am :-)

[1] http://packages.debian.org/sid/libdb4.4
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=441152
[3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=441069 (with reference to Ubuntu #135038 :-)

bye,
Daniel

Daniel Mueller (ubuntu-danm) wrote :

With libdb4.4 version 4.4.20-10 from debian 'unstable' postgrey is working stable for two days now.

Oct 4 02:15:55 x postgrey[x]: cleaning up old logs...
Oct 4 02:15:55 x postgrey[x]: cleaning up old entries...
Oct 4 02:15:55 x postgrey[x]: cleaning main database finished. before: x, after: x
Oct 4 02:15:55 x postgrey[x]: cleaning clients database finished. before: x, after: x
[..]
Oct 4 02:22:36 x postgrey[x]: action=pass, reason=triplet found, delay=x, client_name=x, client_address=x.x.x.x, recipient=info@xxx
[..]

I think updating Ubuntu's libdb4.4 would solve the issue.

bye,
Daniel

Matt LaPlante (cybrmatt) on 2007-10-18
Changed in postgrey:
status: New → Confirmed
Matt LaPlante (cybrmatt) wrote :

Why is it taking so long to get a response on this? postgrey is very much broken in gutsy, and Debian already addressed the problem.

Martijn Heemels (yggdrasil) wrote :

I have to second Matt's remark that this is taking ages. This is a imho a critical bug since it brings down the server's mail delivery completely and regularly, yet has been open for 2 months now.

Is Daniel Mueller's suggestion of updating libdb4.4 working? If so, please release it.

Regards, Martijn

Changed in postgrey:
importance: Undecided → Medium
Scott Kitterman (kitterman) wrote :

Ironically enough, my reading of the Debian bug is that postrgrey isn't broken, it's libdb4.4. Adding libdb4.4 task.

Changed in postgrey:
importance: Medium → Undecided
status: Confirmed → Invalid
Changed in db4.4:
importance: Undecided → Medium

My postgrey dies 4 times each night. a random minute of each 3am, 4am, 5am and 6am.

I've been running a cronjob every minute that restarts postgrey if its down and mails me for half a week.

Machine is gutsy upgraded from feisty (and it was stable on feisty). I have little time to investigate the causes, as i'm in the process of moving.

Changed in db4.4:
status: Unknown → Fix Released
Gustavo A. Díaz (gdiaz) wrote :

Same for me, and running Gutsy as well...
I was wondering why my email server stops to send emails once per day almost... and now i discovered this bug report.. which is the same problem happening to me... and since I've upgraded to gutsy... with feisty, zero problems.

Brendan Cully (bcully) wrote :

I'm me-tooing this bug to get updates.

Raoul Merki (raoul) wrote :

Same with me. Brand new installed and updated Gutsy.

Implemented the "Cron Fix" described above.

Installed the updated libdb4.4 package from debian/lenny. Fixes the problem (no more mails from my cronscript). Can we expect an updated package from Ubuntu any time soon?

(People following this 'advise': at your risk off course)

Daniel Hahler (blueyed) on 2007-11-05
Changed in db4.4:
assignee: nobody → blueyed
status: New → In Progress
Daniel Mueller (ubuntu-danm) wrote :

My system runs stable since 2007-10-02 (after replacing Ubuntu's libdb4.4 package with Debian's 4.4.20-10). I am really disappointed that nobody at Canonical Ltd. feels responsible for this bug. I mean - I've installed the Gutsy server version and postfix/postgrey is not an exotic combination, is it?

If you have in mind to replace the libdb4.4 package make sure that there's no other application using that particular Berkeley DB version. Otherwise you should backup your database files, update libdb4.4, install db4.4-util, run db4.4_recover -h <your db env> and cross the fingers.

Daniel Hahler (blueyed) wrote :

I'll make this a dupe of bug 153996 (which seems better for a meta-bug).
This affects subversion (with --fs-type bdb), too.

Changed in db4.4:
assignee: blueyed → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.