[SRU] connection problems under load with hardy dovecot

Bug #189616 reported by James Troup on 2008-02-06
8
Affects Status Importance Assigned to Milestone
dovecot (Ubuntu)
Medium
Mathias Gug
Hardy
Medium
Mathias Gug
Intrepid
Medium
Mathias Gug

Bug Description

Binary package hint: dovecot

Hi,

We recently upgraded our mail server to a (backported-to-dapper) hardy
dovecot because we wanted SSL certificate chaining support but we had
to revert back to dapper-dovecot the next day because the hardy
version was failing under load. Unfortunately there was no obvious
pattern to the failures and nothing helpful in the server-side logs.
The symptoms on the client side was MUAs timing out or simply not
picking up new mail. Turning on IMAP level debugging on the client
side wasn't useful either. Just before the revert, it got so bad that
even a simple openssl s_client -connect to the dovecot server was
hanging.

Even more unfortunately, I can't easily reproduce this problem. I
can't afford to break our production mail server again, and this
problem wasn't evident in the smaller scale testing we did prior to
the upgrade. I tried battering a test hardy-dovecot instance with
groups of openssl s_client's in a for loop, but that wasn't sufficient
to reproduce the problem.

--
James

Related branches

Kees Cook (kees) wrote :

While using the qa-regression-testing scripts, I saw an SSL connection hang once while doing a batch of test runs. When it hung, I had to kill -9 the server to get it to drop the port. Before seeing this bug, I assumed I had just done something goofy, especially since I never saw the issue again after the first failure.

Mathias Gug (mathiaz) on 2008-04-07
Changed in dovecot:
assignee: nobody → mathiaz
Nick Barcet (nijaba) on 2008-04-08
Changed in dovecot:
milestone: none → ubuntu-8.04
Nicolas Valcarcel (nvalcarcel) wrote :

Which back end did you use for dovecot? passwd, mysql, ldap? Can you post the config file cleaning the sensible information please.

Nicolas Valcarcel (nvalcarcel) wrote :

Also i would like to know how did you backport postfix, changing the sources to hardy ones and installing postfix using apt-get or just downloading the packages and installing them using dpkg

James Troup (elmo) wrote :

> Which back end did you use for dovecot? passwd, mysql, ldap?

passwd

> Can you post the config file cleaning the sensible information please.

I'll do this tomorrow.

> Also i would like to know how did you backport postfix, changing the
> sources to hardy ones and installing postfix using apt-get or just
> downloading the packages and installing them using dpkg

Assuming you mean dovecot, we backported it in the normal way,
i.e. downloading the dovecot source package from hardy, and
recompiling it for dapper and then installing the resulting binaries
with apt/dpkg.

Nicolas Valcarcel (nvalcarcel) wrote :

yes, sorry i meant dovecot, i don't know where i have my head on.

James Troup (elmo) wrote :
Mathias Gug (mathiaz) wrote :

I've spent some time trying to reproduce this problem but all my attempts have been unsuccessful so far.

Using a hardy dovecot server, I've tried to reproduce the unable to connect problem. I ran tests with ~ 200 clients connecting concurrently. I hit the default login_max_processes_count limit (set to 128). Once raised to 256, all clients were able to successfully login. I've attached the scripts used to do the test.

Changed in dovecot:
status: New → Incomplete
Mathias Gug (mathiaz) wrote :
Nicolas Valcarcel (nvalcarcel) wrote :

i think it may be a compatibility issue on the config files, so for the test case we should:
a) install dapper
b) paste the configuration James posted here
c) test if it works ok
d) compile hardy's dovecot from source on dapper, and upgrade it
e) reproduce the bug
I have been a little busy this day's but i will give a try this week if everything goes as it should.

Nicolas Valcarcel (nvalcarcel) wrote :

I can't build hardy's dovecot on dapper out of the box, can you please post the exact changes that was made to dovecot for it to build (if you can upload the .deb it will be better)

Nicolas Valcarcel (nvalcarcel) wrote :

I have got the debdiff, uploading it for the record.

Nicolas Valcarcel (nvalcarcel) wrote :

I have run 12 instances of the script (the python one) in parallel and nothing goes wrong, I think it has been a packaging issue.

James Troup (elmo) wrote :

So we just upgraded the mail server to hardy, and unfortunately we ran
into the same problem again.

This takes the dapper backport and any possible interaction with
dapper libraries/kernel etc. out of the picture as we're now running a
vanilla hardy server (with hardy kernel and hardy dovecot) and
experiencing the same issue.

I've had to downgrade to the dapper dovecot (running on hardy) so that
our users can access their email.

Steve Langasek (vorlon) on 2008-05-01
Changed in dovecot:
milestone: ubuntu-8.04 → ubuntu-8.04.1
Nicolas Valcarcel (nvalcarcel) wrote :

Can you provide us with some of the logs showing the errors you are experiencing?

Mathias Gug (mathiaz) wrote :

After some discussions, it seems that the problem is related to the login_max_processes_count option. The dapper version of dovecot wasn't enforcing this limit - this would explain why the dapper backport was working smoothly on hardy. Hardy version properly handles the login_max_process_count option.

Moreover most of the clients are using imaps. According to http://wiki.dovecot.org/LoginProcess, the imap-login process acts as an SSL/TLS proxy in the case of ssl connections. Combined with the fact that imap clients such as thunderbird are known to open more than one imap connections, the maximum number of login processes can be hit with less than 50 users connected at the same time.

I've attached a debdiff that logs a warning message when login_max_process_count is hit and no -login processes can be freed. It also add a note to the dovecot configuration.

Changed in dovecot:
importance: Undecided → Medium
status: Incomplete → In Progress
Mathias Gug (mathiaz) wrote :

An updated version of dovecot is available in my PPA - https://launchpad.net/~mathiaz/+archive/.

Nicolas Valcarcel (nvalcarcel) wrote :

Shouldn't i be included on hardy?

Steve Langasek (vorlon) on 2008-05-14
Changed in dovecot:
importance: Undecided → Medium
milestone: none → ubuntu-8.04.1
status: New → In Progress
assignee: nobody → mathiaz
Chuck Short (zulcss) wrote :

Yes its being targeted for 8.04.1

Chuck Short (zulcss) wrote :

Due to a change in ssl handling in dovecot between hardy and dapper. The default configuration in dovecot can prevent server loads to rise and become non-responsive when using dovecot. In dapper the login_max_processes_count was not being enforced, however in hardy it is. The attached patch displays in the error log that max login count has been reached, it also puts a comment in the dovecot example configuration file.

Testcase (Should be done in a virtual machine)

1. Install dovecot.
2. Configure ssl with dovecot
3. Run the create users script.
4. Set the login_max_process_count option to 4.
5. Run the client login script.

Chuck Short (zulcss) wrote :

Sorry that should be "cause the server load to rise and become non-responsive".

Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here

Changed in dovecot:
milestone: ubuntu-8.04.1 → none
status: In Progress → Fix Committed
Steve Langasek (vorlon) on 2008-05-15
Changed in dovecot:
milestone: none → ubuntu-8.04.1
Launchpad Janitor (janitor) wrote :
Download full text (3.4 KiB)

This bug was fixed in the package dovecot - 1:1.0.13-4ubuntu1

---------------
dovecot (1:1.0.13-4ubuntu1) intrepid; urgency=low

  * Merge from debian unstable, remaining changes:
    - DebainMaintainerField
    - Use Snakeoil SSL certificate by default.
      + debian/control: Depend on ssl-cert
      + debian/patches/ssl-cert-snakeoil.dpatch: Change default SSL cert paths
        to snakeoil.
      + debian/dovecot-common.postinst: Relax grep for SSL_* a bit.
    - Fast TearDown:
      + debian/rules: Call dh_installinit in 'multiuser' mode.
      + debian/control: Depend on newer sysv-rc for this.
      + debian/dovecot-common.postinst: Remove stp script symlinks from rc0 and rc6 on upgrades.
        Need to be kept unil next LTS release.
    - Add autopkgtest in debian/tests/*.
    - Don't fail in postinst if dovecot-{sql,ldap} is missing. (LP: #153161)
    - Dropped upstream-mail-group-fixes.dpatch. No longer needed.
    - Dropped upstream-invalid-password-fixes.dpatch. No longer needed.
    - debian/dovecot-common.init: Check to see if there is an /etc/inetd.conf. (LP: #208411)
    - debian/patches/login-max-processes-count-warning.dpatch: Tell the user
      that they have reached the maxium number of processes count. (LP: #189616)

dovecot (1:1.0.13-4) unstable; urgency=low

  * debian/patches/dovecot-MANAGESIEVE-9.3.dpatch: updated managesieve to
    version 9.3.
  * debian/dovecot-common.README.Debian: added a note about how to configure
    dovecot to log to file instead of using syslog.
  * debian/dovecot.1: added a SIGNALS section. (Closes: #479059)
  * dovecot-sieve: updated to the last hg release (5c3ba11994cb).
    (Closes: #479104)

dovecot (1:1.0.13-3) unstable; urgency=low

  * debian/rules: do not install anymore the ldap and sql example
    configuration files under /etc. (Closes: #472674)
  * debian/dovecot-common.postinst: really chmod
    /etc/dovecot/dovecot-{ldap,sql}.conf files to 0600.
  * debian/devecot-common.init: do not start the service if dovecot.conf
    doesn't exist. (Closes: #475888)

dovecot (1:1.0.13-2) unstable; urgency=low

  * debian/rules: use aclocal-1.9 instead of aclocal. (Closes: #473754)

dovecot (1:1.0.13-1) unstable; urgency=high

  * New upstream release, fixes a security issue:
    http://www.dovecot.org/list/dovecot-news/2008-March/000064.html

dovecot (1:1.0.12-1) unstable; urgency=high

  * New upstream release. (Closes: #469457)
  * debian/patches/dovecot-MANAGESIEVE-9.2.dpatch: updated, thanks to Marco
    Nenciarini for the patch.

dovecot (1:1.0.10-4) unstable; urgency=low

  * debian/patches/autocreate.dpatch: added, thanks to Walter Reiner.
  * debian/rules: use --with-ioloop=best instead of --with-ioloop=epoll, as
    suggested by Timo. (Closes: #466296)

dovecot (1:1.0.10-3) unstable; urgency=low

  * debian/patches/dovecot-MANAGESIEVE-9.1.dpatch: added, thanks to Aleksey
    Midenkov for providing a patch. (Closes: #416166)
  * debian/dovecot-common.init: added $time to Should-Start. (Closes: #461543)
  * debian/dovecot-common.postinst: do not add the dovecot user to the mail
    group, it is not required by upstream. (Closes: #457123)
  * debian/control: updated Standards-Vers...

Read more...

Changed in dovecot:
status: In Progress → Fix Released
Chuck Short (zulcss) on 2008-05-26
Changed in dovecot:
status: Fix Committed → Confirmed
Martin Pitt (pitti) wrote :

Any testers?

Chuck Short (zulcss) wrote :

Ill test it today

Chuck Short (zulcss) wrote :

This works for me.

chuck

Martin Pitt (pitti) wrote :

Copied to hardy-updates.

Changed in dovecot:
status: Confirmed → Fix Committed
status: Fix Committed → Fix Released
Adam Sommer (asommer) wrote :

Just an FYI it worked fine for me as well.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers