Activity log for bug #997217

Date Who What changed Old value New value Message
2012-05-09 17:05:33 David Farrar bug added bug
2012-05-10 18:45:27 Launchpad Janitor cyrus-sasl2 (Ubuntu): status New Confirmed
2012-05-12 08:54:09 Mathias Lindberg bug added subscriber Mathias Gustavsson
2013-05-16 16:51:35 Fufroma bug watch added http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=708552
2013-06-07 11:40:29 Robie Basak bug added subscriber Robie Basak
2013-06-07 11:41:43 Robie Basak cyrus-sasl2 (Ubuntu): status Confirmed Fix Released
2013-11-16 00:31:19 Daniel bug added subscriber Daniel Parthey
2013-11-16 05:14:51 Adolfo Jayme Barrientos nominated for series Ubuntu Precise
2013-11-17 20:10:14 Daniel tags precise
2013-11-18 16:20:42 Brian Murray bug task added cyrus-sasl2 (Ubuntu Precise)
2013-11-18 16:21:47 Brian Murray cyrus-sasl2 (Ubuntu Precise): status New Triaged
2013-11-18 16:21:49 Brian Murray cyrus-sasl2 (Ubuntu Precise): importance Undecided High
2018-10-23 20:12:51 Andreas Hasenack nominated for series Ubuntu Xenial
2018-10-23 20:12:51 Andreas Hasenack bug task added cyrus-sasl2 (Ubuntu Xenial)
2018-10-23 20:12:59 Andreas Hasenack cyrus-sasl2 (Ubuntu Precise): status Triaged Won't Fix
2018-10-23 20:14:39 Andreas Hasenack nominated for series Ubuntu Trusty
2018-10-23 20:14:39 Andreas Hasenack bug task added cyrus-sasl2 (Ubuntu Trusty)
2018-10-23 20:15:09 Andreas Hasenack tags precise server-next
2018-10-23 20:15:20 Andreas Hasenack bug added subscriber Ubuntu Server
2018-10-23 20:15:31 Andreas Hasenack cyrus-sasl2 (Ubuntu Trusty): status New Triaged
2018-10-23 20:15:34 Andreas Hasenack cyrus-sasl2 (Ubuntu Xenial): status New Triaged
2018-10-23 21:01:44 Andreas Hasenack cyrus-sasl2 (Ubuntu Xenial): importance Undecided High
2018-10-23 21:01:44 Andreas Hasenack cyrus-sasl2 (Ubuntu Xenial): assignee Andreas Hasenack (ahasenack)
2018-10-23 21:01:54 Andreas Hasenack cyrus-sasl2 (Ubuntu Trusty): importance Undecided High
2018-10-23 21:02:00 Andreas Hasenack cyrus-sasl2 (Ubuntu Trusty): assignee Andreas Hasenack (ahasenack)
2018-10-24 18:23:57 Andreas Hasenack description sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) { if ( FD_ISSET(s, &perm) ) { ret = read(s, rbuf+rc, sizeof(rbuf)-rc); if ( ret<0 ) { rc = ret; break; } else { rc += ret; } } } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions. [Impact] * An explanation of the effects of the bug on users and * justification for backporting the fix to the stable release. * In addition, it is helpful, but not required, to include an explanation of how the upload fixes this bug. [Test Case] * detailed instructions how to reproduce the bug * these should allow someone who is not familiar with the affected package to reproduce the bug and verify that the updated package fixes the problem. [Regression Potential] * discussion of how regressions are most likely to manifest as a result of this change. * It is assumed that any SRU candidate patch is well-tested before upload and has a low overall risk of regression, but it's important to make the effort to think about what ''could'' happen in the event of a regression. * This both shows the SRU team that the risks have been considered, and provides guidance to testers in regression-testing the SRU. [Other Info] * Anything else you think is useful to include * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions.
2018-10-24 18:59:08 Andreas Hasenack description [Impact] * An explanation of the effects of the bug on users and * justification for backporting the fix to the stable release. * In addition, it is helpful, but not required, to include an explanation of how the upload fixes this bug. [Test Case] * detailed instructions how to reproduce the bug * these should allow someone who is not familiar with the affected package to reproduce the bug and verify that the updated package fixes the problem. [Regression Potential] * discussion of how regressions are most likely to manifest as a result of this change. * It is assumed that any SRU candidate patch is well-tested before upload and has a low overall risk of regression, but it's important to make the effort to think about what ''could'' happen in the event of a regression. * This both shows the SRU team that the risks have been considered, and provides guidance to testers in regression-testing the SRU. [Other Info] * Anything else you think is useful to include * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions. [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Once the saslauthd daemon enters this loop, it stops responding to authentication requests. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 5 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command. * Try authenticating again, while saslauthd is spinning. It should fail: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: NO "authentication failed" [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions.
2018-10-24 19:07:28 Andreas Hasenack description [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Once the saslauthd daemon enters this loop, it stops responding to authentication requests. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 5 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command. * Try authenticating again, while saslauthd is spinning. It should fail: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: NO "authentication failed" [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions. [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 5 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command. [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions.
2018-10-24 19:10:16 Andreas Hasenack description [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 5 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command. [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions. [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Furthermore, the saslauthd child process that picked up that authentication request and that is spinning now won't be reaped nor can it service further requests. If all children are left in this state, the authentication service as a whole won't be working anymore. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this, with just one child: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command. * repeat the authentication request. Since the only saslauthd child is now spinning, this will block: $ sudo testsaslauthd -u ubuntu -p ubuntu <blocks> [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions.
2018-10-24 20:36:52 Andreas Hasenack cyrus-sasl2 (Ubuntu Trusty): status Triaged Invalid
2018-10-24 20:50:55 Launchpad Janitor merge proposal linked https://code.launchpad.net/~ahasenack/ubuntu/+source/cyrus-sasl2/+git/cyrus-sasl2/+merge/357779
2018-10-24 20:55:17 Andreas Hasenack description [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Furthermore, the saslauthd child process that picked up that authentication request and that is spinning now won't be reaped nor can it service further requests. If all children are left in this state, the authentication service as a whole won't be working anymore. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this, with just one child: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command. * repeat the authentication request. Since the only saslauthd child is now spinning, this will block: $ sudo testsaslauthd -u ubuntu -p ubuntu <blocks> [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions. [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Furthermore, the saslauthd child process that picked up that authentication request and that is spinning now won't be reaped nor can it service further requests. If all children are left in this state, the authentication service as a whole won't be working anymore. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this, with just one child: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command, but 1s was enough in all my runs. * start dovecot and repeat the authentication request. Since the only saslauthd child is now spinning, this will block: sudo service dovecot start $ sudo testsaslauthd -u ubuntu -p ubuntu <blocks> [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions.
2018-10-24 21:05:22 Andreas Hasenack description [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Furthermore, the saslauthd child process that picked up that authentication request and that is spinning now won't be reaped nor can it service further requests. If all children are left in this state, the authentication service as a whole won't be working anymore. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this, with just one child: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command, but 1s was enough in all my runs. * start dovecot and repeat the authentication request. Since the only saslauthd child is now spinning, this will block: sudo service dovecot start $ sudo testsaslauthd -u ubuntu -p ubuntu <blocks> [Regression Potential]  * discussion of how regressions are most likely to manifest as a result of this change.  * It is assumed that any SRU candidate patch is well-tested before    upload and has a low overall risk of regression, but it's important    to make the effort to think about what ''could'' happen in the    event of a regression.  * This both shows the SRU team that the risks have been considered,    and provides guidance to testers in regression-testing the SRU. [Other Info]  * Anything else you think is useful to include  * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board  * and address these questions in advance [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions. [Impact] The rimap authentication mechanism in saslauthd can hit a condition where it will start spinning and using all available CPU. This condition can be easily encountered when an authentication is happening and the imap service is being restarted. Furthermore, the saslauthd child process that picked up that authentication request and that is spinning now won't be reaped nor can it service further requests. If all children are left in this state, the authentication service as a whole won't be working anymore. [Test Case] This test can be performed in a LXD or VM. * install the needed packages. mail-stack-delivery is used to have an imap server available on localhost that needs no further configuration. Accept the defaults for all debconf prompts: sudo apt update sudo apt install sasl2-bin mail-stack-delivery * set the password "ubuntu" for the ubuntu user echo ubuntu:ubuntu | sudo chpasswd * start saslauthd like this, with just one child: sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1 * restart dovecot sudo service dovecot restart * test saslauthd authentication: $ sudo testsaslauthd -u ubuntu -p ubuntu 0: OK "Success." * Now let's break it. In one terminal watch the output of top: top * in another terminal, run the following: sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop * observe in the "top" terminal that saslauthd is consuming a lot of cpu. If that's not happening, try starting dovecot again and adjusting the sleep value in the previous test command, but 1s was enough in all my runs. * start dovecot and repeat the authentication request. Since the only saslauthd child is now spinning, this will block: sudo service dovecot start $ sudo testsaslauthd -u ubuntu -p ubuntu <blocks> [Regression Potential] This fix relies on read(2) returning zero bytes when the connection is dropped, and that is clearly documented in its manpage: "On success, the number of bytes read is returned (zero indicates end of file)," The select manpage also documents such a case being a valid case to indicate that a socket is ready to be read from, and that it won't block: "The file descriptors listed in readfds will be watched to see if characters become available for reading (more precisely, to see if a read will not block; in particular, a file descriptor is also ready on end-of-file)" This patch is what was used upstream, and is also present in bionic. I can't think of regressions specific to this change, other than the usual risk of rebuilding a widely used library (sasl2) in an environment different from the one from when xenial was released, i.e., different libraries available, maybe different system-wide build options, etc. [Other Info] Trusty is still not accounting for read() returning zero being an end-of-file case, but the loop there has a counter and it eventually exits, not leading to a perpetual spin or high cpu usage (see comment #17 for a brief history on how this fix was dropped in the xenial package). The fix is simple and could be applied there as well, if the SRU team prefers. [Original Description] sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug that causes heavy cpu utilization, impacting normal operation of one of our mail servers following an upgrade to Ubuntu 12.04. We are running the daemon with the following options: /usr/sbin/saslauthd -a rimap -O our.imap.server -r -m /var/spool/postfix/var/run/saslauthd -n 5 We noticed that users were unable to send mail and that the saslauthd processes were using approximately 100% of each cpu core. An strace of one of the runaway process showed that it was stuck in the following behaviour: select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) read(8, "", 940) = 0 ..... with further inspection showing that the file descriptor in question was a socket connected to our imap server in CLOSE_WAIT. Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we came across the following code, repeated in two locations: while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {            if ( FD_ISSET(s, &perm) ) {               ret = read(s, rbuf+rc, sizeof(rbuf)-rc);               if ( ret<0 ) {                  rc = ret;                  break;               } else {                  rc += ret;               }            }         } It looks like this loop is expected to run until a read error is encountered or the timeout of 1 second is reached. There is no test to check that 0 bytes were read, indicating that the connection was closed by the remote peer. Since select() will immediately return the size of the set of the partially closed descriptor (1, which is >0), and calls to read() will always yield 0 bytes, there's the potential for execution to get stuck in this non blocking loop and I'm presuming that that's what's happening here. We've not performed any further analysis to prove that this is really what's happening but if my intuition is correct then our IMAP server (an nginx imap proxy) most liklely closes the connection at an unexpected time under as yet undetermined conditions.
2018-10-24 21:05:32 Andreas Hasenack cyrus-sasl2 (Ubuntu Xenial): status Triaged In Progress
2018-10-29 22:11:05 Brian Murray cyrus-sasl2 (Ubuntu Xenial): status In Progress Fix Committed
2018-10-29 22:11:07 Brian Murray bug added subscriber Ubuntu Stable Release Updates Team
2018-10-29 22:11:11 Brian Murray bug added subscriber SRU Verification
2018-10-29 22:11:16 Brian Murray tags server-next server-next verification-needed verification-needed-xenial
2018-11-14 17:50:24 Andreas Hasenack tags server-next verification-needed verification-needed-xenial server-next verification-done-xenial verification-needed
2018-11-19 14:15:32 Launchpad Janitor cyrus-sasl2 (Ubuntu Xenial): status Fix Committed Fix Released
2018-11-19 14:15:39 Łukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team