Ubuntu

pam update causes cron to stop working with "Module is unknown" error

Reported by Patric on 2011-05-31
348
This bug affects 76 people
Affects Status Importance Assigned to Milestone
pam (Ubuntu)
Critical
Marc Deslauriers
Hardy
Critical
Unassigned
Lucid
Critical
Unassigned
Maverick
Critical
Unassigned
Natty
Critical
Unassigned

Bug Description

Upgrading libpam-modules from 1.1.1-4ubuntu1 to 1.1.1-4ubuntu2.2, cron stopped working, just gives a log message "Module is unknown". This happened during unattended-upgrades this night, so there might be a lot of people who didn't realize that yet.
Downgrading to 1.1.1-4ubuntu1 fixes this.
Ubuntu 10.10 amd64, almost vanilla, fresh, minimum install + java (ppa) + postgresql9 (ppa).

Patric (patric-bechtel) on 2011-05-31
description: updated
sun (uwe-dormann) wrote :

I get the same message on 10.04 (Lucid) after running apt-get upgrade today. It pulled

bind9 bind9-host bind9utils dnsutils libbind9-60 libdns64 libisc60 libisccc60 libisccfg60 liblwres60 libpam-cracklib libpam-modules libpam-runtime libpam0g linux-libc-dev

The "Module is unknown" error starts to show up the minute the update has been completed.

Oliver (oliver341) wrote :

Same here. Also get this in auth.log:

May 31 10:26:01 server CRON[18212]: PAM unable to dlopen(/lib/security/pam_env.so): /lib/libpam.so.0: version `LIBPAM_MODUTIL_1.1.3' not found (required by /lib/security/pam_env.so)
May 31 10:26:01 server CRON[18212]: PAM adding faulty module: /lib/security/pam_env.so
May 31 10:26:01 server CRON[18212]: pam_unix(cron:session): session opened for user root by (uid=0)

sun (uwe-dormann) wrote :

For the moment I performed:

apt-cache policy libpam-modules

which provided data about which old versions are available on my system.
So that I could run

apt-get install libpam-modules=1.1.1-2ubuntu2

rolling back from 1.1.1-2ubuntu5.2 to 1.1.1-2ubuntu2

in my case on the Lucid System.

Marc Deslauriers (mdeslaur) wrote :

Simply restarting the cron daemon should be sufficient.

I am preparing updated pam packages to fix this issue.

Changed in pam (Ubuntu):
status: New → Confirmed
assignee: nobody → Marc Deslauriers (mdeslaur)
importance: Undecided → High
Darren Worrall (dazworrall) wrote :

Just to confirm that restarting cron was enough for us as well, affects at least 10.04 and 10.10.

Oliver (oliver341) wrote :

Restarting cron solved the issue for me too (10.04).

Patric (patric-bechtel) wrote :

Yup. That does the trick. Thanks a lot.

This hit us too on 10.04.2 . restarting cron seems to have fixed things...

Hark (ubuntu-komkommerkom) wrote :

Restarting the cron daemon also worked for me. By the way I also had to restart atd where I use it.

This must be one of the worst bugs in years on a LTS version. Just imagine how many backup scripts won't run due to this bug. And it won't be fixed automatically as the unattended-upgrades won't be started as well...

Martin (martin00) wrote :

Again a bug that shouldn't happen. I had very bad bugs with apparmor and kvm and now this. I'm very sad about the quality of ubuntu and I'm now thinking to switch back to Debian (60 machines).

Colin Watson (cjwatson) on 2011-05-31
Changed in pam (Ubuntu):
importance: High → Critical
Colin Watson (cjwatson) wrote :

I have removed the affected versions from {hardy,lucid,maverick,natty}-{security,updates} (pending the next publisher run). Our sysadmins are going to block downloads of those versions to try to minimise further problems.

summary: - cron gives "Module is unknown" in syslog, stops working
+ pam update causes cron to stop working with "Module is unknown" error

Issued occurred upgrading to 1.1.1-2ubuntu5.2 on lucid. Restarting cron solves the problem.

Colin, that seems like a horrible solution - you now have everybody's apt claiming they need to get a package that 404s, causing apt-get upgrade to fail. I understand a buggy package was released, but once that has been done, you can't just remove the package itself (while leaving reference to it as "here's the latest version of libpam")...

if a fixed version isn't quickly on the way, if nothing else you should release 1.1.1-2ubuntu2 as 1.1.1-2ubuntu5.3 or whatever, so that people who have already installed 1.1.1-2ubuntu5.2 can upgrade to a working version, and people with 1.1.1-2ubuntu2 aren't prompted to install 1.1.1-2ubuntu5.2 that doesn't exist...

i'll add the caveat that i'm not intimately familiar with the release process for ubuntu, but it seems that once a release is publicly accessible, you can't just rollback - you have to move forward to keep in line with every possible "path" (in this case, people who already installed the buggy 1.1.1-2ubuntu5.2 and people with 1.1.1-2ubuntu2 who are now being prompted to upgrade to 1.1.1-2ubuntu5.2).

Marc Deslauriers (mdeslaur) wrote :

Fixed pam packages are being prepared, will go through QA, and will be released in the next few hours.

Cliffm (c2mcatee) wrote :

Natty update manager says 'Failed to download package files' 'check Internet connection'
these are the details

Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/pam/libpam0g_1.1.2-2ubuntu8.2_amd64.deb 404 Not Found [IP: 91.189.92.167 80]
Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/pam/libpam-modules-bin_1.1.2-2ubuntu8.2_amd64.deb 404 Not Found [IP: 91.189.92.167 80]
Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/pam/libpam-modules_1.1.2-2ubuntu8.2_amd64.deb 404 Not Found [IP: 91.189.92.167 80]
Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/pam/libpam-runtime_1.1.2-2ubuntu8.2_all.deb 404 Not Found [IP: 91.189.92.167 80]

Chris Siebenmann (cks) wrote :

This PAM issue also affects xdm, which no longer allows people to log in
(it syslogs the same error message). This has caused us serious problems
on our multiuser login servers, because of course we cannot simply
reboot the machines and restarting xdm has the pleasant side effect of
instantly logging off all xdm-based users.

It has probably broken other daemons as well, some of them depending
on details of their configuration.

Marc Deslauriers (mdeslaur) wrote :

@Chris: in your situation, downgrading the package to the previous version should work around the issue until we publish updated pam packages.

Colin Watson (cjwatson) wrote :

Vijay: we've gone over this at enormous length in the past, and settled on this as standard process for sufficiently bad problems, on the grounds that even though it does cause 404s it's better than the breakage.

Colin Watson (cjwatson) wrote :

Vijay: and obviously this is only a temporary solution until a proper fix is published, which indeed will go forward in version numbers.

Chris Siebenmann (cks) wrote :

@Marc: the previous libpam version (1.1.1-2ubuntu5 for 10.04 LTS)
doesn't seem to be available any more, or at least 'apt-get' can't
find it, which makes downgrading hard. We would have to roll all the
way back to 1.1.1-2ubuntu2 ... which is missing a root escalation CVE
(CVE-2010-0832, root priv escalation via symlink following). This is
not something we are in a position to do on multiuser systems.

(Nor is it in /var/cache/apt/archives on our machines.)

I am trying 'apt-get -u install "libpam-modules=1.1.1-2ubuntu5"'.
Possibly this is the wrong thing.

Colin Watson (cjwatson) wrote :

Chris: If necessary, previous versions can always be downloaded manually from Launchpad: https://launchpad.net/ubuntu/lucid/+source/pam/1.1.1-2ubuntu5.1 and follow the appropriate link under "Builds". (Obviously not ideal but may help as a stopgap.)

Kate Stewart (kate.stewart) wrote :

Adding in the set of affected series, for tracking purposes.

Changed in pam (Ubuntu Hardy):
status: New → Confirmed
Changed in pam (Ubuntu Lucid):
status: New → Confirmed
Changed in pam (Ubuntu Maverick):
status: New → Confirmed
Changed in pam (Ubuntu Natty):
status: New → Confirmed
Changed in pam (Ubuntu Hardy):
importance: Undecided → Critical
Changed in pam (Ubuntu Lucid):
importance: Undecided → Critical
Changed in pam (Ubuntu Maverick):
importance: Undecided → Critical
Changed in pam (Ubuntu Natty):
importance: Undecided → Critical
Paul Boven (p-boven) wrote :

Just for the record: pulling the pam packages like this completely breaks the network-based install I'm doing from an official Ubuntu mirror, because of the 404-errors.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pam - 1.1.2-2ubuntu8.3

---------------
pam (1.1.2-2ubuntu8.3) natty-security; urgency=low

  * SECURITY REGRESSION:
    - debian/patches/security-dropprivs.patch: updated patch to preserve
      ABI and prevent daemons from needing to be restarted. (LP: #790538)
    - debian/patches/autoconf.patch: refreshed
 -- Marc Deslauriers <email address hidden> Tue, 31 May 2011 05:48:25 -0400

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pam - 1.1.1-4ubuntu2.3

---------------
pam (1.1.1-4ubuntu2.3) maverick-security; urgency=low

  * SECURITY REGRESSION:
    - debian/patches/security-dropprivs.patch: updated patch to preserve
      ABI and prevent daemons from needing to be restarted. (LP: #790538)
    - debian/patches/autoconf.patch: refreshed
 -- Marc Deslauriers <email address hidden> Tue, 31 May 2011 06:48:32 -0400

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pam - 1.1.1-2ubuntu5.3

---------------
pam (1.1.1-2ubuntu5.3) lucid-security; urgency=low

  * SECURITY REGRESSION:
    - debian/patches/security-dropprivs.patch: updated patch to preserve
      ABI and prevent daemons from needing to be restarted. (LP: #790538)
    - debian/patches/autoconf.patch: refreshed
 -- Marc Deslauriers <email address hidden> Tue, 31 May 2011 07:07:44 -0400

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pam - 0.99.7.1-5ubuntu6.4

---------------
pam (0.99.7.1-5ubuntu6.4) hardy-security; urgency=low

  * SECURITY REGRESSION:
    - debian/patches/security-dropprivs.patch: updated patch to preserve
      ABI and prevent daemons from needing to be restarted. (LP: #790538)
    - debian/patches/autoconf.patch: refreshed
 -- Marc Deslauriers <email address hidden> Tue, 31 May 2011 07:32:03 -0400

Changed in pam (Ubuntu Hardy):
status: Confirmed → Fix Released
Changed in pam (Ubuntu Lucid):
status: Confirmed → Fix Released
Changed in pam (Ubuntu Maverick):
status: Confirmed → Fix Released
Changed in pam (Ubuntu Natty):
status: Confirmed → Fix Released
Colin Watson (cjwatson) wrote :

Paul: understood, and we knew it was a trade-off. It will be fixed very soon.

Changed in pam (Ubuntu):
status: Confirmed → Fix Released
Michael Hanson (mhanson) wrote :

@Marc: Thanks. All is well. People need to RELAX. S*#@ happens sometimes and for this to pop up and be fixed so quickly is brilliant work from the Ubuntu maintainers. Its a bit juvenile to expect perfection when it exists nowhere else and especially because no one here paid for for Ubuntu.

Thanks again for all the hard work.

Cheers

Mike

Jacob Winski (winski) wrote :

I can confirm the bug.

I can confirm that restarting cron works.

I can confirm that after applying the newest pam related packages updated 19 minutes ago cron is working as expected.

Not the best confirmation since I restarted cron prior to applying the fix, but there you go.

Thank you everyone for fixing the problem so quickly!

Chris Siebenmann (cks) wrote :

We have an Ubuntu 10.04 machine with the original update applied where
xdm had not been restarted and was thus rejecting login attempts.
I can confirm that applying the just-released PAM update made xdm
accept logins again. The update also did not break cron, which had been
restarted and so was using all of the updated PAM stuff.

Gabriel Galibourg (ggali66) wrote :

For those who haven't restarted cron:
As expected, since cron is dead, unattended-upgrade does not start thus fix is not being installed automatically.
apt-get update + upgrade does work and cron does not need a restart.
But it still means a manual intervention to fix the problem, great ... not :-(
May I dare suggest an improvement in QA so that bugs like this don't filter out again.
Thanks to the team for quickly resolving the issue.

Steve Sutton (sutton) wrote :

update + upgrade works (i.e. fixed cron) on 10.04 server LTS, on a cron that had been broken, but NOT manually restarted.

yakupm (lvu486c02) wrote :

I agree with Mike - relax and appreciate the fact that the solution was quickly found and made available.

Yak

Peter Odding (peterodding) wrote :

I appreciate that the problem has already been solved (I'm glad) but am still considering whether to nominate this as the biggest fuck-up since the Debian SSH/SSL key fiasco. If this truly affects every server Ubuntu install out there using unattended upgrades (it affected my three servers running Ubuntu 10.04 with unattended upgrades), this bug just broke thousands of backup cron jobs all over the world and the worst thing is it requires manual intervention from a sysadmin to get it going again :-S.

On Wed, Jun 01, 2011 at 05:11:15PM -0000, Peter Odding wrote:
> I appreciate that the problem has already been solved (I'm glad) but am
> still considering whether to nominate this as the biggest fuck-up since
> the Debian SSH/SSL key fiasco.

This is a bigger fuq-up than the SSH key fiasco. The SSH key fiasco
did not actually break anything, and it also had nothing to do with
Ubuntu, as such.

> If this truly affects every server Ubuntu install out there using
> unattended upgrades (it affected my three servers running Ubuntu
> 10.04 with unattended upgrades), this bug just broke thousands of
> backup cron jobs all over the world and the worst thing is it
> requires manual intervention from a sysadmin to get it going again
> :-S.

It it worse than you are describing. Many servers, mine included, run
everything, application-wise, from cron jobs.

I personally run five distinct categories of things:

1) My application stuff that makes money for me
2) System updates
3) Reboots
4) Backups
5) Problem detection and notification

from cron jobs. When cron stopped working, my apps would not run, the
automatic updates stopped, reboots stopped, and my notification system
stopped as well. So, problems that are self repairing normally, would
not self repair.

What this means is that not only my stuff stopped working, but also
self repair stopped, and I did not know about this problem since my
notification and problem detection system ceased to work, as well.

IOW, I was completely and totally screwed.

The only thing where I lucked out, is that my main webserver that
serves algebra.com, only does system updates on Sundays. If that one
went down, I would have lost hundreds of $$$.

I am aware that "stuff happens", and I appreciate a quick fix, but
this was a truly momentous error.

i

Peter Odding (peterodding) wrote :

@ubuntu devs: Since this has the potential to break lots servers in various nasty ways it might (?) be wise to post a heads up to a mailing list that's (hopefully) followed by lots of sysadmins like ubuntu-security-announce. I'm guessing there's a whole policy about what should and should not be submitted to ubuntu-security-announce but, well, it was just a suggestion :-).

PS. If such a message has already been sent but I didn't see it because I'm only following ubuntu-security-announce please ignore my suggestion.

Oliver (oliver-assarbad) wrote :

Thanks for the fix. Much appreciated! :)

I noticed it because for some of my machines (I run Ubuntu 10.04, 11.04 and Debian 6) stopped sending logcheck reports. Only had the time to look into it today, though. It wasn't as critical for me, though some of the backups that landed on the staging area of the server that shovels them off-site is now a bit more filled than usual ;)

Thanks again.

Jamie Strandboge (jdstrand) wrote :

I would like to briefly follow up to let people know that regressions are treated very seriously in Ubuntu. Regressions are closely examined to identify areas of improvement going forward, and as such, we have created a public incident report in:
https://wiki.ubuntu.com/IncidentReports/2011-05-31-pam-security-update-breaks-cron

Full details can be seen in that report, but here is a quick summary of what happened after the regression was found:
 * mirroring was stopped
 * the regressed packages were removed from the Ubuntu archive
 * the cause of the regression was identified and updates prepared
 * the fixed packages were built and verified to correct the issue and then were published to the archive
 * once it was established that mirroring could be safely re-enabled, it was
 * an email was sent to ubuntu-security-announce (https://lists.ubuntu.com/archives/ubuntu-security-announce/2011-May/001341.html)
 * the Ubuntu website was updated (http://www.ubuntu.com/usn/usn-1140-2/)

We are still conducting a post-mortem of the incident and identifying areas of improvement so this does not happen again. One improvement that has already been made is we have adjusted our pam test scripts to catch this problem in the future.

We apologize for the inconvenience.

Mike (mike-n) wrote :

Anyone doing unattended upgrades in production opens themselves up to this sort of risk. Sure it was a screwup by Ubuntu, but would have been easily resolved if you'd done upgrades in a scheduled maintenance window. All aboard thye failboat.

Chris Siebenmann (cks) wrote :

Not all aspects of this PAM failure can be fixed easily, since daemons
other than cron were also affected. Cron can be restarted without user
impact, but something like xdm cannot be.

(Since we got hurt by the xdm issue, not by the cron issue, I am
rather sensitive about this.)

I personally think that we were quite lucky that the Ubuntu sshd was not
affected by this, although it uses PAM. (I am not sure why it was unaffected,
since sshd has libpam loaded and requires pam_env.so in a normal Ubuntu
config.)

I also hope that Ubuntu will conduct a real root cause analysis on this.
'Cron locked up after a PAM upgrade' is only the surface problem and
addressing and detecting just it would be only addressing the most
obvious symptom of the real problem. The real problem was 'a PAM
update was not ABI compatible'; the broken processes were simply
a symptom of this. I would like Ubuntu to change things so that they
detect this if it happens again, not merely instances of the root
cause where cron helpfully locks up.

Jamie - thanks for the report - and thanks for the great product, and thanks to all who take the heat when this happens - really appreciated.

KAMI (kami911) wrote :

Update applied, seems okay.

Cam Hutchison (camh) wrote :

Chris - sshd kept working because it re-execs itself when doing auth. This means it got the new libpam.so and was able to dlopen pam_env.so without the symbol version mismatch.

Morten Kjeldgaard (mok0) wrote :

Any system has its weak spots. This incident is an illustration that cron is such a weak spot and breakage can be devastating, and as Chris wrote in #41, it is lucky that sshd did not break, or things would have been a lot worse.

Offhand, I see two possible solutions:

  * cron should be built with it's own PAM security system, not relying on other system components
  * a backup system could register if cron is not running, and take appropriate action, maintaining vital tasks until cron is working again.

Tomofumi (tomofumi) wrote :

Apart from crond, anacron seems had been hanged since may31 too, there is an apt defunct process there, what should i do or is it require to reboot?

root 22114 1 0 May31 ? 00:00:00 /bin/sh -c test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
root 22115 22114 0 May31 ? 00:00:00 run-parts --report /etc/cron.daily
root 22118 22115 0 May31 ? 00:00:00 [apt] <defunct>

Marc Deslauriers (mdeslaur) wrote :

I'm not sure why those jobs would have hung. I would kill them and restart the anacron daemon.

this seems to have reared its head again on ubuntu 8.04 (and gentoo!) as part of an update on 11/june

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers