cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

Bug #27520 reported by Tessa
104
This bug affects 14 people
Affects Status Importance Assigned to Milestone
cron (Debian)
Fix Released
Undecided
Unassigned
cron (Ubuntu)
Fix Released
Medium
Unassigned
Lucid
Fix Released
Medium
Barry Warsaw
Maverick
Won't Fix
Medium
Unassigned
Natty
Won't Fix
Medium
Unassigned

Bug Description

== SRU Justification ==

* Impact: users being defined on remote user databases such as LDAP will not be able to access to their cron jobs and these will be marked as orphaned unless cron is restarted. The impact is severe for users relying on cron and using LDAP.

* Fix:
  The fix was implemented in Fedora's cronie. It implements a list orphan which allows to describe jobs as being orphaned or not depending on whether the owner is found or not.

* Test case:
  How to reproduce:
     1. Setup an LDAP remote directory and add a user to test.
     2. Create a crontab for this user with some jobs.
     3. When a reboot happens, this user will have its jobs orphaned.
  Expected results:
     - the cron table is read and jobs are to be executed when required.
  Actual results:
     - the cron table / jobs are marked as orphaned.

* Regression potential: very minimal, the fix only adds a list adding a new description to the jobs, these are described as orphans and are checked when necessary.

* Original bug description:

We had a server which was happily running Hoary. It authenticated to our AD2003
domain using winbind, and winbind was in the nsswitch.conf. However, after
upgrading to Breezy, cron no longer works properly, in that it doesn't respect
accounts from winbind as being valid accounts. My logs are filling up with
messages like:

Dec 22 09:52:01 thorin /usr/sbin/cron[28207]: (user1) ORPHAN (no passwd entry)
Dec 22 09:52:01 thorin /usr/sbin/cron[28207]: (user2) ORPHAN (no passwd entry)
Dec 22 09:55:01 thorin /usr/sbin/cron[28207]: (user3) ORPHAN (no passwd entry)
Dec 22 09:55:01 thorin /usr/sbin/cron[28207]: (user4) ORPHAN (no passwd entry)

If you do "id user1", their information shows up perfectly fine, so it seems
like cron has been changed to not respect this source of information.

Matt Zimmerman (mdz)
Changed in cron:
assignee: nobody → adconrad
Revision history for this message
Tessa (unit3) wrote :

Just to get an update, is anyone looking into this for Dapper?

Revision history for this message
Tessa (unit3) wrote :

Has anyone looked into this at all, ever, since I filed the bug report? To me, this is a rather serious issue for those not using a default nsswitch config, and I'm sort of surprised that it's gone this long without any comments from anyone.

Adam Conrad (adconrad)
Changed in cron:
assignee: adconrad → nobody
Revision history for this message
marx (xgrac) wrote :

The bug is still there but we have found an workaround, install nscd - this will cache your results and problem will be 'solved' :)

Changed in cron:
status: New → Confirmed
Revision history for this message
Tessa (unit3) wrote :

Just found this problem on a new server, this time using ldap as a user source instead of winbind, on Debian/etch. This server is running nscd, and it's still having the same issue.

So: I don't think your workaround actually works, marx, at least not on my systems, and this problem appears to exist upstream as well.

I would be *really* nice if this could get fixed, so that cron was actually usable with centralized user management.

Revision history for this message
Tessa (unit3) wrote :

In the mean time, it looks like fcron doesn't exhibit this bug. It's not a drop in replacement for vixie crond, but it looks like my users can just use "fcrontab -e" and it'll work.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for your report. This is not a problem with cron (nor ldap or nsswitch) but with the pam module. I can't remember yet how we had solved it but nscd was part of the solution.

I'll let you know if I'm able to find out something.

Revision history for this message
Mark Sethi (mark-sethi) wrote :

We have the same symptom with jaunty (our mirror was last synced ~June 2009) with LDAP providing the non-local users. Restarting cron resolved the issue for us.

After some poking and prodding I found that cron was calling getpwnam for all the user crontabs and getting back NULL for all non-local users. If I understand how the resolver works correctly, when an application first makes a call to resolve something (name, remote host, etc), it checks /etc/nsswitch.conf and attempts to connect to the resolving mechanisms (file, nis/ldap, etc) and then you're stuck with whatever it found then. This means that if you change /etc/nsswitch.conf after you've started a program that needs to resolve hosts, users, groups, services, or whatnot, and it's attempted it's first resolution, your changes will be ignored for the remainder of that process's life cycle.

My guess is that cron is likely coming up before networking is available on the machine and so LDAP (in our case) and winbind (in OP's) is not available initially and is thus ignored by cron until you restart it. I would imagine that nscd may resolve the issue for some if it is already running when cron starts since it will protect cron from this "feature" of libnss.

We've dealt with the issue by adding a cron job to /etc/crontab (or to a snippet in /etc/cron.d) like so:
@reboot root /bin/sleep 45 && /etc/init.d/cron restart

Pretty hackish but it appears to do the trick. Hopefully this is fixed with the move to upstart in karmic and beyond.

Revision history for this message
Stephane Chazelas (stephane-chazelas) wrote :

Also affects Ubuntu lucid 10.04.

crontabs for LDAP users are ignored on boot. You need to edit the user's crontab or cron reload for the LDAP users crontabs to be loaded.

A work around could be to do a "reload cron" every time a network interface goes up or down.

I suppose the fix would be to load all the crontabs but to check for the presence of the user only upon running the jobs.

Revision history for this message
Christian Kastner (ckk) wrote :

This appears to be a variant of Debian bug #512757 against cron. Can somebody confirm this for me?

Revision history for this message
Stephane Chazelas (stephane-chazelas) wrote : Re: [Bug 27520] Re: cron no longer respects nsswitch.conf

> This appears to be a variant of Debian bug #512757 against cron. Can
> somebody confirm this for me?
[...]

Pretty much, though for me reordering the init scripts wasn't enough as
there was some delay between the time nslcd (the LDAP cache daemon) was
started and it being operational.

In any case we need a mechanism for cron to refresh its idea of which
users are valid at a given time. It already checks upon executing a cron
job whether the user is still there, it would also need to check regularly
(or upon notification of the NSS system) when new users come into life.

Or because it checks before executing the jobs anyway, cron could skip the
check for orphan crontabs at startup altogether (or maybe just issue a
warning that some crontabs could be cleaned up).

Revision history for this message
Christian Kastner (ckk) wrote :

On 09/02/2010 05:09 PM, Stephane Chazelas wrote:
>> This appears to be a variant of Debian bug #512757 against cron. Can
>> somebody confirm this for me?
> [...]
>
> Pretty much, though for me reordering the init scripts wasn't enough as
> there was some delay between the time nslcd (the LDAP cache daemon) was
> started and it being operational.

I'm just guessing here, but this may be a bug in the nslcd init script,
see the following discussion on debian-devel@:

http://lists.debian.org/debian-devel/2010/05/msg00130.html

ie, services not yet being fully operational when their init script
terminates.

> Or because it checks before executing the jobs anyway, cron could skip the
> check for orphan crontabs at startup altogether (or maybe just issue a
> warning that some crontabs could be cleaned up).

Once cron considers a crontab "broken", it is ignored until it is
modified again (checked by mtime).

One way this could be solved is similar to the solution in Debian bug
#433609
. The fix there was to keep re-checking a dangling symlink until
it was fixed, even though the symlink's mtime never changed.

In that case, however, it was clearly cron's fault. Here, cron just
calls getpwnam(); I hesitate to add such a "fix" when the underlying
cause does not lie with cron.

What I will do is add nslcd to Required-Start in Debian cron's LSB init
headers (which would be sync'ed back to Ubuntu eventually), but for any
further changes I'd like to see all other possibilites ruled out (ie,
nslcd-signalling-ready-but-not-yet-operational).

Regards,
Christian

Revision history for this message
Launchpad Janitor (janitor) wrote : Re: cron no longer respects nsswitch.conf

This bug was fixed in the package cron - 3.0pl1-116ubuntu1

---------------
cron (3.0pl1-116ubuntu1) natty; urgency=low

  * Merge from debian unstable (LP: #696953), remaining changes:
    - debian/control:
      + Requires debhelper >= 7.3.15ubuntu2 (for Upstart).
      + Move MTA,lockfile-progs to Suggests field.
    - debian/cron.upstart: Add Upstart script.
    - debian/{prerm,postinst,postrm}:
      + Don't call update-rc.d,invoke-rc.d and
        /etc/init.d/cron.
    - debian/rules: Call dh_installinit to install Upstart job properly.

cron (3.0pl1-116) unstable; urgency=high

  * Upload with approval from Release Team to get RC bug fixes in Squeeze
    (see http://lists.debian.org/debian-release/2010/12/msg00719.html)
  * do_command.c, popen.c:
    - Use fork() instead of vfork().
  * do_command.c:
    - Close an unused stream in the fork()ed child prior to exec'ing the
      user's command, thereby avoiding an fd leak. Closes: #604181, LP: #665912
      Previously to this, in conjunction with LVM, the fd leak may have the
      effect of the user being spammed by warnings every time a cron job was
      executed.
  * crontab.5:
    - Fixed the example demonstrating how to run a job on a certain weekday of
      the month (date range was off-by-one). Also, the same example contained
      a superfluous escape, resulting in wrong output. Closes: #606325
  * cron.init:
    - Added $named to Should-Start, in case @reboot jobs need DNS resolution.
      Closes: #602903
    - Added nslcd to Should-Start. LP: #27520
 -- Lorenzo De Liso <email address hidden> Mon, 03 Jan 2011 20:32:01 +0100

Changed in cron (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Matthieu Herrb (matthieu-herrb) wrote :

A pullup in 10.4 LTS would be welcome.

summary: - cron no longer respects nsswitch.conf
+ cron daemon starts before LDAP client, causing "ORPHAN" message for all
+ LDAP users
summary: cron daemon starts before LDAP client, causing "ORPHAN" message for all
- LDAP users
+ LDAP-defined users
Revision history for this message
Nathan Stratton Treadway (nathanst) wrote : Re: cron daemon starts before LDAP client, causing "ORPHAN" message for all LDAP-defined users

We 're also running into this in Lucid -- each (or at least most) times we reboot, we see messages like this in our syslog (with messages generated by other applications removed):

May 6 22:59:26 vm-76 cron[606]: (CRON) INFO (pidfile fd = 3)
May 6 22:59:26 vm-76 cron[617]: (CRON) STARTUP (fork ok)
May 6 22:59:26 vm-76 cron[617]: (LdapUser) ORPHAN (no passwd entry)
May 6 22:59:26 vm-76 cron[617]: (CRON) INFO (Running @reboot jobs)
May 6 22:59:28 vm-76 nslcd[968]: version 0.7.2 starting
May 6 22:59:28 vm-76 nslcd[968]: accepting connections
May 6 22:59:30 vm-76 nslcd[968]: [8b4567] connected to LDAP server ldap://ldap....

The "LdapUser"'s crontab is then disabled until we do a "restart cron" (or, presumably, the user does a "crontab -e" to touch his/her crontab file).

(/etc/nsswitch.conf contains the line "passwd: compat ldap", and the "libnss-ldapd" package is installed.)

Unfortunately, in Lucid cron had been switched to an Upstart job, but nslcd is still using an rc.d script, so I am not sure the proper way to ensure that nslcd is started before the cron daemon....

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

I see that LP: #605123 describes a similar situation when cron is started before the likewise-open daemons, while the Debian BTS 512757 mentioned above relates to NIS users, etc.

And even if the startup script order is fixed in all these cases, its always possible that the LDAP (or whatever) server will be unreachable when a particular machine is started, which presumably could result in the same ORPHANing process happening for cron session....

All of which makes me think that perhaps Stephane Chazelas's idea (comment #10) about having cron check the validity of the user continually, rather than only at startup, makes more sense in modern network-based environments....

Revision history for this message
Christian Kastner (ckk) wrote : Re: [Bug 27520] Re: cron daemon starts before LDAP client, causing "ORPHAN" message for all LDAP-defined users

On 05/11/2011 12:16 AM, Nathan Stratton Treadway wrote:
> I see that LP: #605123 describes a similar situation when cron is
> started before the likewise-open daemons, while the Debian BTS 512757
> mentioned above relates to NIS users, etc.

I discussed this problem recently with one of the upstart developers.
Unfortunately we didn't get a fix in time for 11.04, but as I recall to
properly fix this we require features in upstart planned for 11.10.

> And even if the startup script order is fixed in all these cases, its
> always possible that the LDAP (or whatever) server will be unreachable
> when a particular machine is started, which presumably could result in
> the same ORPHANing process happening for cron session....

Only when the crontab changes, thereby triggering a rescan by the daemon.

>
> All of which makes me think that perhaps Stephane Chazelas's idea
> (comment #10) about having cron check the validity of the user
> continually, rather than only at startup, makes more sense in modern
> network-based environments....

cron 3.0pl1-117, which is currently pending upload in Debian (after
which it will be sync'ed to Ubuntu), adds detection and recovery for
certain kinds of errors we were missing so far. Theoretically, this
could easily be extended to the ORPHAN case, but I'd have to give this
some more thought (eg: what if ORPHAN is justified, ie the user really
does not exist).

Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: cron daemon starts before LDAP client, causing "ORPHAN" message for all LDAP-defined users

Please backport the fix to all affected and supported releases.

Changed in cron (Debian):
status: New → Confirmed
status: Confirmed → New
summary: - cron daemon starts before LDAP client, causing "ORPHAN" message for all
- LDAP-defined users
+ cron daemon starts before LDAP/NIS client, causing "ORPHAN" message for
+ all LDAP/NIS-defined users
Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: cron daemon starts before LDAP/NIS client, causing "ORPHAN" message for all LDAP/NIS-defined users

Same problem applies to NIS configurations. cron starts early and checks users' existence before dhclient has even acquired the IP. Same symptoms on Lucid.

Revision history for this message
Matthias Andree (matthias-andree) wrote :

Note that while the trigger is cron starting before the respective user database (NIS, LDAP, whatever), cron should really re-check each time to be resilient to temporary network hicc-ups.

summary: - cron daemon starts before LDAP/NIS client, causing "ORPHAN" message for
- all LDAP/NIS-defined users
+ cron daemon caches user-non-existent lookup results, causing "ORPHAN"
+ message and skipping jobs for all LDAP/NIS-defined users
Revision history for this message
Nathan Stratton Treadway (nathanst) wrote : Re: [Bug 27520] Re: cron daemon starts before LDAP client, causing "ORPHAN" message for all LDAP-defined users

On Sat, May 14, 2011 at 23:48:56 -0000, Christian Kastner wrote:
> cron 3.0pl1-117, which is currently pending upload
> in Debian (after which it will be sync'ed to
> Ubuntu), adds detection and recovery for certain
> kinds of errors we were missing so far.
> Theoretically, this could easily be extended to the
> ORPHAN case, but I'd have to give this some more
> thought (eg: what if ORPHAN is justified, ie the
> user really does not exist).

I think the point here is that cron can't assume that
a crontab is really, permanently ORPHAN just because
the user can't be validated when cron first starts up.

Instead, cron needs to re-check the status of the
user each time it "considers" running a particular
crontab, in case the user has come into "existance"
since the last time it checked.

(If I have followed the program logic correctly [I
just took a quick look through the source for the
Lucid package], I think the opposite situation
can also cause problems.

That is, right now it appears that if an
LDAP/NIS/whatever user is deleted after cron has
already started up, cron will continue to try to run
the defined jobs for that user until it has some other
reason to reload the database. Presumably there will
be a PAM failure when trying to spawn the jobs as the
user in question, but it seems like it would be
"cleaner" to write an explicit log message saying that
the crontab's user was not found, and then completley
skip that user's crontab for that run....)

     Nathan

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote :

Note there may be two separate fixes involved here (for Lucid, at least; not sure about Maverick)

In the short term, how can we once-again tweak the startup scripts so that cron is always started after nslcd/nis/likewise-open/whatever has already started (especially if the other damon in question is still using a /etc/init.d/-style startup script?

In the longer term, obviously it would be great for cron better integrate with network-based user databases, but it would be nice to hear from some Ubuntu developers whether that sort of change is likely to get backported to Lucid?

Revision history for this message
Tessa (unit3) wrote :

Nathan: I think that since this bug sat untouched from 2005-2010, you can guess that the answer is going to be "No, don't care". I'm just happy if a fix eventually makes into some Ubuntu release at some point in the /next/ 5 years.

Revision history for this message
Christian Kastner (ckk) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

On 06/08/2011 12:53 AM, Graeme wrote:
> Nathan: I think that since this bug sat untouched from 2005-2010, you
> can guess that the answer is going to be "No, don't care".

If you look at the changelog for recent versions of cron, you can see
that somebody does care for bug reports.

> I'm just happy if a fix eventually makes into some Ubuntu release at some point
> in the /next/ 5 years.

cron -117 (uploaded to Debian, will be sync'ed to Ubuntu soon) adds a
feature which greatly aids in the recovery of errors which were
previously fatal.

Theoretically, this could easily be extended to #27520 -- just a few
lines of code, actually -- but it's just not that simple, because there
are cases where ORPHAN is completely valid, ie the user really doesn't
exist.

All cron does is call getpwnam(), so it cannot differentiate between the
two cases. Were we to simply re-check the ORPHANS every time, we'd creat
a bug-like situation for that other use case.

Anyway, I'll give -117 some time to settle, and will then revisit this
issue.

Revision history for this message
Tessa (unit3) wrote :

Christian: Sorry, I was being a little facetious. Yes, the recent traffic on this bug indicates things have changed with regards to these kinds of issues. That being said, it did take 5 years for anyone to bother looking into it. Forgive me if I'm a little pessimistic about the speed with which things will get resolved.

Regardless, it's been my experience that Ubuntu doesn't backport fixes of this nature to releases that have already shipped. I've found that the bugs must constitute a clear, widespread problem for a majority of users before a backport is even considered. That's more what I was speaking to than just a fix getting applied. If that policy's changed, please correct me.

Revision history for this message
Matthias Andree (matthias-andree) wrote :

You can achieve such a workaround, but it's prone to races and
unreliable. Better to fix cron and stop it from caching which user
exists or not.

My lucid /etc/init/cron.conf for NIS currently looks as given below, and
tested on some old-fashioned Pentium 4 HT computer that suffered
heavily, with two major changes:

A - I've added a pre-start that queries ypwhich up to 30 times to see
whether ypbind has started and bound to the server yet, sleeping 1 s
after failure. Since ypwhich itself employs timeouts, the actual wait
time is longer.

B - I've gotten rid of the "expect fork" and use cron's "don't fork
mode"; this is an unrelated cleanup and not needed for this particular
fix, and the -L2 also bumps up the cron log level.

--------------------------------------------------------------------------
# cron - regular background program processing daemon
#
# cron is a standard UNIX program that runs user-specified programs at
# periodic scheduled times

description "regular background program processing daemon"

start on runlevel [2345]
stop on runlevel [!2345]

# ma 2011-06-07 hack alert: defer start until ypwhich
# returns success, so that NIS can bind to the domain
# see https://bugs.launchpad.net/ubuntu/+source/cron/+bug/27520
pre-start script
   count=30
   until ypwhich ; do
     count=$(( $count - 1 ))
     if [ $count = 0 ] ; then break ; fi
     sleep 1
   done
end script

respawn

exec cron -L2 -f
---------------------------------------------------------------------

--
Matthias Andree

Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

Am 08.06.2011 01:17, schrieb Christian Kastner:

> Theoretically, this could easily be extended to #27520 -- just a few
> lines of code, actually -- but it's just not that simple, because there
> are cases where ORPHAN is completely valid, ie the user really doesn't
> exist.

Fine, but cron should recheck.

> All cron does is call getpwnam(), so it cannot differentiate between the
> two cases. Were we to simply re-check the ORPHANS every time, we'd creat
> a bug-like situation for that other use case.

Yes, and the underlying database can legitimately change over time, so
cron should recheck.

Now, this particular bug reveals yet another time that the GNU libc
implementation of Name Service Switch is insufficient and has design
flaws. On the affected systems I've checked, nsswitch has "passwd:
files nis". The tryagain/unavail default reactions are "continue", but
the getpw*() functions cannot return temporary failure, so cron cannot
distinguish a "don't know yet, ask again later" from a "user does not
exist" condition.

I'd filed an upstream bug against glibc 7 years ago to port Solaris's
"tryagain=forever" reaction - which is the default for most sources BTW.
http://sources.redhat.com/bugzilla/show_bug.cgi?id=430

--
Matthias Andree

Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

I'd concur with Graeme's observations:

- it often takes longer than a year for a bug report to be triaged at
all, not even thinking of "looked at by the packagers" -- even for
things that are clearly upstream bugs and need to be delegated outside
Ubuntu, which doesn't require packager attention but can happen in
triaging already.

- there is a certain "fixed in future-release" habit that leaves LTS
users standing in the rain with critical bugs.

--
Matthias Andree

Christian Kastner (ckk)
Changed in cron (Debian):
status: New → Fix Released
Revision history for this message
Christian Kastner (ckk) wrote : Re: [Bug 27520] Re: cron daemon starts before LDAP/NIS client, causing "ORPHAN" message for all LDAP/NIS-defined users

On 06/07/2011 06:13 PM, Matthias Andree wrote:
> Note that while the trigger is cron starting before the respective user
> database (NIS, LDAP, whatever), cron should really re-check each time to
> be resilient to temporary network hicc-ups.

In general, cron does not contain a single line of networking code,
temporary network hiccups really aren't its problem. Please don't take
this wrong, I'm merely trying to point the difficulties I face as a
Maintainer.

In this particular case, though, temporary hiccups would only matter if
a crontab were to be changed and saved during such a hiccup, so the risk
is only minor.

Revision history for this message
Christian Kastner (ckk) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

On 06/08/2011 12:30 AM, Nathan Stratton Treadway wrote:
> In the longer term, obviously it would be great for cron better
> integrate with network-based user databases, but it would be nice to
> hear from some Ubuntu developers whether that sort of change is likely
> to get backported to Lucid?

This is the key point. I talked to an Ubuntu Dev for the 11.04 release,
but we didn't make it in time and upstart was also missing a feature we
needed to emulate the LSB "Should-Start" header (I'm grossly simplifying
here).

Revision history for this message
Christian Kastner (ckk) wrote :

On 06/08/2011 01:52 AM, Graeme wrote:
> Regardless, it's been my experience that Ubuntu doesn't backport fixes
> of this nature to releases that have already shipped. I've found that
> the bugs must constitute a clear, widespread problem for a majority of
> users before a backport is even considered. That's more what I was
> speaking to than just a fix getting applied. If that policy's changed,
> please correct me.

I'm afraid I'm not familiar enough with Ubuntu's release policy to
comment on this matter...

Revision history for this message
Christian Kastner (ckk) wrote :

On 06/08/2011 11:12 AM, Matthias Andree wrote:
> Am 08.06.2011 01:17, schrieb Christian Kastner:
>
>> Theoretically, this could easily be extended to #27520 -- just a few
>> lines of code, actually -- but it's just not that simple, because there
>> are cases where ORPHAN is completely valid, ie the user really doesn't
>> exist.
>
> Fine, but cron should recheck.

As I mentioned earlier, I have a fix in mind for the next release which
should resolve the issue in a manner not conflicting with the other goals.

Revision history for this message
Christian Kastner (ckk) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

On 06/08/2011 11:14 AM, Matthias Andree wrote:
> - it often takes longer than a year for a bug report to be triaged at
> all, not even thinking of "looked at by the packagers" -- even for
> things that are clearly upstream bugs and need to be delegated outside
> Ubuntu, which doesn't require packager attention but can happen in
> triaging already.

Well, most of us are volunteers with limited resources. I'm not even
involved in Ubuntu, I'm one of the Maintainers of the Debian package
(which gets sync'ed here from time to time) and just try to occasionally
help out our derivative distros.

Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

Am 18.06.2011 03:39, schrieb Christian Kastner:
> As I mentioned earlier, I have a fix in mind for the next release which
> should resolve the issue in a manner not conflicting with the other goals.

"next release" doesn't help those running LTS...

--
Matthias Andree

Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: [Bug 27520] Re: cron daemon starts before LDAP/NIS client, causing "ORPHAN" message for all LDAP/NIS-defined users

Am 18.06.2011 03:19, schrieb Christian Kastner:
> On 06/07/2011 06:13 PM, Matthias Andree wrote:
>> Note that while the trigger is cron starting before the respective user
>> database (NIS, LDAP, whatever), cron should really re-check each time to
>> be resilient to temporary network hicc-ups.
>
> In general, cron does not contain a single line of networking code,
> temporary network hiccups really aren't its problem. Please don't take
> this wrong, I'm merely trying to point the difficulties I face as a
> Maintainer.

I understand that. I've been trying to point out that the libc getpw*()
interfaces have no means to reliably/portably return error conditions to
distinguish between "temporary issue" and "user not present", and
nsswitch can't help here either. (a. it doesn't properly map "NIS domain
not bound yet" to a temporary condition, b. even if it did there were no
means to make it retry).

My conclusion is that cron needs to check for user existence with
getpwnam() or similar each and every time it tries to *run a job*, not
just when reading the crontabs.

--
Matthias Andree

Revision history for this message
Christian Kastner (ckk) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

On 06/18/2011 11:29 AM, Matthias Andree wrote:
> Am 18.06.2011 03:39, schrieb Christian Kastner:
>> As I mentioned earlier, I have a fix in mind for the next release which
>> should resolve the issue in a manner not conflicting with the other goals.
>
> "next release" doesn't help those running LTS...

As stated previously, I am not involved in Ubuntu and therefore have
neither the privileges nor the insight required to effect a bugfix
release for LTS, or any other release for that matter.

And -- judging from the subscriber list -- unless you bring this bug to
the attention of someone who *does* meet the above requirements, I
believe any further comments regarding such a release will remain fruitless.

Changed in cron (Ubuntu Lucid):
status: New → Confirmed
Revision history for this message
Christian Kastner (ckk) wrote :

cron 3.0pl1-119, recently uploaded to Debian unstable, contains a patch that forces a rescan of ORPHANed crontabs. Credit for the fix goes to Fedora cronie, from where it was taken.

I'll try to get this sync'ed over the weekend, together with some other bugfixes for cron in Ubuntu.

Revision history for this message
RoyK (roysk) wrote :

Any chance for a fix in Lucid on this one?

Revision history for this message
Steve Langasek (vorlon) wrote :

This bug has been incorrectly marked as fixed for the most recent development release due to the addition of Should-Start: nslcd to the init script. A more complete fix is still needed here.

Changed in cron (Ubuntu):
status: Fix Released → Triaged
Revision history for this message
Steve Langasek (vorlon) wrote :

uploaded the Debian fix to the freeze queue for oneiric. Once accepted, this can be backported to lucid.

Changed in cron (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cron - 3.0pl1-116ubuntu3

---------------
cron (3.0pl1-116ubuntu3) oneiric; urgency=low

  * Cherry-pick fix from Debian: database.c, cron.c, cron.h,
    debian/copyright:
    - Check orphaned crontabs for adoption. Fix taken from Fedora cronie.
      Closes: #634926, LP: #27520.
 -- Steve Langasek <email address hidden> Mon, 19 Sep 2011 10:21:01 -0700

Changed in cron (Ubuntu):
status: Fix Committed → Fix Released
Steve Langasek (vorlon)
Changed in cron (Ubuntu Maverick):
status: New → Triaged
status: Triaged → In Progress
Changed in cron (Ubuntu Natty):
status: New → Triaged
Changed in cron (Ubuntu Maverick):
status: In Progress → Triaged
Changed in cron (Ubuntu Lucid):
status: Confirmed → Triaged
importance: Undecided → Medium
Changed in cron (Ubuntu Maverick):
importance: Undecided → Medium
Changed in cron (Ubuntu Natty):
importance: Undecided → Medium
Revision history for this message
Jose Plans (jplans) wrote :

Lucid's debdiff

description: updated
Revision history for this message
Jose Plans (jplans) wrote :

Maverick's debdiff

Revision history for this message
Jose Plans (jplans) wrote :

Natty's debdiff

Revision history for this message
Adam Stokes (adam-stokes) wrote :
Revision history for this message
Adam Stokes (adam-stokes) wrote :
Revision history for this message
Adam Stokes (adam-stokes) wrote :
Steve Langasek (vorlon)
Changed in cron (Ubuntu Lucid):
assignee: nobody → Barry Warsaw (barry)
Changed in cron (Ubuntu Maverick):
assignee: nobody → Barry Warsaw (barry)
Changed in cron (Ubuntu Natty):
assignee: nobody → Barry Warsaw (barry)
Barry Warsaw (barry)
Changed in cron (Ubuntu Lucid):
status: Triaged → In Progress
Changed in cron (Ubuntu Maverick):
status: Triaged → In Progress
Changed in cron (Ubuntu Natty):
status: Triaged → In Progress
Barry Warsaw (barry)
tags: added: verification-needed
Revision history for this message
Barry Warsaw (barry) wrote :

I'm changing the version number for the natty SRU to 3.0pl1-116ubuntu1.1 since otherwise, you have the potential (should another upload to natty be required) of running into the oneric version number. If that happens, people upgrading from natty to oneiric could potentially miss the oneiric version of the package, which already looks to contain the fix (but still).

Revision history for this message
Barry Warsaw (barry) wrote :

All three uploads are sponsored now.

Changed in cron (Ubuntu Lucid):
status: In Progress → Fix Committed
Changed in cron (Ubuntu Maverick):
status: In Progress → Fix Committed
Changed in cron (Ubuntu Natty):
status: In Progress → Fix Committed
Changed in cron (Ubuntu Lucid):
milestone: none → lucid-updates
Changed in cron (Ubuntu Maverick):
milestone: none → maverick-updates
Changed in cron (Ubuntu Natty):
milestone: none → natty-updates
Revision history for this message
Martin Pitt (pitti) wrote :

I don't consider it appropriate to fix this in maverick or natty. maverick is two months before its EOL, and given the regression potential of such a central package and service I don't see us ever getting enough testing for natty either.

The SRU upload for lucid changes debian/copyright, why is it doing this?

Changed in cron (Ubuntu Maverick):
status: Fix Committed → Won't Fix
assignee: Barry Warsaw (barry) → nobody
milestone: maverick-updates → none
Changed in cron (Ubuntu Natty):
milestone: natty-updates → none
status: Fix Committed → Won't Fix
assignee: Barry Warsaw (barry) → nobody
Revision history for this message
Jose Plans (jplans) wrote :

Hi Martin, the debian commit including the patch from Tomas Mraz (cronie fedora) included the copyright changes. Perhaps a cleanup. We decided to keep it consistent, should we remove these? thanks

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

Jose Plans [2012-02-28 14:02 -0000]:
> Hi Martin, the debian commit including the patch from Tomas Mraz (cronie
> fedora) included the copyright changes. Perhaps a cleanup. We decided to
> keep it consistent, should we remove these? thanks

They don't hurt much either way, but it's very unusual to change it in
a stable release, and I was wondering if it was intentional.

Revision history for this message
Matthias Andree (matthias-andree) wrote : Re: [Bug 27520] Re: cron daemon caches user-non-existent lookup results, causing "ORPHAN" message and skipping jobs for all LDAP/NIS-defined users

Well... I don't seriously mind not fixing maverick, but reject changes
for natty more than half a year before EOL as you will, just don't
expect that this raises credibility of Ubuntu "support".

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Matthias, to Martin's point, the level of testing we see in each release goes down significantly as new stable releases are available. This is evident in the number of untested SRU's that have been pushed into natty-proposed and stayed there because not even the original reporters return to verify the fix.

So while I understand the frustration, please understand that we have to prioritize the testing resources and development resources we have. Since natty users can upgrade directly to oneiric and have this solved, and the subset of users affected is limited, its in the greater interest of Ubuntu users to pass on backporting the fix to natty.

Revision history for this message
Matthias Andree (matthias-andree) wrote :

Am 28.02.2012 20:55, schrieb Clint Byrum:
> Matthias, to Martin's point, the level of testing we see in each release
> goes down significantly as new stable releases are available. This is
> evident in the number of untested SRU's that have been pushed into
> natty-proposed and stayed there because not even the original reporters
> return to verify the fix.

The key point is "reasonably quick turnaround". Bugs take ages to even
be triaged (and those be forwarded upstream that are clearly upstream
bugs), let alone be fixed -- I've more than once been unable to verify
fixes because the hardware I reported bugs against was no longer
available, or because the systems had been upgraded, or moved to
different distributions that do fix bugs on a quicker schedule.

This bug has lived for nearly six years!

> So while I understand the frustration, please understand that we have to
> prioritize the testing resources and development resources we have.
> Since natty users can upgrade directly to oneiric and have this solved,
> and the subset of users affected is limited, its in the greater interest
> of Ubuntu users to pass on backporting the fix to natty.

I do understand the resource constraints, but please understand that the
massive deliberate desktop disruption with the GNOME->Unity move and the
still feature-limited KDE 4 makes the upgrade much more difficult than
the upgrades of the pre-Unity era used to be.

There's a reason why I still have maverick and natty systems around...
and given responses like this it's unlikely those will upgrade. Much
more likely they'll move on to Mint, Fedora, or possibly even FreeBSD.

Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Graeme, or anyone else affected,

Accepted cron into lucid-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message
Tessa (unit3) wrote :

Sorry, as previous comments have mentioned, the timeline to get this fixed means all of my affected systems have been migrated to other distros / authentication methods / workarounds in the intervening 6 years. I've actually switched jobs twice since this bug was opened, and the place I work now is a CentOS shop, so I have no ability to reproduce or test this bug.

In other words: please use this bug as an example for your managers of why people are reluctant to use Ubuntu on servers and in the data center.

Revision history for this message
Nathan Stratton Treadway (nathanst) wrote : Re: [Bug 27520] Please test proposed package

On Mon, Mar 05, 2012 at 16:38:16 -0000, Martin Pitt wrote:
> Accepted cron into lucid-proposed. The package will
> build now and be available in a few hours. Please test
> and give feedback here. See https://wiki.ubuntu.
> com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you in advance!

I performed the following testing on Ubuntu Lucid x86_64
system running cron 3.0pl1-106ubuntu5 and using
libpam-ldapd/libnss-ldapd/nscld for user
authentication.

I started out by creating a cron job for "ldapuser" that
run every minute, and made sure that "nscd" wasn't
running. (Before these tests I didn't have any
user-level crontabs on this system.)

I confirmed the expected behavior with the existing cron
version:

  # /etc/init.d/nslcd stop
   * Stopping LDAP connection daemon nslcd
  # service cron restart
  cron start/running, process 5259

The expected syslog message "cron[5259]: (ldapuser)
ORPHAN (no passwd entry)" appeared, and as expected the
cron job never started firing, even after I started up
nslcd again a few seconds later.

Then I restarted cron:
  # service cron restart
  cron start/running, process 5280

, and as expected no "ORPHAN" message appeared, and the
cron job fired at the start of the next minute.

Then I stopped nslcd again, configured lucid-proposed
and installed cron 3.0pl1-106ubuntu6.

  # /etc/init.d/nslcd stop
  # aptitude -u
  [...]
  Setting up cron (3.0pl1-106ubuntu6) ...
  cron start/running, process 5465

Syslog still showed the ORPHAN entry in the startup
messages for that instance of cron, and then cron job
didn't fire over the next two minutes. But then I
started nslcd again... and sure enough at the start of
the next minute the cron job did fire, as desired.

After letting the cron job run for a couple minutes, I
rebooted the system, and as it started up cron gave me
the ORPHAN message as usual... but by the start of the
next minute nslcd had already started up, and my cron
job began to fire as expected without any further
effort.

I left the new version of cron running for two nights,
and both mornings the cron.daily jobs ran as expected;
the cron.hourly job shows up in syslog every hour
(though it doesn't actually have anything to do on this
system).

So as far as I can see on this lightly-used system, cron
3.0pl1-106ubuntu6 does fix the ORPHAN problem, and
otherwise continues to work as before.

The only thing that's a little strange/surprising is
that no syslog message is generated when a user's
crontab is moved out of orphan status -- you still get
the ORPHAN message during startup, but no later message
to indicate that the crontab has been "activated"....

Nathan

Revision history for this message
Peter Matulis (petermatulis) wrote :

So what's the status on this one? Are we good to go?

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cron - 3.0pl1-106ubuntu6

---------------
cron (3.0pl1-106ubuntu6) lucid-proposed; urgency=low

  * Cherry-pick fix from Debian: database.c, cron.c, cron.h
    :
    - Check orphaned crontabs for adoption. Fix taken from Fedora cronie.
      Closes: #634926, LP: #27520.
 -- Adam Stokes <email address hidden> Thu, 19 Jan 2012 08:26:59 -0500

Changed in cron (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
arbuntu (arb) wrote :

This problem is still *not* fixed.

(I am running cron 3.0pl1-116ubuntu1 on 11.04 with users in NIS.)

Revision history for this message
Harald Hannelius (harald-arcada) wrote :

I noticed this error on Ubuntu 20.10. Local OpenLDAP, nslcd and a LDAP-user's cron-jobs aren't run because the log says orphaned.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.