landscape-common.postinst stuck with defunct who

Bug #277038 reported by Martin von Gagern on 2008-10-02
50
This bug affects 1 person
Affects Status Importance Assigned to Milestone
landscape-client (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: landscape-common

During an update of Intrepid (I think from Alpha 6 to Beta), Adept was updating landscape-common for several minutes without any progress. Investigating the issue with ps got me this process call stack:

/usr/bin/dpkg --status-fd 3 --configure <lots of packages...>
/usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/landscape-common.postinst configure
/bin/sh /var/lib/dpkg/info/landscape-common.postinst configure
/bin/sh -e /usr/sbin/update-motd
run-parts --lsbsysinit /etc/update-motd.d
/bin/sh /etc/update-motd.d/50-landscape-sysinfo
/usr/bin/python /usr/bin/landscape-sysinfo
[who] <defunct>

Looks like who terminated, but landscape-sysinfo wasn't ready for dead children. Killing who gave no result, neither with SIGTERM nor with SIGKILL. SIGKILL to its parent, landscape-sysinfo, however resulted in Adept resuming its operations. landscape-common wasn't listed in the output of "dpkg --audit" after all of this, but I don't know whether the update actually worked as expected despite this problem.

Andreas Hasenack (ahasenack) wrote :

I'm not sure I understand the lines you posted. Was [who] a leaf of the process tree starting at dpkg --configure?

Martin von Gagern (gagern) wrote :

Yes. The "tree" starting at dpkg was linear, with one child for each process, except for who which was a leaf.

I just had a look at the code. /usr/lib/python2.5/site-packages/landscape/lib/sysstats.py uses twisted.internet.utils.getProcessOutputAndValue to call "who -q". Therefore the actual cause of this bug might also lie in python-twisted-core, which might for some reason have failed to reap this dead child. Should this bug here therefore be marked as affecting twisted as well as landscape-client?

I cannot reproduce the issue by calling landscape-sysinfo manually. If I hadn't been in such a hurry, I might have thought of stracing landscape-sysinfo, but it's too late for that now.

Andreas Hasenack (ahasenack) wrote :

It could also be stuck somewhere else. We had a problem before with the whole process being stopped because dpkg was asking a configuration question, and this was unexpected. I don't know Adept, so I don't know how it would handle that situation.

Martin von Gagern (gagern) wrote :

I'm not sure enough to completely rule that out, but it sounds unlikely to me, for the following reasons. Either lanscape-sysinfo itself would be asking a question. That doesn't seem to be in its job description, especially when called without parameters, so I think this unlikely. Or some other process would be asking a question. Then there would be nothing to prevent lanscape-sysinfo from reaping its defunct child. Ergo, no questions involved.

Andreas Hasenack (ahasenack) wrote :

Marking it as confirmed as we have a few duplicates already, albeit not all dupes have enough details to tell if it's "who" that is stuck.

Changed in landscape-client:
status: New → Confirmed

Andreas Hasenack wrote:
> not all dupes have enough details to tell if it's "who" that is stuck.

If the error lies in the way how landscape-sysinfo executes other
processes via python-twisted-core, then the invocation of other commands
invoked using this mechanism might turn zombie as well.

Andreas Hasenack (ahasenack) wrote :

For a moment I thought this was fixed in bug #257346, but the two other bug reports we got (#291282 and #293598) seem to have versions with the fix.

Mike Pontillo (mpontillo) wrote :

Another data point: this is happening in the service monitor in MAAS when using `getProcessOutputAndValue`. (See also bug #1793448)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers