pstree crashes on fclose(NULL)

Bug #1755681 reported by Thomas Snider
This bug report is a duplicate of:  Bug #1837444: pstree seg fault. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
psmisc (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Confirmed
Medium
Unassigned

Bug Description

I am on 16.04.4 LTS. I am seeing pstree crash intermittently, and I tracked it down to an fclose(NULL):

https://gitlab.com/psmisc/psmisc/blob/28005b99fedef566b06286bd6ca72a7a4d673f20/src/pstree.c#L822

I am guessing it is a race condition where the procfs entry disappears between dir listing and the fopen() call several lines above.

This is fixed in the latest release (v23.0, released 9 months ago) but exists in the current version of psmisc for 16.04.4 (v22.21).

Would this call for a bump of the upstream version, or an Ubuntu-specific patch to remove that fclose() call?

Tags: server-next
Revision history for this message
Thomas Snider (tjps) wrote :

After two weeks of no response, is this the wrong venue?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I think you are right here, not sure what happened to the triager of march 14th.

23 is in Artful and Bionic already.
So the question here is to backport via an SRU to Xenial

Changed in psmisc (Ubuntu):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The offending line you identified was not fixed as a bug fix that one could directly backport, but instead as part of [1]

Instead for an SRU we would need a more surgical approach, maybe just a check to the pointer like:
if (file != NULL)
    fclose(file);

I'm not so sure about the priority of this, how regular are you hitting that on 16.04.4?
I had no rapid process spawn/kill workload, but in 10/10 tries I didn't hit that.
Then I ran:
$ stress-ng --fork 4 --vfork 4 --exec 4 --metrics-brief
But still 10/10 pstree runs worked (and that is more than 100k forks/exits per sec).

OTOH the change seems trivial, never the less I'd be interested how (reliably) you hit that.

[1]: https://gitlab.com/psmisc/psmisc/commit/265fa43ee48898001130b90d3656d48af5d241aa

tags: added: server-next
Changed in psmisc (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note: I'm asking as a SRU [1] usually needs sort of a reproduction steps guide to verify - and since I fail to recreate ...

[1]: https://wiki.ubuntu.com/StableReleaseUpdates

Revision history for this message
Thomas Snider (tjps) wrote :

My reliable repro was actually a much less intensive `watch 'pstree -aup' | grep "..."`

With the default watch interval of 2 seconds, I would reliably hit this 1-2 times per day with it running continuously in the background. Process churn isn't crazy, but I am regularly compiling various things.

I actually debugged the issue to make sure it was consistent and not RAM failing (this is in a desktop with non-ECC RAM). I hate to say it's not critical at the risk of this ticket being closed WONT_FIX but it is not actually a critical issue.

Would be nice to have fixed, even if it is just an if()-guard added to the existing version. How involved is that process? Is it something I could submit a patch for?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Lucas identified this beign the same as one that he works on.
Marking as duplicate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.