unable to find user root: no matching entries in passwd file

Bug #1803544 reported by Cédric Jeanneret
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

We sometime hit the following issue while deploying an undercloud:

2018-11-13 18:16:19 | "2018-11-13 18:13:39,017 ERROR: 4514 -- Failed running docker-puppet.py for heat_api",
2018-11-13 18:16:19 | "2018-11-13 18:13:39,018 ERROR: 4514 -- error mounting image volumes: unable to find user root: no matching entries in passwd file",

Problem is: it's not always the case, and it's not always the same container. Until now, I had it for swift-related and heat-related containers.

It's happening for a while, at random:
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22no%20matching%20entries%20in%20passwd%20file%5C%22

tags: removed: tripleo-heat-templates
tags: added: ci
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :
Revision history for this message
Rabi Mishra (rabi) wrote :

Possibly a runc issue as there is some discussion[1] about something similar in a docker thread[1] which ends with:

"Currently there is no resolution to this issue. It is being tracked via an internal bug.”

And that's not good:/

[1] https://forums.docker.com/t/unable-to-find-user-root-no-matching-entries-in-passwd-file/26545

Revision history for this message
Harald Jensås (harald-jensas) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

just seen this on the master tht gate check centos standalone http://logs.openstack.org/98/604298/105/check/tripleo-ci-centos-7-standalone/5b9e669/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-11-29_20_50_50

 2018-11-29 20:50:50 | "2018-11-29 20:48:29,700 ERROR: 20487 -- error mounting image volumes: unable to find user root: no matching entries in passwd file",

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

We are seeing more and more of these errors and they seems to be caused by "classified" docker bug.

http://status.openstack.org/elastic-recheck/index.html#1803544

https://forums.docker.com/t/unable-to-find-user-root-no-matching-entries-in-passwd-file/26545/7

What can we do about it?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/624420

Revision history for this message
wes hayutin (weshayutin) wrote :
tags: added: alert
Revision history for this message
wes hayutin (weshayutin) wrote :

This is happening a lot and failing the gate, I see it's being worked on but setting alert for visibility.

Changed in tripleo:
importance: High → Critical
Revision history for this message
Emilien Macchi (emilienm) wrote :
wes hayutin (weshayutin)
tags: removed: alert
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/624420

Revision history for this message
Rabi Mishra (rabi) wrote :
Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

The fix is supposed to land before the end of the week. I am sure that Emilien with post a comment when this lands.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/624420

Revision history for this message
wes hayutin (weshayutin) wrote :

Bug 1803544 - unable to find user root: no matching entries in passwd file
0 fails in 24 hrs / 0 fails in 10 days
Projects: (tripleo - Triaged)
No matches
Logstash Launchpad

Revision history for this message
wes hayutin (weshayutin) wrote :

closing.

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/696120
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=393e96b5b9e8affd35b48fb45f0a75b253629074
Submitter: Zuul
Branch: master

commit 393e96b5b9e8affd35b48fb45f0a75b253629074
Author: Michele Baldessari <email address hidden>
Date: Tue Nov 26 17:18:03 2019 +0100

    Use '0' instead of root in container-puppet.py

    Even though the number of user lookups have been reduced from two to one
    via https://github.com/containers/libpod/pull/1978, we still see the
    following error from time to time:
    time="2019-11-22T19:19:33Z" level=debug msg="ExitCode msg: \"unable to find user root: no matching entries in passwd file\""
    time="2019-11-22T19:19:33Z" level=error msg="unable to find user root: no matching entries in passwd file"

    The TLDR; is that podman/docker, when passed a --user=<name> parameter,
    will parse the /etc/passwd file inside the container and detect the
    uid/gid to switch to. The problem seems to be that sometimes this
    /etc/passwd is either read as empty or non-existant when we try and
    parse it (the root-cause of which is the real underlying bug).

    Since it seems that root-causing this will take a rather large amount of
    time, we can just pass the UID directly which will not fail when
    the parsing code cannot find the specified user in /etc/passwd, as it
    simply uses the provided UID:
    https://github.com/containers/libpod/blob/master/vendor/github.com/opencontainers/runc/libcontainer/user/user.go#L333

    Tested this by running a reproducer on three machines for a total
    of ~800 runs and had 0 occurrences of this error. Previously I could
    reproduce this issue in about 30 to 60 runs at most.

    Related rhbz: 1776766
    Related-Bug: #1803544

    Change-Id: Ia9860107c35e543a05775596076873ea950b7400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/696641

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/696641
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=871c1a30326c3c5b0e3703a0ab64deedc1ded5a6
Submitter: Zuul
Branch: stable/train

commit 871c1a30326c3c5b0e3703a0ab64deedc1ded5a6
Author: Michele Baldessari <email address hidden>
Date: Tue Nov 26 17:18:03 2019 +0100

    Use '0' instead of root in container-puppet.py

    Even though the number of user lookups have been reduced from two to one
    via https://github.com/containers/libpod/pull/1978, we still see the
    following error from time to time:
    time="2019-11-22T19:19:33Z" level=debug msg="ExitCode msg: \"unable to find user root: no matching entries in passwd file\""
    time="2019-11-22T19:19:33Z" level=error msg="unable to find user root: no matching entries in passwd file"

    The TLDR; is that podman/docker, when passed a --user=<name> parameter,
    will parse the /etc/passwd file inside the container and detect the
    uid/gid to switch to. The problem seems to be that sometimes this
    /etc/passwd is either read as empty or non-existant when we try and
    parse it (the root-cause of which is the real underlying bug).

    Since it seems that root-causing this will take a rather large amount of
    time, we can just pass the UID directly which will not fail when
    the parsing code cannot find the specified user in /etc/passwd, as it
    simply uses the provided UID:
    https://github.com/containers/libpod/blob/master/vendor/github.com/opencontainers/runc/libcontainer/user/user.go#L333

    Tested this by running a reproducer on three machines for a total
    of ~800 runs and had 0 occurrences of this error. Previously I could
    reproduce this issue in about 30 to 60 runs at most.

    Related rhbz: 1776766
    Related-Bug: #1803544

    NB: Cherry-pick not 100% clean

    Change-Id: Ia9860107c35e543a05775596076873ea950b7400
    (cherry picked from commit 393e96b5b9e8affd35b48fb45f0a75b253629074)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/720588

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/720588
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=cea234d40ef70a740421759771b97c5b20aac8d5
Submitter: Zuul
Branch: stable/stein

commit cea234d40ef70a740421759771b97c5b20aac8d5
Author: Michele Baldessari <email address hidden>
Date: Tue Nov 26 17:18:03 2019 +0100

    Use '0' instead of root in container-puppet.py

    Even though the number of user lookups have been reduced from two to one
    via https://github.com/containers/libpod/pull/1978, we still see the
    following error from time to time:
    time="2019-11-22T19:19:33Z" level=debug msg="ExitCode msg: \"unable to find user root: no matching entries in passwd file\""
    time="2019-11-22T19:19:33Z" level=error msg="unable to find user root: no matching entries in passwd file"

    The TLDR; is that podman/docker, when passed a --user=<name> parameter,
    will parse the /etc/passwd file inside the container and detect the
    uid/gid to switch to. The problem seems to be that sometimes this
    /etc/passwd is either read as empty or non-existant when we try and
    parse it (the root-cause of which is the real underlying bug).

    Since it seems that root-causing this will take a rather large amount of
    time, we can just pass the UID directly which will not fail when
    the parsing code cannot find the specified user in /etc/passwd, as it
    simply uses the provided UID:
    https://github.com/containers/libpod/blob/master/vendor/github.com/opencontainers/runc/libcontainer/user/user.go#L333

    Tested this by running a reproducer on three machines for a total
    of ~800 runs and had 0 occurrences of this error. Previously I could
    reproduce this issue in about 30 to 60 runs at most.

    Related rhbz: 1776766
    Related-Bug: #1803544

    NB: Cherry-pick not 100% clean

    Change-Id: Ia9860107c35e543a05775596076873ea950b7400
    (cherry picked from commit 393e96b5b9e8affd35b48fb45f0a75b253629074)
    (cherry picked from commit 871c1a30326c3c5b0e3703a0ab64deedc1ded5a6)

tags: added: in-stable-stein
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.