running skopeo in podman fails with "Error inspecting image"

Bug #1797114 reported by Cédric Jeanneret on 2018-10-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Medium
Emilien Macchi

Bug Description

Hello guys,

While deploying an overcloud using podman as container_cli, we hit a situation where `skopeo inspect' doesn't work when launched from within a container, but actually does work when launched from the host.

For example:
http://logs.openstack.org/52/608452/7/check/tripleo-ci-centos-7-containers-multinode/5ec1089/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-10-08_16_19_44

It can be reproduced using the quickstart reproducer script located here for example:
http://logs.openstack.org/52/608452/7/check/tripleo-ci-centos-7-containers-multinode/5ec1089/logs/reproducer-quickstart.sh

If you're using a libvirt node, you can run it like that:
bash reproducer-quickstart.sh -w $(pwd)/work -v true -a -l

Changed in tripleo:
importance: Undecided → Medium
Cédric Jeanneret (cjeanner) wrote :

Fun fact: running "podman exec <running container> skopeo inspect <image>" does actually work....

Cédric Jeanneret (cjeanner) wrote :

More precisions:

Two containers, two different ways to react. One is working as expected, while the second one does NOT.

Skopeo versions are the same:
skopeo-0.1.31-1.gitf9baaa6.el7.x86_64
skopeo-containers-0.1.31-1.gitf9baaa6.el7.x86_64

Failing container:
mistral_engine (192.168.24.1:8787/tripleomaster/centos-binary-mistral-engine:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280-updated-20181010112921)

Working container:
mistral_api (192.168.24.1:8787/tripleomaster/centos-binary-mistral-api:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280-updated-20181010112921)

The full error is:
2018-10-10 12:53:48 | Error inspecting image: docker://docker.io/tripleomaster/centos-binary-cron:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280
2018-10-10 12:53:48 | time="2018-10-10T12:53:46Z" level=fatal msg="error getting username and password: error reading JSON file "/run/containers/42430/auth.json": error unmarshaling JSON at "/run/containers/42430/auth.json": unexpected end of JSON input"

The /run/containers directory does not exist in the working container, while it DOES exist in the failing container.

Main difference: /run is bind-mounted in the failing container, while NOT in the working one.

Question: is the bind-mount requested? :)

Cédric Jeanneret (cjeanner) wrote :

Still digging.

Some other containers have skopeo installed, same version as well:
- mistral_event_engine (192.168.24.1:8787/tripleomaster/centos-binary-mistral-event-engine:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280-updated-20181010112921)
Also has the /run mounted

- nova_scheduler (192.168.24.1:8787/tripleomaster/centos-binary-nova-scheduler:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280-updated-20181010112921)
Also has the /run mounted

- mistral_executor (192.168.24.1:8787/tripleomaster/centos-binary-mistral-executor:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280-updated-20181010112921)
Also has the /run mounted

Small note: the access to /run/container is root-only.

If we run, with any of the listed "failing" container, the following command:
podman exec --user root 1d01350439f7 skopeo inspect docker://docker.io/tripleomaster/centos-binary-cron:965941f1e62cef16967e7a7cd6d98263e52acb62_0989b280

Tadaaa, it works.

So, real error is a "permission denied".

Cédric Jeanneret (cjeanner) wrote :

Still digging, and confirmation:
- running the exact same image with the exact same mounts, SAVE the /run:/run, allows skopeo to work as expected.

I suspect the binary to do some path checking, for instance "is there /run/containers, if so fetch the auth.json thingy", and it fails with an uncatched access error.

Meaning: skopeo/libpod bug.

Bogdan Dobrelya (bogdando) wrote :

Just let's fix mistral thingy that runs skopeo, buildah, podman et al. If we want that looking exactly like when launched from the host, one should use nsenter. I have an example [0] to use.

[0] https://review.openstack.org/#/q/topic:podman_rootwrap+(status:open+OR+status:merged)

Cédric Jeanneret (cjeanner) wrote :

So, as per Miloslav comment on the github issue, a way to workaround this issue is to just pass a new env variable to the containers:

-e "XDG_RUNTIME_DIR=/tmp/"

It seems this XDG_RUNTIME_DIR is used in order to create some temporary file for the authentication or something like that. Passing a user writable location is enough to make skopeo inspect work (just tested on my reproducer).

In parallel, the "Permission Denied" error is now correctly caught in https://github.com/containers/image/pull/515 - so we will get a better understanding next time we hit this kind of issue :).

Fix proposed to branch: master
Review: https://review.openstack.org/609586

Changed in tripleo:
assignee: Emilien Macchi (emilienm) → Steve Baker (steve-stevebaker)
status: Triaged → In Progress
Cédric Jeanneret (cjeanner) wrote :

@Bogdan: you have a fine hammer, but that doesn't mean everything is a nail ;).

Bogdan Dobrelya (bogdando) wrote :

Bind-mounting /run and/or any specific things like /run/containers to make thing appearing like executed on host, *does* look like a nail to me. We should use nsenter to make things working like executed on hosts avoiding "special" bind mounts IMO

Changed in tripleo:
assignee: Steve Baker (steve-stevebaker) → Cédric Jeanneret (cjeanner)
Changed in tripleo:
assignee: Cédric Jeanneret (cjeanner) → Steve Baker (steve-stevebaker)
Changed in tripleo:
assignee: Steve Baker (steve-stevebaker) → Emilien Macchi (emilienm)

Reviewed: https://review.openstack.org/609586
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=28a5ba0a01383f69b3a8880286830d4c00901fc4
Submitter: Zuul
Branch: master

commit 28a5ba0a01383f69b3a8880286830d4c00901fc4
Author: Steve Baker <email address hidden>
Date: Thu Oct 11 17:38:50 2018 +1300

    Replace skopeo inspect with python

    This replaces the skopeo inspect calls with python equivalent. It is
    faster than the skopeo inspect for two reasons:
    - the auth token is shared for all requests
    - the tags list request is made concurrently

    This should also help with running the dry-run prepare in the mistral
    podman container since /run is not involved at all in this
    implementation.

    Change-Id: Ia898d0acfdeac1699e7e08e2935a2a4eaf578531
    Closes-Bug: #1797114

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/tripleo-common 10.1.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.