podman error with nova container: stderr: standard_init_linux.go:203: exec user process caused \"no such file or directory\"

Bug #1804434 reported by Sorin Sbarnea on 2018-11-21
46
This bug affects 5 people
Affects Status Importance Assigned to Milestone
tripleo
High
Emilien Macchi

Bug Description

Based on the log this fails to get some images while some are there.

2018-11-19 21:42:01 | TASK [Debug output for task: Start containers for step 3] **********************
2018-11-19 21:42:01 | fatal: [fedora-28-vexxhost-sjc1-0000573726]: FAILED! => {
2018-11-19 21:42:01 | "failed_when_result": true,
2018-11-19 21:42:01 | "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
2018-11-19 21:42:01 | "$ podman inspect --type image --format exists docker.io/tripleomaster/centos-binary-cinder-api:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b",
2018-11-19 21:42:01 | "b'exists'",
2018-11-19 21:42:01 | "b''",
2018-11-19 21:42:01 | "$ podman inspect --type image --format exists docker.io/tripleomaster/centos-binary-cinder-volume:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b",
2018-11-19 21:42:01| "b'error getting image \"docker.io/tripleomaster/centos-binary-cinder-volume:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b\": unable to find \\'docker.io/tripleomaster/centos-binary-cinder-volume:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b\\' in local storage\\n'",

Full log: http://logs.openstack.org/56/618056/8/check/tripleo-ci-fedora-28-standalone/a3755dd/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-11-19_21_42_01

Logstash confirms that this is a recurring issue:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22error%20getting%20image%5C%22%20AND%20message%3A%5C%22in%20local%20storage%5C%22%20AND%20tags%3Aconsole

Sorin Sbarnea (ssbarnea) on 2018-11-21
Changed in tripleo:
assignee: nobody → Gabriele Cerami (gcerami)
Emilien Macchi (emilienm) wrote :

The bug report is wrong, the job mentioned in the link failed because of this error:
stderr: standard_init_linux.go:203: exec user process caused \"no such file or directory\"

Also, the logstash URL is wrong as well. I have to admit it is confusing but the "error getting image" message isn't an error message that is critical to us, we should fix the operator experience I agree.

Let me investigate the "exec user process caused \"no such file or directory\"" issue, which is the real problem here.

Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → Emilien Macchi (emilienm)
tags: added: containers
removed: alert
tags: added: ci
Changed in tripleo:
milestone: none → stein-2
status: New → In Progress
Emilien Macchi (emilienm) wrote :

The container that causes problem:
2018-11-19 21:42:01 | "Error running ['podman', 'run', '--name', 'nova_wait_for_db_sync', '--label', 'config_id=tripleo_step3', '--label', 'container_name=nova_wait_for_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/docker-config-scripts/nova_wait_for_db_sync.py\", \"detach\": false, \"image\": \"docker.io/tripleomaster/centos-binary-nova-placement-api:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b\", \"net\": \"host\", \"privileged\": false, \"start_order\": 1, \"user\": \"root\", \"volumes\": [\"/var/lib/nova:/var/lib/nova:shared\", \"/var/lib/docker-config-scripts/:/docker-config-scripts/\", \"/var/lib/config-data/puppet-generated/nova_placement/etc/nova:/etc/nova:ro\"]}', '--net=host', '--privileged=false', '--user=root', '--volume=/var/lib/nova:/var/lib/nova:shared', '--volume=/var/lib/docker-config-scripts/:/docker-config-scripts/', '--volume=/var/lib/config-data/puppet-generated/nova_placement/etc/nova:/etc/nova:ro', 'docker.io/tripleomaster/centos-binary-nova-placement-api:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b', '/docker-config-scripts/nova_wait_for_db_sync.py']. [1]",
2018-11-19 21:42:01 | "stderr: standard_init_linux.go:203: exec user process caused \"no such file or directory\"",

Cédric Jeanneret (cjeanner) wrote :

So, two upstream bugs have been created:
- https://github.com/containers/libpod/issues/1845 for the "unclear message" when we test the existence of the image
- https://github.com/containers/libpod/issues/1844 for the "real" error we are hitting in this LP

Guess we can rename it, btw.

summary: - podman fails to find image: error getting image
+ podman error with nova container: stderr: standard_init_linux.go:203:
+ exec user process caused \"no such file or directory\"
Emilien Macchi (emilienm) wrote :

The bug is probably in podman, as a race, but really started to be visible with https://review.openstack.org/#/c/610966/.

There are quite a lot of hits in CI right now, looking at the logstash query and it's always the same containers, so proposing a revert for now.

Emilien Macchi (emilienm) wrote :

Also note that we have this problem with Docker as well:
http://logs.openstack.org/98/619598/3/check/tripleo-ci-fedora-28-standalone-docker/b3b6940/logs/undercloud/var/log/extra/errors.txt

It might be a race on our side as well.

Reviewed: https://review.openstack.org/619607
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=ba30607ec618ad8913aff05cf97849e764c7ccd2
Submitter: Zuul
Branch: master

commit ba30607ec618ad8913aff05cf97849e764c7ccd2
Author: Emilien Macchi <email address hidden>
Date: Thu Nov 22 16:22:05 2018 +0000

    Revert "Verify nova api migration finished before start placement"

    This reverts commit c19b58a9f312bbe2ef0183f08e6773431eba6fe6.
    Related-Bug: #1804434

    Change-Id: I801a53e1cf2ec923b8294824f6738bedbc30bdf7

wes hayutin (weshayutin) on 2018-12-21
tags: added: promotion-blocker

Related fix proposed to branch: master
Review: https://review.openstack.org/626955

Reviewed: https://review.openstack.org/626583
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=6822ba9a7e264f0fbc454ce830483fd5e29358db
Submitter: Zuul
Branch: master

commit 6822ba9a7e264f0fbc454ce830483fd5e29358db
Author: Wes Hayutin <email address hidden>
Date: Thu Dec 20 06:52:10 2018 -0700

    temporarily turn off podman

    We're hitting several issues in the upstream
    ci related to podman :(

    https://bugs.launchpad.net/tripleo/+bug/1804434
    https://github.com/containers/libpod/issues/1844

    Related-Bug: #1804434
    Related-Bug: #1809218
    Change-Id: I19aa04382ba159768a1748d44412bbc670facaf3

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/626953

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/626955

Changed in tripleo:
milestone: stein-2 → stein-3
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.