podman error with nova container: stderr: standard_init_linux.go:203: exec user process caused \"no such file or directory\"

Bug #1804434 reported by Sorin Sbarnea
46
This bug affects 5 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Emilien Macchi

Bug Description

Based on the log this fails to get some images while some are there.

2018-11-19 21:42:01 | TASK [Debug output for task: Start containers for step 3] **********************
2018-11-19 21:42:01 | fatal: [fedora-28-vexxhost-sjc1-0000573726]: FAILED! => {
2018-11-19 21:42:01 | "failed_when_result": true,
2018-11-19 21:42:01 | "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [
2018-11-19 21:42:01 | "$ podman inspect --type image --format exists docker.io/tripleomaster/centos-binary-cinder-api:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b",
2018-11-19 21:42:01 | "b'exists'",
2018-11-19 21:42:01 | "b''",
2018-11-19 21:42:01 | "$ podman inspect --type image --format exists docker.io/tripleomaster/centos-binary-cinder-volume:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b",
2018-11-19 21:42:01| "b'error getting image \"docker.io/tripleomaster/centos-binary-cinder-volume:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b\": unable to find \\'docker.io/tripleomaster/centos-binary-cinder-volume:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b\\' in local storage\\n'",

Full log: http://logs.openstack.org/56/618056/8/check/tripleo-ci-fedora-28-standalone/a3755dd/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-11-19_21_42_01

Logstash confirms that this is a recurring issue:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22error%20getting%20image%5C%22%20AND%20message%3A%5C%22in%20local%20storage%5C%22%20AND%20tags%3Aconsole

Sorin Sbarnea (ssbarnea)
Changed in tripleo:
assignee: nobody → Gabriele Cerami (gcerami)
Revision history for this message
Emilien Macchi (emilienm) wrote :

The bug report is wrong, the job mentioned in the link failed because of this error:
stderr: standard_init_linux.go:203: exec user process caused \"no such file or directory\"

Also, the logstash URL is wrong as well. I have to admit it is confusing but the "error getting image" message isn't an error message that is critical to us, we should fix the operator experience I agree.

Let me investigate the "exec user process caused \"no such file or directory\"" issue, which is the real problem here.

Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → Emilien Macchi (emilienm)
tags: added: containers
removed: alert
tags: added: ci
Changed in tripleo:
milestone: none → stein-2
status: New → In Progress
Revision history for this message
Emilien Macchi (emilienm) wrote :
Revision history for this message
Emilien Macchi (emilienm) wrote :

The container that causes problem:
2018-11-19 21:42:01 | "Error running ['podman', 'run', '--name', 'nova_wait_for_db_sync', '--label', 'config_id=tripleo_step3', '--label', 'container_name=nova_wait_for_db_sync', '--label', 'managed_by=paunch', '--label', 'config_data={\"command\": \"/docker-config-scripts/nova_wait_for_db_sync.py\", \"detach\": false, \"image\": \"docker.io/tripleomaster/centos-binary-nova-placement-api:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b\", \"net\": \"host\", \"privileged\": false, \"start_order\": 1, \"user\": \"root\", \"volumes\": [\"/var/lib/nova:/var/lib/nova:shared\", \"/var/lib/docker-config-scripts/:/docker-config-scripts/\", \"/var/lib/config-data/puppet-generated/nova_placement/etc/nova:/etc/nova:ro\"]}', '--net=host', '--privileged=false', '--user=root', '--volume=/var/lib/nova:/var/lib/nova:shared', '--volume=/var/lib/docker-config-scripts/:/docker-config-scripts/', '--volume=/var/lib/config-data/puppet-generated/nova_placement/etc/nova:/etc/nova:ro', 'docker.io/tripleomaster/centos-binary-nova-placement-api:3ed8ac0e93367a02ad53d9fa93467057724b6621_fd8eb74b', '/docker-config-scripts/nova_wait_for_db_sync.py']. [1]",
2018-11-19 21:42:01 | "stderr: standard_init_linux.go:203: exec user process caused \"no such file or directory\"",

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

So, two upstream bugs have been created:
- https://github.com/containers/libpod/issues/1845 for the "unclear message" when we test the existence of the image
- https://github.com/containers/libpod/issues/1844 for the "real" error we are hitting in this LP

Guess we can rename it, btw.

summary: - podman fails to find image: error getting image
+ podman error with nova container: stderr: standard_init_linux.go:203:
+ exec user process caused \"no such file or directory\"
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/619607

Revision history for this message
Emilien Macchi (emilienm) wrote :

The bug is probably in podman, as a race, but really started to be visible with https://review.openstack.org/#/c/610966/.

There are quite a lot of hits in CI right now, looking at the logstash query and it's always the same containers, so proposing a revert for now.

Revision history for this message
Emilien Macchi (emilienm) wrote :

Also note that we have this problem with Docker as well:
http://logs.openstack.org/98/619598/3/check/tripleo-ci-fedora-28-standalone-docker/b3b6940/logs/undercloud/var/log/extra/errors.txt

It might be a race on our side as well.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/619607
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=ba30607ec618ad8913aff05cf97849e764c7ccd2
Submitter: Zuul
Branch: master

commit ba30607ec618ad8913aff05cf97849e764c7ccd2
Author: Emilien Macchi <email address hidden>
Date: Thu Nov 22 16:22:05 2018 +0000

    Revert "Verify nova api migration finished before start placement"

    This reverts commit c19b58a9f312bbe2ef0183f08e6773431eba6fe6.
    Related-Bug: #1804434

    Change-Id: I801a53e1cf2ec923b8294824f6738bedbc30bdf7

wes hayutin (weshayutin)
tags: added: promotion-blocker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/626953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/626955

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/626583
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=6822ba9a7e264f0fbc454ce830483fd5e29358db
Submitter: Zuul
Branch: master

commit 6822ba9a7e264f0fbc454ce830483fd5e29358db
Author: Wes Hayutin <email address hidden>
Date: Thu Dec 20 06:52:10 2018 -0700

    temporarily turn off podman

    We're hitting several issues in the upstream
    ci related to podman :(

    https://bugs.launchpad.net/tripleo/+bug/1804434
    https://github.com/containers/libpod/issues/1844

    Related-Bug: #1804434
    Related-Bug: #1809218
    Change-Id: I19aa04382ba159768a1748d44412bbc670facaf3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart (master)

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/626953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/626955

Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.