containers-multinode unable to pull images

Bug #1819175 reported by Rafael Folco on 2019-03-08
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
Unassigned

Bug Description

http://logs.openstack.org/09/640709/2/gate/tripleo-ci-centos-7-containers-multinode/47506e1/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

2019-03-08 10:58:00 | "Copying blob 7eca0a6a6ec4: 28.17 MiB / 28.17 MiB 2s",
2019-03-08 10:58:00 | "Failed",
2019-03-08 10:58:00 | "error pulling image \"192.168.24.1:8787/tripleomaster/centos-binary-neutron-server-ovn:current-tripleo-updated-20190308094023\": unable to pull 192.168.24.1:8787/tripleomaster/centos-binary-neutron-server-ovn:current-tripleo-updated-20190308094023: unable to pull image: Error writing blob: error storing blob to file \"/var/tmp/storage478289386/6\": Digest did not match, expected sha256:312452cd6838bfb9c50a5c62840d8f44a056423428390462aa13c5ac741c2296, got sha256:33dd1b7baf25530426549981bb39f42d71a0c83e4c3ec0ef120c1828c3f4f3ff",

http://logs.openstack.org/09/640709/2/gate/tripleo-ci-centos-7-containers-multinode/47506e1/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-03-08_10_58_06

2019-03-08 10:58:06 | Traceback (most recent call last):
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 29, in run
2019-03-08 10:58:06 | super(Command, self).run(parsed_args)
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
2019-03-08 10:58:06 | return super(Command, self).run(parsed_args)
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
2019-03-08 10:58:06 | return_code = self.take_action(parsed_args) or 0
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action
2019-03-08 10:58:06 | verbosity=self.app_args.verbose_level)
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/deployment.py", line 323, in config_download
2019-03-08 10:58:06 | raise exceptions.DeploymentError("Overcloud configuration failed.")
2019-03-08 10:58:06 | DeploymentError: Overcloud configuration failed.
2019-03-08 10:58:06 | Overcloud configuration failed.
2019-03-08 10:58:06 | END return value: 1

Changed in tripleo:
milestone: stein-3 → stein-rc1
wes hayutin (weshayutin) wrote :

no longer seeing this issue

Changed in tripleo:
status: Triaged → Invalid
Martin Mágr (mmagr) wrote :

I'm hitting this bug too. Basically any release from stein, master, master-tripleo-ci deployed by quickstart is failing by following for me:

"Trying to pull 192.168.24.1:8787/tripleomaster/centos-binary-nova-libvirt:current-tripleo...",
        "Copying blob sha256:aa6f1da0ce62d77237d5b4ba56c92cc8c50e224276c12878da7ff0be38a37b98",
        "Copying blob sha256:e5f193924581b7ed972f249f3bcfad70325d847e0e6edfff704b687e0fc34f15",
        "Copying blob sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908",
        "Copying blob sha256:bec77c7a2e0d98b90e1abec7c2b50b3e73afa0e03de630af41aa8a3bd9418ce2",
        "Copying blob sha256:029c9bd93a422fc31a4bf6e3a88d9f7fe00d3d52292e9df7a7f63a3f720d02bd",
        " Digest did not match, expected sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        "Error: unable to pull 192.168.24.1:8787/tripleomaster/centos-binary-nova-libvirt:current-tripleo: unable to pull image: Error reading blob sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908: Digest did not match, expected sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Changed in tripleo:
status: Invalid → Confirmed
wes hayutin (weshayutin) on 2019-10-10
tags: added: alrt
tags: added: alert
removed: alrt
Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → nobody
wes hayutin (weshayutin) wrote :

seeing if we can reproduce outside of CI w/
./quickstart.sh -w /var/tmp/RECREATE_tq/ --clean --tags all -R tripleo-ci/CentOS-7/master whayutin-testbox

wes hayutin (weshayutin) wrote :

./quickstart.sh -w /var/tmp/RECREATE_tq/ --clean --tags all -R tripleo-ci/CentOS-7/master whayutin-testbox

https://pasted.tech/pastes/6170336054fac0bbe4926fc469ff87823984df07
https://paste.fedoraproject.org/paste/TyEixPe0z2shWxVb8kKAhw/
https://paste.fedoraproject.org/paste/wGEG0aKPiG8G8q-DjRR9qw/

must be some proxy getting in your way.

Changed in tripleo:
status: Confirmed → Incomplete
wes hayutin (weshayutin) wrote :

any more details you can provide Martin?

Martin Mágr (mmagr) wrote :
Download full text (5.9 KiB)

Running the image prepare fails on libvirt image:
(undercloud) [stack@undercloud ~]$ sudo /usr/bin/tripleo-container-image-prepare --roles-file /tmp/ansible.uJ8hgm-role-data --environment-file /home/stack/containers-prepare-parameter.yaml --cleanup partial --log-file fuckin.log --debug
<snip>
019-10-11 13:29:44 740504 ERROR tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-nova-libvirt] Write Failure:
2019-10-11 13:29:44 740504 ERROR tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-nova-libvirt] Broken layer found and removed /var/lib/image-serve/v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908.gz
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Unlocking layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Starting acquire for lock sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Acquired for unlock sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Updated lock info sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Released lock on layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:45 740504 WARNING tripleo_common.image.image_uploader [ ] No lock information provided for layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] Resetting dropped connection: 192.168.24.1
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] http://192.168.24.1:8787 "HEAD /v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908 HTTP/1.1" 404 0
2019-10-11 13:29:45 740504 DEBUG tripleo_common.image.image_uploader [ ] [sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908] Uploading layer
2019-10-11 13:29:45 740504 DEBUG tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-nova-libvirt] Export layer to /var/lib/image-serve/v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908.gz
2019-10-11 13:29:45 740504 INFO tripleo_common.image.image_uploader [ ] [/tripleomaster/centos-binary-nova-libvirt] Fetching layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908 from https://registry-1.docker.io/v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] https://registry-1.docker.io:443 "GET /v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908 HTTP/1.1" 307 0
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] https://production.cloudflar...

Read more...

Martin Mágr (mmagr) wrote :
Download full text (3.4 KiB)

There is not much to say. Reproducer is:

1. Clone tripleo-quickstart
2. Apply following patch, without it on CentOS7 quicktart fails with suntax error:
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$ git show bfbc1d00e2fc82e44e3fdd92382b0ec2d8d13b4c
commit bfbc1d00e2fc82e44e3fdd92382b0ec2d8d13b4c
Author: root <email address hidden>
Date: Mon Oct 7 16:35:11 2019 +0200

    para fix

diff --git a/requirements.txt b/requirements.txt
index 2740482..9206b46 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,6 @@
 cmd2==0.8.5
 ara<1.0
-ansible>=2.8,<2.9
+ansible>=2.7,<2.8
 jmespath
 netaddr>=0.7.18
 os-client-config
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$
3. deploy undercloud with:
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$ ./quickstart.sh --clean --teardown all --no-clone --release master-tripleo-ci --nodes /root/tripleo/tripleo-quickstart/para-standard-nodes.yaml dell-t5810ws-rdo-13.tpb.lab.eng.brq.redhat.com
3a. nodes setting:
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$ cat para-standard-nodes.yaml
# Tell tripleo which nodes to deploy.
topology_map:
  Controller:
    scale: 3
  Compute:
    scale: 1

repo_setup_run_update: false
control_memory: 6144
compute_memory: 6144

# undercloud resources
undercloud_disk: 150
#undercloud_memory: 16384
undercloud_memory: 8192
undercloud_vcpu: 4

# opstools resources
opstools_memory: 4096

flavors:
  compute:
    memory: '{{compute_memory|default(default_memory)}}'
    disk: '{{compute_disk|default(default_disk)}}'
    vcpu: '{{compute_vcpu|default(default_vcpu)}}'

  control:
    memory: '{{control_memory|default(default_memory)}}'
    disk: '{{control_disk|default(default_disk)}}'
    vcpu: '{{control_vcpu|default(default_vcpu)}}'

  ceph:
    memory: '{{ceph_memory|default(default_memory)}}'
    disk: '{{ceph_disk|default(default_disk)}}'
    vcpu: '{{ceph_vcpu|default(default_vcpu)}}'
    extradisks: true

  blockstorage:
    memory: '{{block_memory|default(default_memory)}}'
    disk: '{{block_disk|default(default_disk)}}'
    vcpu: '{{block_vcpu|default(default_vcpu)}}'

  objectstorage:
    memory: '{{objectstorage_memory|default(default_memory)}}'
    disk: '{{objectstorage_disk|default(default_disk)}}'
    vcpu: '{{objectstorage_vcpu|default(default_vcpu)}}'
    extradisks: true

  opstools:
    memory: '{{osptools_memory|default(default_memory)}}'
    disk: '{{osptools_disk|default(default_disk)}}'
    vcpu: '{{osptools_vcpu|default(default_vcpu)}}'

  undercloud:
    memory: '{{undercloud_memory|default(default_memory)}}'
    disk: '{{undercloud_disk|default(default_disk)}}'
    vcpu: '{{undercloud_vcpu|default(default_vcpu)}}'

node_count: 7
overcloud_nodes:
  - name: control_0
    flavor: control
    virtualbmc_port: 6230

  - name: control_1
    flavor: control
    virtualbmc_port: 6231

  - name: control_2
    flavor: control
    virtualbmc_port: 6232

  - name: compute_0
    flavor: compute
    virtualbmc_port: 6233

# - name: opstools_0
# flavor: control
# virtualbmc_port: 6234

  - name: ceph_0
    flavor: ceph
    virtualbmc_port: 6235

  - name: ceph_1
    flavor: ceph
    virtualbmc_port: 6236

  - name: ceph_2
    f...

Read more...

Bogdan Dobrelya (bogdando) wrote :

I think the regression may be https://review.opendev.org/686193
That had changed the exit condition for the locks manager when detecting and managing colliding locks.

Fix proposed to branch: master
Review: https://review.opendev.org/688421

Changed in tripleo:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: Incomplete → In Progress

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/688421
Reason: That endless waiting seen in the timed out OVB jobs was probably caused bu that "fixed" exit condition. Apparently "keep retrying until we no longer have collisions" was not the best strategy :)

I'm not sure which should be the exit condition then. I leave that for the authors of the involved changes.

Bogdan Dobrelya (bogdando) wrote :

The root cause may so that is described here https://bugs.launchpad.net/tripleo/+bug/1848532

If so, then https://review.opendev.org/#/c/688660/ should close this issue as well

Martin Mágr (mmagr) wrote :

With applied patch deployment fails on the same step:

TASK [tripleo-container-image-prepare : Run tripleo-container-image-prepare logged to: /var/log/tripleo-container-image-prepare.log] ***
fatal: [undercloud]: FAILED! => {
    "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
    "changed": true
}

with following error in /var/log/tripleo-container-image-prepare.log:

2019-10-23 09:55:10,583 333650 INFO tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-manila-scheduler] Layer written successfully /var/lib/image-serve/v2/tripleomaster/centos-binary-manila-scheduler/blobs/sha256:5ae90976da713f10598c8328b2dcd5ec2fb4c0dc2a3a83287add9296ce0c3a17.gz
2019-10-23 09:55:10,859 333650 INFO tripleo_common.image.image_uploader [ ] [docker.io/tripleomaster/centos-binary-manila-scheduler:current-tripleo] Completed upload for image
2019-10-23 09:55:13,284 333561 ERROR root [ ] Image prepare failed: [tripleomaster/centos-binary-nova-libvirt] Write Failure:
Traceback (most recent call last):
  File "/usr/bin/tripleo-container-image-prepare", line 138, in <module>
    lock=lock)
  File "/usr/lib/python2.7/site-packages/tripleo_common/image/kolla_builder.py", line 235, in container_images_prepare_multi
    uploader.upload()
  File "/usr/lib/python2.7/site-packages/tripleo_common/image/image_uploader.py", line 269, in upload
    uploader.run_tasks()
  File "/usr/lib/python2.7/site-packages/tripleo_common/image/image_uploader.py", line 2095, in run_tasks
    for result in p.map(upload_task, self.upload_tasks):
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 605, in result_iterator
    yield future.result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 429, in result
    return self.__get_result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 381, in __get_result
    raise exception_type, self._exception, self._traceback
IOError: [tripleomaster/centos-binary-nova-libvirt] Write Failure:

Martin Mágr (mmagr) wrote :

Not sure what could be the IOError issue, undercloud storage is not full:
[stack@undercloud ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 84K 3.9G 1% /dev/shm
tmpfs 3.9G 97M 3.9G 3% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda 150G 44G 107G 30% /
tmpfs 799M 0 799M 0% /run/user/1000
tmpfs 500M 35M 466M 7% /var/log/heat-launcher
[stack@undercloud ~]$

Bogdan Dobrelya (bogdando) wrote :

It is technically possible that https://review.opendev.org/#/c/690111/ might fix that bug as well

Changed in tripleo:
status: In Progress → Triaged
assignee: Bogdan Dobrelya (bogdando) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers