containers-multinode unable to pull images

Bug #1819175 reported by Rafael Folco
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

http://logs.openstack.org/09/640709/2/gate/tripleo-ci-centos-7-containers-multinode/47506e1/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

2019-03-08 10:58:00 | "Copying blob 7eca0a6a6ec4: 28.17 MiB / 28.17 MiB 2s",
2019-03-08 10:58:00 | "Failed",
2019-03-08 10:58:00 | "error pulling image \"192.168.24.1:8787/tripleomaster/centos-binary-neutron-server-ovn:current-tripleo-updated-20190308094023\": unable to pull 192.168.24.1:8787/tripleomaster/centos-binary-neutron-server-ovn:current-tripleo-updated-20190308094023: unable to pull image: Error writing blob: error storing blob to file \"/var/tmp/storage478289386/6\": Digest did not match, expected sha256:312452cd6838bfb9c50a5c62840d8f44a056423428390462aa13c5ac741c2296, got sha256:33dd1b7baf25530426549981bb39f42d71a0c83e4c3ec0ef120c1828c3f4f3ff",

http://logs.openstack.org/09/640709/2/gate/tripleo-ci-centos-7-containers-multinode/47506e1/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-03-08_10_58_06

2019-03-08 10:58:06 | Traceback (most recent call last):
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 29, in run
2019-03-08 10:58:06 | super(Command, self).run(parsed_args)
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
2019-03-08 10:58:06 | return super(Command, self).run(parsed_args)
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
2019-03-08 10:58:06 | return_code = self.take_action(parsed_args) or 0
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action
2019-03-08 10:58:06 | verbosity=self.app_args.verbose_level)
2019-03-08 10:58:06 | File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/deployment.py", line 323, in config_download
2019-03-08 10:58:06 | raise exceptions.DeploymentError("Overcloud configuration failed.")
2019-03-08 10:58:06 | DeploymentError: Overcloud configuration failed.
2019-03-08 10:58:06 | Overcloud configuration failed.
2019-03-08 10:58:06 | END return value: 1

Tags: ci
Changed in tripleo:
milestone: stein-3 → stein-rc1
Revision history for this message
wes hayutin (weshayutin) wrote :

no longer seeing this issue

Changed in tripleo:
status: Triaged → Invalid
Revision history for this message
Martin Mágr (mmagr) wrote :

I'm hitting this bug too. Basically any release from stein, master, master-tripleo-ci deployed by quickstart is failing by following for me:

"Trying to pull 192.168.24.1:8787/tripleomaster/centos-binary-nova-libvirt:current-tripleo...",
        "Copying blob sha256:aa6f1da0ce62d77237d5b4ba56c92cc8c50e224276c12878da7ff0be38a37b98",
        "Copying blob sha256:e5f193924581b7ed972f249f3bcfad70325d847e0e6edfff704b687e0fc34f15",
        "Copying blob sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908",
        "Copying blob sha256:bec77c7a2e0d98b90e1abec7c2b50b3e73afa0e03de630af41aa8a3bd9418ce2",
        "Copying blob sha256:029c9bd93a422fc31a4bf6e3a88d9f7fe00d3d52292e9df7a7f63a3f720d02bd",
        " Digest did not match, expected sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        "Error: unable to pull 192.168.24.1:8787/tripleomaster/centos-binary-nova-libvirt:current-tripleo: unable to pull image: Error reading blob sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908: Digest did not match, expected sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908, got sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

Changed in tripleo:
status: Invalid → Confirmed
wes hayutin (weshayutin)
tags: added: alrt
tags: added: alert
removed: alrt
Changed in tripleo:
assignee: Gabriele Cerami (gcerami) → nobody
Revision history for this message
wes hayutin (weshayutin) wrote :

seeing if we can reproduce outside of CI w/
./quickstart.sh -w /var/tmp/RECREATE_tq/ --clean --tags all -R tripleo-ci/CentOS-7/master whayutin-testbox

Revision history for this message
wes hayutin (weshayutin) wrote :

./quickstart.sh -w /var/tmp/RECREATE_tq/ --clean --tags all -R tripleo-ci/CentOS-7/master whayutin-testbox

https://pasted.tech/pastes/6170336054fac0bbe4926fc469ff87823984df07
https://paste.fedoraproject.org/paste/TyEixPe0z2shWxVb8kKAhw/
https://paste.fedoraproject.org/paste/wGEG0aKPiG8G8q-DjRR9qw/

must be some proxy getting in your way.

Changed in tripleo:
status: Confirmed → Incomplete
Revision history for this message
wes hayutin (weshayutin) wrote :

any more details you can provide Martin?

Revision history for this message
Martin Mágr (mmagr) wrote :
Download full text (5.9 KiB)

Running the image prepare fails on libvirt image:
(undercloud) [stack@undercloud ~]$ sudo /usr/bin/tripleo-container-image-prepare --roles-file /tmp/ansible.uJ8hgm-role-data --environment-file /home/stack/containers-prepare-parameter.yaml --cleanup partial --log-file fuckin.log --debug
<snip>
019-10-11 13:29:44 740504 ERROR tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-nova-libvirt] Write Failure:
2019-10-11 13:29:44 740504 ERROR tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-nova-libvirt] Broken layer found and removed /var/lib/image-serve/v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908.gz
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Unlocking layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Starting acquire for lock sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Acquired for unlock sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Updated lock info sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:44 740504 DEBUG tripleo_common.image.image_uploader [ ] Released lock on layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:45 740504 WARNING tripleo_common.image.image_uploader [ ] No lock information provided for layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] Resetting dropped connection: 192.168.24.1
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] http://192.168.24.1:8787 "HEAD /v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908 HTTP/1.1" 404 0
2019-10-11 13:29:45 740504 DEBUG tripleo_common.image.image_uploader [ ] [sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908] Uploading layer
2019-10-11 13:29:45 740504 DEBUG tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-nova-libvirt] Export layer to /var/lib/image-serve/v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908.gz
2019-10-11 13:29:45 740504 INFO tripleo_common.image.image_uploader [ ] [/tripleomaster/centos-binary-nova-libvirt] Fetching layer sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908 from https://registry-1.docker.io/v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] https://registry-1.docker.io:443 "GET /v2/tripleomaster/centos-binary-nova-libvirt/blobs/sha256:ee9a79ad1e1013a5b0b5eb95737934a4efcd46daad5d3327479e355996823908 HTTP/1.1" 307 0
2019-10-11 13:29:45 740504 DEBUG urllib3.connectionpool [ ] https://production.cloudflar...

Read more...

Revision history for this message
Martin Mágr (mmagr) wrote :
Download full text (3.4 KiB)

There is not much to say. Reproducer is:

1. Clone tripleo-quickstart
2. Apply following patch, without it on CentOS7 quicktart fails with suntax error:
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$ git show bfbc1d00e2fc82e44e3fdd92382b0ec2d8d13b4c
commit bfbc1d00e2fc82e44e3fdd92382b0ec2d8d13b4c
Author: root <email address hidden>
Date: Mon Oct 7 16:35:11 2019 +0200

    para fix

diff --git a/requirements.txt b/requirements.txt
index 2740482..9206b46 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,6 @@
 cmd2==0.8.5
 ara<1.0
-ansible>=2.8,<2.9
+ansible>=2.7,<2.8
 jmespath
 netaddr>=0.7.18
 os-client-config
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$
3. deploy undercloud with:
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$ ./quickstart.sh --clean --teardown all --no-clone --release master-tripleo-ci --nodes /root/tripleo/tripleo-quickstart/para-standard-nodes.yaml dell-t5810ws-rdo-13.tpb.lab.eng.brq.redhat.com
3a. nodes setting:
[root@dell-t5810ws-rdo-13 tripleo-quickstart]$ cat para-standard-nodes.yaml
# Tell tripleo which nodes to deploy.
topology_map:
  Controller:
    scale: 3
  Compute:
    scale: 1

repo_setup_run_update: false
control_memory: 6144
compute_memory: 6144

# undercloud resources
undercloud_disk: 150
#undercloud_memory: 16384
undercloud_memory: 8192
undercloud_vcpu: 4

# opstools resources
opstools_memory: 4096

flavors:
  compute:
    memory: '{{compute_memory|default(default_memory)}}'
    disk: '{{compute_disk|default(default_disk)}}'
    vcpu: '{{compute_vcpu|default(default_vcpu)}}'

  control:
    memory: '{{control_memory|default(default_memory)}}'
    disk: '{{control_disk|default(default_disk)}}'
    vcpu: '{{control_vcpu|default(default_vcpu)}}'

  ceph:
    memory: '{{ceph_memory|default(default_memory)}}'
    disk: '{{ceph_disk|default(default_disk)}}'
    vcpu: '{{ceph_vcpu|default(default_vcpu)}}'
    extradisks: true

  blockstorage:
    memory: '{{block_memory|default(default_memory)}}'
    disk: '{{block_disk|default(default_disk)}}'
    vcpu: '{{block_vcpu|default(default_vcpu)}}'

  objectstorage:
    memory: '{{objectstorage_memory|default(default_memory)}}'
    disk: '{{objectstorage_disk|default(default_disk)}}'
    vcpu: '{{objectstorage_vcpu|default(default_vcpu)}}'
    extradisks: true

  opstools:
    memory: '{{osptools_memory|default(default_memory)}}'
    disk: '{{osptools_disk|default(default_disk)}}'
    vcpu: '{{osptools_vcpu|default(default_vcpu)}}'

  undercloud:
    memory: '{{undercloud_memory|default(default_memory)}}'
    disk: '{{undercloud_disk|default(default_disk)}}'
    vcpu: '{{undercloud_vcpu|default(default_vcpu)}}'

node_count: 7
overcloud_nodes:
  - name: control_0
    flavor: control
    virtualbmc_port: 6230

  - name: control_1
    flavor: control
    virtualbmc_port: 6231

  - name: control_2
    flavor: control
    virtualbmc_port: 6232

  - name: compute_0
    flavor: compute
    virtualbmc_port: 6233

# - name: opstools_0
# flavor: control
# virtualbmc_port: 6234

  - name: ceph_0
    flavor: ceph
    virtualbmc_port: 6235

  - name: ceph_1
    flavor: ceph
    virtualbmc_port: 6236

  - name: ceph_2
    f...

Read more...

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I think the regression may be https://review.opendev.org/686193
That had changed the exit condition for the locks manager when detecting and managing colliding locks.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.opendev.org/688421

Changed in tripleo:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/688421
Reason: That endless waiting seen in the timed out OVB jobs was probably caused bu that "fixed" exit condition. Apparently "keep retrying until we no longer have collisions" was not the best strategy :)

I'm not sure which should be the exit condition then. I leave that for the authors of the involved changes.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The root cause may so that is described here https://bugs.launchpad.net/tripleo/+bug/1848532

If so, then https://review.opendev.org/#/c/688660/ should close this issue as well

Revision history for this message
Martin Mágr (mmagr) wrote :

With applied patch deployment fails on the same step:

TASK [tripleo-container-image-prepare : Run tripleo-container-image-prepare logged to: /var/log/tripleo-container-image-prepare.log] ***
fatal: [undercloud]: FAILED! => {
    "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
    "changed": true
}

with following error in /var/log/tripleo-container-image-prepare.log:

2019-10-23 09:55:10,583 333650 INFO tripleo_common.image.image_export [ ] [tripleomaster/centos-binary-manila-scheduler] Layer written successfully /var/lib/image-serve/v2/tripleomaster/centos-binary-manila-scheduler/blobs/sha256:5ae90976da713f10598c8328b2dcd5ec2fb4c0dc2a3a83287add9296ce0c3a17.gz
2019-10-23 09:55:10,859 333650 INFO tripleo_common.image.image_uploader [ ] [docker.io/tripleomaster/centos-binary-manila-scheduler:current-tripleo] Completed upload for image
2019-10-23 09:55:13,284 333561 ERROR root [ ] Image prepare failed: [tripleomaster/centos-binary-nova-libvirt] Write Failure:
Traceback (most recent call last):
  File "/usr/bin/tripleo-container-image-prepare", line 138, in <module>
    lock=lock)
  File "/usr/lib/python2.7/site-packages/tripleo_common/image/kolla_builder.py", line 235, in container_images_prepare_multi
    uploader.upload()
  File "/usr/lib/python2.7/site-packages/tripleo_common/image/image_uploader.py", line 269, in upload
    uploader.run_tasks()
  File "/usr/lib/python2.7/site-packages/tripleo_common/image/image_uploader.py", line 2095, in run_tasks
    for result in p.map(upload_task, self.upload_tasks):
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 605, in result_iterator
    yield future.result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 429, in result
    return self.__get_result()
  File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 381, in __get_result
    raise exception_type, self._exception, self._traceback
IOError: [tripleomaster/centos-binary-nova-libvirt] Write Failure:

Revision history for this message
Martin Mágr (mmagr) wrote :

Not sure what could be the IOError issue, undercloud storage is not full:
[stack@undercloud ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 84K 3.9G 1% /dev/shm
tmpfs 3.9G 97M 3.9G 3% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda 150G 44G 107G 30% /
tmpfs 799M 0 799M 0% /run/user/1000
tmpfs 500M 35M 466M 7% /var/log/heat-launcher
[stack@undercloud ~]$

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

It is technically possible that https://review.opendev.org/#/c/690111/ might fix that bug as well

Changed in tripleo:
status: In Progress → Triaged
assignee: Bogdan Dobrelya (bogdando) → nobody
Changed in tripleo:
importance: Critical → High
tags: removed: alert
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.