IPv6: System install failed at docker image pull

Bug #1859835 reported by Peng Peng
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Angie Wang

Bug Description

Brief Description
-----------------
During system installation, all docker images pull failed. The system is IPv6 and proxy is used.

Severity
--------
Critical

Steps to Reproduce
------------------
pull docker image during system installation

Expected Behavior
------------------
pull image success

Actual Behavior
----------------
pull image failed

Reproducibility
---------------
Reproducible (3/3)

System Configuration
--------------------
Two node system
Multi-node system
IPv6
proxy

Lab-name: WCP_71-75, WCP_78-79 & WP_8-12

Branch/Pull Time/Commit
-----------------------
2020-01-14_00-10-00

Last Pass
---------
2020-01-13_00-10-00

Timestamp/Logs
--------------
E TASK [common/push-docker-images : Download images and push to local registry] ***
E fatal: [localhost]: FAILED! => {"changed": true, "msg": "non-zero return code", "rc": 1, "stderr": "time=\"2020-01-15T07:42:31Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/k8s.gcr.io/kube-apiserver:v1.16.2\\\": failed to resolve reference \\\"registry.local:9001/k8s.gcr.io/kube-apiserver:v1.16.2\\\": failed to authorize: failed to fetch oauth token: Post https://[face::1]:9002/token/: Forbidden\"\ntime=\"2020-01-15T07:42:52Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/k8s.gcr.io/kube-apiserver:v1.16.2\\\": failed to resolve reference \\\"registry.local:9001/k8s.gcr.io/kube-apiserver:v1.16.2\\\": failed to authorize: failed to fetch oauth token: Post https://[face::1]:9002/token/: Forbidden\"\ntime=\"2020-01-15T07:43:13Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/k8s.gcr.io/kube-apiserver:v1.16.2\\\": failed to resolve reference \\\"registry.local:9001/k8s.gcr.io/kube-apiserver:v1.16.2\\\": failed to authorize: failed to fetch oauth token: Post https://[face::1]:9002/token/: Forbidden\"\ntime=\"2020-01-15T07:43:46Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/k8s.gcr.io/kube-controller-manager:v1.16.2\\\": failed to resolve reference

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Angie Wang (angiewang) wrote :

This is due to recent KATA container change.

Images are all pulled and pushed to the local registry successfully by docker, but the "crictl pull" command fails.

controller-0:~$ crictl pull --creds admin:Li69nux* registry.local:9001/k8s.gcr.io/coredns:1.6.2
FATA[0000] pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image "registry.local:9001/k8s.gcr.io/coredns:1.6.2": failed to resolve reference "registry.local:9001/k8s.gcr.io/coredns:1.6.2": failed to authorize: failed to fetch oauth token: Post https://[face::1]:9002/token/: Forbidden

KATA container related commits were reverted in LP https://bugs.launchpad.net/starlingx/+bug/1859686, but the reverted changes were not in yesterday's load 2020-01-14_00-10-00.

Revision history for this message
Ghada Khalil (gkhalil) wrote :
tags: added: stx.4.0
tags: added: stx.config
Changed in starlingx:
importance: Undecided → Critical
assignee: nobody → Don Penney (dpenney)
status: New → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Issue reprodued on
Lab: WCP_78_79
Load: 2020-01-23_00-10-00

and
20200122T170940Z

new log attached
https://files.starlingx.kube.cengn.ca/launchpad/1859835

Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The kata code was merged back as of 2020-01-21. It seems that the issue described here was re-introduced.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Bruce, This is breaking general IPv6 deployments in master. With the Shanghai team on vacation, can you please find someone to revert the kata commits again?

Changed in starlingx:
assignee: Don Penney (dpenney) → Bruce Jones (brucej)
Revision history for this message
Lin Shuicheng (shuicheng) wrote :

I have IPv6 environment without proxy only. And it works well.
Could you share me the localhost.yml configuration with proxy? I could have a try in my environment.
It may be relate with the no_proxy list setting I guess.

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi all,
The issue is due to containerd doesn't support IPv6 address with square bracket in NO_PROXY parameter.
With setting like below, both containerd and docker could access registry.local:9001 successfully.
And the only difference is, there is "[]" for IPv6 in docker, while containerd doesn't have it.

controller-0:/etc/systemd/system# cat containerd.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://child-prc.intel.com:913"
Environment="HTTPS_PROXY=http://child-prc.intel.com:913"
Environment="NO_PROXY=localhost,127.0.0.1,registry.local,fd04::1,fd01::1,fd01::2,fd00::2,fd00::3,registry.dcp-dev.intel.com"
controller-0:/etc/systemd/system# cat docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://child-prc.intel.com:913"
Environment="HTTPS_PROXY=http://child-prc.intel.com:913"
Environment="NO_PROXY=localhost,127.0.0.1,registry.local,[fd04::1],[fd01::1],[fd01::2],[fd00::2],[fd00::3],registry.dcp-dev.intel.com"

For ansible, the "[]" is added by "ipwrap" in below code:
https://opendev.org/starlingx/ansible-playbooks/src/branch/master/playbookconfig/src/playbooks/roles/bootstrap/validate-config/tasks/main.yml#L438

Puppet also need be updated to remove "[]" for containerd, while keep "[]" for docker.

Due to I am still with Chinese New Year holiday, and I am afraid I don't have much time to continue debug and fix it. Anyone could help own this issue is really appreciated. Otherwise, I will handle it after I am back to office at Feb 3.

Ghada Khalil (gkhalil)
tags: added: stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Angie Wang to help with this while Shuicheng is away

Changed in starlingx:
assignee: Bruce Jones (brucej) → Angie Wang (angiewang)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704677

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704679

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/704677
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=393379bd7671aeec5e9852679a69bdc29577426a
Submitter: Zuul
Branch: master

commit 393379bd7671aeec5e9852679a69bdc29577426a
Author: Angie Wang <email address hidden>
Date: Tue Jan 28 14:01:10 2020 -0500

    Fix the image download failure on IPv6 system

    "crictl pull" failed to pull images on IPv6 system with
    proxy setting since Containerd doesn't work with the
    NO_PROXY environment variable that has IPv6 addresses
    with square brackets. This commit updates to strip out
    the square brackets from NO_PROXY environment variable.

    Verified on both IPv4 and IPv6 labs.

    Change-Id: I70bd00439b2cc39d2b25dd62746994a524be4998
    Partial-Bug: 1859835
    Signed-off-by: Angie Wang <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/704679
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=fc7b9b3d8d811fd50427b584dae5b7488947bb03
Submitter: Zuul
Branch: master

commit fc7b9b3d8d811fd50427b584dae5b7488947bb03
Author: Angie Wang <email address hidden>
Date: Tue Jan 28 13:57:52 2020 -0500

    Fix the image download failure on IPv6 system

    "crictl pull" failed to pull images on IPv6 system with
    proxy setting since Containerd doesn't work with the
    NO_PROXY environment variable that has IPv6 addresses
    with square brackets. This commit updates to strip out
    the square brackets from NO_PROXY environment variable.

    Change-Id: I6bb5ad0379f576f66d77a90dfdca94f5e0f28f0c
    Closes-Bug: 1859835
    Signed-off-by: Angie Wang <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705831

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705852

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (8.2 KiB)

Reviewed: https://review.opendev.org/705831
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=6670caf7ceda5fe0dc46f2f82033b68abf00ed5e
Submitter: Zuul
Branch: f/centos8

commit bf8d081a95a9b1776964960a6d9089b1449f2c58
Author: Angie Wang <email address hidden>
Date: Thu Jan 30 17:57:05 2020 -0500

    Support k8s networking upgrade based on k8s version

    Update to support a set of k8s networking templates
    based on kubernetes release. The kubernetes version
    needs to be passed to the ansible playbook
    k8s-networking-upgrade.yml to determine which set
    of networking manifests should be applied for the
    current kubernetes.

    Story: 2006781
    Task: 37584
    Change-Id: I3a0b9f56608ddb1323b36f9ecedb8a5488c222c9
    Signed-off-by: Angie Wang <email address hidden>

commit 2b0cd43e5fa75628d8eff78be7045ba4fc82d980
Author: Jerry Sun <email address hidden>
Date: Thu Dec 19 13:22:50 2019 -0500

    Add Dex parameters to ansible bootstrap

    Add oidc_groups_claim as a new parameters for ansible
    config. We now have 2 valid configs: the previous 3 parameters
    for a microsoft azure authentication deployment, and the previous
    3 in addition to oidc_groups_claim for a dex authentication
    deployment.

    Story: 2006711
    Task: 37850
    Change-Id: I265d2f7872eb31e2b295eeff6a3165543673497c
    Depends-On: https://review.opendev.org/702798
    Signed-off-by: Jerry Sun <email address hidden>

commit 92ca122652733805b62fc16940861ca4e83e2bb1
Author: David Sullivan <email address hidden>
Date: Wed Jan 22 21:33:19 2020 -0500

    Install secondary controller nodes with kubeadm join

    Kubeadm init is no longer supported for installing secondary nodes in an
    HA kubernetes cluster. kubeadm join with the --controller-plane option
    should be used.

    Change-Id: I64aaf02b09053608c884149d73bc1a3f2b62d98a
    Partial-Bug: 1846829
    Depends-On: https://review.opendev.org/702797
    Signed-off-by: David Sullivan <email address hidden>

commit 393379bd7671aeec5e9852679a69bdc29577426a
Author: Angie Wang <email address hidden>
Date: Tue Jan 28 14:01:10 2020 -0500

    Fix the image download failure on IPv6 system

    "crictl pull" failed to pull images on IPv6 system with
    proxy setting since Containerd doesn't work with the
    NO_PROXY environment variable that has IPv6 addresses
    with square brackets. This commit updates to strip out
    the square brackets from NO_PROXY environment variable.

    Verified on both IPv4 and IPv6 labs.

    Change-Id: I70bd00439b2cc39d2b25dd62746994a524be4998
    Partial-Bug: 1859835
    Signed-off-by: Angie Wang <email address hidden>

commit 792ea357e2b6d2bd23b441aa1657e0dc46f7ef5d
Author: Jim Somerville <email address hidden>
Date: Mon Jan 27 16:08:48 2020 -0500

    Security: Add nospectre_v1 to the default setting

    Most of the v1 mitigation is baked into the kernel and not
    optional. The swapgs barriers are, however, optional.
    They have a negative performance impact so we disable them
    by using the nospectre_v1 kernel bootarg.

    C...

Read more...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (f/centos8)
Download full text (9.5 KiB)

Reviewed: https://review.opendev.org/705852
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=e1f095eb112f76a133734a17f01afeb9828ebaf2
Submitter: Zuul
Branch: f/centos8

commit fc7b9b3d8d811fd50427b584dae5b7488947bb03
Author: Angie Wang <email address hidden>
Date: Tue Jan 28 13:57:52 2020 -0500

    Fix the image download failure on IPv6 system

    "crictl pull" failed to pull images on IPv6 system with
    proxy setting since Containerd doesn't work with the
    NO_PROXY environment variable that has IPv6 addresses
    with square brackets. This commit updates to strip out
    the square brackets from NO_PROXY environment variable.

    Change-Id: I6bb5ad0379f576f66d77a90dfdca94f5e0f28f0c
    Closes-Bug: 1859835
    Signed-off-by: Angie Wang <email address hidden>

commit 950670ac1f0bfaa43e29eeb3ffda71a94de66520
Author: Jim Somerville <email address hidden>
Date: Mon Jan 27 17:09:52 2020 -0500

    Security: Add nospectre_v1 to the security params

    Most of the v1 mitigation is baked into the kernel and not
    optional. The swapgs barriers are, however, optional.
    They have a negative performance impact so we disable them
    by using the nospectre_v1 kernel bootarg.

    Partial-Bug: 1860193
    Depends-On: https://review.opendev.org/#/c/704406
    Change-Id: Iaa11ba3f430fc064ebda679cf290474d3be413da
    Signed-off-by: Jim Somerville <email address hidden>

commit 83775d38804fb665af518127051b37a1daf31e36
Author: David Sullivan <email address hidden>
Date: Wed Jan 15 23:50:23 2020 -0500

    Install secondary controller nodes with kubeadm join

    Kubeadm init is no longer supported for installing secondary nodes in an
    HA kubernetes cluster. kubeadm join with the --controller-plane option
    should be used.

    Change-Id: I21a30b9e871d05c59a19e33a9d278f0217682da6
    Closes-Bug: 1846829
    Depends-On: https://review.opendev.org/702797
    Signed-off-by: David Sullivan <email address hidden>

commit c94fa4a0174b96e0716d39bbea7e6fbbbee415a9
Author: Shuicheng Lin <email address hidden>
Date: Thu Jan 23 02:45:31 2020 +0800

    Fix duplex system controller-1 fail to boot after unlock

    It is due to controller-1 doesn't have /opt/platform/config folder.
    And cause puppet failure due to using non-exist file as source.
    Restrict the code for worker node only, since controller node
    already has ca cert in the ssl folder.

    Test:
    Pass simplex/duplex/multi node deployment with vm created.

    Closes-Bug: 1860529
    Change-Id: I808ee15e5c78ebead114219d0ec428fb45cc9128
    Signed-off-by: Shuicheng Lin <email address hidden>

commit 27f167eb14a04bc67ecca59af3b617c115522101
Author: Angie Wang <email address hidden>
Date: Wed Jan 15 16:15:26 2020 -0500

    Remove puppet-manifests code made obsolete by ansible

    As a result of switch to Ansible, remove the obsolete erb
    templates and remove the dependency of is_initial_config_primary
    facter.

    Change-Id: I4ca6525f01a37da971dc66a11ee99ea4e115e3ad
    Partial-Bug: 1834218
    Depends-On: https://review.opendev.org/#/c/703517/
 ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.