Bootstrap fails when insecure registries are configured

Bug #1863144 reported by Tee Ngo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Lin Shuicheng

Bug Description

Brief Description
-----------------
Kata containers (containerd) related change breaks bootstrap if insecure registries are configured.

Severity
--------
Critical

Steps to Reproduce
------------------
In localhost.yml, configure the docker_registries as follows:

docker_registries:
  quay.io:
    url: quay.io
  docker.elastic.co:
    url: docker.elastic.co
  gcr.io:
    url: gcr.io
  k8s.gcr.io:
    url: k8s.gcr.io
  docker.io:
    url: docker.io
  defaults:
    type: docker
    secure: false

Then run bootstrap playbook to bootstrap the controller-0.

Expected Behavior
------------------
Bootstrap completes successfully and the host is ready for provisioning.

Actual Behavior
----------------
Bootstrap fails during the execution of push-docker-images tasks
2020-02-13 16:08:36,706 p=14546 u=sysadmin | TASK [common/push-docker-images : Set registries information] ******************
2020-02-13 16:08:36,739 p=14546 u=sysadmin | ok: [localhost] => (item={u'replaced_url': u'k8s.gcr.io', u'default_url': u'k8s.gcr.io'})
2020-02-13 16:08:36,751 p=14546 u=sysadmin | ok: [localhost] => (item={u'replaced_url': u'gcr.io', u'default_url': u'gcr.io'})
2020-02-13 16:08:36,764 p=14546 u=sysadmin | ok: [localhost] => (item={u'replaced_url': u'quay.io', u'default_url': u'quay.io'})
2020-02-13 16:08:36,776 p=14546 u=sysadmin | ok: [localhost] => (item={u'replaced_url': u'docker.io', u'default_url': u'docker.io'})
2020-02-13 16:08:36,789 p=14546 u=sysadmin | ok: [localhost] => (item={u'replaced_url': u'docker.elastic.co', u'default_url': u'docker.elastic.co'})
2020-02-13 16:08:36,794 p=14546 u=sysadmin | TASK [common/push-docker-images : Log in k8s, gcr, quay, docker registries if credentials exist] ***
2020-02-13 16:08:36,862 p=14546 u=sysadmin | TASK [common/push-docker-images : Get local registry credentials] **************
2020-02-13 16:08:37,424 p=14546 u=sysadmin | changed: [localhost]
2020-02-13 16:08:37,428 p=14546 u=sysadmin | TASK [common/push-docker-images : set_fact] ************************************
2020-02-13 16:08:37,452 p=14546 u=sysadmin | ok: [localhost]
2020-02-13 16:08:37,456 p=14546 u=sysadmin | TASK [common/push-docker-images : Log in to local registry] ********************
2020-02-13 16:08:37,787 p=14546 u=sysadmin | fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error connecting: Error while fetching server API version: ('Connecti on aborted.', error(104, 'Connection reset by peer'))"}
2020-02-13 16:08:37,788 p=14546 u=sysadmin | PLAY RECAP *********************************************************************
2020-02-13 16:08:37,788 p=14546 u=sysadmin | localhost : ok=218 changed=105 unreachable=0 failed=1

Relevant logs in daemon.log showed that containerd failed to start
2020-02-13T16:08:31.264 localhost systemd[1]: info Started Docker Application Container Engine.
2020-02-13T16:08:32.596 localhost systemd[1]: warning Cannot add dependency job for unit dev-hugepages.mount, ignoring: Unit is masked.
2020-02-13T16:08:32.596 localhost systemd[1]: info Stopping Docker Application Container Engine...
2020-02-13T16:08:32.596 localhost dockerd[110315]: info time="2020-02-13T16:08:32.596870598Z" level=info msg="Processing signal 'terminated'"
2020-02-13T16:08:32.597 localhost dockerd[110315]: info time="2020-02-13T16:08:32.597752671Z" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
2020-02-13T16:08:32.626 localhost systemd[1]: info Stopped Docker Application Container Engine.
2020-02-13T16:08:32.626 localhost systemd[1]: info Closed Docker Socket for the API.
2020-02-13T16:08:32.627 localhost systemd[1]: info Stopping Docker Socket for the API.
2020-02-13T16:08:32.627 localhost systemd[1]: info Starting Docker Socket for the API.
2020-02-13T16:08:32.627 localhost systemd[1]: info Stopping containerd container runtime...
2020-02-13T16:08:32.627 localhost containerd[105167]: info time="2020-02-13T16:08:32.627637371Z" level=info msg="Stop CRI service"
2020-02-13T16:08:32.628 localhost systemd[1]: info Listening on Docker Socket for the API.
2020-02-13T16:08:32.644 localhost systemd[1]: info Stopped containerd container runtime.
2020-02-13T16:08:32.656 localhost systemd[1]: info Starting containerd container runtime...
2020-02-13T16:08:32.659 localhost systemd[1]: info Started containerd container runtime.
2020-02-13T16:08:32.660 localhost systemd[1]: warning start request repeated too quickly for docker.service
2020-02-13T16:08:32.660 localhost systemd[1]: err Failed to start Docker Application Container Engine.
2020-02-13T16:08:32.660 localhost systemd[1]: notice Unit docker.service entered failed state.
2020-02-13T16:08:32.660 localhost systemd[1]: warning docker.service failed.
2020-02-13T16:08:32.695 localhost containerd[110742]: info containerd: Near line 101 (last key parsed 'plugins.cri.registry.mirrors'): Key 'plugins.cri.registry.mirrors.docker.io' has already been defined.
2020-02-13T16:08:32.698 localhost systemd[1]: notice containerd.service: main process exited, code=exited, status=1/FAILURE
2020-02-13T16:08:32.708 localhost systemd[1]: notice Unit containerd.service entered failed state.

This is because a duplicate plugin mirror entry was written to the containerd's config file which caused containerd to fail.

Reproducibility
---------------
<Reproducible/Intermittent/Seen once>
100% reproducible

System Configuration
--------------------
All configurations

Branch/Pull Time/Commit
-----------------------
Feb 7th 2020 master build

Last Pass
---------
Jan 21st 2020 build

Timestamp/Logs
--------------
See above

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / high priority - issue introduced by kata container feature.

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.4.0 stx.containers
Changed in starlingx:
assignee: nobody → Lin Shuicheng (shuicheng)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/708047

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/708047
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=980eae4a86cbc2f414e06e6e8d8d0a13a10fbbad
Submitter: Zuul
Branch: master

commit 980eae4a86cbc2f414e06e6e8d8d0a13a10fbbad
Author: Shuicheng Lin <email address hidden>
Date: Mon Feb 17 10:59:24 2020 +0800

    Remove docker registry default setting in containerd config file

    Docker registry is configured as secure registry in default in
    config.toml. User may config docker registry as insecure registry
    in localhost.yml, and lead to configuration conflict. Remove the
    default setting in config.toml to solve it.

    Closes-Bug: 1863144

    Change-Id: I4b3b3e1095f10a9ce2e92a3c84330dab1af3f895
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716153

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (f/centos8)
Download full text (7.5 KiB)

Reviewed: https://review.opendev.org/716153
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=da2659f53aa94b9818dc78b28b739abd785e5546
Submitter: Zuul
Branch: f/centos8

commit ed763e6a5db5df4a0005dd57bd11b4c411557ea5
Author: Steven Webster <email address hidden>
Date: Sat Mar 28 17:23:16 2020 -0400

    Fix SR-IOV runtime manifest apply

    When an SR-IOV interface is configured, the platform's
    network runtime manifest is applied in order to apply the virtual
    function (VF) config and restart the interface. This results in
    sysinv being able to determine and populate the puppet hieradata
    with the virtual function PCI addresses.

    A side effect of the network manifest apply is that potentially
    all platform interfaces may be brought down/up if it is determined
    that their configuration has changed. This will likely be the case
    for a system which configures SR-IOV interfaces before initial
    unlock.

    A few issues have been encountered because of this, with some
    services not behaving well when the interface they are communicating
    over suddenly goes down.

    This commit makes the SR-IOV VF configuration much more targeted
    so that only the operation of setting the desired number of VFs
    is performed.

    Closes-Bug: #1868584

    Change-Id: Ic867fccae89fe8bc9173598c3c84c94ba2d7511f
    Signed-off-by: Steven Webster <email address hidden>

commit 1ca6d5914266fc7f424ec88e1a466b9f8ab5da9d
Author: Robert Church <email address hidden>
Date: Wed Mar 18 21:56:09 2020 -0400

    Add kubelet support for volume plugins

    When upversioning Calico from 3.6 to 3.12 the --volume-plugin-dir
    argument needs to be provided to kubelet.

    Specifically, the configuration for Calico 3.8 "Adds a Flex Volume
    Driver that creates a per-pod Unix Domain Socket to allow Dikastes to
    communicate with Felix over the Policy Sync API."

    Change-Id: Ic76baa00de4402cbb65c37fe89835b114d424634
    Story: 2006999
    Task: 39111
    Signed-off-by: Robert Church <email address hidden>

commit 17ce7aa97eb485807a46181b2a7db7e02641e245
Author: Jerry Sun <email address hidden>
Date: Fri Mar 13 12:44:48 2020 -0400

    Remove creation of /etc/kuberetes/kubeadm.yaml

    Now that we are not using /etc/kubernetes/kubeadm.yaml anymore,
    we can remove the creation of the file from puppet. Bootstrap will
    still create it for bootstrap use.

    Change-Id: Id08af049fac3fc68b70a7dae5aec8548865a4784
    Closes-bug: 1866695
    Depends-On: https://review.opendev.org/#/c/713020/
    Signed-off-by: Jerry Sun <email address hidden>

commit 027727470da6dcbf3641ff2a701d0c7561476920
Author: Jerry Sun <email address hidden>
Date: Wed Mar 11 14:18:15 2020 -0400

    Clean up change_apiserver_parameters in kubernetes puppet

    Move excess puppet execs into the template already used in the class

    Story: 2006711
    Task: 38944

    Change-Id: Iad54064fa4056f9f30406646c95623a1e7c25bec
    Signed-off-by: Jerry Sun <email address hidden>

commit b39136dc686549c1c937ba30d885ed6958603dba
Author: J...

Read more...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.