[Containers] Issues on configuration of controller-1 Duplex

Bug #1814968 reported by Jose Perez Carranza
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Erich Cordoba

Bug Description

Title
-----
Issues on configuration of controller-1 Duplex

Brief Description
-----------------
After unlocking controller-1 some issues of configuration appears and controller stays in failure status

Severity
--------
Provide the severity of the defect.
Critical

Steps to Reproduce
------------------
1. Install a Duplex system using proxies
2. Follow instruction of wiki
     https://wiki.openstack.org/wiki/StarlingX/Containers/InstallationOnAIODX
3. Reach step Unlock Controller-1

Expected Behavior
------------------
Controller should be unlocked correctly

Actual Behavior
----------------
Below errors are displayed:

| task | Configuration Failed, re-enabling |
| task | Rebooting |
| task | Booting |

===

| 200.011 | controller-1 experienced a configuration failure. | host=controller-1 | critical | 2019-02-06T13:44:19 | controller

===

| task | Configuration failure, threshold reached, Lock/Unlock to retry |

Reproducibility
---------------
100%

System Configuration
--------------------
- Duplex
- Configured with --kubernetes

Branch/Pull Time/Commit
-----------------------
Master
ISO: http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20190204T060000Z/outputs/iso/

Timestamp/Logs
--------------
from controller-1 puppet.log

2019-02-06T13:52:15.219 Notice: 2019-02-06 13:52:15 +0000 /Stage[main]/Platform::Helm/Exec[initialize helm]/returns: Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 8.8.4.4:53: read udp 10.10.10.4:52260->8.8.4.4:53: i/o timeout
2019-02-06T13:52:15.221 Error: 2019-02-06 13:52:15 +0000 helm init --client-only returned 1 instead of one of [0]
2019-02-06T13:52:15.310 Error: 2019-02-06 13:52:15 +0000 /Stage[main]/Platform::Helm/Exec[initialize helm]/returns: change from notrun to 0 failed: helm init --client-only returned 1 instead of one of [0]
2019-02-06T13:52:40.308 Error: 2019-02-06 13:52:40 +0000 kubeadm init --config=/etc/kubernetes/kubeadm.yaml returned 2 instead of one of [0]
2019-02-06T13:52:40.480 Error: 2019-02-06 13:52:40 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: change from notrun to 0 failed: kubeadm init --config=/etc/kubernetes/kubea2019-02-06T13:52:15.219 Notice: 2019-02-06 13:52:15 +0000 /Stage[main]/Platform::Helm/Exec[initialize helm]/returns: Error: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Get https://kubernetes-charts.storage.googleapis.com/index.yaml: dial tcp: lookup kubernetes-charts.storage.googleapis.com on 8.8.4.4:53: read udp 10.10.10.4:52260->8.8.4.4:53: i/o timeout
2019-02-06T13:52:15.221 Error: 2019-02-06 13:52:15 +0000 helm init --client-only returned 1 instead of one of [0]
2019-02-06T13:52:15.310 Error: 2019-02-06 13:52:15 +0000 /Stage[main]/Platform::Helm/Exec[initialize helm]/returns: change from notrun to 0 failed: helm init --client-only returned 1 instead of one of [0]
2019-02-06T13:52:40.308 Error: 2019-02-06 13:52:40 +0000 kubeadm init --config=/etc/kubernetes/kubeadm.yaml returned 2 instead of one of [0]
2019-02-06T13:52:40.480 Error: 2019-02-06 13:52:40 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: change from notrun to 0 failed: kubeadm init --config=/etc/kubernetes/kubeadm.yaml returned 2 instead of one of [0]dm.yaml returned 2 instead of one of [0]

Revision history for this message
Jose Perez Carranza (jgperezc) wrote :
Revision history for this message
Bart Wensley (bartwensley) wrote :

This should be assigned to Mingyuan Qi who implemented support for the docker proxy.

Bruce Jones (brucej)
Changed in starlingx:
assignee: nobody → Mingyuan Qi (myqi)
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jose Perez Carranza (jgperezc) wrote :

If I run below commands the URL is reachable.

$ https_proxy=<my_prooxy> curl https://kubernetes-charts.storage.googleapis.com
$ https_proxy=<my_proxy> helm init

So the controller-1 has connectivity but seems like is not recognizing the proxies.

Revision history for this message
Mingyuan Qi (myqi) wrote :

There are 2 issues found in the log:

1. helm init error.
The docker proxy patch is for docker image pulling, this setting only affects docker daemon.
For helm, it is a new issue that helm init uses system proxy setting which is not set previously.

Moreover, for those who do not have public internet access, https://kubernetes-charts.storage.googleapis.com is not accessible and helm does have a flag for setting stable repo url. see https://github.com/helm/chart-testing/issues/33

2. kubeadm error.
From the log, it seems that you repeatly did the first time controller-1 unlock and the manifests for k8s components are already generated. kubeadm will stop if detect these manifests exists.

2019-02-06T13:52:40.300 Notice: 2019-02-06 13:52:40 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: [preflight] Some fatal errors occurred:
2019-02-06T13:52:40.301 Notice: 2019-02-06 13:52:40 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
2019-02-06T13:52:40.303 Notice: 2019-02-06 13:52:40 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
2019-02-06T13:52:40.305 Notice: 2019-02-06 13:52:40 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists

Revision history for this message
Mingyuan Qi (myqi) wrote :

These errors are not related to docker proxy itself. But helm init issue needs to be addressed as a new issue.

Frank Miller (sensfan22)
tags: added: stx.containers
Revision history for this message
Erich Cordoba (ericho) wrote :

I did this experiment:

I noticed that in controller-1 there wasn't a /etc/kubernetes/admin.conf file, which seems to be required by `helm init --client-only` see [0].

Then I tried a system host-lock controller-1 followed by a system host-unlock controller-1 and I was able to get the controller-1 unlocked.

+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

Also, in [1] it can be seen that for controller-0 the `--skip-refresh` flag is being passed to avoid download stuff on helm init, which can explain why controller-0 doesn't fail this way.

[0] https://github.com/openstack/stx-config/blob/master/puppet-manifests/src/modules/platform/manifests/helm.pp#L80
[1] https://github.com/openstack/stx-config/blob/master/puppet-manifests/src/modules/platform/manifests/helm.pp#L57

Revision history for this message
Mingyuan Qi (myqi) wrote :

The workaround I did yesterday is to add the --skip-refresh flag to helm init --client-only, which works in AIO-DX and Standard config.

Frank Miller (sensfan22)
Changed in starlingx:
assignee: Mingyuan Qi (myqi) → Erich Cordoba (ericho)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Erich Cordoba (ericho) wrote :

With this change: https://review.openstack.org/#/c/637960/

I could get a successful unlock on controller-1

[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/637960
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=ed3c63a06da2cb04b7415cb1b5ba6340c3fa229a
Submitter: Zuul
Branch: master

commit ed3c63a06da2cb04b7415cb1b5ba6340c3fa229a
Author: Erich Cordoba <email address hidden>
Date: Tue Feb 19 12:09:42 2019 -0600

    Add DNS requirement for kubernetes and helm.

    `helm init` is being execute before networking and DNS is properly
    configured in the controller. A dependency was added to kubernetes
    to setup DNS, helm manifest was updated to depend on kubernetes.

    Also, the `--skip-refresh` flag was added to helm init for second
    controller to avoid timeout scenarios on proxy enviroments.

    Closes-Bug: 1814968

    Change-Id: I65759314b3a861e7fdb428889aa5f5c1c7037661
    Suggested-by: Mingyuan Qi <email address hidden>
    Signed-off-by: Erich Cordoba <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/638217

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (f/stein)
Download full text (6.9 KiB)

Reviewed: https://review.openstack.org/638217
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=b09d0898b6eaec572be3195ae25ec15413136552
Submitter: Zuul
Branch: f/stein

commit 1c467789c43827321e4319d50065fdbab1be35a2
Author: David Sullivan <email address hidden>
Date: Wed Feb 20 00:49:17 2019 -0500

    Add replica settings for mariadb ingress pod

    There was no mariadb replica override for the ingress pod. On AIO-SX
    this caused two pods to be scheduled. When anti-affinity was added to
    mariadb this broke application-apply on AIO-SX.

    The mariadb ingress pod replication will be set to the number of
    controllers.

    Change-Id: Icf3f1979720629904ca9ddcabf59e8ecfab709e5
    Story: 2004520
    Task: 29570
    Signed-off-by: David Sullivan <email address hidden>

commit ed3c63a06da2cb04b7415cb1b5ba6340c3fa229a
Author: Erich Cordoba <email address hidden>
Date: Tue Feb 19 12:09:42 2019 -0600

    Add DNS requirement for kubernetes and helm.

    `helm init` is being execute before networking and DNS is properly
    configured in the controller. A dependency was added to kubernetes
    to setup DNS, helm manifest was updated to depend on kubernetes.

    Also, the `--skip-refresh` flag was added to helm init for second
    controller to avoid timeout scenarios on proxy enviroments.

    Closes-Bug: 1814968

    Change-Id: I65759314b3a861e7fdb428889aa5f5c1c7037661
    Suggested-by: Mingyuan Qi <email address hidden>
    Signed-off-by: Erich Cordoba <email address hidden>

commit 70ed5b099496c98b37a94b061610d48c9263f554
Author: Alex Kozyrev <email address hidden>
Date: Fri Feb 15 15:46:32 2019 -0500

    Enable Barbican provisioning in SM in kubernetes environment

    Since Barbican is in charge of storing BMC passwords for MTCE now
    we need it to run as a bare-metal service alongside with kubernetes.
    This patch enables SM provisioning for barbican in this case.

    Change-Id: Id51f679738d429e78f388b6dc42e7606ef0c41ab
    Story: 2003108
    Task: 27700
    Signed-off-by: Alex Kozyrev <email address hidden>

commit 0dd4b86526609b86d8c7395a7c9af13e7f769596
Author: David Sullivan <email address hidden>
Date: Tue Feb 12 14:09:10 2019 -0500

    Add replica and anti-affinity settings

    Add anti-affinity settings to openstack pods. Add replication to
    novncproxy, aodh, panko and rbd_provisioner services.

    Change-Id: I8091a54cab98ff295eba6e7dd6fa76827d149b5f
    Story: 2004520
    Task: 29418
    Signed-off-by: David Sullivan <email address hidden>

commit 5b94294002617b18bc0f98b206a24cec38a5b929
Author: Angie Wang <email address hidden>
Date: Thu Feb 7 23:42:25 2019 -0500

    Support stx-openstack app install with the authed local registry

    The functionality of local docker registry authentication will be
    enabled in commit https://review.openstack.org/#/c/626355/.
    However, local docker registry is currently used to pull/push images
    during application apply without authentication and no credentials
    passed to the kubernetes when pulling images ...

Read more...

tags: added: in-f-stein
Ghada Khalil (gkhalil)
tags: added: stx.2019.05
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.