Bootstrap playbook fails at Initializing Kubernetes master
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Mihnea Saracin |
Bug Description
Brief Description
-----------------
Standard system fails when running the bootstrap playbook at the 'Initializing Kubernetes master' step. The problem is that the etcd endpoint in the kubeadm file is different than the one in the etcd config files.
Steps to Reproduce
------------------
Deploy a Standard system with the 'cluster_
Expected Behavior
------------------
Bootstrap playbook completes successfully
Actual Behavior
----------------
Bootstrap playbook fails
Reproducibility
---------------
9/9
System Configuration
-------
Standard System
Branch/Pull Time/Commit
-------
stx master build on "2021-03-01"
Last Pass
---------
N/A
Timestamp/Logs
--------------
TASK [bootstrap/
E fatal: [localhost]: FAILED! =>
{"changed": true, "cmd": ["kubeadm", "init", "--ignore-
From what I see, It seems that the etcd endpoint defined in the kubeadm_file is different from the one that etcd listens on:
controller-0:~$ cat /etc/kubernetes
apiVersion: kubeadm.
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.206.3
nodeRegistration:
criSocket: "/var/run/
—
apiVersion: kubeadm.
kind: ClusterConfigur
apiServer:
certSANs:
- 192.168.206.2
- 127.0.0.1
- 128.224.150.54
- 128.224.150.219
- 128.224.150.212
extraArgs:
default-
default-
feature-gates: "SCTPSupport=
event-ttl: "24h"
encryption-
extraVolumes:
- name: "encryption-config"
hostPath: /etc/kubernetes
mountPath: /etc/kubernetes
readOnly: true
pathType: File
controllerManager:
extraArgs:
node-monitor-
node-monitor-
pod-eviction-
feature-gates: "TTLAfterFinish
flex-volume-
controlPlaneEn
etcd:
external:
endpoints:
- https:/
caFile: /etc/kubernetes
certFile: /etc/kubernetes
keyFile: /etc/kubernetes
imageRepository: "registry.
kubernetesVersion: v1.18.1
networking:
dnsDomain: cluster.local
podSubnet: 172.16.0.0/16
serviceSubnet: 10.96.0.0/12
—
kind: KubeletConfigur
apiVersion: kubelet.
nodeStatusUpda
featureGates:
HugePageStorag
failSwapOn: false
cgroupRoot: "/k8s-infra"
#######
controller-
#[member]
ETCD_NAME=
ETCD_DATA_
ETCD_SNAPSHOT_
ETCD_HEARTBEAT
ETCD_ELECTION_
ETCD_LISTEN_
ETCD_ADVERTISE
ETCD_MAX_
ETCD_MAX_WALS=5
ETCD_ENABLE_
#
#[proxy]
ETCD_PROXY="off"
ETCD_PROXY_
ETCD_PROXY_
ETCD_PROXY_
ETCD_PROXY_
ETCD_PROXY_
#
#[security]
ETCD_CERT_
ETCD_KEY_
ETCD_CLIENT_
ETCD_TRUSTED_
ETCD_PEER_
#
#[logging]
ETCD_DEBUG=false
#######
controller-
# Managed by Puppet
# Source URL: https:/
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: "controller"
# Path to the data directory.
data-dir: "/opt/etcd/
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-
# List of comma separated URLs to listen on for client traffic.
listen-
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-
# Accept etcd V2 client requests
enable-v2: true
# Valid values include 'on', 'readonly', 'off'
proxy: "off"
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-
# Time (in milliseconds) of the endpoints refresh interval.
proxy-
# Time (in milliseconds) for a dial to timeout.
proxy-
# Time (in milliseconds) for a write to timeout.
proxy-
# Time (in milliseconds) for a read to timeout.
proxy-
client-
# Path to the client server TLS cert file.
cert-file: "/etc/etcd/
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: true
# Path to the client server TLS trusted CA key file.
trusted-ca-file: "/etc/etcd/ca.crt"
# Enable debug-level logging for etcd.
debug: false
#######
In the kubeadm file we have https:/
I think the commit that introduced this is:
https:/
This change in particular:
#####
The issue reproduces if the 'cluster_
The cluster_
- The platform:
- The ETCD_ENDPOINT: "https://\{{ cluster_
We can see in the ansible.log:
The bind_address
2021-03-02 18:34:01,656 p=11972 u=sysadmin | changed: [localhost] => (item=platform:
The ETCD_ENDPONT
2021-03-02 18:45:10,760 p=11972 u=sysadmin | changed: [localhost] => (item=sed -i -e 's|<%= @etcd_endpoint %>|'$ETCD_
In the auth.log
auth.log:
Test Activity
-------------
Developer Testing
Changed in starlingx: | |
assignee: | nobody → Mihnea Saracin (msaracin) |
description: | updated |
description: | updated |
Changed in starlingx: | |
status: | New → Fix Released |
Fix proposed here: /review. opendev. org/c/starlingx /ansible- playbooks/ +/779253
https:/