Canonical Juju

wrong etcd_connection_string on kubernetes-master charm

Bug #1831580 reported by Seyeong Kim on 2019-06-04

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Joseph Phillips	Canonical Juju 2.7.7
	2.8	Fix Released	High	Joseph Phillips	Canonical Juju 2.8.1

Bug Description

Hello

This is case on manual deployment environment.

1. bootstrap & deploy juju 2.3.7 & k8s with specific revision
2. I got issue like below, but it is resolved when I upgrade juju to 2.4.7
- subprocess.CalledProcessError: Command '['leader-set', 'auto_dns_provider=kube-dns']' returned non-zero exit status 1
3. so, I upgraded juju controller and model to 2.4.7
4. There are different issue.
- I analyzed it and found out that there are wrong IP in kube-apiserver args ( etcd-servers )
- --etcd-servers="https://127.0.1.1:2379,https://127.0.1.1:2379,https://127.0.1.1:2379" on kubernetes-master
- /var/snap/kube-apiserver/current/args

5. I also needed to change canal argument manually like you did before.
- It points 127.0.1.1 as master as well
6. I modified them manually, k8s cluster worked fine.
7. then, I tried to upgrad charm to lastest(etcd, kubernetes-master)
8. kubernetes-master's configuration reverted to wrong IP
- --etcd-servers="https://127.0.1.1:2379,https://127.0.1.1:2379,https://127.0.1.1:2379"

I checked code quickly and found out below function gets this info

etcd.get_connection_string()

Some info is here
##########################################################
juju run --unit etcd/0 unit-get public-address
node-02.maas
juju run --unit etcd/0 unit-get private-address
node-02.maas
##########################################################
cat /var/snap/kube-apiserver/current/args

--advertise-address="127.0.1.1"
--min-request-timeout="300"
--etcd-cafile="/root/cdk/etcd/client-ca.pem"
--etcd-certfile="/root/cdk/etcd/client-cert.pem"
--etcd-keyfile="/root/cdk/etcd/client-key.pem"
--etcd-servers="https://127.0.1.1:2379,https://127.0.1.1:2379,https://127.0.1.1:2379"
--storage-backend="etcd3"
--tls-cert-file="/root/cdk/server.crt"
--tls-private-key-file="/root/cdk/server.key"
--insecure-bind-address="127.0.0.1"
--insecure-port="8080"
--audit-log-maxbackup="9"
--audit-log-maxsize="100"
--audit-log-path="/root/cdk/audit/audit.log"
--audit-policy-file="/root/cdk/audit/audit-policy.yaml"
--basic-auth-file="/root/cdk/basic_auth.csv"
--client-ca-file="/root/cdk/ca.crt"
--requestheader-allowed-names="system:kube-apiserver"
--requestheader-client-ca-file="/root/cdk/ca.crt"
--requestheader-extra-headers-prefix="X-Remote-Extra-"
--requestheader-group-headers="X-Remote-Group"
--requestheader-username-headers="X-Remote-User"
--service-account-key-file="/root/cdk/serviceaccount.key"
--token-auth-file="/root/cdk/known_tokens.csv"
--authorization-mode="AlwaysAllow"
--admission-control="NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota"
--allow-privileged=false
--enable-aggregator-routing
--kubelet-certificate-authority="/root/cdk/ca.crt"
--kubelet-client-certificate="/root/cdk/client.crt"
--kubelet-client-key="/root/cdk/client.key"
--kubelet-preferred-address-types="[InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP]"
--proxy-client-cert-file="/root/cdk/client.crt"
--proxy-client-key-file="/root/cdk/client.key"
--service-cluster-ip-range="172.18.0.0/16"
--v="4"

#############################################################
juju status

tnode01: Tue Jun 4 18:12:06 2019

Model Controller Cloud/Region Version SLA Timestamp
maas maas maas 2.4.7 unsupported 18:12:07+09:00

App Version Status Scale Charm Store Rev OS Notes
canal 0.10.0/2.6.12 active 5 canal jujucharms 604 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 231 ubuntu
etcd 3.2.10 active 3 etcd jujucharms 426 ubuntu
kubeapi-load-balancer 1.10.3 active 1 kubeapi-load-balancer jujucharms 58 ubuntu
kubernetes-master 1.12.8 blocked 2 kubernetes-master jujucharms 678 ubuntu
kubernetes-worker 1.12.8 active 3 kubernetes-worker jujucharms 536 ubuntu

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 node-01.maas Certificate Authority connected.
etcd/0 active idle 1 node-02.maas 2379/tcp Healthy with 3 known peers
etcd/1* active idle 2 node-03.maas 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 node-04.maas 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 node-05.maas 443/tcp Loadbalancer ready.
kubernetes-master/0 maintenance idle 5 node-06.maas 6443/tcp Writing kubeconfig file.
  canal/4 active idle node-06.maas Flannel subnet 172.19.88.1/24
kubernetes-master/1* blocked idle 6 node-07.maas 6443/tcp Stopped services: kube-apiserver
  canal/3 active idle node-07.maas Flannel subnet 172.19.62.1/24
kubernetes-worker/0* active executing 7 node-08.maas Kubernetes worker running.
  canal/0* active idle node-08.maas Flannel subnet 172.19.63.1/24
kubernetes-worker/1 active executing 8 node-09.maas Kubernetes worker running.
  canal/2 active idle node-09.maas Flannel subnet 172.19.19.1/24
kubernetes-worker/2 active executing 9 node-10.maas Kubernetes worker running.
  canal/1 active idle node-10.maas Flannel subnet 172.19.57.1/24

Entity Meter status Message
model amber user verification pending

Machine State DNS Inst id Series AZ Message
0 started node-01.maas manual:node-01.maas xenial Manually provisioned machine
1 started node-02.maas manual:node-02.maas xenial Manually provisioned machine
2 started node-03.maas manual:node-03.maas xenial Manually provisioned machine
3 started node-04.maas manual:node-04.maas xenial Manually provisioned machine
4 started node-05.maas manual:node-05.maas xenial Manually provisioned machine
5 started node-06.maas manual:node-06.maas xenial Manually provisioned machine
6 started node-07.maas manual:node-07.maas xenial Manually provisioned machine
7 started node-08.maas manual:node-08.maas xenial Manually provisioned machine
8 started node-09.maas manual:node-09.maas xenial Manually provisioned machine
9 started node-10.maas manual:node-10.maas xenial Manually provisioned machine

Tags:

Seyeong Kim (seyeongkim) on 2019-06-04

tags:

added: sts

Revision history for this message

Felipe Reyes (freyes) wrote on 2019-06-04:

the connection string comes from etcd charm itself, it sets it in the relation with the key "connection_string", this string is built with the information coming from get_ingress_addresses()[0] which relies on network_get() and fallbacks to unit_private_ip(), so I'm adding a task for the etcd charm as well.

[0] https://github.com/juju-solutions/layer-etcd/blob/master/lib/etcd_lib.py#L4

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2019-06-28:

127.0.1.1 is from ingress-addresses like below.
I haven't analyzed further but what code set ingress-addresses?

I think it is NetworksForRelation func in state/relationunit.go

analyzing further...

juju run --unit etcd/2 "network-get --format yaml db"
bind-addresses:
- macaddress: ""
  interfacename: ""
  addresses:
  - hostname: node-04.maas
    address: 127.0.1.1
    cidr: ""
egress-subnets:
- 10.0.0.6/32
ingress-addresses:
- 127.0.1.1

Revision history for this message

George Kraft (cynerva) wrote on 2019-06-28:

Looks like a Juju bug. Why is network-get returning 127.0.1.1 as the ingress address?

Revision history for this message

Felipe Reyes (freyes) wrote on 2019-06-28:

@seyeong,

> This is case on manual deployment environment.

does this mean you are using juju's manual provider?

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2019-06-28:

@freyes

right, I added machines manually first, then deployed units on that machines

Revision history for this message

Richard Harding (rharding) wrote on 2019-06-28:

can you post the network (including all devices) setup of the machine please?

Changed in juju:
status:	New → Incomplete

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2019-06-29:

@rharding

I paste ip addr, you may need something else?

https://pastebin.ubuntu.com/p/5tcYFtPNGg/

Revision history for this message

Felipe Reyes (freyes) wrote on 2019-07-04:

Setting the juju task to new as Seyeong provided the info requested.

Changed in juju:
status:	Incomplete → New

Revision history for this message

Tiago Pasqualini da Silva (tiago.pasqualini) wrote on 2019-07-25:

I was trying to deploy an openstack bundle [1] with juju manual provider and I noticed the same bug on percona-cluster charm:

$ juju run --unit neutron-api/0 "relation-get -r shared-db:28 - mysql/0"
allowed_units: neutron-api/0
db_host: 127.0.1.1
egress-subnets: 10.230.56.251/32
ingress-address: z-rotomvm21
password: 7yyftsyMk6ffsmSmzFPy22ZxkcLfV8Y5
private-address: z-rotomvm21

[1] https://pastebin.ubuntu.com/p/RXvT3csvzy/

Revision history for this message

Seyeong Kim (seyeongkim) wrote on 2019-08-20:

#10

I managed to find why this is happening in my env.

I found that network-get is getting info from local dns, and 127.0.1.1 is returned when
trying to nslookup node-12.maas(affected machine)

This was because manage_etc_hosts: true is default for MAAS deployed machine.

so I set user_data when deploy maas machine like below

maas xtrusia machine deploy MACHINE_NAME distro_series=xenial user_data=I2Nsb3VkLWNvbmZpZwptYW5hZ2VfZXRjX2hvc3RzOiBmYWxzZQo=

user_data is like below

#cloud-config
manage_etc_hosts: false

After that, symptom was gone.

Thanks.

Tim Van Steenburgh (tvansteenburgh) on 2019-09-20

no longer affects:	charm-etcd
no longer affects:	charm-kubernetes-master

Richard Harding (rharding) on 2020-02-03

Changed in juju:
status:	New → Triaged
importance:	Undecided → Low
tags:	added: network

Revision history for this message

Felipe Reyes (freyes) wrote on 2020-05-14:

#11

This bug was hit again, this time during the deployment of a proof of concept, the characteristics of the scenario were the same: manual provider where etcd misbehaved due to the 127.0.0.1 getting registered as the ingress address and from that point all the services related to etcd were trying to use it.

On site the /etc/hosts was fixed, but it looks like etcd already stored the incorrect address calling set_db_ingress_address()[0] which internally calls to conversation.set_remote()[1] and that ultimately boils down to a relation-set[2]

[0] https://github.com/charmed-kubernetes/layer-etcd/blob/master/reactive/etcd.py#L242
[1] https://github.com/juju-solutions/interface-etcd/blob/master/peers.py#L57
[2] https://github.com/juju-solutions/charms.reactive/blob/master/charms/reactive/relations.py#L773

Revision history for this message

Nick Niehoff (nniehoff) wrote on 2020-05-15:

#12

To clarify Felipe's comment this was hit again on a manual cloud with /etc/hosts configured with:

127.0.1.1 hostname.fqdn hostname

For completeness, it was 127.0.1.1 not 127.0.0.1 which matches Seyeong's findings as well.

Revision history for this message

Tim Penhey (thumper) wrote on 2020-05-25:

#13

Latest issues happend with Juju 2.7

Ian Booth (wallyworld) on 2020-05-26

Changed in juju:
milestone:	none → 2.7.7
importance:	Low → High

Joseph Phillips (manadart) on 2020-05-27

Changed in juju:
status:	Triaged → In Progress
assignee:	nobody → Joseph Phillips (manadart)

Revision history for this message

Joseph Phillips (manadart) wrote on 2020-05-28:

#14

Can we get the output of "juju show-machine x" where the offending unit is deployed?

Revision history for this message

Joseph Phillips (manadart) wrote on 2020-05-29:

#15

https://github.com/juju/juju/pull/11638

Joseph Phillips (manadart) on 2020-06-10

Changed in juju:
status:	In Progress → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2020-06-30

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.