kolla-ansible

zun scenario fails at Running Keystone bootstrap container when etcd is not healthy

Bug #1846531 reported by Radosław Piliszek on 2019-10-03

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	kolla-ansible	Opinion	Low	Unassigned

Bug Description

The example failure given below is after enabling internal TLS which broke etcd.
Any etcd failure is a valid trigger for this bug.

In Ansible:
Read timed out. (read timeout=60)

In Docker:
Oct 02 18:23:08 primary dockerd[10089]: time="2019-10-02T18:23:08.586295237Z" level=debug msg="could not find network 3f560f255e07334fd1e21230d15cd44fb6d0f4029cf386884d1b096fc3b55107: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.0.2.1:2379: connect: connection refused\n; error #1: dial tcp 192.0.2.2:2379: connect: connection refused\n; error #2: dial tcp 192.0.2.3:2379: connect: connection refused\n"

In etcd:
2019-10-02 18:26:05.742017 I | embed: rejected connection from "192.0.2.2:33296" (error "remote error: tls: bad certificate", ServerName "") 2019-10-02 18:26:05.742049 I | embed: rejected connection from "192.0.2.2:33298" (error "remote error: tls: bad certificate", ServerName "") 2019-10-02 18:26:05.805035 I | embed: rejected connection from "192.0.2.1:54146" (error "remote error: tls: bad certificate", ServerName "") 2019-10-02 18:26:05.806758 I | embed: rejected connection from "192.0.2.1:54144" (error "remote error: tls: bad certificate", ServerName "")

Expected etcd to break only containers deployed by Zun, not all of deployment.

See original description

Tags:

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-10-03:

#1

docker-info.txt Edit (134.4 KiB, text/plain)

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-04:

#2

I doubt it affects Stein - we only merged internal TLS support in Train.

description:

updated

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-04:

#3

Oh also I imagine it's only a problem if you use self-signed certs.

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-10-04:

#4

Mark, this bug tracks that etcd failure fails deployment, the TLS is just an example. Etcd could be broken in another way and we are affected by that.

description:

updated

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-10-06:

#5

I did not mean that to be critical.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-18: Related fix proposed to kolla-ansible (master)

#6

Related fix proposed to branch: master
Review: https://review.opendev.org/694778

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2019-11-18:

#7

So it only happens when etcd is rejecting connections. When etcd is down, dockerd ignores it and only zun is b0rken.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-21: Change abandoned on kolla-ansible (master)

#8

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: master
Review: https://review.opendev.org/694778
Reason: irrelevant

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-05-19:

#9

The problem is just how docker handles failures. It seems the decision is to reject requests when etcd is known to be available but broken.

no longer affects:	kolla-ansible/stein
no longer affects:	kolla-ansible/train
no longer affects:	kolla-ansible/ussuri
Changed in kolla-ansible:
status:	Triaged → Opinion
assignee:	Radosław Piliszek (yoctozepto) → nobody

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

docker-info.txt Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.