zun scenario fails at Running Keystone bootstrap container when etcd is not healthy

Bug #1846531 reported by Radosław Piliszek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Opinion
Low
Unassigned

Bug Description

The example failure given below is after enabling internal TLS which broke etcd.
Any etcd failure is a valid trigger for this bug.

In Ansible:
Read timed out. (read timeout=60)

In Docker:
Oct 02 18:23:08 primary dockerd[10089]: time="2019-10-02T18:23:08.586295237Z" level=debug msg="could not find network 3f560f255e07334fd1e21230d15cd44fb6d0f4029cf386884d1b096fc3b55107: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.0.2.1:2379: connect: connection refused\n; error #1: dial tcp 192.0.2.2:2379: connect: connection refused\n; error #2: dial tcp 192.0.2.3:2379: connect: connection refused\n"

In etcd:
2019-10-02 18:26:05.742017 I | embed: rejected connection from "192.0.2.2:33296" (error "remote error: tls: bad certificate", ServerName "") 2019-10-02 18:26:05.742049 I | embed: rejected connection from "192.0.2.2:33298" (error "remote error: tls: bad certificate", ServerName "") 2019-10-02 18:26:05.805035 I | embed: rejected connection from "192.0.2.1:54146" (error "remote error: tls: bad certificate", ServerName "") 2019-10-02 18:26:05.806758 I | embed: rejected connection from "192.0.2.1:54144" (error "remote error: tls: bad certificate", ServerName "")

Expected etcd to break only containers deployed by Zun, not all of deployment.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Revision history for this message
Mark Goddard (mgoddard) wrote :

I doubt it affects Stein - we only merged internal TLS support in Train.

description: updated
Revision history for this message
Mark Goddard (mgoddard) wrote :

Oh also I imagine it's only a problem if you use self-signed certs.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Mark, this bug tracks that etcd failure fails deployment, the TLS is just an example. Etcd could be broken in another way and we are affected by that.

description: updated
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I did not mean that to be critical.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/694778

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

So it only happens when etcd is rejecting connections. When etcd is down, dockerd ignores it and only zun is b0rken.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: master
Review: https://review.opendev.org/694778
Reason: irrelevant

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

The problem is just how docker handles failures. It seems the decision is to reject requests when etcd is known to be available but broken.

no longer affects: kolla-ansible/stein
no longer affects: kolla-ansible/train
no longer affects: kolla-ansible/ussuri
Changed in kolla-ansible:
status: Triaged → Opinion
assignee: Radosław Piliszek (yoctozepto) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.