kolla_enable_tls_internal breaks etcd and kuryr

Bug #1930109 reported by Buddhika Sanjeewa
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
kolla-ansible
In Progress
Undecided
Unassigned

Bug Description

When we set kolla_enable_tls_internal to true, the internal_protocol becomes https.
So etcd/kuryr begin to listen on a TLS enabled port on a private IP.
This makes assigning certificates to these services a bit tricky.

docker logs etcd shows

2021-05-28 22:36:34.468508 I | pkg/flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://10.244.0.1:2379
2021-05-28 22:36:34.468541 I | pkg/flags: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd
2021-05-28 22:36:34.468552 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=https://10.244.0.1:2380
2021-05-28 22:36:34.468555 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=ctrl-s1-001=http://10.244.0.1:2380,ctrl-s2-001=http://10.244.0.2:2380
2021-05-28 22:36:34.468558 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=new
2021-05-28 22:36:34.468561 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=eci50iUdhBQMyFOUJh8wYpqKzg9r9OKcmwX2lIkM
2021-05-28 22:36:34.468565 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=https://10.244.0.1:2379
2021-05-28 22:36:34.468568 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=https://10.244.0.1:2380

Here both http and https settings on port 2380 and https on 2379

But cinder.conf has

[coordination]
backend_url = etcd3+http://10.244.0.1:2379

In kuryr, kuryr.conf has
kuryr_uri = https://10.244.0.5:23750

and kuryr.spec has https://10.244.0.5:23750

When trying to start the cloud shell in Horizon, the container fails to load with an error
Docker internal error: 500 Server Error: Internal Server Error ("legacy plugin: Post https://10.244.0.5:23750/Plugin.Activate: tls: first record does not look like a TLS handshake").

Seems some plugin used in zun tries to contact kuryr with plain text. (Here I did not provide a kolla/certificates/kuryr-*.pem or etcd-*.pem as the services are listening on private IPs)

Here 10.244.0.0 is my "cloud" network. All these problems go away if kolla_enable_tls_internal is set to false. But that allows all services to communicate in plain text. Tested with kayobe stable/victoria

Tags: tls etcd kuryr
description: updated
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Revision history for this message
Mark Goddard (mgoddard) wrote :

I started trying to get all CI jobs passing with TLS enabled here: https://review.opendev.org/c/openstack/kolla-ansible/+/782387. I made some progress but the zun jobs did not pass.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Note that etcd TLS is configured separately from the VIP, via etcd_enable_tls. The default value is set to kolla_enable_tls_backend.

Revision history for this message
Buddhika Sanjeewa (bsanjeewa) wrote (last edit ):

I don't see anything changed when I set "etcd_enable_tls: no" in /etc/kayobe/kolla/global.yml

it seems (for me) the issue is in the ansible/roles/etcd/defaults/main.yml

The endpoints section always set the protocol as internal_protocol which is https when kolla_enable_tls_internal is set to yes

############
# Endpoints
############
etcd_client_internal_endpoint: "{{ internal_protocol }}://{{ api_interface_address | put_address_in_context('url') }}:{{ etcd_client_port }}"
etcd_peer_internal_endpoint: "{{ internal_protocol }}://{{ api_interface_address | put_address_in_context('url') }}:{{ etcd_peer_port }}"

My guess is that the {{ internal_protocol }} should be replaced with {{ etcd_protocol }} (only in ansible/roles/etcd/defaults/main.yml Endpoints section)

Revision history for this message
Buddhika Sanjeewa (bsanjeewa) wrote :

Well now everything is http (after changing ansible/roles/etcd/defaults/main.yml endpoints' protocol to to {{ etcd_protocol }})
Now openstack volume list works
But I'm not sure etcd listening on http is OK (when tls_intrnal is enabled)

2021-06-03 18:56:36.170913 I | pkg/flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://10.244.0.1:2379
2021-06-03 18:56:36.170946 I | pkg/flags: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd
2021-06-03 18:56:36.170956 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=http://10.244.0.1:2380
2021-06-03 18:56:36.170962 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=ctrl-s1-001=http://10.244.0.1:2380,ctrl-s2-001=http://10.244.0.2:2380
2021-06-03 18:56:36.170965 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=new
2021-06-03 18:56:36.170968 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=eci50iUdhBQMyFOUJh8wYpqKzg9r9OKcmwX2lIkM
2021-06-03 18:56:36.170972 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=http://10.244.0.1:2379
2021-06-03 18:56:36.170978 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=http://10.244.0.1:2380
2021-06-03 18:56:36.170989 I | pkg/flags: recognized and used environment variable ETCD_NAME=ctrl-s1-001

Revision history for this message
Maroš Varchola (marosvarchola) wrote :

Hello! Now is the same problem, but with the kuryr. When everything is set to tls enable, kuryr still listen on the http. When Zun tries to contact it, TLS handshake does not succeed.

Revision history for this message
Maroš Varchola (marosvarchola) wrote :

Docker internal error: 500 Server Error for http://192.168.0.200:2375/v1.26/networks/create: Internal Server Error ("legacy plugin: Post "https://192.168.0.200:23750/Plugin.Activate": tls: first record does not look like a TLS handshake")

Revision history for this message
Matthew Heler (mheler) wrote :
Changed in kolla-ansible:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/875320
Committed: https://opendev.org/openstack/kolla-ansible/commit/ee336ac45c3bc32fa89ddbf6686eb9649c7b7532
Submitter: "Zuul (22348)"
Branch: master

commit ee336ac45c3bc32fa89ddbf6686eb9649c7b7532
Author: Matthew N Heler <email address hidden>
Date: Sun Feb 26 07:46:16 2023 -0600

    etcd: Set the proper peer and client protocol when tls is enabled

    Partial-Bug: #1930109

    Change-Id: I383b2b5a139d24a419145473b66a34c06e32060a

Revision history for this message
Javier Diaz Jr (javierdiazcharles) wrote :

Any progress on this front?

I am also getting the following error when creating a Zun container:

Docker internal error: 500 Server Error for http://172.17.1.183:2375/v1.26/networks/create: Internal Server Error ("legacy plugin: Post "https://172.17.1.183:23750/Plugin.Activate": tls: first record does not look like a TLS handshake").

Kolla-ansible 16.2.0
OpenStack 2023.1

external and internal tls enabled. Tried enabling backend tls, but this did not yield any positive results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.