Application Apply failing when HTTPS is enabled

Bug #1960354 reported by Heitor Matsui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Lucas

Bug Description

Brief Description
-----------------
When trying to apply a stx-openstack tarball the operation fails when Cinder pod fails to come up.

Severity
-----------------
Critical: Openstack with HTTPS is not usable after the defect.

Steps to Reproduce
-----------------
Install all the required certificates for stx-openstack to work with https on your system
Configure the system to have https enabled
Apply the required stx-openstack overrides for https
Apply the stx-openstack application

Expected Behavior
-----------------
Application finishes the apply procedure and is available for use.

Actual Behavior
-----------------
Application apply procedure fails when Cinder pod is being created.

Reproducibility
-----------------
Reproducible

System Configuration
-----------------
Observed on Simplex, but might be Any

Branch/Pull Time/Commit
-----------------
master

Last Pass
-----------------
Aug/2021

Timestamp/Logs
-----------------
Describe cinder-api pod:

  Warning Unhealthy 43m (x2 over 155m) kubelet, controller-0 Readiness probe failed: dial tcp 172.16.192.184:8776: connect: connection refused
Ingress seems to be wrongly configured since requests to services fqdn (cinder.myhost.com) are timing out with a response:

[root@bootstrap-xxx tmp]# curl -g -i -k --cacert "/etc/ssl/certs/openstack-helm.crt" -X GET https://cinder.myhost.com/v3 -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: gAAAAABh4Iisa..."
HTTP/1.1 504 Gateway Time-out
Date: Thu, 13 Jan 2022 21:14:00 GMT
Content-Type: text/html
Content-Length: 160
Connection: keep-alive
Strict-Transport-Security: max-age=15724800; includeSubDomains

<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
[root@bootstrap-xxx tmp]# curl -g -i -k --cacert "/etc/ssl/certs/openstack-helm.crt" -X GET https://cinder/v3 -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: gAAAAABh4Iis..."

ingress-5cdd7f8687-rj9lq ingress 2022/01/13 21:21:15 [error] 634#634: *370604 upstream timed out (110: Operation timed out) while reading response header from upstream, client: 172.16.192.183, server: cinder, request: "GET /v3 HTTP/1.1", upstream: "http://172.16.192.182:8776/v3", host: "cinder"
And requests directly to services are reaching the desired pods but the response is getting lost on its way back:

[root@bootstrap-xxx tmp]# curl -g -i -k --cacert "/etc/ssl/certs/openstack-helm.crt" -X GET https://cinder/v3/1bf128d4d4224cd3b3028f770cffba4a/types/ceph-store -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "X-Auth-Token: gAAAAABh4Ii..."
^C

20:21:49.517116 IP (tos 0x0, ttl 63, id 21274, offset 0, flags [DF], proto TCP (6), length 60)
  ⦙ 172.16.192.183.60876 > cinder-api-55b5999cdb-lz9sw.8776: Flags [S], cksum 0xd9bd (incorrect -> 0xcda9), seq 294508843, win 64240, options [mss 1460,sackOK,TS val 300325154 ecr 0,nop,wscale 7], length 0
20:21:49.517135 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
  ⦙ cinder-api-55b5999cdb-lz9sw.8776 > 172.16.192.183.60876: Flags [S.], cksum 0xd9bd (incorrect -> 0x089b), seq 2703150312, ack 294508844, win 65160, options [mss 1460,sackOK,TS val 3482878917 ecr 300325154,nop,wscale 7], length 0
20:21:49.517157 IP (tos 0x0, ttl 63, id 21275, offset 0, flags [DF], proto TCP (6), length 52)
  ⦙ 172.16.192.183.60876 > cinder-api-55b5999cdb-lz9sw.8776: Flags [.], cksum 0xd9b5 (incorrect -> 0x33fa), ack 1, win 502, options [nop,nop,TS val 300325154 ecr 3482878917], length 0
20:21:49.517205 IP (tos 0x0, ttl 63, id 21276, offset 0, flags [DF], proto TCP (6), length 407)
  ⦙ 172.16.192.183.60876 > cinder-api-55b5999cdb-lz9sw.8776: Flags [P.], cksum 0xdb18 (incorrect -> 0x105d), seq 1:356, ack 1, win 502, options [nop,nop,TS val 300325154 ecr 3482878917], length 355
20:21:49.517209 IP (tos 0x0, ttl 64, id 22021, offset 0, flags [DF], proto TCP (6), length 52)
  ⦙ cinder-api-55b5999cdb-lz9sw.8776 > 172.16.192.183.60876: Flags [.], cksum 0xd9b5 (incorrect -> 0x3292), ack 356, win 507, options [nop,nop,TS val 3482878917 ecr 300325154], length 0
20:22:51.036268 IP (tos 0x0, ttl 63, id 21277, offset 0, flags [DF], proto TCP (6), length 52)
  ⦙ 172.16.192.183.60876 > cinder-api-55b5999cdb-lz9sw.8776: Flags [.], cksum 0xd9b5 (incorrect -> 0x4248), ack 1, win 502, options [nop,nop,TS val 300386673 ecr 3482878917], length 0
20:22:51.036287 IP (tos 0x0, ttl 64, id 22022, offset 0, flags [DF], proto TCP (6), length 52)
  ⦙ cinder-api-55b5999cdb-lz9sw.8776 > 172.16.192.183.60876: Flags [.], cksum 0xd9b5 (incorrect -> 0x4242), ack 356, win 507, options [nop,nop,TS val 3482940436 ecr 300325154], length 0
20:23:03.863893 IP (tos 0x0, ttl 63, id 21278, offset 0, flags [DF], proto TCP (6), length 52)
  ⦙ 172.16.192.183.60876 > cinder-api-55b5999cdb-lz9sw.8776: Flags [F.], cksum 0xd9b5 (incorrect -> 0x1fdb), seq 356, ack 1, win 502, options [nop,nop,TS val 300399500 ecr 3482940436], length 0
20:23:03.904233 IP (tos 0x0, ttl 64, id 22023, offset 0, flags [DF], proto TCP (6), length 52)
  ⦙ cinder-api-55b5999cdb-lz9sw.8776 > 172.16.192.183.60876: Flags [.], cksum 0xd9b5 (incorrect -> 0xed91), ack 357, win 507, options [nop,nop,TS val 3482953304 ecr 300399500], length 0

Test Activity
-----------------
Developer Testing.

Workaround
-----------------
No workaround.

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to helm-charts (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/822833
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/27c4d562c8ade4b4f34ec807ac327334fdd13cb3
Submitter: "Zuul (22348)"
Branch: master

commit 27c4d562c8ade4b4f34ec807ac327334fdd13cb3
Author: Lucas Cavalcante <email address hidden>
Date: Thu Dec 23 11:13:15 2021 -0300

    Fixes Application Apply failing when HTTPS enabled

    Openstack-helm provides the option to terminate TLS at the services.
    However, at Starlingx TLS termination is done at the reverse
    proxy (ingress) and therefore is unecessary for the OpenStack itself
    be HTTPS and terminate tls a second time. Furthermore, it is not
    possible to have https enabled on openstack services with the
    current centos based containers that we have, openstack-helm only
    supports tls using debian based containers.

    Manually working arroud this creates a cumbersome override file, so
    to diminish this overrides this patch 0020 and 0013(osh-i) disables
    https at the backend, thus maitaining the same behaviour as stx 5.0

    Mariadb and RabbitMQ tls does not seem to be working very well within
    Starlingx, so we also disable TLS for them. I am not confident that
    current openstack-helm and openstack-helm-infra supports production level
    openstack with mariadb in TLS mode. Furthermore, from the way everything
    is redirected in StarlingX I do see too many performance and stability
    issues using both of them with tls enabled.

    Disclaimer I did not test with either only mairiadb tls or
    rabbitmq activated, but with both of them on the system is not usable.

    Test Plan:

    PASS: Openstack is Applied. (https disabled)
    PASS: enable https. Opensatck is Applied (WITHOUT service.conf
    overrides)

    Signed-off-by: Lucas Cavalcante <email address hidden>
    Change-Id: Ifb7946e9a289234047934b52d200b951a59c1a3f
    Partial-bug: 1960354
    Related-to: https://review.opendev.org/c/starlingx/helm-charts/+/828815

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to helm-charts (master)

Reviewed: https://review.opendev.org/c/starlingx/helm-charts/+/828815
Committed: https://opendev.org/starlingx/helm-charts/commit/40dc19f1a2dca285093d02b67d94d5c50c7c003c
Submitter: "Zuul (22348)"
Branch: master

commit 40dc19f1a2dca285093d02b67d94d5c50c7c003c
Author: Lucas Cavalcante <email address hidden>
Date: Fri Feb 11 01:14:45 2022 -0300

    Improve stability with https enabled

    Uses public ingress secrets and disables mariadb and rabbit tls
    that were causing connection problems with services

    PASS: Openstack is Applied. (https disabled)
    PASS: enable https. Opensatck is Applied (WITHOUT service.conf
    overrides)

    Depends-on: https://review.opendev.org/c/starlingx/openstack-armada-app/+/822833
    Signed-off-by: Lucas Cavalcante <email address hidden>
    Closes-bug: 1960354
    Change-Id: Id41385eea097bdf874290620d2a0be58f9d21e2b

Changed in starlingx:
status: In Progress → Fix Released
Lucas (lcavalca)
Changed in starlingx:
assignee: nobody → Lucas (lcavalca)
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.distro.openstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.