Overcloud deployment is failing in C7 train with "stderr": "Get https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to HTTPS client",

Bug #1909750 reported by Sandeep Yadav
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Alex Schultz

Bug Description

Description:-

Overcloud deployment is failing in C7 train with "stderr": "Get https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to HTTPS client"]

Affected job: periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train

Build history:-
https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train

Logs:-
https://logserver.rdoproject.org/17/26217/22/check/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train/36ab673/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
~~~
2020-12-31 10:49:48 | 2020-12-31 10:47:36.277586 | fa163ebf-e325-d2c8-5203-000000003300 | FATAL | Pull 192.168.24.1:8787/tripleotrain/centos-binary-cinder-volume:aa5cdac62ccf12ead64c317e554bf24fe7b312e3_9fe37254-updated-20201231085211 image | overcloud-controller-2 | error={"changed": true, "cmd": "docker pull 192.168.24.1:8787/tripleotrain/centos-binary-cinder-volume:aa5cdac62ccf12ead64c317e554bf24fe7b312e3_9fe37254-updated-20201231085211", "delta": "0:00:00.100917", "end": "2020-12-31 10:47:36.145534", "msg": "non-zero return code", "rc": 1, "start": "2020-12-31 10:47:36.044617", "stderr": "Get https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to HTTPS client", "stderr_lines": ["Get https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to HTTPS client"], "stdout": "Trying to pull repository 192.168.24.1:8787/tripleotrain/centos-binary-cinder-volume ... ", "stdout_lines": ["Trying to pull repository 192.168.24.1:8787/tripleotrain/centos-binary-cinder-volume ... "]}
2020-12-31 10:49:48 | 2020-12-31 10:47:36.278371 | fa163ebf-e325-d2c8-5203-000000003300 | TIMING | tripleo-container-tag : Pull 192.168.24.1:8787/tripleotrain/centos-binary-cinder-volume:aa5cdac62ccf12ead64c317e554bf24fe7b312e3_9fe37254-updated-20201231085211 image | overcloud-controller-2 | 0:12:34.929103 | 0.49s
~~~

Another example:-

https://logserver.rdoproject.org/14/31414/3/check/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train/7fb08f6/logs/undercloud/home/zuul/overcloud_deploy.log.gz

Additional information:-

If we restart docker like what we tried here with a test patch in tht[1] deployment passes[2] without error.

With this bug we want to find the root cause of this issue and if there is any better way to solve the issue than what we tried in [1]

[1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/768231/4/deployment/deprecated/docker/docker-baremetal-ansible.yaml
[2] Testproject
https://review.opendev.org/c/openstack/tripleo-heat-templates/+/768231

~~~
https://logserver.rdoproject.org/14/31414/3/check/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-train/c73d293/logs/undercloud/home/zuul/overcloud_deploy.log.gz
~~~

Tags: alert
summary: Overcloud deployment is failing in C7 train with "stderr": "Get
https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to
- HTTPS client", "stderr_lines": ["Get https://192.168.24.1:8787/v1/_ping:
- http: server gave HTTP response to HTTPS client"]
+ HTTPS client",
Revision history for this message
Alex Schultz (alex-schultz) wrote :

JFYI "Get https://192.168.24.1:8787/v1/_ping: http: server gave HTTP response to HTTPS client" is not related. That's always printed for HTTP endpoints. It would only be a problem if the host is not in the insecure registry list.

Revision history for this message
yatin (yatinkarel) wrote :

@Alex Actually it's there in insecure registry list but docker is not getting started post the configuration and thus the job fails, a test via manual restart with https://review.opendev.org/c/openstack/tripleo-heat-templates/+/768231 passes the job in https://review.rdoproject.org/r/#/c/31414/, didn't got why docker is not getting started post config changes with handlers. It happens randomly on different hosts. You have any idea what could cause it?

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Fall out from the strategy change in train. handlers don't work right. We need to switch away from using handlers in the container-registry code. I'll propose a change

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
yatin (yatinkarel) wrote :

https://review.opendev.org/c/openstack/ansible-role-container-registry/+/770148 merged and job not failing on this issue, so closing it.

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ansible-role-container-registry 1.3.0

This issue was fixed in the openstack/ansible-role-container-registry 1.3.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers