[master][promotion][RDO phase1] Creating overcloud Heat stack failed giving Error([('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')]

Bug #1781541 reported by chandan kumar
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
wes hayutin

Bug Description

tripleo-quickstart-promote-master-current-tripleo-delorean-minimal job runs on RDO Phase 1 in promotion pipeline, It is consistently failing from yesterday.
Below is the more details:
https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-370/undercloud/home/stack/overcloud_deploy.log.gz

2018-07-12 18:09:49 | Deploying templates in the directory /tmp/tripleoclient-iplAUg/tripleo-heat-templates
2018-07-12 18:09:49 | Initializing overcloud plan deployment
2018-07-12 18:09:49 | Creating overcloud Heat stack
2018-07-12 18:09:49 | Failed to run action [action_ex_id=424fb011-75f7-4391-90f4-5b498128fe69, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 90}']
2018-07-12 18:09:49 | ERROR: SSLError: : resources.Controller<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_e3b4e244dcdb419782f6778f29efa3a0/overcloud/puppet/controller-role.yaml>.resources.Controller: : SSL exception connecting to https://192.168.24.3:8774/v2.1/: ("bad handshake: Error([('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')],)",)
2018-07-12 18:09:49 | + status_code=1
2018-07-12 18:09:49 | + openstack stack list
2018-07-12 18:09:49 | + grep -q overcloud
2018-07-12 18:09:57 | + echo 'overcloud deployment not started. Check the deploy configurations'
2018-07-12 18:09:57 | overcloud deployment not started. Check the deploy configurations
2018-07-12 18:09:57 | + exit 1

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-370/undercloud/var/log/extra/errors.txt.gz

2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor [req-aafe3f78-270b-4f7d-b53c-91d47f0e626a 769952915fdf4012b15a3bb1c6055308 e3b4e244dcdb419782f6778f29efa3a0 - default default] Failed to run action [action_ex_id=424fb011-75f7-4391-90f4-5b498128fe69, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 90}']
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor Traceback (most recent call last):
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor result = action.run(action_ctx)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/deployment.py", line 201, in run
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor return heat.stacks.create(**stack_args)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 171, in create
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor data=kwargs, headers=headers)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 289, in post
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor return self.client_request("POST", url, **kwargs)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 279, in client_request
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor resp, body = self.json_request(method, url, **kwargs)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 268, in json_request
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor resp = self._http_request(url, method, **kwargs)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 231, in _http_request
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor raise exc.from_response(resp)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor HTTPBadRequest: ERROR: SSLError: : resources.Controller<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_e3b4e244dcdb419782f6778f29efa3a0/overcloud/puppet/controller-role.yaml>.resources.Controller: : SSL exception connecting to https://192.168.24.3:8774/v2.1/: ("bad handshake: Error([('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')],)",)
2018-07-12 18:09:47.520 ERROR /var/log/mistral/executor.log: 21072 ERROR mistral.executors.default_executor
2018-07-12 18:04:37.097 ERROR /var/log/keystone/keystone.log: 23625 ERROR keystone.assignment.core [req-26bdd5d5-ee50-475d-ae7b-12fb69a36792 - - - - -] Circular reference found role inference rules - 871dced7d7e14137a24bc3b327e70803.
2018-07-12 18:04:41.376 ERROR /var/log/keystone/keystone.log: 23626 ERROR keystone.assignment.core [req-4b1b7e0b-b9c0-47e2-943e-8e358420

Logs from other run:
https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-371/undercloud/home/stack/overcloud_deploy.log.gz

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-372/undercloud/home/stack/overcloud_deploy.log.gz

I think it was an issue with the TLS port for keystone that wasn't properly passed - but still, not 100% sure, It needs investigation

Revision history for this message
chandan kumar (chkumar246) wrote :
Download full text (3.5 KiB)

From IRC discussion related to this bug:
 Tengu │ chkumar|ruck: guess there's no way to debug this with some running instance?
 Tengu │ chkumar|ruck: might be interesting to check what's actually listening on the said ports (13808 and 8774), and if they are TLS, what's the certificate
       │ content (if any)
 Tengu │ chkumar|ruck: in fact.... it might even be due the lack of certificates. if a client wants to talk TLS to a non-TLS port, I think this kind of error
       │ might appear.
r|ruck │ Tengu: https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-370/undercloud/var/log/extra/net
       │ stat.txt.gz
 Tengu │ chkumar|ruck: ok, haproxy, of course. Will see its configuration (it's in containers right?)
gerrit │ Quique Llorente proposed openstack-infra/tripleo-ci master: [WIP] Use toci scripts as templates for zuulv3 https://review.openstack.org/581331
    -- │ amoralej|off is now known as amoralej
   ◀▬▬ │ jd_ (~<email address hidden>) has quit (Ping timeout: 276 seconds)
 Tengu │ bind 192.168.24.2:13808 transparent ssl crt /etc/pki/tls/certs/undercloud-192.168.24.2.pem according to https://ci.centos.org/artifacts/rdo/jenkins-t
       │ ripleo-quickstart-promote-master-current-tripleo-delorean-minimal-370/undercloud/etc/haproxy/haproxy.cfg.gz
   ◀▬▬ │ dmacpher (dmacpher@nat/redhat/x-hvdjdyyudrkqikff) has quit (Quit: Leaving)
 Tengu │ chkumar|ruck: certificate is apparently valid, at least on the host. Will check if I find anything in container things.
   ▬▬▶ │ waleedm (~waleedm@37.8.44.46) has joined #tripleo
 Tengu │ oh. hm. no container.
r|ruck │ yes no container,
r|ruck │ Tengu: https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/minimal.yml is used for this job
   ▬▬▶ │ suuuper (~<email address hidden>) has joined #tripleo ...

Read more...

Revision history for this message
chandan kumar (chkumar246) wrote :

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-370/undercloud/var/log/mistral/mistral-db-manage.log.gz

2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base [-] Failed to create action: qinling.jobs_list: AttributeError: 'NoneType' object has no attribute 'Client'
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base Traceback (most recent call last):
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/action_generator/base.py", line 143, in create_actions
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base client_method = class_.get_fake_client_method()
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/base.py", line 75, in get_fake_client_method
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base return cls._get_client_method(cls._get_fake_client())
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/actions.py", line 997, in _get_fake_client
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base return cls._get_client_class()(
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base File "/usr/lib/python2.7/site-packages/mistral/actions/openstack/actions.py", line 985, in _get_client_class
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base return qinlingclient.Client
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base AttributeError: 'NoneType' object has no attribute 'Client'
2018-07-12 17:33:41.627 20815 ERROR mistral.actions.openstack.action_generator.base

Revision history for this message
chandan kumar (chkumar246) wrote :

https://review.openstack.org/#/c/579777/ -> detect https and act accordingly issue is coming

wes hayutin (weshayutin)
tags: removed: alert
Changed in tripleo:
assignee: nobody → wes hayutin (weshayutin)
status: Triaged → In Progress
Revision history for this message
Alan Pevec (apevec) wrote :

@Wes IIUC those tripleo jobs in RDO Phase1 are not containerized, shouldn't they be removed from the promotion criteria?

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/582405
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=b2d5a90fca26e11fd51ca3369c260fd642e7c49e
Submitter: Zuul
Branch: master

commit b2d5a90fca26e11fd51ca3369c260fd642e7c49e
Author: Wes Hayutin <email address hidden>
Date: Thu Jul 12 17:40:33 2018 -0400

    update the minimal config for master

    The minimal configuration for libvirt based
    deployments was slightly out of date

    Closes-Bug: #1781541
    Change-Id: Iee97dcf6968504c1bf9d4d3f2094be2602401500

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
yatin (yatinkarel) wrote :

It's not happening in minimal scenario after switching to containerized undercloud with https://review.openstack.org/582405 . But all the scenarios deployed with non containerized SSL undercloud and deploying overcloud will be affected. For ex: FS021, good to check which others are affected.

Also it's happening after the heat change:- https://review.openstack.org/#/c/556810/.
After this heat change following request is made:- http://192.168.24.3/v2.1 Which returns Location: in Response, then haproxy replace location http --> https which i think is wrong as at 192.168.24.3 https is not served.

New request after heat change:-
curl -g -i -X GET http://192.168.24.3:8774/v2.1 -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA1}dddcc2275f0b1d96962d07d62ace6a355302822c" -H "X-OpenStack-Nova-API-Version: 2.1"

HAProxy Conf for nova api:-
=========================
listen nova_osapi
  bind 192.168.24.2:13774 transparent ssl crt /etc/pki/tls/certs/undercloud-192.168.24.2.pem
  bind 192.168.24.3:8774 transparent
  mode http
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Port %[dst_port]
  option httpchk
  option httplog
  redirect scheme https code 301 if { hdr(host) -i 192.168.24.2 } !{ ssl_fc }
  rsprep ^Location:\ http://(.*) Location:\ https://\1
  server 192.168.24.1 192.168.24.1:8774 check fall 5 inter 2000 rise 2
=========================

Can someone from haproxy and nova look at the behavior and see if this is what is wanted.

Revision history for this message
chandan kumar (chkumar246) wrote :

FS21 is also getting affected https://logs.rdoproject.org/39/577039/6/openstack-check/legacy-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset021-master/c1cda0b/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-07-13_04_33_35

/environments/debug.yaml -e /home/zuul/inject-trust-anchor.yaml --validation-errors-nonfatal --compute-scale 1 --control-scale 3 --ntp-server pool.ntp.org
2018-07-13 04:33:35 | Unexpected status FAILED for tripleo.deployment.v1.deploy_plan
2018-07-13 04:33:35 | Removing the current plan files
2018-07-13 04:33:35 | Uploading new plan files
2018-07-13 04:33:35 | Plan updated.
2018-07-13 04:33:35 | Processing templates in the directory /tmp/tripleoclient-XBQrsm/tripleo-heat-templates
2018-07-13 04:33:35 | WARNING: Following parameter(s) are deprecated and still defined. Deprecated parameters will be removed soon!
2018-07-13 04:33:35 | NovaComputeExtraConfig
2018-07-13 04:33:35 | WARNING: Following parameter(s) are defined but not used in plan. Could be possible that parameter is valid but currently not used.
2018-07-13 04:33:35 | DockerClustercheckConfigImage
2018-07-13 04:33:35 | DockerMysqlClientConfigImage
2018-07-13 04:33:35 | DockerQdrouterdImage
2018-07-13 04:33:35 | DockerRsyslogSidecarConfigImage
2018-07-13 04:33:35 | CephPoolDefaultPgNum
2018-07-13 04:33:35 | CephPoolDefaultSize
2018-07-13 04:33:35 | DockerQdrouterdConfigImage
2018-07-13 04:33:35 | DockerRsyslogSidecarImage
2018-07-13 04:33:35 | DockerClustercheckImage
2018-07-13 04:33:35 | SaharaWorkers
2018-07-13 04:33:35 | SSLRootCertificate
2018-07-13 04:33:35 | Deploying templates in the directory /tmp/tripleoclient-XBQrsm/tripleo-heat-templates
2018-07-13 04:33:35 | Initializing overcloud plan deployment
2018-07-13 04:33:35 | Creating overcloud Heat stack
2018-07-13 04:33:35 | Failed to run action [action_ex_id=e1705502-fdab-4b06-a22f-b6b294407fbb, action_cls='<class 'mistral.actions.action_factory.DeployStackAction'>', attributes='{}', params='{u'skip_deploy_identifier': False, u'container': u'overcloud', u'timeout': 211}']
2018-07-13 04:33:35 | ERROR: SSLError: : resources.Compute<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_7a6168b9590e49ab9aca07679e3c8430/overcloud/puppet/compute-role.yaml>.resources.NovaCompute: : SSL exception connecting to https://192.168.24.3:8774/v2.1/: ("bad handshake: Error([('SSL routines', 'SSL23_GET_SERVER_HELLO', 'unknown protocol')],)",)
2018-07-13 04:33:35 | + status_code=1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)

Fix proposed to branch: master
Review: https://review.openstack.org/583202

Revision history for this message
Mike Bayer (zzzeek) wrote :
Download full text (5.0 KiB)

I have the fix in https://review.openstack.org/#/c/579777/4/mistral/actions/std_actions.py on an undercloud here and still getting this stack trace:

2018-07-23 15:49:29.970 25005 INFO heat.engine.stack [req-4cb285ed-4c3c-4b41-a375-47cb5182ae5a admin admin - default default] Exception in stack validation
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack Traceback (most recent call last):
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/stack.py", line 894, in validate
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack result = res.validate()
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server.py", line 1498, in validate
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack self._validate_image_flavor(image, flavor)
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/nova/server.py", line 1449, in _validate_image_flavor
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack flavor_obj = self.client_plugin().get_flavor(flavor)
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 275, in get_flavor
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack flavor = self.client().flavors.get(flavor_identifier)
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/microversion_mixin.py", line 27, in client
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack version = self.get_max_microversion()
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/heat/engine/clients/os/nova.py", line 100, in get_max_microversion
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack self.max_microversion = client.versions.get_current().version
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line 70, in get_current
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack return self._get_current()
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/v2/versions.py", line 53, in _get_current
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack return self._get(url, "version")
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/base.py", line 356, in _get
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack resp, body = self.api.client.get(url)
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 328, in get
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack return self.request(url, 'GET', **kwargs)
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/python2.7/site-packages/novaclient/client.py", line 77, in request
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack **kwargs)
2018-07-23 15:49:29.970 25005 ERROR heat.engine.stack File "/usr/lib/pytho...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart 2.1.1

This issue was fixed in the openstack/tripleo-quickstart 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.