CI: Deployment times out because signal back to undercloud fails with a connection timed out

Bug #1731540 reported by Alex Schultz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
heat-agents
Fix Released
Undecided
Alex Schultz
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

The deployment seemed to just hang even though it had completed. This seems to be related to the lost signal back to the undercloud.

http://logs.openstack.org/37/518037/1/gate/legacy-tripleo-ci-centos-7-scenario003-multinode-oooq-puppet/b1a365a/logs/subnode-2/var/log/messages.txt.gz#_Nov_10_17_50_10

Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 journal: Suppressed 15575 messages from /system.slice/os-collect-config.service
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,559] (heat-config) [INFO]
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,560] (heat-config) [ERROR] Error running heat-config-notify. [1]
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,560] (heat-config) [ERROR] [2017-11-10 17:34:38,552] (heat-config-notify) [DEBUG] Signaling to http://192.168.24.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3A18c762306e4c46c69a775b95a80e5351%3Astacks/overcloud-AllNodesDeploySteps-7fmdthg5b4yp-ControllerDeployment_Step5-egizan4r3glg/566f649a-7ea6-4a46-8d2c-e45e2b68efc8/resources/0?Timestamp=2017-11-10T17%3A29%3A29Z&SignatureMethod=HmacSHA256&AWSAccessKeyId=fdcc1521357e4c21a2dd8ce699d68cf2&SignatureVersion=2&Signature=LT6mqktEq90a%2BO9A39Ms%2BJu6PlRYfOlFW4n%2FaYtTEpo%3D via POST
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: Traceback (most recent call last):
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/bin/heat-config-notify", line 163, in <module>
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: sys.exit(main(sys.argv, sys.stdin))
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/bin/heat-config-notify", line 113, in main
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: headers={'content-type': 'application/json'})
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/lib/python2.7/site-packages/requests/api.py", line 112, in post
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: return request('post', url, data=data, json=json, **kwargs)
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: return session.request(method=method, url=url, **kwargs)
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 518, in request
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: resp = self.send(prep, **send_kwargs)
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 639, in send
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: r = adapter.send(request, **kwargs)
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 488, in send
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: raise ConnectionError(err, request=request)
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: requests.exceptions.ConnectionError: ('Connection aborted.', error(110, 'Connection timed out'))
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] Skipping config fb3506e0-4279-4d72-83d3-f4cbb9f16424, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/fb3506e0-4279-4d72-83d3-f4cbb9f16424.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] Skipping config 85e277c1-b359-4cb4-a3dc-9fc7c1990639, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/85e277c1-b359-4cb4-a3dc-9fc7c1990639.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] Skipping config 1fc4dfa0-0a43-48fa-b9d6-1b36c7fbbe4c, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/1fc4dfa0-0a43-48fa-b9d6-1b36c7fbbe4c.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] Skipping config bc1e2b10-3160-46df-a764-0c8b0bb39e0e, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/bc1e2b10-3160-46df-a764-0c8b0bb39e0e.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] Skipping config a287f753-0382-40c7-ae55-a4e104c23294, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,562] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/a287f753-0382-40c7-ae55-a4e104c23294.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config 4c54974e-8a4f-4525-a5dc-7be164a246c5, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/4c54974e-8a4f-4525-a5dc-7be164a246c5.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config c388a887-51ba-450e-a42f-738ed46b2bff, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/c388a887-51ba-450e-a42f-738ed46b2bff.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config 06c33330-2a26-4889-9f21-217b966cc36e, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/06c33330-2a26-4889-9f21-217b966cc36e.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config f599a416-8b05-48fd-b003-14268ea76c4a, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/f599a416-8b05-48fd-b003-14268ea76c4a.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config 2270e1c7-1f25-482b-8ab5-4d2817f9f61b, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/2270e1c7-1f25-482b-8ab5-4d2817f9f61b.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config f96cf4e5-eb20-4c03-ab2a-67a9f0f8c192, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/f96cf4e5-eb20-4c03-ab2a-67a9f0f8c192.json
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] Skipping config 04955a40-1113-49a6-ab8d-d20d2d97cc02, already deployed
Nov 10 17:50:10 centos-7-citycloud-kna1-0000814992 os-collect-config: [2017-11-10 17:50:10,563] (heat-config) [WARNING] To force-deploy, rm /var/lib/heat-config/deployed/04955a40-1113-49a6-ab8d-d20d2d97cc02.json

http://logs.openstack.org/62/515462/1/check/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet/475c84f/logs/subnode-2/var/log/messages.txt.gz#_Nov_10_09_44_53

Tags: ci
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Possible fix in heat-templates: https://review.openstack.org/#/c/518984/

Changed in heat:
status: New → In Progress
assignee: nobody → Alex Schultz (alex-schultz)
Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
status: Triaged → In Progress
no longer affects: heat
Changed in heat-templates:
status: New → In Progress
assignee: nobody → Alex Schultz (alex-schultz)
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Hi, fwiw here are a few things that looks odds in the 475c84f run:

logs/undercloud/var/log/keystone/keystone.log.txt.gz:
0125: 2017-11-10 08:01:03.986 29859 ERROR keystone DBError: (pymysql.err.InternalError) (1130, u"Host '192.168.24.1' is not allowed to connect to this MariaDB server")

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Changed in heat-agents:
status: New → In Progress
assignee: nobody → Alex Schultz (alex-schultz)
no longer affects: heat-templates
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/519756

Revision history for this message
Alex Schultz (alex-schultz) wrote :

So this appears to be environmental because if you look further into the logs you can see connection timeouts continuing with os-collect-config.

Nov 10 09:45:33 centos-7-inap-mtl01-0000809967 os-collect-config: HTTPConnectionPool(host='192.168.24.1', port=8080): Read timed out. (read timeout=10.0)
Nov 10 09:45:33 centos-7-inap-mtl01-0000809967 os-collect-config: Source [request] Unavailable.

So I think the vxlan connection failed between the CI nodes. I'm going to drop alert from this for now unless we see further instances of this.

tags: removed: alert
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat-agents (master)

Reviewed: https://review.openstack.org/519417
Committed: https://git.openstack.org/cgit/openstack/heat-agents/commit/?id=756fcafdf025351ea862ae57f29b8eff9beee362
Submitter: Zuul
Branch: master

commit 756fcafdf025351ea862ae57f29b8eff9beee362
Author: Alex Schultz <email address hidden>
Date: Mon Nov 13 09:33:41 2017 -0700

    Retry logic for url request in heat-config-notify

    Adds retry logic for software deployments using the url signals
    to ensure that requests are retried if network connection issues
    occur or a 500, 502, 503, or 504 is returned by the http or https
    endpoint.

    Note: this does not add retry logic to heatclient or zaqarclient
    if they are used for signaling.

    Change-Id: I82dff4a4b9fac05c5ec649db3eb379bdec71e208
    Related-Bug: #1731540

Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Revision history for this message
Alex Schultz (alex-schultz) wrote :

There was also an issue with heat which was also resolved.

Changed in tripleo:
status: In Progress → Fix Released
Changed in heat-agents:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.