Gate occasionally failed with TLS handshake error

Bug #1521395 reported by hongbin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Magnum
Fix Released
Undecided
hongbin

Bug Description

This error occurred occasional in gate. It failed on using docker CLI to create/delete container. Here is an example:

http://logs.openstack.org/84/251184/2/check/gate-functional-dsvm-magnum-swarm/7b19343/

From the logs, the stack created successfully. All the services in bay nodes started successfully as well. But the docker CLI cannot connect to the swarm manager. On client side, it failed with the following error:

requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))

On server side, swarm-manager recorded the following error:

http: TLS handshake error from 192.168.0.3:46565: remote error: unknown certificate authority

hongbin (hongbin034)
description: updated
Eli Qiao (taget-9)
Changed in magnum:
assignee: nobody → Eli Qiao (taget-9)
assignee: Eli Qiao (taget-9) → nobody
assignee: nobody → Eli Qiao (taget-9)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to magnum (master)

Fix proposed to branch: master
Review: https://review.openstack.org/251740

Changed in magnum:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/251740
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=ea2efb8779a13c7863cda3dde92faac74504b262
Submitter: Jenkins
Branch: master

commit ea2efb8779a13c7863cda3dde92faac74504b262
Author: Eli Qiao <email address hidden>
Date: Tue Dec 1 17:35:37 2015 +0800

    Wait more time after swarm bay creation before doing functional testing

    This patch increases waiting loop time after swarm bay creation finished.
    Raise a meaningful exception to indicate ca initial failed.

    Closes-Bug: #1521395
    Change-Id: I58c94affe3dc07aa43dc8d7a120b0b62e0cdb47a

Changed in magnum:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to magnum (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/253864

Revision history for this message
hongbin (hongbin034) wrote :

It seems this issue is not fixed yet. The gate recently failed with this error again. I am going to re-open this issue.

http://logs.openstack.org/81/253781/1/check/gate-functional-dsvm-magnum-swarm/be8590b/

Changed in magnum:
status: Fix Committed → Confirmed
Revision history for this message
hongbin (hongbin034) wrote :

A bit more finding. From all the failing gate, there is an error on pulling image. Like below

msg="Handler for POST /containers/create returned error: No such image: docker.io/cirros:latest (tag: latest)"

Full log: http://logs.openstack.org/84/251184/2/check/gate-functional-dsvm-magnum-swarm/7b19343/logs/bay-nodes/master-172.24.5.6/docker.txt.gz

Changed in magnum:
assignee: Eli Qiao (taget-9) → hongbin (hongbin034)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to magnum (master)

Reviewed: https://review.openstack.org/253864
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=97a37e8eef0d0ce65529400c8a325001b10ea303
Submitter: Jenkins
Branch: master

commit 97a37e8eef0d0ce65529400c8a325001b10ea303
Author: Hongbin Lu <email address hidden>
Date: Sat Dec 5 23:40:17 2015 -0500

    Always log if disconnect from docker swarm

    Currently, docker conductor conditionally log the exception. It could
    cause lose of debug information.

    Change-Id: I13d8a15e4731e5a4bae8d6cb80b054baf5a3dd42
    Related-Bug: #1521395

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to magnum (master)

Reviewed: https://review.openstack.org/253961
Committed: https://git.openstack.org/cgit/openstack/magnum/commit/?id=8733cd37fa3b82c53ef5de62e8053377f79eaf6b
Submitter: Jenkins
Branch: master

commit 8733cd37fa3b82c53ef5de62e8053377f79eaf6b
Author: Hongbin Lu <email address hidden>
Date: Sun Dec 6 16:03:09 2015 -0500

    Gate: Fix docker swarm disconnect issue

    The swarm func test occasionally failed with the error below. This
    error cannot be determinately reproduced. After some experiments,
    it seems that swarm will abort connections during registration of
    a new swarm agent.

    ConnectionError: ('Connection aborted.', BadStatusLine("''",))

    This commit tries to fix the issue by waiting for the completion of
    agent registration. After the swarm agent service starts, it checks
    ETCD to ensure the agent was successfully registered before sending
    signal to Heat to indicate its success.

    Closes-Bug: #1521395
    Change-Id: Iec1772d1df7d85e367676758b1f97a5b604c0eb7

Changed in magnum:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/magnum 2.0.0

This issue was fixed in the openstack/magnum 2.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.