wait_for_introspection_to_finish sometimes fails with a keystoneauth1.exceptions.connection.ConnectFailure

Bug #1836976 reported by Steve Baker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Steve Baker

Bug Description

It appears that the inspector service can be too busy to service requests, causing the mistral _introspect workflow to fail.

A retry directive on this action should be enough to handle this situation.

keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://192.168.24.2:13050/v1/introspection/8e845c37-6d89-49c8-99a4-5a287cfe938d: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://192.168.24.2:13050/v1/introspection/8e845c37-6d89-49c8-99a4-5a287cfe938d: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
2019-07-01 15:45:34.479 8 WARNING mistral.executors.default_executor [req-c713b016-66e9-4f6b-8b44-454a7da50d63 b7e89eebf2004c338b2fa27f2c8e5c0a 255ba499e4bf4f2896b34de488035ab2 - default default] The action raised an exception [action_ex_id=9ce38d03-7d0a-4ca3-bccf-14bc7e674e92, action_cls='<class 'mistral.actions.action_factory.BaremetalIntrospectionAction'>', attributes='{'client_method_name': 'wait_for_finish'}', params='{'uuids': ['8e845c37-6d89-49c8-99a4-5a287cfe938d'], 'max_retries': 120, 'retry_interval': 10}']
 BaremetalIntrospectionAction.wait_for_finish failed: Unable to establish connection to https://192.168.24.2:13050/v1/introspection/8e845c37-6d89-49c8-99a4-5a287cfe938d: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)): mistral.exceptions.ActionException: BaremetalIntrospectionAction.wait_for_finish failed: Unable to establish connection to https://192.168.24.2:13050/v1/introspection/8e845c37-6d89-49c8-99a4-5a287cfe938d: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.opendev.org/671380

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Steve Baker (<email address hidden>) on branch: master
Review: https://review.opendev.org/671380
Reason: Evidence points to inspector being resource constrained to respond to the request, but there is more going on here.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672389

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672392

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/672615

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/672389
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c31faaa2862401c07825af46b08f99647db7ee63
Submitter: Zuul
Branch: master

commit c31faaa2862401c07825af46b08f99647db7ee63
Author: Steve Baker <email address hidden>
Date: Wed Jul 24 00:34:26 2019 +0000

    wait_for_introspection_to_finish_error set status FAILED

    Currently when wait_for_introspection_to_finish_error is reached,
    the _introspect workflow will pass as a success even when it
    actually failed. This means that the retry logic in the calling
    workflow is never triggered.

    wait_for_introspection_to_finish_error is hit in downstream CI
    when ironic-introspector is under too much load to respond
    to status poll requests.

    This change also ensures callers to the
    tripleo.baremetal.v1.introspect can override the default concurrency
    so that this can be changed when required.

    Change-Id: Ifd88ff9175bc6ca583e3826c59787680e25fbea3
    Partial-Bug: #1836976

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/672615
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=bd7d9ba0d65809b48eb33d7d579ce2cea7ee9534
Submitter: Zuul
Branch: stable/stein

commit bd7d9ba0d65809b48eb33d7d579ce2cea7ee9534
Author: Steve Baker <email address hidden>
Date: Wed Jul 24 00:34:26 2019 +0000

    wait_for_introspection_to_finish_error set status FAILED

    Currently when wait_for_introspection_to_finish_error is reached,
    the _introspect workflow will pass as a success even when it
    actually failed. This means that the retry logic in the calling
    workflow is never triggered.

    wait_for_introspection_to_finish_error is hit in downstream CI
    when ironic-introspector is under too much load to respond
    to status poll requests.

    This change also ensures callers to the
    tripleo.baremetal.v1.introspect can override the default concurrency
    so that this can be changed when required.

    Change-Id: Ifd88ff9175bc6ca583e3826c59787680e25fbea3
    Partial-Bug: #1836976

tags: added: in-stable-stein
Changed in tripleo:
milestone: train-2 → train-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/672392
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=3e03e3b78cce8a6babe75bc99c5142232f8a07c2
Submitter: Zuul
Branch: master

commit 3e03e3b78cce8a6babe75bc99c5142232f8a07c2
Author: Steve Baker <email address hidden>
Date: Wed Jul 24 01:12:29 2019 +0000

    Add --concurrency argument to introspect commands

    The default concurrency of 20 may be too high for small underclouds
    (especially CI environments), so this change adds a --concurrency
    argument so callers can control the maximum number of nodes
    to introspect concurrently.

    Depends-On: https://review.opendev.org/#/c/672389/
    Change-Id: I9faee9ab133e34466a79aa1176a16106bda1f15d
    Closes-Bug: #1836976

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.2.0

This issue was fixed in the openstack/python-tripleoclient 12.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.