introspect command logs introspect failed for successful introspections

Bug #1854399 reported by Steve Baker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
In Progress
Medium
Steve Baker

Bug Description

Introspection reported that a node failed but then says successfully introspected the node later on.

(undercloud) [cloud-user@undercloud ~]$ openstack overcloud node introspect --all-manageable --provide
Waiting for introspection to finish...
Waiting for messages on queue 'tripleo' with no timeout.
Introspection of node 0256d108-ef6d-423f-adf7-78b63e8c795f failed.
Introspection of node 43ac4f9d-0979-4cbf-b63c-3e0b683b0aaa completed. Status:SUCCESS. Errors:None
Introspection of node ba2fdc05-2fa9-41c4-b55e-19f0472da6a6 completed. Status:SUCCESS. Errors:None
Introspection of node 56f75e79-ee1a-4a8c-a1af-e8672712b309 completed. Status:SUCCESS. Errors:None
Successfully introspected 4 node(s).

I think what is happening here is:
1. introspection starts on node
2. call to get introspection status fails due to some interaction between ironic-inspector/haproxy/mistral, causing workflow to fail
3. retry logic checks introspection status, which is now successful, so no retry attempted and no message sent saying it is completed

Proposed fixes:
tripleo-common: Clarify introspection failure message is a failed attempt, not an absolute failure
python-tripleoclient: Print final introspected_nodes and failed_introspection payloads at the end of introspection
tripleo-puppet: Fix haproxy occasionally disconnecting requests for ironic-inspector

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.opendev.org/696630

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.opendev.org/696633

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.opendev.org/696633
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=71abeb12c4d8106693f0ade7e1252a00cee0ed56
Submitter: Zuul
Branch: master

commit 71abeb12c4d8106693f0ade7e1252a00cee0ed56
Author: Steve Baker <email address hidden>
Date: Fri Nov 29 01:01:41 2019 +0000

    Remove haproxy ironic-inspector http-check workaround

    When werkzeug was used as the WSGI service for ironic-inspector, it
    wrote stack traces to the logs whenever haproxy did a http check; this
    listen_options line was the workaround for that. Reverting that change
    is done for the following reasons:

    - since ironic-inspector now uses oslo-service, the log stack traces
      are no longer written
    - setting listen_options overrides the default 'option httplog', which is
      making diagnosing bug #1854399 harder
    - this http-check override may well be the root cause of bug #1854399
      (any non-200 response will result in other connections to
      ironic-inspector being disconnected?)

    Change-Id: I5c397d31650b248660a39e028c98c779871d07ba
    Partial-Bug: #1854399
    Related-Bug: #1691971

Changed in tripleo:
milestone: ussuri-1 → ussuri-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/696630
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=7e1f2ce788a3d569f24b061f768b87db1e1683bb
Submitter: Zuul
Branch: master

commit 7e1f2ce788a3d569f24b061f768b87db1e1683bb
Author: Steve Baker <email address hidden>
Date: Fri Nov 29 11:20:38 2019 +1300

    Clarify introspection failed attempt log message

    This change clarifies that an introspect attempt failed, implying that
    a future attempt might succeed.

    This is part of a fix for transient failures when calling
    ironic-inspector leading to logging indicating introspection failed
    when it actually succeeded.

    The layout of the message is also changed slightly to make Kibana
    searches for failures easier.

    Change-Id: Icef88c3ae33fe4516b3cadd64240cc882434e690
    Partial-Bug: #1854399

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/704340

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/704341

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/704340
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=8b1b9dbd20709e225a0d3556e57a79e2f0bc6984
Submitter: Zuul
Branch: stable/train

commit 8b1b9dbd20709e225a0d3556e57a79e2f0bc6984
Author: Steve Baker <email address hidden>
Date: Fri Nov 29 11:20:38 2019 +1300

    Clarify introspection failed attempt log message

    This change clarifies that an introspect attempt failed, implying that
    a future attempt might succeed.

    This is part of a fix for transient failures when calling
    ironic-inspector leading to logging indicating introspection failed
    when it actually succeeded.

    The layout of the message is also changed slightly to make Kibana
    searches for failures easier.

    Change-Id: Icef88c3ae33fe4516b3cadd64240cc882434e690
    Partial-Bug: #1854399
    (cherry picked from commit 7e1f2ce788a3d569f24b061f768b87db1e1683bb)

tags: added: in-stable-train
tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/704341
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=3c0865de9ce5d338e95c2802b7152eb9be17bf42
Submitter: Zuul
Branch: stable/stein

commit 3c0865de9ce5d338e95c2802b7152eb9be17bf42
Author: Steve Baker <email address hidden>
Date: Fri Nov 29 11:20:38 2019 +1300

    Clarify introspection failed attempt log message

    This change clarifies that an introspect attempt failed, implying that
    a future attempt might succeed.

    This is part of a fix for transient failures when calling
    ironic-inspector leading to logging indicating introspection failed
    when it actually succeeded.

    The layout of the message is also changed slightly to make Kibana
    searches for failures easier.

    Change-Id: Icef88c3ae33fe4516b3cadd64240cc882434e690
    Partial-Bug: #1854399
    (cherry picked from commit 7e1f2ce788a3d569f24b061f768b87db1e1683bb)

wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.