StarlingX

NFV: Failed openstack API calls being quietly ignored in python3

Bug #2007285 reported by Al Bailey on 2023-02-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Al Bailey

Bug Description

Brief Description
-----------------
For NFV (running python3) when it encounters a OpenStackRestAPI exception, it is being quietly ignored and eventually reports a timeout.

This affects traceability and error handling for NFV orchestration activities.

Easiest way to reproduce the issue is to attempt a kube upgrade to an 'older' version.

Severity
--------
Major

Steps to Reproduce
------------------
# assume already at a later version than v1.21.8
source /etc/platform/openrc
sw-manager kube-upgrade-strategy create --to-version v1.21.8
sw-manager kube-upgrade-strategy apply

Expected Behavior
------------------
It should quickly report a failure
sw-manager kube-upgrade-strategy show
Strategy Kubernetes Upgrade Strategy:
  strategy-uuid: e30abd1f-96ab-49d8-8ee0-b551df41adfd
  controller-apply-type: serial
  storage-apply-type: serial
  worker-apply-type: serial
  default-instance-action: stop-start
  alarm-restrictions: strict
  current-phase: abort
  current-phase-completion: 100%
  state: aborted
  apply-result: failed
  apply-reason: the installed kubernetes version v1.24.4 cannot upgrade to version v1.21.8
  abort-result: success
  abort-reason:

Actual Behavior
----------------
It takes a couple of minutes and reports a failure due to 'timed out'

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
Any load running python3

Last Pass
---------
Any load running python2

Timestamp/Logs
--------------
Logs are pointless for this activity. The underlying component that causes the problem quietly discards the action. I spent weeks until I finally found where the 'stall' was occurring.

Test Activity
-------------
Feature Testing

Workaround
----------
None

Tags:

Al Bailey (albailey1974) on 2023-02-14

Changed in starlingx:
assignee:	nobody → Al Bailey (albailey1974)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-02-14: Fix proposed to nfv (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/nfv/+/873724

Changed in starlingx:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-02-14: Fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/873724
Committed: https://opendev.org/starlingx/nfv/commit/94321e9d571922a453917e80cebd0835b9bf7e40
Submitter: "Zuul (22348)"
Branch: master

commit 94321e9d571922a453917e80cebd0835b9bf7e40
Author: Al Bailey <email address hidden>
Date: Tue Feb 14 15:21:13 2023 +0000

Debian: python3 fix for OpenStackRestAPIExceptions

    When the NFV uses tasks and futures and coroutines to
    interact with openstack APIs, an OpenStackRestAPIException
    can be returned as a task result.

The exception needs to be 'pickled' when sent across the
queue/socket for the 'simulated' asyncio workflow.

    However, the pickle code for that exception was broken in
    python3. It was relying on a python2 'message' attribute
    of the base Exception class to exist, which no longer
    exists (in python3)

    This was causing the pickle command to quietly fail and
    the code waiting for the task result would timeout and
    not report back the failure information.

The fix is to ensure that there is a 'message' property
on that exception type.

    Unit tests have been added for all the pickleable
    exceptions, to ensure their '__reduce__' and other
    interactions with 'pickle' are not reporting any failures.

    Test Plan:
     PASS: create and apply a kube-upgrade-strategy for an
     older version of kubernetes and observe it reports its
    failure error (rather than a timeout)

    Closes-Bug: #2007285
    Signed-off-by: Al Bailey <email address hidden>
    Change-Id: I3a8776163a78330810ae1097ddd1831b1b26a212

Changed in starlingx:
status:	In Progress → Fix Released

Ghada Khalil (gkhalil) on 2023-02-15

Changed in starlingx:
importance:	Undecided → Medium
tags:	added: stx.9.0 stx.debian stx.nfv

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.