N3000 reset fails during migration of AIO-SX upgrade

Bug #2055049 reported by Joshua Kraitberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Joshua Kraitberg

Bug Description

Brief Description
-----------------
When N3000 is used by an app, an N3000 reset will be triggered during upgrade.

If the system is AIO-SX, the migration will fails because a certain image is required, but image are not available at this point in the upgrade.

Severity
--------
Critical

Steps to Reproduce
------------------
* Run N3000/FEC app
* Upgrade stx6 to stx8

Expected Behavior
------------------
Pass

Actual Behavior
----------------
Fail

Reproducibility
---------------
100% on certain systems/configs, 0% otherwise

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
stx8 MR2

Last Pass
---------
N/A

Timestamp/Logs
--------------
root@controller-0:/var/log/puppet/latest# grep Error puppet.log
2024-02-12T23:42:53.296 Notice: 2024-02-12 23:42:53 +0000 /Stage[main]/Platform::Devices::Fpga::N3000::Reset/Exec[Reset n3000 fpgas]/returns: sysinv 2024-02-12 23:42:52.137 19260 CRITICAL reset-n3000-fpgas [-] Unhandled error: subprocess.CalledProcessError: Command '['ctr', '-n=k8s.io', 'image', 'list', 'name==registry.local:9001/docker.io/starlingx/n3000-opae:stx.8.0-v1.0.2']' returned non-zero exit status 1.
2024-02-12T23:42:53.321 Notice: 2024-02-12 23:42:53 +0000 /Stage[main]/Platform::Devices::Fpga::N3000::Reset/Exec[Reset n3000 fpgas]/returns: 2024-02-12 23:42:52.137 19260 ERROR reset-n3000-fpgas raise CalledProcessError(retcode, process.args,
2024-02-12T23:42:53.323 Notice: 2024-02-12 23:42:53 +0000 /Stage[main]/Platform::Devices::Fpga::N3000::Reset/Exec[Reset n3000 fpgas]/returns: 2024-02-12 23:42:52.137 19260 ERROR reset-n3000-fpgas subprocess.CalledProcessError: Command '['ctr', '-n=k8s.io', 'image', 'list', 'name==registry.local:9001/docker.io/starlingx/n3000-opae:stx.8.0-v1.0.2']' returned non-zero exit status 1.
2024-02-12T23:42:53.326 Error: 2024-02-12 23:42:53 +0000 'sysinv-reset-n3000-fpgas' returned 1 instead of one of [0]
2024-02-12T23:42:53.426 Error: 2024-02-12 23:42:53 +0000 /Stage[main]/Platform::Devices::Fpga::N3000::Reset/Exec[Reset n3000 fpgas]/returns: change from 'notrun' to ['0'] failed: 'sysinv-reset-n3000-fpgas' returned 1 instead of one of [0]

Test Activity
-------------
Evaluation

Workaround
----------
Uninstall N3000/FEC app before upgrade

Changed in starlingx:
assignee: nobody → Joshua Kraitberg (jkraitbe-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/910236
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/e3257ebfbab2f16ad4f9f32e81573564ef1a9695
Submitter: "Zuul (22348)"
Branch: master

commit e3257ebfbab2f16ad4f9f32e81573564ef1a9695
Author: Joshua Kraitberg <email address hidden>
Date: Mon Feb 26 10:03:05 2024 -0500

    Skip N3000 reset during AIO-SX upgrade migration

    When N3000 is used by an app, an N3000 reset will be triggered
    during upgrade. If the system is AIO-SX, the migration will fail
    because a certain image is required, because image are not available
    at this point in the upgrade.

    The N3000 reset can be skipped by created a volatile flag.
    This flag is cleared on reboot so the N3000 reset will be
    executed after unlock.

    TEST PLAN
    * AIO-SX optimized upgrade, stx6 to stx8, with FEC app
    * AIO-SX optimized upgrade, stx6 to stx8, without FEC app

    Closes-Bug: 2055049
    Change-Id: Ia026cc6ed05bd3183555b73eeadc84e3ab180ffa
    Signed-off-by: Joshua Kraitberg <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.