B&R: Optimized restore more networking related issues

Bug #1999585 reported by Joshua Kraitberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Joshua Kraitberg

Bug Description

Brief Description
-----------------
There were three new issues observed.

It is possible for IPv6 to have multiple default routes.
The existing regex was created for one or none. Replaced
regex with grep to handle none, one, or multiple entries.

e.g.
$ ip -6 route show default
default via ffff::1234 dev enp22s0f1 proto ra metric 1024 expires 1798sec hoplimit 64 pref medium
default via ffff::8888 dev enp22s0f1 proto ra metric 1024 expires 1642sec hoplimit 64 pref medium

dcmanager can timeout if networking takes too long to restart.
The gateway interfaces will no longer be reset. This way the interface
SSH should be on will not be modified. It is then assumed that
since SSH is already working the interface that it's on is sufficient.

e.g.
TASK [subcloud-bnr/restore : Run subcloud222 platform restore playbook] **********
Tuesday 13 December 2022 29:00:91 +0000 (0:00:00.493) 0:00:09.349 ******
fatal: [subcloud222]: FAILED! =>
  msg: 'Failed to connect to the host via ssh: ssh: connect to host ffff::1234 port 22: No route to host'

When data is off box registry restore was being triggered by dcmanager even
when not requested. This has been fixed via updating condition.

Severity
--------
Any

Steps to Reproduce
------------------
Run optimized restore

Expected Behavior
------------------
Pass

Actual Behavior
----------------
Fail

Reproducibility
---------------
N/A

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Developer Testing

Workaround
----------
N/A

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/867580
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/b202752c4d0c9e48ee86f42d93235518cda1fdd5
Submitter: "Zuul (22348)"
Branch: master

commit b202752c4d0c9e48ee86f42d93235518cda1fdd5
Author: Joshua Kraitberg <email address hidden>
Date: Tue Dec 13 16:28:06 2022 -0500

    Rework bring up networking during optimized restore

    There were three new issues observed.

    It is possible for IPv6 to have multiple default routes.
    The existing regex was created for one or none. Replaced
    regex with grep to handle none, one, or multiple entries.

    e.g.
    $ ip -6 route show default
    default via ffff::1234 dev enp22s0f1 proto ra metric 1024 expires 1798sec hoplimit 64 pref medium
    default via ffff::8888 dev enp22s0f1 proto ra metric 1024 expires 1642sec hoplimit 64 pref medium

    dcmanager could timeout if networking takes too long to restart.
    The gateway interfaces will no longer be reset. This way the interface
    SSH should be on will not be modified. It is then assumed that
    since SSH is already working the interface that it's on is sufficient.

    e.g.
    TASK [subcloud-bnr/restore : Run subcloud222 platform restore playbook] **********
    Tuesday 13 December 2022 29:00:91 +0000 (0:00:00.493) 0:00:09.349 ******
    fatal: [subcloud222]: FAILED! =>
      msg: 'Failed to connect to the host via ssh: ssh: connect to host ffff::1234 port 22: No route to host'

    When data is off box registry restore was being triggered by dcmanager even
    when not requested. This has been fixed via updating condition.

    TEST PLAN
    PASS: Optimize restore is successful (AIO-SX)
      * Test on affected and unaffected systems

    Closes-Bug: 1999585
    Signed-off-by: Joshua Kraitberg <email address hidden>
    Change-Id: I029a7e1d01a3463708b873a66341870b810d59bb

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.8.0 stx.update
Changed in starlingx:
assignee: nobody → Joshua Kraitberg (jkraitbe-wr)
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.