controller-0 fails after lock/unlock when configured to pxeboot from an unexpected interface

Bug #2045656 reported by Andre Kantek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andre Kantek

Bug Description

Brief Description

Auto installing AIO-DX IPv6 lab succeeds but after swacting to controller-1 and locking/unlocking controller-0 we see controller-0 fail due to the resolve.conf file changing from

sysadmin@controller-0:~$ cat /etc/resolv.conf
nameserver abcd:204::1
nameserver 2620:10a:a001:a103::2
nameserver 2001:4860:4860::8888
to this over the reboot

sysadmin@controller-0:~$ cat /etc/resolv.conf
nameserver 128.224.144.130
nameserver 147.11.57.128
Stat'ing the resolve.conf we see it was freshly created at 2023-11-24 01:32:26

sysadmin@controller-0:~$ stat / etc/resolv.conf
history File: /etc/resolv.conf
Size: 82 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 657089 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-11-24 01:32:26.169999371 +0000
Modify: 2023-11-24 01:32:26.169999371 +0000
Change: 2023-11-24 01:32:26.196999369 +0000
 Birth: 2023-11-24 01:32:26.169999371 +0000

Several loads have been tried including the latest master branch. This issue seems like a lab infrastructure networking issue.

Severity

Major

Steps to Reproduce

install lab
system host-swact controller-0
on controller-1 now
system host-lock controller-0
system host-unlock controller-0
Expected Behavior

Controller-0 unkocks fine

Actual Behavior

Controller-0 gets a configuration error due to inability to mount /opt/platform due to dns name resolution failure due to changed and now invalid /etc/resolv.conf

Reproducibility

100% reproducible

System Configuration

AIO-DX with IPv6 install

Load info (eg: 2022-03-10_20-00-07)

Any load

Last Pass

Yes but unknown when it started to fail.

Timestamp/Logs

See above

Alarms

N/A

Test Activity

Feature Testing

Workaround

None

Andre Kantek (akantek)
Changed in starlingx:
assignee: nobody → Andre Kantek (akantek)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/902637
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/2996cb456eb5b3dd05cf3ae19712002947e7915e
Submitter: "Zuul (22348)"
Branch: master

commit 2996cb456eb5b3dd05cf3ae19712002947e7915e
Author: Andre Kantek <email address hidden>
Date: Mon Dec 4 15:30:34 2023 -0300

    Remove auto from network config left from pxeboot, if DHCP is used

    Pxeboot can leave an interface script in controller-0 with an IPv4
    dhcp config that can remain if the pxeboot interface isn't part of
    StarlingX managed interfaces. This will start a "rogue" dhclient
    after unlock (controllers use static addressing from DB).

    If a response occurs the file /etc/resolv.conf may be overwritten
    in controller-0. If the system is IPv6, controller-0 will lose
    the valid IPv6 nameservers

    This change searches the config files in /etc/network/interfaces.d/ at
    bootstrap's "prepare-env" phase and removes the auto flag from the
    file if is setting the interface to use dhcp (IPv4 or IPv6).

    Test Plan:
    [PASS] Install AIO-SX (a config file was planted before the bootstrap
           on an interface not used by starlingX), done in IPv4 and IPv6.
    [PASS] Install STANDARD (a config file was planted before the
           bootstrap on an interface not used by starlingX), done in IPv4.

    Closes-Bug: 2045656

    Change-Id: Id6261e2d868939b50c48c68a9e29f9ef80c8a6f0
    Signed-off-by: Andre Kantek <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.networking
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.