Unable to install a subcloud due to VLAN networking failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Kyle MacLeod |
Bug Description
Brief Description
When trying to install a subcloud in 22.12, we observe that ostree pull fails because of network issues.
NOTE: This is an IPv4 network
Severity
<Critical: System/Feature is not usable after the defect>
Steps to Reproduce
Install the system controller, with oam and mgmt both on different vlans on the same physical pxeboot interface. The subclouds should have the same configuration.
The system consists of 2 system controllers and a worker, and one subcloud (simplex) which has a similar networking configuration as the controllers.
Now try to deploy the subcloud. It fails to install at the ostree step.
Expected Behavior
Subcloud should be deployed
Actual Behavior
Subcloud fails to deploy.
Reproducibility
100%
System Configuration
2 system controllers, 1 worker
1 simplex subcloud
OAM, MGMT all on the pxeboot interface via vlan
TOR switch, with vlans configured on them.
Load info (eg: 2022-03-
starlingx master
Last Pass
Timestamp/Logs
Provide a snippet of logs if available and the timestamp when issue was seen.
Please indicate the unique identifier in the logs to highlight the problem
Attach the logs for debugging
Alarms
Please indicate if there are any alarms observed.
If there are any alarms please list them here
Workaround
The root cause is a bug in miniboot.cfg, which is incorrectly setting the default route.
We can fix this by overriding the miniboot.cfg file used during remote installation.
Steps
The following steps are done on the active system controller.
1. Copy miniboot.cfg into /var/miniboot/
1a) If /var/www/
sudo mkdir /var/miniboot/
sudo cp /var/www/
OR
1b) If /var/www/
Assuming you have already done a load-import, you can extract miniboot.cfg from the ISO as follows.
sudo mkdir /mnt/iso
sudo mount -o loop /opt/dc-
sudo cp /mnt/iso/
sudo umount /mnt/iso
sudo rmdir /mnt/iso
2. Edit the file to change the default route setting
Run the following command, which replaces the 'mgmt_dev' to 'mgmt_iface' on lines 1589 and 1590:
sudo sed -i.orig '1589,+1s|dev ${mgmt_dev}|dev ${mgmt_iface}|' /var/miniboot/
The original file can be compared with the new file to ensure that it was successful. You should see the following:
[sysadmin@
1589,1590c1589,1590
< ilog "ip ${BOOTPARAM_IP_VER} route add default ${BOOTPARAM_
< ip ${BOOTPARAM_IP_VER} route add default ${BOOTPARAM_
---
> ilog "ip ${BOOTPARAM_IP_VER} route add default ${BOOTPARAM_
> ip ${BOOTPARAM_IP_VER} route add default ${BOOTPARAM_
3. Run the remote installation
tags: | added: stx.9.0 stx.me |
tags: |
added: stx.metal removed: stx.me |
Changed in starlingx: | |
importance: | Undecided → High |
assignee: | nobody → Kyle MacLeod (kmacleod) |
Fix proposed to branch: master /review. opendev. org/c/starlingx /metal/ +/879068
Review: https:/