Subcloud with admin network doesn't go online due to missing rule in L3 firewall

Bug #2027827 reported by Andre Kantek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Andre Kantek

Bug Description

Brief Description

When installing a subcloud using the admin network, the subcloud install completes, routes are added as expected, but the subcloud and subcloud is online. But after sometime (20min or so), the subcloud goes offline.

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list
+----+-----------+------------+--------------+---------------+-------------+---------------+-----------------+
| id | name | management | availability | deploy status | sync | backup status | backup datetime |
+----+-----------+------------+--------------+---------------+-------------+---------------+-----------------+
| 2 | subcloud5 | managed | offline | complete | unknown | None | None |
| 3 | subcloud6 | managed | offline | complete | unknown | None | None |
Initial investigation points that there is a missing rule/config related to L3 firewall. Even though root cause is related L3 firewall, this issue is also impacting TCPG-1093 as it prevents installation of subclouds with admin network.

Severity

<Major: System/Feature is usable but degraded>

Steps to Reproduce

1 - Install subcloud using admin network (comment out mgmt network gateway in the subcloud bootstrap file before running "dcmanager subcloud add")

2 - Wait subcloud to complete installation

Expected Behavior

Subcloud is completed and online

Actual Behavior

Subcloud install is completed but status is offline

Reproducibility

Reproducible

System Configuration

WRCP-DC3-1 SC1 (AIO-SX subcloud)

system controller: 2620:10a:a001:d41::1008

Official lab files used.

Load info (eg: 2022-03-10_20-00-07)

BUILD_ID="2023-07-05_18-00-07"

Last Pass

Tried and worked on master load from may 25th

Alarms

NA

Test Activity

Workaround

The workaround is to add an extra global network policy to allow the admin network and the link-local network for ICMPv6

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: workaround-admin-if-gnp-icmpv6-extra
spec:
  applyOnForward: true
  ingress:
  - action: Allow
    ipVersion: 6
    metadata:
      annotations:
        name: stx-ingr-workaround-subcloud-icmpv66
    protocol: ICMPv6
    source:
      nets:
      - fd00:8:24::/64 # <=== adjust to the subcloud admin network address
      - fe80::/64
  order: 100
  selector: has(nodetype) && nodetype == 'controller' && has(iftype) && iftype contains 'admin'
  types:
  - Ingress

Andre Kantek (akantek)
Changed in starlingx:
assignee: nobody → Andre Kantek (akantek)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Related to recent code changes for: https://storyboard.openstack.org/#!/story/2010591

Changed in starlingx:
importance: Undecided → High
tags: added: stx.9.0 stx.networking
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.