Backup & Restore: Network range checking is not enforced during initial bootstrap

Bug #1845215 reported by Senthil Mukundakumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
David Sullivan

Bug Description

Brief Description
-----------------
 In default lab setup, both management and multicast-subnet networks are configured as /28 subnet and the initial bootstrap will pass (I think it is because these subnets were not defined in the override file). When backing up the system, we generate following two entries in the override file (e.g. localhost.yml). But when using this override file to restore the platform, address validation done by the bootstrap will fail due to this check:

     elif (("{{ network }}" == 'pxeboot' or "{{ network }}" == 'multicast' or "{{ network }}" == 'management') and
                range.size < {{ min_16_addresses|int }}):
            raise Exception("Failed validation, {{ network }} address range must contain at least %d addresses." %
                            int("{{ min_16_addresses }}"))

management_subnet: 192.168.204.0/28
management_multicast_subnet: 239.1.1.0/28

if /28 subnet is not allowed for pxeboot, management and multicast, It should be checked in initial bootstrap.

Also in the “system addrpool-list” output, The IP ranges for cluster-service-subnet is ['10.96.0.1-10.96.0.1']. This will set cluster_sevice_start_address and cluster_sevice_end_address to 10.96.0.1. During restore, address validation done by the bootstrap will fail due to:

        if not start < end:
            raise Exception("Failed validation, {{ network }} start address must be less than end address.")

or
       if (("{{ network }}" == 'cluster_pod' or "{{ network }}" == 'cluster_service') and
              range.size < {{ min_pod_service_num_addresses|int }}):
            raise Exception("Failed validation, {{ network }} address range must contain at least %d addresses." %

cluster_service_subnet: 10.96.0.0/12
cluster_sevice_start_address: 10.96.0.1
cluster_service_end_address: 10.96.0.1

Here is the output of “system addrpool-list”

[sysadmin@controller-0 ~(keystone_admin)]$ system addrpool-list --nowrap
+--------------------------------------+------------------------+---------------+--------+--------+-----------------------------------+------------------+---------------------+---------------------+-----------------+
| uuid | name | network | prefix | order | ranges | floating_address | controller0_address | controller1_address | gateway_address |
+--------------------------------------+------------------------+---------------+--------+--------+-----------------------------------+------------------+---------------------+---------------------+-----------------+
| f1ab47a6-c25f-457f-a0a4-f5825b175188 | cluster-host-subnet | 192.168.206.0 | 24 | random | ['192.168.206.2-192.168.206.254'] | 192.168.206.2 | 192.168.206.3 | 192.168.206.4 | None |
| 73fbf749-6f9d-4ebd-aec2-2ebefdbd97d5 | cluster-pod-subnet | 172.16.0.0 | 16 | random | ['172.16.0.1-172.16.255.254'] | None | None | None | None |
| 57f12119-51d8-408f-9f90-840c9b52cace | cluster-service-subnet | 10.96.0.0 | 12 | random | ['10.96.0.1-10.96.0.1'] | None | None | None | None |
| b54ac8b2-b90b-46d7-a287-ec8cc509c9ac | management | 192.168.204.0 | 28 | random | ['192.168.204.2-192.168.204.14'] | 192.168.204.2 | 192.168.204.3 | 192.168.204.4 | None |
| d076ff59-8001-486c-9987-b75d1616736f | multicast-subnet | 239.1.1.0 | 28 | random | ['239.1.1.1-239.1.1.14'] | None | None | None | None |
| a210f34e-f7ac-41b5-8d3a-f99ba9cfb70a | oam | 128.224.150.0 | 23 | random | ['128.224.150.1-128.224.151.254'] | 128.224.150.81 | None | None | 128.224.150.1 |
| 7491a66e-205f-4255-b2a3-53f97bf7dfec | pxeboot | 169.254.202.0 | 24 | random | ['169.254.202.2-169.254.202.254'] | 169.254.202.2 | 169.254.202.3 | 169.254.202.4 | None |
+--------------------------------------+------------------------+---------------+--------+--------+-----------------------------------+------------------+---------------------+---------------------+-----------------+

Severity
--------

Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
1. Create an environment for ansible remote host
2. Bring up the AIO-DX system
3. Backup the system using ansible remotely
4. Re-install the controller with the same load
5. Restore the system using ansible remotely.
6. Unlock the active controller
7. Power on and PXE boot controller-1. Ceph OSDs on controller-1 will remain intact. Unlock controller-1

Expected Behavior
------------------
The active controller is successfully restored

Actual Behavior
----------------
Active controller failed to restore

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any IPV4 system

Branch/Pull Time/Commit
-----------------------
 BUILD_ID="2019-09-12_20-00-00"

Test Activity
-------------
Feature Testing

Revision history for this message
Frank Miller (sensfan22) wrote :

The next step is to confirm if /28 is default for VBox and /24 for hardware.

Revision history for this message
Wei Zhou (wzhou007) wrote :

Please also verify if /16 is the valid default for cluster_pod_subset.

When having following config for cluster_pod_subset:

cluster_pod_subnet: 172.16.0.0/16
cluster_pod_start_address: 172.16.0.1
cluster_pod_end_address: 172.16.255.254

Got this error:

TASK [bootstrap/validate-config : Validate cluster_pod start and end address format] ***********************************************************************************************
ok: [localhost] => {
    "msg": "cluster_pod: 172.16.0.1 172.16.255.254"
}

TASK [bootstrap/validate-config : Validate cluster_pod start and end range] ********************************************************************************************************
changed: [localhost]

TASK [bootstrap/validate-config : Fail if address range did not meet required criteria] ********************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Exception: Failed validation, cluster_pod address range must contain at least 65536 addresses."}

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 gating / found during B&R, but not related to the feature itself.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Tee Ngo (teewrs)
tags: added: stx.3.0 stx.config
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Wei Zhou (wzhou007) wrote :

Although this is a medium priority LP, without a fix which is to properly set the default values for all the networks, restore will fail due to the network validation enforced by bootstrap during platform restore.

Frank Miller (sensfan22)
Changed in starlingx:
assignee: Tee Ngo (teewrs) → David Sullivan (dsullivanwr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/686506

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/686506
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=339070babbd4ad5c278e1b5bd5af965eed68379c
Submitter: Zuul
Branch: master

commit 339070babbd4ad5c278e1b5bd5af965eed68379c
Author: David Sullivan <email address hidden>
Date: Thu Oct 10 16:39:36 2019 -0400

    Accurately enforce minimum network ranges

    The minimum network range (when a start/end was specified) was
    incorrectly set to the the size of the minimum network subnet. The
    minimum network ranges should be minimum subnet size - 2.

    Also move the subnet size checks into validate_address_range.yml for
    consistency.

    Change-Id: I1665a0dd67d5e23e43e658e8e6c9eae1a1068b26
    Signed-off-by: David Sullivan <email address hidden>
    Closes-Bug: 1845215

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Senthil Mukundakumar (smukunda) wrote :

Verified using build 2019-10-20_20-00-00

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.