Controller-1 reboot loop due to mismatched volume group sizes

Bug #1797108 reported by Stefan Dinescu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Stefan Dinescu

Bug Description

Brief Description
-----------------
When installing a system with two controller nodes, the standby controller goes into a reboot loop when unlocked for the first time if changes to filesystem sizes were done on the active controller that are above the provisioned space of the standby controller.

Severity
--------
Major

Steps to Reproduce
------------------
- begin installing a system with two controllers (two-node system or multi-node system)
- controller-0 should have an additional disk to assign additional space to cgts-vg
- after unlocking controller-0, create partition and assign it to cgts-vg (the partition should be large enough so that the used space on controller-0 is above the default provisioned volume group on controller-1)
- resize the filesystems so that it uses almost all the available space in the volume group
- install controller-1 and leave the default partition/volume-group configuration as is
- unlock controller-1

Expected Behavior
------------------
- the unlock should be rejected as controller-1 doesn't have enough space to fit all the filesystems

Actual Behavior
----------------
- the unlock is accepted, the puppet manifests fail and controller-1 goes into a reboot loop

Reproducibility
---------------
<Reproducible/Intermittent>
Reproducible

System Configuration
--------------------
Two node system, Multi-node system, Dedicated storage

Branch/Pull Time/Commit
-----------------------
Issue seen on loads built Oct 1

Timestamp/Logs
--------------

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 as this is a configuration issue. There is no issue if the user enters a valid configuration. This bug will be used to add semantic checks to prevent this from occurring. Not required for stx.2018.10

tags: added: stx.2019.03
Changed in starlingx:
assignee: nobody → Stefan Dinescu (stefandinescu)
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/609654

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/609654
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=604b4a5ea0c2f8fa78f204032acfbf0a72f8a50c
Submitter: Zuul
Branch: master

commit 604b4a5ea0c2f8fa78f204032acfbf0a72f8a50c
Author: Stefan Dinescu <email address hidden>
Date: Thu Oct 11 12:43:55 2018 +0300

    Standby controller filesystem sizes check

    While installing a system with two controllers, you can assign new
    PVs to the cgts-vg volume group and resize the filesystem on
    controller-0, before provisioning controller-1.

    If these new sizes are above the default provisioned space for
    cgts-vg on controller-1, the unlock is allowed, but the node
    goes into a reboot loop due to not having enough space to
    assign to all the partitions.

    Now, before unlocking the standby controller for the first time
    we check if the provisioned space on the node is equal or larger
    than the used space on the active controller and if it is not
    reject the unlock.

    Change-Id: I3fce3430abbb81d08272f35915cc50c761754733
    Closes-bug: 1797108
    Signed-off-by: Stefan Dinescu <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.