Subcloud installation is failing for inactive load for which previously worked

Bug #2066411 reported by Kyle MacLeod
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kyle MacLeod

Bug Description

Brief Description

Subcloud installation is failing for inactive load for which previously worked

Severity

Major

Steps to Reproduce

1) Import the inactive load

[sysadmin@controller-0 ~(keystone_admin)]$ system load-list
--{}------{}++{}----------------
id state software_version

--{}------{}++{}----------------
1 active 24.03
8 inactive 22.12

--{}------{}++{}----------------

2) Install the subcloud for inactive load

dcmanager --os-endpoint-type internalURL subcloud add --bootstrap-address 2620:10a:a001:d41::211 --bootstrap-values /home/sysadmin/subcloud-3/subcloud3_ipv6-bootstrap-values.yaml --deploy-config /home/sysadmin/subcloud-3/subcloud3-deploy-standard.yaml --sysadmin-password Li69nux* --bmc-password administrator --install-values /home/sysadmin/subcloud-3/subcloud3-install-values.yaml --release 22.12

3) Validate the subcloud status

[sysadmin@controller-0 ~(keystone_admin)]$ dcmanager subcloud list

--{}-------{}+{}--------{}{}----------{}{}-----------{}{}---------{}{}-----------{}{}---------------
id name management availability deploy status sync backup status prestage status

--{}-------{}+{}--------{}{}----------{}{}-----------{}{}---------{}{}-----------{}{}---------------
  1 subcloud1 managed online complete in-sync None None
  3 subcloud3 managed online complete out-of-sync None None

--{}-------{}+{}--------{}{}----------{}{}-----------{}{}---------{}{}-----------{}{}---------------

4) Delete the load and import the un-upgradable loads

5) Delete the subcloud

7) Load the upgradable load which was used in step1

8) Install the subcloud

dcmanager --os-endpoint-type internalURL subcloud add --bootstrap-address 2620:10a:a001:d41::211 --bootstrap-values /home/sysadmin/subcloud-3/subcloud3_ipv6-bootstrap-values.yaml --deploy-config /home/sysadmin/subcloud-3/subcloud3-deploy-standard.yaml --sysadmin-password Li69nux* --bmc-password administrator --install-values /home/sysadmin/subcloud-3/subcloud3-install-values.yaml --release 22.12

9) Subcloud fails at pre-install
 7 subcloud3 unmanaged offline pre-install-failed unknown None None

Expected Behavior

Subcloud should be installed successfully with inactive load

Actual Behavior

Subcloud fails to installed

Reproducibility

Yes

Load info (eg: 2022-03-10_20-00-07)

SW_VERSION="24.03"
BUILD_ID="2024-05-06_19-00-07"
BUILD_DATE="2024-05-06 23:00:07 +0000"

Last Pass

Yes

Timestamp/Logs

2024-05-09 17:33:47.861 44587 INFO dccommon.subcloud_install [-] Prepare for subcloud3 remote install
2024-05-09 17:33:47.861 44587 INFO dccommon.subcloud_install [-] Mounting ostree_repo at /opt/platform/iso/22.12/ostree_repo
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager [-] Command '['mount', '--bind', '/var/www/pages/feed/rel-22.12/ostree_repo', '/opt/platform/iso/22.12/ostree_repo']' returned non-zero exit status 32.: subprocess.CalledProcessError: Command '['mount', '--bind', '/var/www/pages/feed/rel-22.12/ostree_repo', '/opt/platform/iso/22.12/ostree_repo']' returned non-zero exit status 32.
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager Traceback (most recent call last):
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager File "/usr/lib/python3/dist-packages/dcmanager/manager/subcloud_manager.py", line 2289, in _run_subcloud_install
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager install.prep(dccommon_consts.ANSIBLE_OVERRIDES_PATH, payload)
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager File "/usr/lib/python3/dist-packages/dccommon/subcloud_install.py", line 544, in prep
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager self.check_ostree_mount(feed_path_rel_version)
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager File "/usr/lib/python3/dist-packages/dccommon/subcloud_install.py", line 488, in check_ostree_mount
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager self._do_ostree_mount(ostree_mount_dir, check_path, source_path)
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 360, in inner
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager return f(*args, **kwargs)
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager File "/usr/lib/python3/dist-packages/dccommon/subcloud_install.py", line 499, in _do_ostree_mount
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager subprocess.check_call( # pylint: disable=not-callable
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager raise CalledProcessError(retcode, cmd)
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager subprocess.CalledProcessError: Command '['mount', '--bind', '/var/www/pages/feed/rel-22.12/ostree_repo', '/opt/platform/iso/22.12/ostree_repo']' returned non-zero exit status 32.
2024-05-09 17:33:47.876 44587 ERROR dcmanager.manager.subcloud_manager
2024-05-09 17:33:47.887 44587 INFO dccommon.subcloud_install [-] Running install cleanup: subcloud3

Alarms

Test Activity

Feature Testing

Workaround

Manually clean-up and remove the original bind mount at /var/www/pages/iso/24.03/ostree_repo

Kyle MacLeod (kmacleod)
Changed in starlingx:
assignee: nobody → Kyle MacLeod (kmacleod)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/920194

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/920194
Committed: https://opendev.org/starlingx/distcloud/commit/a60ce81f2600dcefe988a92a1c92e450a7e0ba7f
Submitter: "Zuul (22348)"
Branch: master

commit a60ce81f2600dcefe988a92a1c92e450a7e0ba7f
Author: Kyle MacLeod <email address hidden>
Date: Wed May 22 12:13:14 2024 -0400

    Fix stale or missing ostree_repo bind mount in iso dir

    This commit addresses the odd cases where the
    /var/www/pages/iso/<rel>/ostree_repo bind mount
    becomes missing or stale.

    We add detection of missing content, and also detect a stale bind mount.

    A stale bind mount is detected by comparing the inode numbers
    of the bind-mounted /var/www/pages/iso/<rel>/ostree_repo and original
    /var/www/pages/feed/rel-<rel>/ostree_repo directory.

    NOTES:
    - The self.www_root variable is changed to self.www_iso_root to make
      it more obvious that this is the /var/www/pages/iso path, not the feed
      path.
    - Now using the 'sh' python library for the mount commands, which is
      much more convenient and straight-forward than the subprocess library

    Test Plan:
    PASS:
    - Unmount (but do not delete) the /var/www/pages/iso/<rel>/ostree_repo
      directory. When a subcloud add or deploy operation is done, the bind
      mount is recreated.
    - Stale mount:
        # Replace the original
        sudo cp -a /var/www/pages/feed/rel-24.09/ostree_repo \
          /var/www/pages/feed/rel-24.09/ostree_repo.orig
        sudo rm -rf /var/www/pages/feed/rel-24.09/ostree_repo
        sudo cp -a /var/www/pages/feed/rel-24.09/ostree_repo.orig \
          /var/www/pages/feed/rel-24.09/ostree_repo
      When a subcloud add or deploy operation is done, the stale bind
      mount is detected. The /var/www/pages/iso/24.09/ostree_repo is
      unmounted, and the directory is removed.
      When a subcloud add or deploy operation is done, the bind
      mount is recreated.

    Closes-Bug: 2066411

    Change-Id: I25911722b1e333cd352f142664526d7dfa73e9e8
    Signed-off-by: Kyle MacLeod <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.10.0 stx.distcloud
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/921316

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/921316
Committed: https://opendev.org/starlingx/distcloud/commit/42b72fd08a03f5bd03def0e2d8c3107b3a82ef65
Submitter: "Zuul (22348)"
Branch: master

commit 42b72fd08a03f5bd03def0e2d8c3107b3a82ef65
Author: Kyle MacLeod <email address hidden>
Date: Tue Jun 4 13:49:02 2024 -0400

    Fix incorect stale ostree_repo mount check

    This commit fixes a regression introduced in
    https://review.opendev.org/c/starlingx/distcloud/+/920194

    The incorrect path was being used for comparing the inodes
    of the target and source mount paths, which results in incorrectly
    detected stale mounts, followed by a failure to properly clean up the
    mount.

    Test Plan:
    PASS:
    - Unmount (but do not delete) the /var/www/pages/iso/<rel>/ostree_repo
      directory. When a subcloud add or deploy operation is done, the bind
      mount is recreated.
    - Stale mount:
        # Replace the original
        sudo cp -a /var/www/pages/feed/rel-24.09/ostree_repo \
          /var/www/pages/feed/rel-24.09/ostree_repo.orig
        sudo rm -rf /var/www/pages/feed/rel-24.09/ostree_repo
        sudo cp -a /var/www/pages/feed/rel-24.09/ostree_repo.orig \
          /var/www/pages/feed/rel-24.09/ostree_repo
      When a subcloud add or deploy operation is done, the stale bind
      mount is detected. The /var/www/pages/iso/24.09/ostree_repo is
      unmounted, and the directory is removed.
      When a subcloud add or deploy operation is done, the bind
      mount is recreated.

    Closes-Bug: 2066411

    Change-Id: I14332ef008f781f742c69fcb3987037e871ba19d
    Signed-off-by: Kyle MacLeod <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.