AIO-SX migration to AIO-DX failed on standalone system

Bug #1927224 reported by Pedro Henrique Linhares
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Pedro Henrique Linhares

Bug Description

Brief Description
-----------------
AIO-SX migration to AIO-DX failed on standalone system

Severity
--------
Critical

Steps to Reproduce
------------------
1)After installing a simplex system , follow the steps below to migrate it to DX
Then unlock the host so migration takes effect during unlock and puppet run
   [sysadmin@controller-0 ~(keystone_admin)]$ system host-unlock controller-0

2)Once the system is unlocked, access to system services are refused

controller-0:~$ source /etc/platform/openrc

Openstack Admin credentials can only be loaded from the active controller.
 __

controller-0:~$ export PS1='[\u@\h \W(keystone_$OS_USERNAME)]\$

[sysadmin@controller-0 ~(keystone_admin)]$export OS_PASSWORD="Li69nux*"

 __

[sysadmin@controller-0 ~(keystone_admin)]$system application-list

Authorization failed: Unable to establish connection to http://192.168.204.1:5000/v3/auth/tokens

Expected Behavior
------------------
system state should be stable, usable and all the services are running properly

Actual Behavior
----------------
System services failed to start

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
simplex

Branch/Pull Time/Commit
-----------------------
Branch master, stx-puppet. bca8197 - Merge "Fix zuul errors due to changes in dependencies"

Last Pass
---------
Did this test scenario pass previously? If so, please indicate the load/pull time info of the last pass.
Use this section to also indicate if this is a new test scenario.

Timestamp/Logs
--------------
/Stage[main]/Platform::Drbd::Cephmon/Platform::Drbd::Filesystem[drbd-cephmon]/Drbd::Resource[drbd-cephmon]/Drbd::Resource::Enable[drbd-cephmon]/Drbd::Resource::Up[drbd-cephmon]/Mount[/var/lib/ceph/mon]: Scheduling refresh of Mount[/var/lib/ceph/mon]^[[0m
Mount[/var/lib/ceph/mon](provider=parsed): Remounting^[[0m
Executing: '/usr/bin/mount -o remount /var/lib/ceph/mon'^[[0m
/Stage[main]/Platform::Drbd::Cephmon/Platform::Drbd::Filesystem[drbd-cephmon]/Drbd::Resource[drbd-cephmon]/Drbd::Resource::Enable[drbd-cephmon]/Drbd::Resource::Up[drbd-cephmon]/Mount[/var/lib/ceph/mon]: Failed to call refresh: Execution of '/usr/bin/mount -o remount /var/lib/ceph/mon' returned 32: mount: /var/lib/ceph/mon not mounted or bad option
------------
2021-04-28 16:19:20 +0000 /Stage[post]/Platform::Ceph::Migration::Sx_to_dx::Update_pvcs/Exec[Update monitor IP in existing K8s PersistentVolumes]/returns: Sleeping for 10 seconds between tries^[[0m
2021-04-28T16:19:30.447 ^[[0;36mDebug: 2021-04-28 16:19:30 +0000 /Stage[post]/Platform::Ceph::Migration::Sx_to_dx::Update_pvcs/Exec[Update monitor IP in existing K8s PersistentVolumes]/returns: Exec try 6/6
Notice: 2021-04-28 16:19:40 +0000 /Stage[post]/Platform::Ceph::Migration::Sx_to_dx::Update_pvcs/Exec[Update monitor IP in existing K8s PersistentVolumes]/returns: The connection to the server 192.168.206.1:6443 was refused - did you specify the right host or port?

Test Activity
-------------
Feature Testing

Workaround
----------
Not available.

CVE References

Ghada Khalil (gkhalil)
tags: added: stx.6.0 stx.config
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.6.0 / this is a feature for the upcoming release. A fix is only required in the stx master branch.

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Pedro Henrique Linhares (linharesp)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/789844

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/789851

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fault (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/fault/+/790183

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/789844
Committed: https://opendev.org/starlingx/stx-puppet/commit/cb7858c65982c250f07a5022719d4f2b6d547d64
Submitter: "Zuul (22348)"
Branch: master

commit cb7858c65982c250f07a5022719d4f2b6d547d64
Author: Pedro Henrique Linhares <email address hidden>
Date: Wed May 5 11:11:27 2021 -0300

    Fix for failure during AIO-SX to AIO-DX migration on standalone system

    Fix drbd-cephmon mount error by manually remounting monitor DRBD after
    DRBD::Resource creation. Removed patching of Kubernetes Persistent
    Volumes from puppet manifest since Kubelet and kube-api are no longer
    available during puppet run.

    Partial-Bug: 1927224
    Signed-off-by: Pedro Henrique Linhares <email address hidden>
    Change-Id: Id5565ac734499b617b470499cfc2aa1ae2972da3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fault (master)

Reviewed: https://review.opendev.org/c/starlingx/fault/+/790183
Committed: https://opendev.org/starlingx/fault/commit/3280e6cd5b28809b51ea45e369c069f76f165c44
Submitter: "Zuul (22348)"
Branch: master

commit 3280e6cd5b28809b51ea45e369c069f76f165c44
Author: Pedro Henrique Linhares <email address hidden>
Date: Thu May 6 18:41:57 2021 -0300

    Adding Kubernetes alarm type for PV migration errors during AIO-SX to AIO-DX

    This commit adds a new alarm type for Kubernetes Persistent Volume
    patching errors during AIO-SX to AIO-DX migration.

    Partial-Bug: 1927224
    Signed-off-by: Pedro Henrique Linhares <email address hidden>
    Change-Id: I8f64280394999249c829372d1748a9c26fdb9ced

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/789851
Committed: https://opendev.org/starlingx/config/commit/6df2034a4e9e2a25cd0ba39af7074fec5a26466d
Submitter: "Zuul (22348)"
Branch: master

commit 6df2034a4e9e2a25cd0ba39af7074fec5a26466d
Author: Pedro Henrique Linhares <email address hidden>
Date: Wed May 5 11:34:47 2021 -0300

    Adding AIO-SX to AIO-DX migration steps patching existing PVs

    Kubelet and kube-api are no longer available during puppet
    manifest run during unlock. Therefore, we moved the patching
    of Persistent Volumes from puppet tosysinv-conductor
    as a post-migration step during its start-up.

    Closes-Bug: 1927224
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/789844
    Depends-On: https://review.opendev.org/c/starlingx/fault/+/790183
    Change-Id: I9745b7f8547c82485353130156011650f2655317
    Signed-off-by: Pedro Henrique Linhares <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792013

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792013

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792018

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792018

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792029

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fault (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/fault/+/792254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/fault/+/793428

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fault (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/fault/+/792254

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fault (f/centos8)
Download full text (6.9 KiB)

Reviewed: https://review.opendev.org/c/starlingx/fault/+/793428
Committed: https://opendev.org/starlingx/fault/commit/d17dd2a196d07500797895ebba4adb020b8a3498
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 3280e6cd5b28809b51ea45e369c069f76f165c44
Author: Pedro Henrique Linhares <email address hidden>
Date: Thu May 6 18:41:57 2021 -0300

    Adding Kubernetes alarm type for PV migration errors during AIO-SX to AIO-DX

    This commit adds a new alarm type for Kubernetes Persistent Volume
    patching errors during AIO-SX to AIO-DX migration.

    Partial-Bug: 1927224
    Signed-off-by: Pedro Henrique Linhares <email address hidden>
    Change-Id: I8f64280394999249c829372d1748a9c26fdb9ced

commit a64e88bf43012d5558826442b98b26847370eeb3
Author: Jerry Sun <email address hidden>
Date: Tue May 4 15:46:52 2021 -0400

    Better repair action for alarm 100.104

    This commit adds a better proposed repair action for filesystem
    threshold alarm 100.104.

    Closes-Bug: 1927155
    Signed-off-by: Jerry Sun <email address hidden>
    Change-Id: Id2d1d4c23d343455d1f0c2e359cf380cc23229cd

commit 03090ca2bb77edb8a01c9a08a716aa3d1a5f4595
Author: Charles Short <email address hidden>
Date: Mon Apr 26 10:50:20 2021 -0400

    Fix pep8 gate failures

    Set hacking to < 4.0.1 in test-requirements.txt so that
    the pep8 gate passes again.

    Test:
    Ran tox -e pep8 command to validate the flake8 job and result.

    Related-Bug: 1926172

    Signed-off-by: Charles Short <email address hidden>
    Change-Id: I5b27a89d0e078912814ca2999bf28e6602980fd0

commit 581495082a5a0a9456065b3d3bb8b5f015747fd8
Author: Eric MacDonald <email address hidden>
Date: Tue Apr 6 09:02:39 2021 -0400

    Make small modification to fm's logrotation configuration file

    This update makes the following changes to the fm logrotation config file

     - add 'create' with permissions to each tuple
     - add 'delaycompress' as a local setting to each log entry
     - remove 'nodateext' global and local setting

    Test Plan:

    PASS: Verify fm logs rotation behavior
    PASS: Verify fm logs delaycompress setting behavior
    PASS: Verify log permissions after rotate

    Change-Id: Ibe8bd8107501df947b5091e928de202378ef4ea8
    Partial-Bug: 1918979
    Depends-On: https://review.opendev.org/c/starlingx/config-files/+/784943
    Signed-off-by: Eric MacDonald <email address hidden>

commit 63fcc33bbca0bc07719c070a8fa7c2a3d3f084b9
Author: Enzo Candotti <email address hidden>
Date: Thu Apr 1 11:37:45 2021 -0300

    Update events.yaml with DM-Monitor alarms

    Add a new alarm definition under the 260.001 id,
    created when resources reconciled status were false.

    Closes-Bug: 1922238

    Signed-off-by: Enzo Candotti <email address hidden>
    Change-Id: I96c05aaaf914bb253f7a71a7bfc79924c8da7857

commit 4639f7dfff972f2b3e2cd61df11ebaf31afc89ee
Author: albailey <email address hidden>
Date: Wed Nov 18 13:36:04 2020 -0600

    Add log and alarm support for vim orchestrated kube-upgrade

    A...

Read more...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (f/centos8)
Download full text (48.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/792029
Committed: https://opendev.org/starlingx/stx-puppet/commit/2b026190a3cb6d561b6ec4a46dfb3add67f1fa69
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 3e3940824dfb830ebd39fd93265b983c6a22fc51
Author: Dan Voiculeasa <email address hidden>
Date: Thu May 13 18:03:45 2021 +0300

    Enable kubelet support for pod pid limit

    Enable limiting the number of pids inside of pods.

    Add a default value to protect against a missing value.
    Default to 750 pids limit to align with service parameter default
    value for most resource consuming StarlingX optional app (openstack).
    In fact any value above service parameter minimum value is good for the
    default.

    Closes-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <email address hidden>
    Change-Id: I10c1684fe3145e0a46b011f8e87f7a23557ddd4a

commit 0c16d288fbc483103b7ba5dad7782e97f59f4e17
Author: Jessica Castelino <email address hidden>
Date: Tue May 11 10:21:57 2021 -0400

    Safe restart of the etcd SM service in etcd upgrade runtime class

    While upgrading the central cloud of a DC system, activation failed
    because there was an unexpected SWACT to controller-1. This was due
    to the etcd upgrade script. Part of this script runs the etcd
    manifest. This triggers a reload/restart of the etcd service. As this
    is done outside of the sm, sm saw the process failure and triggered
    the SWACT.

    This commit modifies platform::etcd::upgrade::runtime puppet class
    to do a safe restart of the etcd SM service and thus, solve the
    issue.

    Change-Id: I3381b6976114c77ee96028d7d96a00302ad865ec
    Signed-off-by: Jessica Castelino <email address hidden>
    Closes-Bug: 1928135

commit eec3008f600aeeb69a42338ed44332228a862d11
Author: Mihnea Saracin <email address hidden>
Date: Mon May 10 13:09:52 2021 +0300

    Serialize updates to global_filter in the AIO manifest

    Right now, looking at the aio manifest:
    https://review.opendev.org/c/starlingx/stx-puppet/+/780600/15/puppet-manifests/src/manifests/aio.pp
    there are 3 classes that update
    in parallel the lvm global_filter:
    - include ::platform::lvm::controller
    - include ::platform::worker::storage
    - include ::platform::lvm::compute
    And this generates some errors.

    We fix this by adding dependencies between the above classes
    in order to update the global_filter in a serial mode.

    Closes-Bug: 1927762
    Signed-off-by: Mihnea Saracin <email address hidden>
    Change-Id: If6971e520454cdef41138b2f29998c036d8307ff

commit 97371409b9b2ae3f0db6a6a0acaeabd74927160e
Author: Steven Webster <email address hidden>
Date: Fri May 7 15:33:43 2021 -0400

    Add SR-IOV rate-limit dependency

    Currently, the binding of an SR-IOV virtual function (VF) to a
    driver has a dependency on platform::networking. This is needed
    to ensure that SR-IOV is enabled (VFs created) before actually
    doing the bind.

    This dependency does not exist for configuring the VF rate-limits
    however. There is a cha...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794906

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (147.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <email address hidden>
Date: Mon May 31 14:45:52 2021 -0400

    Add more logging to run docker login

    Add error log for running docker login. The new log could
    help identify docker login failure.

    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <email address hidden>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <email address hidden>
Date: Fri May 28 13:42:42 2021 -0500

    Fix controller-0 downgrade failing to kill ceph

    kill_ceph_storage_monitor tried to manipulate a pmon
    file that does not exist in an AIO-DX environment.

    We no longer invoke kill_ceph_storage_monitor in an
    AIO SX or DX env.

    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <email address hidden>
Date: Fri May 28 11:05:43 2021 -0500

    Fix file permissions failure during duplex upgrade abort

    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <email address hidden>
Date: Tue May 25 16:16:29 2021 +0800

    Fix bug rook-ceph provision with multi osd on one host

    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot

    Without this fix, only osd pod could launch successfully after boot
    as vg start with ceph could not correctly add in sysinv-database

    Closes-bug: 1929511

    Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
    Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <email address hidden>
Date: Tue May 25 18:49:21 2021 -0400

    Fix issue in partition data migration script

    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.