Possibly incorrect waiting-for-partition loop in AIO duplex installation guide

Bug #1844656 reported by Pratik M.
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
zhao.shuai

Bug Description

The guide at:
https://docs.starlingx.io/deploy_install_guides/current/bare_metal_aio_duplex.html

has this command for controller-1
while true; do system host-disk-partition-list $COMPUTE --nowrap | grep $NOVA_PARTITION_UUID | grep Ready; if [ $? -eq 0 ]; then break; fi; sleep 1; done

This loop never breaks because the status will never go to ready, unless the controller-1 is unlocked.
system host-disk-partition-list controller-1

+--------------------------------------+-----------------------------+------------+--------------------------------------+-----------+----------+----------------------+
| uuid | device_path | device_nod | type_guid | type_name | size_gib | status |
| | | e | | | | |
+--------------------------------------+-----------------------------+------------+--------------------------------------+-----------+----------+----------------------+
| 291e0cc1-4636-4249-963d-75a90e3d8d95 | /dev/disk/by-path/pci-0000: | /dev/sda5 | ba5eba11-0000-1111-2222-000000000001 | None | 34.0 | Creating (on unlock) |
| | 00:1f.2-ata-1.0-part5 | | | | | |
| | | | | | | |
+--------------------------------------+-----------------------------+------------+--------------------------------------+-----------+----------+----------------------+

Thanks

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to the storage team to review first and make recommendations for the doc update as needed.

tags: added: stx.storage
tags: added: stx.2.0
tags: removed: stx.2.0
Changed in starlingx:
assignee: nobody → Cindy Xie (xxie1)
importance: Undecided → Medium
Cindy Xie (xxie1)
Changed in starlingx:
assignee: Cindy Xie (xxie1) → zhao.shuai (zhao.shuai)
Revision history for this message
Long.Li (long.li) wrote :

we checked this issue and deployed bare metal simplex according to the guide doc,
and reproduced the issue at last.
command for controller-0
while true; do system host-disk-partition-list $COMPUTE --nowrap | grep $NOVA_PARTITION_UUID | grep Ready; if [ $? -eq 0 ];
then break; fi; sleep 1; done

the status never go to ready, it stuck in loop and never break.

after we check the install guide doc, https://docs.starlingx.io/deploy_install_guides/current/bare_metal_aio_simplex.html

We found the key issue in it,

Install Software on Controller-0
Installer Menu Selections:
    First Menu
      Select "Standard Controller Configuration"

it should be selected as "All in one Controller Configuration", not the "Standard Controller Configuration"

in this way, the issue will be fixed, otherwise, it will reproduced.

and we have tested it passed after select "All-in-one Controller Configuration"

we advise document team to fix simplex and duplex bare metal guide doc.

Revision history for this message
Pratik M. (pvmpublic) wrote : Re: [Bug 1844656] Re: Possibly incorrect waiting-for-partition loop in AIO duplex installation guide
Download full text (3.5 KiB)

Thank you. Just so that we are on same page, I had chosen AIO on
controller-0. Then I installed controller-1. I see this issue in
controller-1. My uneducated guess is because the partitioning will be
performed on controller-1 after it is unlocked?

Thanks

On Mon, Sep 23, 2019 at 3:45 PM Long.Li <email address hidden> wrote:
>
> we checked this issue and deployed bare metal simplex according to the guide doc,
> and reproduced the issue at last.
> command for controller-0
> while true; do system host-disk-partition-list $COMPUTE --nowrap | grep $NOVA_PARTITION_UUID | grep Ready; if [ $? -eq 0 ];
> then break; fi; sleep 1; done
>
> the status never go to ready, it stuck in loop and never break.
>
> after we check the install guide doc,
> https://docs.starlingx.io/deploy_install_guides/current/bare_metal_aio_simplex.html
>
> We found the key issue in it,
>
> Install Software on Controller-0
> Installer Menu Selections:
> First Menu
> Select "Standard Controller Configuration"
>
>
> it should be selected as "All in one Controller Configuration", not the "Standard Controller Configuration"
>
> in this way, the issue will be fixed, otherwise, it will reproduced.
>
> and we have tested it passed after select "All-in-one Controller
> Configuration"
>
> we advise document team to fix simplex and duplex bare metal guide doc.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1844656
>
> Title:
> Possibly incorrect waiting-for-partition loop in AIO duplex
> installation guide
>
> Status in StarlingX:
> New
>
> Bug description:
> The guide at:
> https://docs.starlingx.io/deploy_install_guides/current/bare_metal_aio_duplex.html
>
> has this command for controller-1
> while true; do system host-disk-partition-list $COMPUTE --nowrap | grep $NOVA_PARTITION_UUID | grep Ready; if [ $? -eq 0 ]; then break; fi; sleep 1; done
>
> This loop never breaks because the status will never go to ready, unless the controller-1 is unlocked.
> system host-disk-partition-list controller-1
>
> +--------------------------------------+-----------------------------+------------+--------------------------------------+-----------+----------+----------------------+
> | uuid | device_path | device_nod | type_guid | type_name | size_gib | status |
> | | | e | | | | |
> +--------------------------------------+-----------------------------+------------+--------------------------------------+-----------+----------+----------------------+
> | 291e0cc1-4636-4249-963d-75a90e3d8d95 | /dev/disk/by-path/pci-0000: | /dev/sda5 | ba5eba11-0000-1111-2222-000000000001 | None | 34.0 | Creating (on unlock) |
> | | 00:1f.2-ata-1.0-part5 | | | | | |
> | | ...

Read more...

Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
Revision history for this message
Kristal Dale (kdale) wrote :

Addressing the comment from Long.Li:

Both the bare metal AIO-Simplex and bare metal AIO-Duplex installation guides have been updated with the correction to the first menu selection in the section "Install Software on Controller-0."

https://docs.starlingx.io/deploy_install_guides/current/bare_metal/aio_simplex_install_kubernetes.html#install-software-on-controller-0

https://docs.starlingx.io/deploy_install_guides/current/bare_metal/aio_duplex_install_kubernetes.html#install-software-on-controller-0

Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Kristal Dale (kdale) wrote :

Docs review that resolved this issue: https://review.opendev.org/#/c/682967/

Revision history for this message
Pratik M. (pvmpublic) wrote :

Unfortunately, the issue was NOT about the documentation if Standard Storage Controller is chosen (as interpreted in comment#2). That's a different problem. I already chose AIO in the menu. So a documentation update for the correct menu choice will not fix this.

My issue is that either the documentation or the code is incorrect about the while loop in AIO-Duplex for controller-1 here:
https://docs.starlingx.io/deploy_install_guides/current/bare_metal/aio_duplex_install_kubernetes.html#install-software-on-controller-1-node

Let me know if I need to re-open this or can I just submit a patch to remove that line from the document?

Thanks

Revision history for this message
Long Li (lilong-neu) wrote :

we should change 2 point for the doc of bare metal AIO-Duplex deploy.

https://docs.starlingx.io/deploy_install_guides/r2_release/bare_metal/aio_duplex_install_kubernetes.html#configure-controller-1

1. as we configure the OAM and MGMT interfaces of controller-1
we should change the command from
"system interface-network-assign controller-1 $MGMT_IF cluster-host"
to
"system interface-network-assign controller-1 mgmt0 cluster-host"

2. we should delete the while loop in AIO-Duplex for controller-1 where can't break.
"while true; do system host-disk-partition-list $COMPUTE --nowrap | grep $NOVA_PARTITION_UUID | grep Ready; if [ $? -eq 0 ]; then break; fi; sleep 1; done"

with this changes, it will be deployed successfully for bare metal AIO-Duplex deploy.
and I have test the bare metal AIO-Duplex deploy successfully with the changes.

we need to confirm whether it is right about the changes for the doc guide.

Revision history for this message
Cindy Xie (xxie1) wrote :

re-open the bug as the reporter doesn't agree with the fix.

Changed in starlingx:
status: Fix Released → Triaged
Revision history for this message
Long.Li (long.li) wrote :

We have proposed the solution in comment #7

and we suggest to change the owner to advance schedule.

Ghada Khalil (gkhalil)
tags: added: stx.3.0
Ghada Khalil (gkhalil)
tags: removed: stx.docs
tags: added: stx.docs
Changed in starlingx:
status: Triaged → Fix Released
Revision history for this message
Michael Tullis (mltullis) wrote :

Sorry, I should have read the comments above after Kris set to "fix released." This bug should remain as triaged, but I don't have rights to reset the status.

Revision history for this message
Michael Tullis (mltullis) wrote :

We'll be opening another review shortly to address feedback in comment #7.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to docs (master)

Fix proposed to branch: master
Review: https://review.opendev.org/686465

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to docs (master)

Reviewed: https://review.opendev.org/686465
Committed: https://git.openstack.org/cgit/starlingx/docs/commit/?id=9ccd43c789804e3082657a6c4b335688d32b8958
Submitter: Zuul
Branch: master

commit 9ccd43c789804e3082657a6c4b335688d32b8958
Author: Kristal Dale <email address hidden>
Date: Thu Oct 3 12:10:11 2019 -0700

    Fix commands in section Configure controller-1

    - Correct command at line 309 per fix #1 described in comment
      #7 of issue #1844656
    - Remove while loop at lines 416-417 per fix #2 described in
      comment #7 of issue #1844656

    Closes-Bug: #1844656

    Change-Id: I447332d1134ac19164e7a95c5c7ac50562b00951
    Signed-off-by: Kristal Dale <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.