[ironic] pod ironic-manage-cleaning-network failing after helm override

Bug #1855319 reported by Jose Perez Carranza
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Mingyuan Qi

Bug Description

Brief Description
-----------------
After do a helm override on ironic pods as indicated on the documentation [1] application apply is failing because pod ironic-manage-cleaning-network goes constantly to CrashLoopBackOff.

1-https://docs.starlingx.io/deploy_install_guides/r3_release/bare_metal/ironic_install.html#generate-user-helm-overrides

Note: According to the logs pod is trying to delete a subnet that does not exist.

Severity
--------
Provide the severity of the defect.
Critical

Steps to Reproduce
------------------
Follow steps on the Documentation page described above until reach system application-apply

Expected Behavior
------------------
Application should be applied correctly

Actual Behavior
----------------
Application is failing

Reproducibility
---------------
<Reproducible/Intermittent/Seen once>
100%

System Configuration
--------------------
Multi-node system
Ironic

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Release 19.12
###

OS="centos"
SW_VERSION="19.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="r/stx.3.0"

JOB="STX_BUILD_3.0"
<email address hidden>"
BUILD_NUMBER="8"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-12-03 02:11:36 +0000"

Last Pass
---------
PASS on regression stage with BUILD_ID="20191006T230000Z"

Timestamp/Logs
--------------

************************************************************
- kubectl -n openstack logs ironic-conductor-0 init

Entrypoint WARNING: 2019/12/05 13:26:00 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:01 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .
Entrypoint WARNING: 2019/12/05 13:26:02 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:03 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .
Entrypoint WARNING: 2019/12/05 13:26:04 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:05 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .
Entrypoint WARNING: 2019/12/05 13:26:06 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:07 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .

*************************************************************

- kubectl -n openstack logs ironic-manage-cleaning-network-rt9ls

++ openstack network show baremetal -f value -c id
+ IRONIC_NEUTRON_CLEANING_NET_ID=cd21a947-c735-4737-ab37-50f0b2cf932e
++ openstack network show cd21a947-c735-4737-ab37-50f0b2cf932e -f value -c subnets
+ for SUBNET in '$(openstack network show $IRONIC_NEUTRON_CLEANING_NET_ID -f value -c subnets)'
++ openstack subnet show '[]' -f value -c name
No Subnet found for []
+ CURRENT_SUBNET=

*************************************************************

Test Activity
-------------
Final Regression Testing

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as high priority for stx.3.0 given this feature was working for stx.2.0.
Needs further investigation to understand if this is related to train or something else.

Changed in starlingx:
assignee: nobody → Mingyuan Qi (myqi)
importance: Undecided → High
status: New → Triaged
tags: added: stx.3.0 stx.containers stx.distro.openstack
Revision history for this message
Mingyuan Qi (myqi) wrote :

Jose, besides the issue itself, I encountered an issue of identity authentication: The current ironic chart requires a default_domain_id for ironic user, which is not "default", otherwise the authentication to keystone will fail. Have you ever faced that issue? my workaround is to add an override: --set endpoints.identity.auth.ironic.default_domain_id=a33bf06164464b9c8b73cf78e835e0fb, the default_domain_id is get by 'openstack domain list'

Revision history for this message
Mingyuan Qi (myqi) wrote :

This issue is caused by the update of openstack client/neutron client/neutron_lib, but the manage-cleaning-network.sh in ironic-manage-cleaning-network job is not updated accordingly.

The ironic-manage-cleaning-network job is running in openstack heat container.
In starlingx/stx-heat:rc-2.0-centos-stable-20191005T150558Z.0, the version of the clients are:
openstack client: 3.18.0
neutron client: 6.12.0
neutron lib: 1.25.0
'openstack network show baremetal -f value -c subnets' returns an empty string if there are no subnets.

While in starlingx/stx-heat:master-centos-stable-20191119T000000Z.0
openstack client: 4.0.0
neutron client: 6.14.0
neutron lib: 1.29.1
'openstack network show baremetal -f value -c subnets' returns an empty array '[]', but the shell script does not consider '[]' as empty. As a result, the following command leads to an error.

Revision history for this message
Jose Perez Carranza (jgperezc) wrote :

Hi Mingyuan

Last time I executed the test everyone was PASS (with BUILD_ID="20191006T230000Z") according the steps on the documentation.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (master)

Fix proposed to branch: master
Review: https://review.opendev.org/698619

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Mingyuan Qi (myqi) wrote :

@Jose, Please cherry-pick this patch and build/deploy the stx-openstack app to confirm the issue is resolved. Passed in my cluster.

Revision history for this message
Jose Perez Carranza (jgperezc) wrote :

@Mingyuan

Applied the patch changes and the error is not present anymore !!!

zhipeng liu (zhipengs)
Changed in starlingx:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/698619
Committed: https://git.openstack.org/cgit/starlingx/openstack-armada-app/commit/?id=a4503a28acd30ca640821ec884100c614cab8cb4
Submitter: Zuul
Branch: master

commit a4503a28acd30ca640821ec884100c614cab8cb4
Author: Mingyuan Qi <email address hidden>
Date: Thu Dec 12 02:41:34 2019 +0000

    Check return value of get subnets before iterate for ironic

    With the update of openstack clients within heat image:
    openstack client >= 4.0.0
    neutron client >= 6.14.0
    neturon lib >= 1.29.1

    The command 'openstack network show ${network} -f value -c subnets'
    returns '[]' instead of null string if no subnets found in the
    specific network. This commit adds a check logic to avoid subsequent
    command returns error by using '[]' as subnet input.

    Change-Id: I695e504518e1c884c7d66ecc94c9fa8787ce9752
    Closes-Bug: 1855319
    Signed-off-by: Mingyuan Qi <email address hidden>

Changed in starlingx:
status: Fix Committed → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Zhipeng, This LP is marked as gating for stx.3.0. Please cherry-pick the code changes to the stx.3.0 branch if applicable or add a note explaining why it shouldn't be cherry-picked.

tags: added: stx.4.0
Revision history for this message
Bill Zvonar (billzvonar) wrote :

Zhipeng - reminder: This LP is marked as gating for stx.3.0. Please cherry-pick the code changes to the stx.3.0 branch if applicable or add a note explaining why it shouldn't be cherry-picked.

tags: added: stx.cherrypickneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/747604

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (r/stx.3.0)

Reviewed: https://review.opendev.org/747604
Committed: https://git.openstack.org/cgit/starlingx/openstack-armada-app/commit/?id=7a859890f77560a1c6436b0967fa49c2a249501d
Submitter: Zuul
Branch: r/stx.3.0

commit 7a859890f77560a1c6436b0967fa49c2a249501d
Author: Mingyuan Qi <email address hidden>
Date: Thu Dec 12 02:41:34 2019 +0000

    Check return value of get subnets before iterate for ironic

    With the update of openstack clients within heat image:
    openstack client >= 4.0.0
    neutron client >= 6.14.0
    neturon lib >= 1.29.1

    The command 'openstack network show ${network} -f value -c subnets'
    returns '[]' instead of null string if no subnets found in the
    specific network. This commit adds a check logic to avoid subsequent
    command returns error by using '[]' as subnet input.

    Change-Id: I695e504518e1c884c7d66ecc94c9fa8787ce9752
    Closes-Bug: 1855319
    Signed-off-by: Mingyuan Qi <email address hidden>

Bill Zvonar (billzvonar)
tags: removed: stx.cherrypickneeded
Ghada Khalil (gkhalil)
tags: added: in-r-stx30
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.