[ironic] pod ironic-manage-cleaning-network failing after helm override

Bug #1855319 reported by Jose Perez Carranza on 2019-12-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
High
Mingyuan Qi

Bug Description

Brief Description
-----------------
After do a helm override on ironic pods as indicated on the documentation [1] application apply is failing because pod ironic-manage-cleaning-network goes constantly to CrashLoopBackOff.

1-https://docs.starlingx.io/deploy_install_guides/r3_release/bare_metal/ironic_install.html#generate-user-helm-overrides

Note: According to the logs pod is trying to delete a subnet that does not exist.

Severity
--------
Provide the severity of the defect.
Critical

Steps to Reproduce
------------------
Follow steps on the Documentation page described above until reach system application-apply

Expected Behavior
------------------
Application should be applied correctly

Actual Behavior
----------------
Application is failing

Reproducibility
---------------
<Reproducible/Intermittent/Seen once>
100%

System Configuration
--------------------
Multi-node system
Ironic

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Release 19.12
###

OS="centos"
SW_VERSION="19.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="r/stx.3.0"

JOB="STX_BUILD_3.0"
<email address hidden>"
BUILD_NUMBER="8"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-12-03 02:11:36 +0000"

Last Pass
---------
PASS on regression stage with BUILD_ID="20191006T230000Z"

Timestamp/Logs
--------------

************************************************************
- kubectl -n openstack logs ironic-conductor-0 init

Entrypoint WARNING: 2019/12/05 13:26:00 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:01 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .
Entrypoint WARNING: 2019/12/05 13:26:02 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:03 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .
Entrypoint WARNING: 2019/12/05 13:26:04 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:05 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .
Entrypoint WARNING: 2019/12/05 13:26:06 entrypoint.go:71: Resolving dependency Service ironic-api in namespace openstack failed: Service ironic-api has no endpoints .
Entrypoint WARNING: 2019/12/05 13:26:07 entrypoint.go:71: Resolving dependency Job ironic-manage-cleaning-network in namespace openstack failed: Job Job ironic-manage-cleaning-network in namespace openstack is not completed yet .

*************************************************************

- kubectl -n openstack logs ironic-manage-cleaning-network-rt9ls

++ openstack network show baremetal -f value -c id
+ IRONIC_NEUTRON_CLEANING_NET_ID=cd21a947-c735-4737-ab37-50f0b2cf932e
++ openstack network show cd21a947-c735-4737-ab37-50f0b2cf932e -f value -c subnets
+ for SUBNET in '$(openstack network show $IRONIC_NEUTRON_CLEANING_NET_ID -f value -c subnets)'
++ openstack subnet show '[]' -f value -c name
No Subnet found for []
+ CURRENT_SUBNET=

*************************************************************

Test Activity
-------------
Final Regression Testing

Ghada Khalil (gkhalil) wrote :

Marking as high priority for stx.3.0 given this feature was working for stx.2.0.
Needs further investigation to understand if this is related to train or something else.

Changed in starlingx:
assignee: nobody → Mingyuan Qi (myqi)
importance: Undecided → High
status: New → Triaged
tags: added: stx.3.0 stx.containers stx.distro.openstack
Mingyuan Qi (myqi) wrote :

Jose, besides the issue itself, I encountered an issue of identity authentication: The current ironic chart requires a default_domain_id for ironic user, which is not "default", otherwise the authentication to keystone will fail. Have you ever faced that issue? my workaround is to add an override: --set endpoints.identity.auth.ironic.default_domain_id=a33bf06164464b9c8b73cf78e835e0fb, the default_domain_id is get by 'openstack domain list'

Mingyuan Qi (myqi) wrote :

This issue is caused by the update of openstack client/neutron client/neutron_lib, but the manage-cleaning-network.sh in ironic-manage-cleaning-network job is not updated accordingly.

The ironic-manage-cleaning-network job is running in openstack heat container.
In starlingx/stx-heat:rc-2.0-centos-stable-20191005T150558Z.0, the version of the clients are:
openstack client: 3.18.0
neutron client: 6.12.0
neutron lib: 1.25.0
'openstack network show baremetal -f value -c subnets' returns an empty string if there are no subnets.

While in starlingx/stx-heat:master-centos-stable-20191119T000000Z.0
openstack client: 4.0.0
neutron client: 6.14.0
neutron lib: 1.29.1
'openstack network show baremetal -f value -c subnets' returns an empty array '[]', but the shell script does not consider '[]' as empty. As a result, the following command leads to an error.

Jose Perez Carranza (jgperezc) wrote :

Hi Mingyuan

Last time I executed the test everyone was PASS (with BUILD_ID="20191006T230000Z") according the steps on the documentation.

Fix proposed to branch: master
Review: https://review.opendev.org/698619

Changed in starlingx:
status: Triaged → In Progress
Mingyuan Qi (myqi) wrote :

@Jose, Please cherry-pick this patch and build/deploy the stx-openstack app to confirm the issue is resolved. Passed in my cluster.

Jose Perez Carranza (jgperezc) wrote :

@Mingyuan

Applied the patch changes and the error is not present anymore !!!

zhipeng liu (zhipengs) on 2020-01-02
Changed in starlingx:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers