stx-openstack apply takes longer then usually

Bug #1958399 reported by Alexandru Dimofte
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Austin Sun

Bug Description

Brief Description
-----------------
stx-openstack apply takes longer then usually

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
Try to apply stx-openstack.

Expected Behavior
------------------
Should apply in less then 10min.

Actual Behavior
----------------
Now it takes ~15min.
This is why, some of our tests are failing.
[2022-01-18 07:49:52,959] 290 WARNING MainThread container_helper.wait_for_apps_status:: ['stx-openstack'] did not reach status applied within 600s
...
> raise exceptions.ContainerError(msg)
E utils.exceptions.ContainerError: Container error.
E Details: ['stx-openstack'] did not reach status applied within 600s

Reproducibility
---------------
on bare-metal configurations standard and standard ext in 100% reproductible

System Configuration
--------------------
Multi-node system, Dedicated storage

Branch/Pull Time/Commit
-----------------------
master

Last Pass
---------
20220108T034842Z

Timestamp/Logs
--------------
will be attached

Test Activity
-------------
Sanity

Workaround
----------
-

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

The collected logs are ready here: https://files.starlingx.cengn.ca/download_file/5

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

This is the Sanity execution log for standard bare-metal: https://files.starlingx.cengn.ca/download_file/6

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Austin Sun (sunausti)
Changed in starlingx:
assignee: nobody → Austin Sun (sunausti)
Revision history for this message
Austin Sun (sunausti) wrote :

from controller-0 sysinv log:
sysinv 2022-01-18 06:00:03.430 347402 INFO sysinv.conductor.kube_app [-] Application stx-openstack (1.0-155-centos-stable-versioned) apply started.
sysinv 2022-01-18 06:04:49.814 347402 INFO sysinv.conductor.kube_app [-] All docker images for application stx-openstack were successfully downloaded in 278 seconds
sysinv 2022-01-18 06:27:47.218 347402 INFO sysinv.conductor.kube_app [-] Application stx-openstack (1.0-155-centos-stable-versioned) apply completed..
it will take ~27 mins for the first time apply. the time is not same as 2022-01-18 07:49:52,959 described.

from controller-1 sysinv log:
sysinv 2022-01-18 07:23:40.734 528871 INFO k8sapp_openstack.helm.neutron [-] _get_neutron_ml2_config={'ml2': {'physical_network_mtus': 'physnet0:1500,physnet1:1500'}, 'ml2_type_flat': {'flat_networks': ''}}
sysinv 2022-01-18 07:23:45.784 528871 INFO sysinv.conductor.manager [-] There has been an overrides change, setting up reapply of stx-openstack
sysinv 2022-01-18 07:27:34.053 528871 INFO sysinv.conductor.manager [-] stx-openstack requires re-apply but platform-integ-apps apply is in progress. Will retry on next audit

re-applied failed.
sysinv 2022-01-18 07:59:49.162 528871 ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/stx-openstack/1.0-155-centos-stable-versioned/stx-openstack-stx-openstack.yaml with exit code 1. See /var/log/armada/stx-openstack-apply_2022-01-18-07-29-43.log for details.

2022-01-18 07:59:48.720 178 ERROR armada.handlers.wait [-] [chart=openstack-neutron]: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-neutron)). These pods were not ready=['neutron-ovs-agent-compute-0-5621f953-8rbkg', 'neutron-ovs-agent-compute-1-532206f8-x4nw5']
2022-01-18 07:59:48.721 178 ERROR armada.handlers.armada [-] Chart deploy [openstack-neutron] failed: armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-neutron)). These pods were not ready=['neutron-ovs-agent-compute-0-5621f953-8rbkg', 'neutron-ovs-agent-compute-1-532206f8-x4nw5']

The neutron-ovs-agent is not ready, suspect related to https://bugs.launchpad.net/starlingx/+bug/1958073 too

wait LP1958073 fix merged and monitor

Revision history for this message
Austin Sun (sunausti) wrote :

meanwhile, the /etc/platform/platform.conf
nodetype=controller
subfunction=controller
system_type=Standard
security_profile=standard
INSTALL_UUID=667847cd-89e5-4432-b2d2-8506bec03a19
http_port=8080
management_interface=eno2
UUID=b4e52408-9fc6-4abc-b62c-45e94b572a8d
oam_interface=eno1
sdn_enabled=no
region_config=no
system_mode=duplex
sw_version=22.02
security_feature="nopti nospectre_v2 nospectre_v1"
cluster_host_interface=eno2
vswitch_type=ovs-dpdk

This is enable ovs-dpdk, this issue only impact bare-metal.

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.7.0 / high - issue is contributing to red sanities
http://lists.starlingx.io/pipermail/starlingx-discuss/2022-January/012682.html

Changed in starlingx:
importance: Undecided → High
tags: added: stx.7.0
Revision history for this message
Thiago Paiva Brito (outbrito) wrote :

This issue is the same as https://bugs.launchpad.net/starlingx/+bug/1958073.

The issue described in 1958073 is a side effect of the neutron-ovs-agent pods being constantly restarted due to this issue. At some point they begin to conflicts with some other pods that are trying to mount neutron-etc to the point that the API starts to throttle the requests.

Will probably be fixed by the same patch.

Revision history for this message
Austin Sun (sunausti) wrote :
Ghada Khalil (gkhalil)
Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.