Stx-openstack application apply with FQDN fails

Bug #1891163 reported by Akshay
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Low
Unassigned

Bug Description

Brief Description
-----------------
Setup: I have bare metal starlingX R3.0 duplex setup. (Openstack Telemetry service is Enabled)

Issue: While applying stx-openstack ( stx-openstack-1.0-19-centos-stable-versioned.tgz) without FQDN, it gets applied successfully but when applying it with FQDN, it fails.

Can you please help me to solve this issue.

Attached: Collect all logs.

Severity
--------
Critical

Steps to Reproduce
------------------
1. Deploy Bare Metal StarlingX R3 duplex mode.
2. Enable openstack telemetry services.
3. Enable FQDN. (use stx-openstack-1.0-19-centos-stable-versioned.tgz )
2. Apply stx-openstack application from http://mirror.starlingx.cengn.ca/mirror/starlingx/release/3.0.0/centos/outputs/helm-charts/stx-openstack-1.0-19-centos-stable-versioned.tgz.

Expected Behavior
------------------
Stx-openstack application should get applied.

Actual Behavior
----------------
Stx-openstack application apply fails.

Reproducibility
---------------
Yes

System Configuration
--------------------
Two node system

Last Pass
---------
NO

Revision history for this message
Akshay (akshay346) wrote :
Revision history for this message
Amit (mahajanamit) wrote :

Just to add to the description what Akshay has shared, logs indicate that on re-applying stx-openstack application with fqdn configuration, heat chart has failed during. On looking at the heat specific Kubernetes pods, there are failures in resolving dependencies jobs (heat-ks-endpoints, heat-trustee-ks-user, heat-ks-service, heat-domain-ks-user, heat-ks-user and heat-bootstrap) in openstack namespace.

And, this issue is reproducible at our end without enabling openstack telemetry services (Ceilometer, Aodh, Gnocchi, Panko) as well.

BTW, has anyone else faced similar issue?

Revision history for this message
Amit (mahajanamit) wrote :

To add to the above observations, we had also tried stx-openstack-1.0-19-centos-stable-latest.tgz for deploy stx application, deployment failed with the latest version. From stx apply logs, we found that libvirt pod deployment in Kubernetes is failing. libvirt seemed to be having dependency on neutron's ovs agent pod. On debugging further, we found that neutron-ovs-agent-init is failing with the following error:

+ OVS_SOCKET=/run/openvswitch/db.sock
+ chown neutron: /run/openvswitch/db.sock
+ DPDK_CONFIG_FILE=/tmp/dpdk.conf
+ DPDK_CONFIG=
+ DPDK_ENABLED=false
+ '[' -f /tmp/dpdk.conf ']'
+ neutron-sanity-check --version
+ timeout 3m neutron-sanity-check --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --ovsdb_native --nokeepalived_ipv6_support
2020-08-01 10:02:04.362 1064 INFO neutron.common.config [-] Logging enabled!
2020-08-01 10:02:04.362 1064 INFO neutron.common.config [-] /var/lib/openstack/bin/neutron-sanity-check version 15.1.1.dev39
2020-08-01 10:02:04.611 1064 CRITICAL neutron [-] Unhandled error: NoSuchOptError: no such option vf_management in group [DEFAULT]
2020-08-01 10:02:04.611 1064 ERROR neutron Traceback (most recent call last):
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/bin/neutron-sanity-check", line 8, in <module>
2020-08-01 10:02:04.611 1064 ERROR neutron sys.exit(main())
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 404, in main
2020-08-01 10:02:04.611 1064 ERROR neutron enable_tests_from_config()
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 347, in enable_tests_from_config
2020-08-01 10:02:04.611 1064 ERROR neutron cfg.CONF.set_default('vf_management', True)
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2051, in __inner
2020-08-01 10:02:04.611 1064 ERROR neutron result = f(self, *args, **kwargs)
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2455, in set_default
2020-08-01 10:02:04.611 1064 ERROR neutron opt_info = self._get_opt_info(name, group)
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2845, in _get_opt_info
2020-08-01 10:02:04.611 1064 ERROR neutron raise NoSuchOptError(opt_name, group)
2020-08-01 10:02:04.611 1064 ERROR neutron NoSuchOptError: no such option vf_management in group [DEFAULT]

This issue is consistently reproducible. Seems changes in upstream are causing this issue. Bug seems to be due to neutron version upgrade in the upstream and seems to be related to https://bugs.launchpad.net/neutron/+bug/1888920.

Are you folks facing this error and if so, then any plan on your side to analyze these issues?

Revision history for this message
yong hu (yhu6) wrote :

@Austin will make comments on this.

Revision history for this message
yong hu (yhu6) wrote :

@Amit to post the URL of downloading "stx-openstack" armada app tarball.

Amit (mahajanamit)
description: updated
description: updated
Revision history for this message
Amit (mahajanamit) wrote :

For the observations mentioned in comment #3, we used latest from http://mirror.starlingx.cengn.ca/mirror/starlingx/release/3.0.0/centos/outputs/helm-charts/stx-openstack-1.0-19-centos-stable-latest.tgz.

I have updated path of stx-openstack application (versioned) in the bug description.

Revision history for this message
Austin Sun (sunausti) wrote :

Hi Amit:
   1) what is the override file for enable FQDN ?
   2) before openstack applying , any other pods deployed except platform itselves ?

Revision history for this message
Amit (mahajanamit) wrote :

Hi Austin,

For point #1 in your question, we used following command for adding service parameter required for configuring helm endpoint domain.
 - system service-parameter-add openstack helm endpoint_domain=<domain_name>

Above command was used while following documentation at the following page:
https://docs.starlingx.io/deploy_install_guides/r3_release/openstack/access.html

For point #2 in your question, we had followed only standard document to install Bare-metal Duplex setup. We had not deployed any pod by ourselves before deploying stx-openstack application.

Revision history for this message
Amit (mahajanamit) wrote :

Hi All,

On further analysis at our setup, we found that OpenStack user account was getting locked while reapplying stx-openstack application with FQDN. It was getting locked because our Orchestrator solution was configured with an old password for OpenStack user account and was periodically trying to get token from Keystone, but with wrong password. As a result of which, stx-openstack reapply for FQDN was failing due to account locked error. So, this issue can be closed/cancelled.

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Closing as per note above

Changed in starlingx:
importance: Undecided → Low
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.