StarlingX

Stx-openstack application apply with FQDN fails

Bug #1891163 reported by Akshay on 2020-08-11

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Invalid	Low	Unassigned

Bug Description

Brief Description
-----------------
Setup: I have bare metal starlingX R3.0 duplex setup. (Openstack Telemetry service is Enabled)

Issue: While applying stx-openstack ( stx-openstack-1.0-19-centos-stable-versioned.tgz) without FQDN, it gets applied successfully but when applying it with FQDN, it fails.

Can you please help me to solve this issue.

Attached: Collect all logs.

Severity
--------
Critical

Steps to Reproduce
------------------
1. Deploy Bare Metal StarlingX R3 duplex mode.
2. Enable openstack telemetry services.
3. Enable FQDN. (use stx-openstack-1.0-19-centos-stable-versioned.tgz )
2. Apply stx-openstack application from http://mirror.starlingx.cengn.ca/mirror/starlingx/release/3.0.0/centos/outputs/helm-charts/stx-openstack-1.0-19-centos-stable-versioned.tgz.

Expected Behavior
------------------
Stx-openstack application should get applied.

Actual Behavior
----------------
Stx-openstack application apply fails.

Reproducibility
---------------
Yes

System Configuration
--------------------
Two node system

Last Pass
---------
NO

See original description

Tags:

Revision history for this message

Akshay (akshay346) wrote on 2020-08-11:

Collect all logs Edit (45.3 MiB, application/x-tar)

Revision history for this message

Amit (mahajanamit) wrote on 2020-08-12:

Just to add to the description what Akshay has shared, logs indicate that on re-applying stx-openstack application with fqdn configuration, heat chart has failed during. On looking at the heat specific Kubernetes pods, there are failures in resolving dependencies jobs (heat-ks-endpoints, heat-trustee-ks-user, heat-ks-service, heat-domain-ks-user, heat-ks-user and heat-bootstrap) in openstack namespace.

And, this issue is reproducible at our end without enabling openstack telemetry services (Ceilometer, Aodh, Gnocchi, Panko) as well.

BTW, has anyone else faced similar issue?

Revision history for this message

Amit (mahajanamit) wrote on 2020-08-24:

To add to the above observations, we had also tried stx-openstack-1.0-19-centos-stable-latest.tgz for deploy stx application, deployment failed with the latest version. From stx apply logs, we found that libvirt pod deployment in Kubernetes is failing. libvirt seemed to be having dependency on neutron's ovs agent pod. On debugging further, we found that neutron-ovs-agent-init is failing with the following error:

+ OVS_SOCKET=/run/openvswitch/db.sock
+ chown neutron: /run/openvswitch/db.sock
+ DPDK_CONFIG_FILE=/tmp/dpdk.conf
+ DPDK_CONFIG=
+ DPDK_ENABLED=false
+ '[' -f /tmp/dpdk.conf ']'
+ neutron-sanity-check --version
+ timeout 3m neutron-sanity-check --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --ovsdb_native --nokeepalived_ipv6_support
2020-08-01 10:02:04.362 1064 INFO neutron.common.config [-] Logging enabled!
2020-08-01 10:02:04.362 1064 INFO neutron.common.config [-] /var/lib/openstack/bin/neutron-sanity-check version 15.1.1.dev39
2020-08-01 10:02:04.611 1064 CRITICAL neutron [-] Unhandled error: NoSuchOptError: no such option vf_management in group [DEFAULT]
2020-08-01 10:02:04.611 1064 ERROR neutron Traceback (most recent call last):
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/bin/neutron-sanity-check", line 8, in <module>
2020-08-01 10:02:04.611 1064 ERROR neutron sys.exit(main())
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 404, in main
2020-08-01 10:02:04.611 1064 ERROR neutron enable_tests_from_config()
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 347, in enable_tests_from_config
2020-08-01 10:02:04.611 1064 ERROR neutron cfg.CONF.set_default('vf_management', True)
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2051, in __inner
2020-08-01 10:02:04.611 1064 ERROR neutron result = f(self, *args, **kwargs)
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2455, in set_default
2020-08-01 10:02:04.611 1064 ERROR neutron opt_info = self._get_opt_info(name, group)
2020-08-01 10:02:04.611 1064 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2845, in _get_opt_info
2020-08-01 10:02:04.611 1064 ERROR neutron raise NoSuchOptError(opt_name, group)
2020-08-01 10:02:04.611 1064 ERROR neutron NoSuchOptError: no such option vf_management in group [DEFAULT]

This issue is consistently reproducible. Seems changes in upstream are causing this issue. Bug seems to be due to neutron version upgrade in the upstream and seems to be related to https://bugs.launchpad.net/neutron/+bug/1888920.

Are you folks facing this error and if so, then any plan on your side to analyze these issues?

To add to the above observations, we had also tried stx-openstack-1.0-19-centos-stable-latest.tgz for deploy stx application, deployment failed with the latest version. From stx apply logs, we found that  libvirt pod deployment in Kubernetes is failing. libvirt seemed to be having dependency on neutron's ovs agent pod. On debugging further, we found that neutron-ovs-agent-init is failing with the following error:

+ OVS_SOCKET=/run/openvswitch/db.sock
+ chown neutron: /run/openvswitch/db.sock
+ DPDK_CONFIG_FILE=/tmp/dpdk.conf
+ DPDK_CONFIG=
+ DPDK_ENABLED=false
+ '[' -f /tmp/dpdk.conf ']'
+ neutron-sanity-check --version
+ timeout 3m neutron-sanity-check --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --ovsdb_native --nokeepalived_ipv6_support
2020-08-01 10:02:04.362 1064 INFO neutron.common.config [-] Logging enabled!
2020-08-01 10:02:04.362 1064 INFO neutron.common.config [-] /var/lib/openstack/bin/neutron-sanity-check version 15.1.1.dev39
2020-08-01 10:02:04.611 1064 CRITICAL neutron [-] Unhandled error: NoSuchOptError: no such option vf_management in group [DEFAULT]
2020-08-01 10:02:04.611 1064 ERROR neutron Traceback (most recent call last):
2020-08-01 10:02:04.611 1064 ERROR neutron   File "/var/lib/openstack/bin/neutron-sanity-check", line 8, in <module>
2020-08-01 10:02:04.611 1064 ERROR neutron     sys.exit(main())
2020-08-01 10:02:04.611 1064 ERROR neutron   File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 404, in main
2020-08-01 10:02:04.611 1064 ERROR neutron     enable_tests_from_config()
2020-08-01 10:02:04.611 1064 ERROR neutron   File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 347, in enable_tests_from_config
2020-08-01 10:02:04.611 1064 ERROR neutron     cfg.CONF.set_default('vf_management', True)
2020-08-01 10:02:04.611 1064 ERROR neutron   File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2051, in __inner
2020-08-01 10:02:04.611 1064 ERROR neutron     result = f(self, *args, **kwargs)
2020-08-01 10:02:04.611 1064 ERROR neutron   File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2455, in set_default
2020-08-01 10:02:04.611 1064 ERROR neutron     opt_info = self._get_opt_info(name, group)
2020-08-01 10:02:04.611 1064 ERROR neutron   File "/var/lib/openstack/lib/python2.7/site-packages/oslo_config/cfg.py", line 2845, in _get_opt_info
2020-08-01 10:02:04.611 1064 ERROR neutron     raise NoSuchOptError(opt_name, group)
2020-08-01 10:02:04.611 1064 ERROR neutron NoSuchOptError: no such option vf_management in group [DEFAULT]

Are you folks facing this error and if so, then any plan on your side to analyze these issues?

Revision history for this message

yong hu (yhu6) wrote on 2020-09-08:

@Austin will make comments on this.

Revision history for this message

yong hu (yhu6) wrote on 2020-09-08:

@Amit to post the URL of downloading "stx-openstack" armada app tarball.

Amit (mahajanamit) on 2020-09-09

description:	updated
description:	updated

Revision history for this message

Amit (mahajanamit) wrote on 2020-09-09:

For the observations mentioned in comment #3, we used latest from http://mirror.starlingx.cengn.ca/mirror/starlingx/release/3.0.0/centos/outputs/helm-charts/stx-openstack-1.0-19-centos-stable-latest.tgz.

I have updated path of stx-openstack application (versioned) in the bug description.

Revision history for this message

Austin Sun (sunausti) wrote on 2020-09-11:

Hi Amit:
1) what is the override file for enable FQDN ?
2) before openstack applying , any other pods deployed except platform itselves ?

Revision history for this message

Amit (mahajanamit) wrote on 2020-09-14:

Hi Austin,

For point #1 in your question, we used following command for adding service parameter required for configuring helm endpoint domain.
- system service-parameter-add openstack helm endpoint_domain=<domain_name>

Above command was used while following documentation at the following page:
https://docs.starlingx.io/deploy_install_guides/r3_release/openstack/access.html

For point #2 in your question, we had followed only standard document to install Bare-metal Duplex setup. We had not deployed any pod by ourselves before deploying stx-openstack application.

Revision history for this message

Amit (mahajanamit) wrote on 2020-09-22:

Hi All,

On further analysis at our setup, we found that OpenStack user account was getting locked while reapplying stx-openstack application with FQDN. It was getting locked because our Orchestrator solution was configured with an old password for OpenStack user account and was periodically trying to get token from Keystone, but with wrong password. As a result of which, stx-openstack reapply for FQDN was failing due to account locked error. So, this issue can be closed/cancelled.

Ghada Khalil (gkhalil) on 2020-09-25

tags:

added: stx.distro.openstack

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-10-19:

#10

Closing as per note above