application-apply stx-openstack failed due to neutron pods failure - high ovs-dpdk cpu usage

Bug #1825045 reported by Peng Peng
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Matt Peters

Bug Description

Brief Description
-----------------
Run " system application-apply stx-openstack" during lab_setup. apply failed and also found that neutron pods failed

Severity
--------
Major

Steps to Reproduce
------------------
as description
....
TC-name: installation

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Multi-node system

Lab-name: IP_7-12

Branch/Pull Time/Commit
-----------------------
stx master as of "20190415T233001Z"

Last Pass
---------
20190410T013000Z

Timestamp/Logs
--------------
2019-04-16 08:27:29 [admin@admin]> RUNNING: system application-apply stx-openstack
+---------------+----------------------------------+
| Property | Value |
+---------------+----------------------------------+
| created_at | 2019-04-16T08:26:53.977627+00:00 |
| manifest_file | manifest.yaml |
| manifest_name | armada-manifest |
| name | stx-openstack |
| progress | None |
| status | applying |
| updated_at | 2019-04-16T08:27:21.056044+00:00 |
+---------------+----------------------------------+

[wrsroot@controller-0 ~(keystone_admin)]$ system application-show stx-openstack
+---------------+------------------------------------------+
| Property | Value |
+---------------+------------------------------------------+
| created_at | 2019-04-16T08:26:53.977627+00:00 |
| manifest_file | manifest.yaml |
| manifest_name | armada-manifest |
| name | stx-openstack |
| progress | operation aborted, check logs for detail |
| status | apply-failed |
| updated_at | 2019-04-16T09:13:08.313222+00:00 |
+---------------+------------------------------------------+

neutron pods failed:

neutron-ovs-agent-compute-0-75ea0372-hw22d 0/1 Init:CrashLoopBackOff 123 10h
neutron-ovs-agent-compute-1-eae26dba-dcv5r 0/1 Init:CrashLoopBackOff 123 10h

Here’s the logs:
   kubectl logs neutron-ovs-agent-compute-0-75ea0372-hw22d -n openstack -c neutron-ovs-agent-init

2019-04-16 19:30:13.618 3643 CRITICAL neutron [-] Unhandled error: TimeoutException: Commands [<ovsdbapp.schema.open_vswitch.commands.AddBridgeCommand object at 0x7f680984ca50>, <ovsdbapp.backend.ovs_idl.command.DbAddCommand object at 0x7f680984cc50>, <ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7f680984cc90>] exceeded timeout 10 seconds
2019-04-16 19:30:13.618 3643 ERROR neutron Traceback (most recent call last):
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/bin/neutron-sanity-check", line 10, in <module>
2019-04-16 19:30:13.618 3643 ERROR neutron sys.exit(main())
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 417, in main
2019-04-16 19:30:13.618 3643 ERROR neutron return 0 if all_tests_passed() else 1
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 404, in all_tests_passed
2019-04-16 19:30:13.618 3643 ERROR neutron return all(opt.callback() for opt in OPTS if cfg.CONF.get(opt.name))
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 404, in <genexpr>
2019-04-16 19:30:13.618 3643 ERROR neutron return all(opt.callback() for opt in OPTS if cfg.CONF.get(opt.name))
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity_check.py", line 53, in check_ovs_vxlan
2019-04-16 19:30:13.618 3643 ERROR neutron result = checks.ovs_vxlan_supported()
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/cmd/sanity/checks.py", line 50, in ovs_vxlan_supported
2019-04-16 19:30:13.618 3643 ERROR neutron with ovs_lib.OVSBridge(name) as br:
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/agent/common/ovs_lib.py", line 1125, in __enter__
2019-04-16 19:30:13.618 3643 ERROR neutron self.create()
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/neutron/agent/common/ovs_lib.py", line 287, in create
2019-04-16 19:30:13.618 3643 ERROR neutron FAILMODE_SECURE))
2019-04-16 19:30:13.618 3643 ERROR neutron File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2019-04-16 19:30:13.618 3643 ERROR neutron self.gen.next()
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/ovsdbapp/api.py", line 112, in transaction
2019-04-16 19:30:13.618 3643 ERROR neutron del self._nested_txns_map[cur_thread_id]
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/ovsdbapp/api.py", line 69, in __exit__
2019-04-16 19:30:13.618 3643 ERROR neutron self.result = self.commit()
2019-04-16 19:30:13.618 3643 ERROR neutron File "/var/lib/openstack/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
2019-04-16 19:30:13.618 3643 ERROR neutron timeout=self.timeout)
2019-04-16 19:30:13.618 3643 ERROR neutron TimeoutException: Commands [<ovsdbapp.schema.open_vswitch.commands.AddBridgeCommand object at 0x7f680984ca50>, <ovsdbapp.backend.ovs_idl.command.DbAddCommand object at 0x7f680984cc50>, <ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7f680984cc90>] exceeded timeout 10 seconds
2019-04-16 19:30:13.618 3643 ERROR neutron

Test Activity
-------------
installation

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Erich Cordoba (ericho) wrote :

We are seeing this issue as well, actually three charts are failing:

$ helm list
...
osh-openstack-libvirt 1 Tue Apr 16 16:37:33 2019 FAILED libvirt-0.1.0 openstack
osh-openstack-mariadb 1 Tue Apr 16 16:28:16 2019 DEPLOYED mariadb-0.1.0 openstack
osh-openstack-memcached 1 Tue Apr 16 16:30:56 2019 DEPLOYED memcached-0.1.0 openstack
osh-openstack-neutron 1 Tue Apr 16 16:37:33 2019 FAILED neutron-0.1.0 openstack
osh-openstack-nova 1 Tue Apr 16 16:37:33 2019 FAILED nova-0.1.0 openstack
...

This seems to affect all configurations.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; high priority as the issue is causing the openstack application to fail

tags: added: stx.2.0 stx.containers
tags: added: stx.retestneeded
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Matt Peters (mpeters-wrs)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Matt Peters, Gerry Kopec and Al Bailey are currently investigating

Revision history for this message
Matt Peters (mpeters-wrs) wrote :

Hello Folks,
The high OVS-DPDK cpu usage is due to the bonding and the side effect of it polling the link state of the slave interfaces. Since these are NICs that are serviced by the DPDK data path, it is using the DPDK interface to busy poll the link state, which results in a high CPU usage on the master/platform core. It looks like the data ports are down, which would likely contribute to the excessive polling of the bond. Can you confirm the switch side is up for these interfaces?

 3(bond0.1): addr:90:e2:ba:47:de:48
     config: 0
     state: LINK_DOWN
     current: 100MB-HD 100MB-FD 1GB-HD 1GB-FD FIBER AUTO_NEG AUTO_PAUSE
     speed: 1000 Mbps now, 0 Mbps max
 4(bond0.0): addr:90:e2:ba:4e:0e:e0
     config: 0
     state: LINK_DOWN
     current: 100MB-HD 100MB-FD 1GB-HD 1GB-FD FIBER AUTO_NEG AUTO_PAUSE
     speed: 1000 Mbps now, 0 Mbps max

Revision history for this message
Frank Miller (sensfan22) wrote :

This particular bug shows a failure signature where ovs-dpdk is consuming the platform cpu as it is trying to access the link devices which appear to be down in this system.

Peng, Please repeat the test-case with the data links up and collect new logs. We want to see the cpu usage of ovs-dpdk in that case.

summary: - application-apply stx-openstack failed due to neutron pods failure
+ application-apply stx-openstack failed due to neutron pods failure -
+ high ovs-dpdk cpu usage
tags: added: stx.networking
Changed in starlingx:
status: Triaged → Incomplete
Revision history for this message
Peng Peng (ppeng) wrote :

After investigation, it is confirmed that this is a switch issue.

Changed in starlingx:
status: Incomplete → Invalid
Peng Peng (ppeng)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.