epoll_wait busy loop in neutron-openvswitch-agent
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
I'm installing a demo openstack environment using TripleO Quickstart using the Queens release, and after deploying the undercloud node, neutron-
Whatever settings Neutron has are the defaults configured by TripleO quickstart with
./quickstart.sh -R queens -E config/
and ./quickstart.sh -T none -I -R queens -E config/
The host node is a freshly installed HP Gen8 blade Server and updated CentOS 7 with default repositories and whatever the quickstart ansible scripts set up. the undercloud node is whatever CentOS 7 image is used by TripleO Quickstart commit 505a0c5df551c45
I have not configured any Neutron settings myself. This is 100% reproducible on my host if I delete all the virtual machines and run TripleO Quickstart again.
I do not know what exactly triggers this behaviour, but the wait(0) call is in the run method in /usr/lib/
I added a line of code to throw an exception when wait(0) happens and this is the stacktrace I get:
2018-04-09 08:14:56.840 23151 CRITICAL neutron [-] Unhandled error: Exception: Eventlet waited for 0
2018-04-09 08:14:56.840 23151 ERROR neutron Traceback (most recent call last):
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/bin/
2018-04-09 08:14:56.840 23151 ERROR neutron sys.exit(main())
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron agent_main.main()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron mod.main()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron 'neutron.
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron app_mgr.close()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron self.uninstanti
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron app.stop()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron hub.joinall(
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron t.wait()
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron return self._exit_
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron return hubs.get_
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron return self.greenlet.
2018-04-09 08:14:56.840 23151 ERROR neutron File "/usr/lib/
2018-04-09 08:14:56.840 23151 ERROR neutron raise Exception("Eventlet waited for 0")
2018-04-09 08:14:56.840 23151 ERROR neutron Exception: Eventlet waited for 0
2018-04-09 08:14:56.840 23151 ERROR neutron
Just changing this wait to be non-zero drops cpu usage to ~nothing, though I can't tell if this impacts functionality in any way. Doesn't seem to, though.
neutron package versions are as such:
[stack@undercloud hubs]$ rpm -qa | grep neutron
python2-
openstack-
openstack-
openstack-
openstack-
python2-
puppet-
openstack-
python-
openstack-
openstack-
python-
openstack-
python2-
openstack-
also
python2-
python2-
openvswitch-
python2-
Can you see if you have this change:
https:/ /review. openstack. org/#/c/ 545612/
but not this one:
https:/ /review. openstack. org/#/c/ 554258/
Having one but not the other could cause the agent to consume 100% of a cpu.