OpenStack Compute (nova)

multi-threaded nova-conductor performance

Bug #1224628 reported by Joe Gordon on 2013-09-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Undecided	Unassigned

Bug Description

The large-ops test is failing for multi-threaded nova-conductor. It is unclear if this is an actual performance regression or something else.

Test Patch:
https://review.openstack.org/#/c/45766/

Failing Jobs:

http://logs.openstack.org/66/45766/1/silent/gate-tempest-devstack-vm-large-ops/bac4fa9/
http://logs.openstack.org/66/45766/1/silent/gate-tempest-devstack-vm-large-ops/c9a6ab1/

Feature: https://review.openstack.org/#/c/42342/

See original description

Tags:

Joe Gordon (jogo) on 2013-09-12

Changed in nova:
milestone:	none → havana-rc1

Matt Riedemann (mriedem) on 2013-09-12

tags:

added: conductor testing

Joe Gordon (jogo) on 2013-09-18

description:

updated

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-09-20:

I see a lot of green runs - https://jenkins02.openstack.org/job/gate-tempest-devstack-vm-large-ops/

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-09-20:

And don't see 1224628 on the rechecks page.

Revision history for this message

Joe Gordon (jogo) wrote on 2013-09-20:

Dims, that is because large-ops is in the silent pipeline, it runs on every patch set but doesn't report back. And this didn't appears to be a repeatable test failure only for the unmerged patch above to devstack.

The log URLs above show the two failed runs. This can be reproduced by running a recheck and watching zuul / Jenkins for the test to fail then finding the logs URL

Russell Bryant (russellb) on 2013-09-23

tags:	added: havana-rc-proposed
Changed in nova:
milestone:	havana-rc1 → none

Russell Bryant (russellb) on 2013-09-23

tags:

added: havana-rc-potential
removed: havana-rc-proposed

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-10-08:

Download full text (3.2 KiB)

Looking at logs from https://review.openstack.org/#/c/45766/ at
http://logs.openstack.org/66/45766/1/check/gate-tempest-devstack-vm-large-ops/a9795a0/

The test error:
BuildErrorException: Server %(server_id)s failed to build and is in ERROR status
Details: <Server: scenario-server-185564612-b5044da7-0f18-476b-a064-2014b2ab391e> failed to get to expected status. In ERROR state.

seems to be caused by:
screen-n-cpu.txt.gz:2013-10-08 18:51:31.843 ERROR nova.compute.manager [req-4314b9da-9257-4935-9660-044e13f70d93 tempest.scenario.manager165361106-user tempest.scenario.manager165361106-tenant] [instance: b5044da7-0f18-476b-a064-2014b2ab391e] Error: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"
screen-n-sch.txt.gz:2013-10-08 18:51:31.889 ERROR nova.scheduler.filter_scheduler [req-4314b9da-9257-4935-9660-044e13f70d93 tempest.scenario.manager165361106-user tempest.scenario.manager165361106-tenant] [instance: b5044da7-0f18-476b-a064-2014b2ab391e] Error from last host: devstack-precise-hpcloud-az3-561338 (node devstack-precise-hpcloud-az3-561338): [u'Traceback (most recent call last):\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1039, in _build_instance\n set_access_ip=set_access_ip)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1448, in _spawn\n network_info.wait(do_raise=True)\n', u' File "/opt/stack/new/nova/nova/network/model.py", line 398, in wait\n self[:] = self._gt.wait()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait\n return self._exit_event.wait()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait\n return hubs.get_hub().switch()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch\n return self.greenlet.switch()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main\n result = function(*args, **kwargs)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1230, in _allocate_network_async\n dhcp_options=dhcp_options)\n', u' File "/opt/stack/new/nova/nova/network/api.py", line 93, in wrapped\n return func(self, context, *args, **kwargs)\n', u' File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapper\n res = f(self, context, *args, **kwargs)\n', u' File "/opt/stack/new/nova/nova/network/api.py", line 300, in allocate_for_instance\n nw_info = self.network_rpcapi.allocate_for_instance(context, **args)\n', u' File "/opt/stack/new/nova/nova/network/rpcapi.py", line 184, in allocate_for_instance\n macs=jsonutils.to_primitive(macs))\n', u' File "/opt/stack/new/nova/nova/rpcclient.py", line 85, in call\n return self._invoke(self.proxy.call, ctxt, method, **kwargs)\n', u' File "/opt/stack/new/nova/nova/rpcclient.py", line 63, in _invoke\n return cast_or_call(ctxt, msg, **self.kwargs)\n', u' File "/opt/stack/new/nova/nova/openstack/common/rpc/proxy.py", line 130, in call\n exc.info, real_topic, msg.get(\'method\'))\n', u'Timeout: Timeout while waiting on RPC response - topic:...

Looking at logs from https://review.openstack.org/#/c/45766/ at
http://logs.openstack.org/66/45766/1/check/gate-tempest-devstack-vm-large-ops/a9795a0/

seems to be caused by:
screen-n-cpu.txt.gz:2013-10-08 18:51:31.843 ERROR nova.compute.manager [req-4314b9da-9257-4935-9660-044e13f70d93 tempest.scenario.manager165361106-user tempest.scenario.manager165361106-tenant] [instance: b5044da7-0f18-476b-a064-2014b2ab391e] Error: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"
screen-n-sch.txt.gz:2013-10-08 18:51:31.889 ERROR nova.scheduler.filter_scheduler [req-4314b9da-9257-4935-9660-044e13f70d93 tempest.scenario.manager165361106-user tempest.scenario.manager165361106-tenant] [instance: b5044da7-0f18-476b-a064-2014b2ab391e] Error from last host: devstack-precise-hpcloud-az3-561338 (node devstack-precise-hpcloud-az3-561338): [u'Traceback (most recent call last):\n', u'  File "/opt/stack/new/nova/nova/compute/manager.py", line 1039, in _build_instance\n    set_access_ip=set_access_ip)\n', u'  File "/opt/stack/new/nova/nova/compute/manager.py", line 1448, in _spawn\n    network_info.wait(do_raise=True)\n', u'  File "/opt/stack/new/nova/nova/network/model.py", line 398, in wait\n    self[:] = self._gt.wait()\n', u'  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait\n    return self._exit_event.wait()\n', u'  File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait\n    return hubs.get_hub().switch()\n', u'  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch\n    return self.greenlet.switch()\n', u'  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main\n    result = function(*args, **kwargs)\n', u'  File "/opt/stack/new/nova/nova/compute/manager.py", line 1230, in _allocate_network_async\n    dhcp_options=dhcp_options)\n', u'  File "/opt/stack/new/nova/nova/network/api.py", line 93, in wrapped\n    return func(self, context, *args, **kwargs)\n', u'  File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapper\n    res = f(self, context, *args, **kwargs)\n', u'  File "/opt/stack/new/nova/nova/network/api.py", line 300, in allocate_for_instance\n    nw_info = self.network_rpcapi.allocate_for_instance(context, **args)\n', u'  File "/opt/stack/new/nova/nova/network/rpcapi.py", line 184, in allocate_for_instance\n    macs=jsonutils.to_primitive(macs))\n', u'  File "/opt/stack/new/nova/nova/rpcclient.py", line 85, in call\n    return self._invoke(self.proxy.call, ctxt, method, **kwargs)\n', u'  File "/opt/stack/new/nova/nova/rpcclient.py", line 63, in _invoke\n    return cast_or_call(ctxt, msg, **self.kwargs)\n', u'  File "/opt/stack/new/nova/nova/openstack/common/rpc/proxy.py", line 130, in call\n    exc.info, real_topic, msg.get(\'method\'))\n', u'Timeout: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"\n']

Revision history for this message

Joe Gordon (jogo) wrote on 2013-10-09:

So it appears that using multi-threaded conductor is somehow making nova-network slower. One possible explanation is this test is running on a pretty small box and using multi-threaded conductor is taking up to much overhead and slowing down all the other processes.

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-10-10:

should we try bumping up rpc_response_timeout?

Thierry Carrez (ttx) on 2013-10-14

tags:

added: havana-backport-potential
removed: havana-rc-potential

Chuck Short (zulcss) on 2014-02-06

Changed in nova:
status:	New → Confirmed

Revision history for this message

Joe Gordon (jogo) wrote on 2014-02-07:

I am not sure how you confirmed this one. We are running multi threaded conductor in the gate and it is working fine. The original issues seen here were partially due to us overloading the test VMs too much in general causing all kinds of random failures.

Changed in nova:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1237125

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.