multi-threaded nova-conductor performance

Bug #1224628 reported by Joe Gordon
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

The large-ops test is failing for multi-threaded nova-conductor. It is unclear if this is an actual performance regression or something else.

Test Patch:
https://review.openstack.org/#/c/45766/

Failing Jobs:

http://logs.openstack.org/66/45766/1/silent/gate-tempest-devstack-vm-large-ops/bac4fa9/
http://logs.openstack.org/66/45766/1/silent/gate-tempest-devstack-vm-large-ops/c9a6ab1/

Feature: https://review.openstack.org/#/c/42342/

Joe Gordon (jogo)
Changed in nova:
milestone: none → havana-rc1
Matt Riedemann (mriedem)
tags: added: conductor testing
Joe Gordon (jogo)
description: updated
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

And don't see 1224628 on the rechecks page.

Revision history for this message
Joe Gordon (jogo) wrote :

Dims, that is because large-ops is in the silent pipeline, it runs on every patch set but doesn't report back. And this didn't appears to be a repeatable test failure only for the unmerged patch above to devstack.

The log URLs above show the two failed runs. This can be reproduced by running a recheck and watching zuul / Jenkins for the test to fail then finding the logs URL

tags: added: havana-rc-proposed
Changed in nova:
milestone: havana-rc1 → none
tags: added: havana-rc-potential
removed: havana-rc-proposed
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Download full text (3.2 KiB)

Looking at logs from https://review.openstack.org/#/c/45766/ at
http://logs.openstack.org/66/45766/1/check/gate-tempest-devstack-vm-large-ops/a9795a0/

The test error:
BuildErrorException: Server %(server_id)s failed to build and is in ERROR status
Details: <Server: scenario-server-185564612-b5044da7-0f18-476b-a064-2014b2ab391e> failed to get to expected status. In ERROR state.

seems to be caused by:
screen-n-cpu.txt.gz:2013-10-08 18:51:31.843 ERROR nova.compute.manager [req-4314b9da-9257-4935-9660-044e13f70d93 tempest.scenario.manager165361106-user tempest.scenario.manager165361106-tenant] [instance: b5044da7-0f18-476b-a064-2014b2ab391e] Error: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"
screen-n-sch.txt.gz:2013-10-08 18:51:31.889 ERROR nova.scheduler.filter_scheduler [req-4314b9da-9257-4935-9660-044e13f70d93 tempest.scenario.manager165361106-user tempest.scenario.manager165361106-tenant] [instance: b5044da7-0f18-476b-a064-2014b2ab391e] Error from last host: devstack-precise-hpcloud-az3-561338 (node devstack-precise-hpcloud-az3-561338): [u'Traceback (most recent call last):\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1039, in _build_instance\n set_access_ip=set_access_ip)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1448, in _spawn\n network_info.wait(do_raise=True)\n', u' File "/opt/stack/new/nova/nova/network/model.py", line 398, in wait\n self[:] = self._gt.wait()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait\n return self._exit_event.wait()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait\n return hubs.get_hub().switch()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch\n return self.greenlet.switch()\n', u' File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main\n result = function(*args, **kwargs)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 1230, in _allocate_network_async\n dhcp_options=dhcp_options)\n', u' File "/opt/stack/new/nova/nova/network/api.py", line 93, in wrapped\n return func(self, context, *args, **kwargs)\n', u' File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapper\n res = f(self, context, *args, **kwargs)\n', u' File "/opt/stack/new/nova/nova/network/api.py", line 300, in allocate_for_instance\n nw_info = self.network_rpcapi.allocate_for_instance(context, **args)\n', u' File "/opt/stack/new/nova/nova/network/rpcapi.py", line 184, in allocate_for_instance\n macs=jsonutils.to_primitive(macs))\n', u' File "/opt/stack/new/nova/nova/rpcclient.py", line 85, in call\n return self._invoke(self.proxy.call, ctxt, method, **kwargs)\n', u' File "/opt/stack/new/nova/nova/rpcclient.py", line 63, in _invoke\n return cast_or_call(ctxt, msg, **self.kwargs)\n', u' File "/opt/stack/new/nova/nova/openstack/common/rpc/proxy.py", line 130, in call\n exc.info, real_topic, msg.get(\'method\'))\n', u'Timeout: Timeout while waiting on RPC response - topic:...

Read more...

Revision history for this message
Joe Gordon (jogo) wrote :

So it appears that using multi-threaded conductor is somehow making nova-network slower. One possible explanation is this test is running on a pretty small box and using multi-threaded conductor is taking up to much overhead and slowing down all the other processes.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

should we try bumping up rpc_response_timeout?

Thierry Carrez (ttx)
tags: added: havana-backport-potential
removed: havana-rc-potential
Chuck Short (zulcss)
Changed in nova:
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :

I am not sure how you confirmed this one. We are running multi threaded conductor in the gate and it is working fine. The original issues seen here were partially due to us overloading the test VMs too much in general causing all kinds of random failures.

Changed in nova:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.