OpenStack Compute (nova)

Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"

Bug #1257626 reported by Joe Gordon on 2013-12-04

This bug affects 12 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Critical	Unassigned

Bug Description

http://logs.openstack.org/21/59121/6/check/gate-tempest-dsvm-large-ops/fdd1002/logs/screen-n-cpu.txt.gz?level=TRACE#_2013-12-04_06_20_16_658

2013-12-04 06:20:16.658 21854 ERROR nova.compute.manager [-] Instance failed network setup after 1 attempt(s)
<...>
2013-12-04 06:20:16.658 21854 TRACE nova.compute.manager Timeout: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"

It appears there has been a performance regression and that gate-tempest-dsvm-large-ops is now failing because of RPC timeouts to allocate_for_instance

logstash query: message:"nova.compute.manager Timeout: Timeout while waiting on RPC response - topic: \"network\", RPC method: \"allocate_for_instance\""

There seems to have been a major rise in this bug on December 3rd.

Tags:

Revision history for this message

Joe Gordon (jogo) wrote on 2013-12-04:

marking as critical since this is hitting us in the gate

Changed in nova:
milestone:	none → icehouse-2
importance:	Undecided → Critical

Revision history for this message

Joe Gordon (jogo) wrote on 2013-12-04:

elastic-recheck query: https://review.openstack.org/59919

Abhishek Chanda (abhishek-i) on 2013-12-10

Changed in nova:
status:	New → Triaged

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-01-08:

The e-r query for this isn't hitting, so opened bug 1267271 against elastic-recheck for that.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-01-08:

Nevermind, looks like it is hitting, it reported on this patch today: https://review.openstack.org/#/c/57358/

tags:

added: gate-failure network testing

Revision history for this message

Joe Gordon (jogo) wrote on 2014-01-09:

It looks like the most recent spike in this bug is due to the introduction of RAX high performance nodes in the gate: https://review.openstack.org/#/c/65236/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-09: Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/65784

Revision history for this message

Joe Gordon (jogo) wrote on 2014-01-10:

Looks like https://review.openstack.org/#/c/65760/ helped. this hasn't been seen outside of https://review.openstack.org/#/c/65989/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-12: Related fix merged to nova (master)

Reviewed: https://review.openstack.org/65784
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=831da3df616c2340f914d56c96c60b0f07cfa496
Submitter: Jenkins
Branch: master

commit 831da3df616c2340f914d56c96c60b0f07cfa496
Author: Dan Smith <email address hidden>
Date: Thu Jan 9 09:24:08 2014 -0800

Avoid unnecessary use of rootwrap for some network commands

    Every time we run something as root with rootwrap, it takes about
    ten times longer (setup-wise anyway). For things that don't need
    to be run as root, we should avoid this hit. Nova network does
    this a lot and is also slow enough to cause trouble, so this
    patch attempts to address that for a few situations.

Related-bug: #1257626

Change-Id: Idc26776bf96ccfd9f50383e9d40aa47397d4e2cf

Revision history for this message

Russell Bryant (russellb) wrote on 2014-01-13:

I believe turning large-ops down to 50 from 100 instances was the solution for this. We were just maxing out the test nodes.

Changed in nova:
status:	Triaged → Invalid
milestone:	icehouse-2 → icehouse-3

Thierry Carrez (ttx) on 2014-01-14

Changed in nova:
milestone:	icehouse-3 → none

Revision history for this message

Christopher Yeoh (cyeoh-0) wrote on 2014-01-28:

#10

Looks like this has come back again. TEMPEST_LARGE_OPS_NUMBER has not changed from 50 so something else is triggering it.

Revision history for this message

Ryan Hsu (rhsu) wrote on 2014-01-29:

#11

VMware Minesweeper CI is experiencing 100% build failures since around 6PM PST yesterday due to this error message. Logs from one of the afflicted runs here: http://10.148.255.241/logs/nova/67581/5/.

Revision history for this message

Ryan Hsu (rhsu) wrote on 2014-01-29:

#12

Sorry, wrong URL. This is the correct link: http://208.91.1.172/logs/nova/67581/5/

Revision history for this message

Joe Gordon (jogo) wrote on 2014-01-29:

#13

Christopher, appeared to come back, but all the hits were in the check queue.

Revision history for this message

Ryan Hsu (rhsu) wrote on 2014-01-31:

#14

Problem turned out to be the patches from this blueprint: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/nova-network-objects,n,z.

Revision history for this message

Alan Pevec (apevec) wrote on 2014-02-06:

#15

Hit in the gate queue: https://review.openstack.org/71230

Revision history for this message

Attila Fazekas (afazekas) wrote on 2014-06-12:

#16

check-grenade-dsvm-icehouse gate job failed with this bugs signature:

http://logs.openstack.org/99/91899/2/check/check-grenade-dsvm-icehouse/d18de65/logs/old/screen-n-cpu.txt.gz?level=ERROR#_2014-06-04_04_25_35_546

Changed in nova:
status:	Invalid → Confirmed

Revision history for this message

Joe Gordon (jogo) wrote on 2014-06-19:

#17

In your example it looks like nova-net didn't start up

Changed in nova:
status:	Confirmed → Invalid

Revision history for this message

jazeltq (jazeltq-k) wrote on 2014-07-17:

#18

Can some-one also fix this on havana release?

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.