OpenStack Compute (nova)

Floating IP takes too long to update in nova and even longer for multiple VMs

Bug #1262529 reported by Yair Fried on 2013-12-19

This bug affects 9 people

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Undecided	Unassigned
devstack	Invalid	Undecided	Unassigned
neutron	Fix Released	Medium	Unassigned
tempest	Invalid	Undecided	Unassigned

Bug Description

Associating Floating IP with neutron takes too long to show up in VM's details ('nova show' or 'compute_client.servers.get()') and even longer when there's more than 1 VM involved.

when launching 2 VMs with floating IP you can see in the log that it passes once:
"unchecked floating IPs: {}"
and fails
"Timed out while waiting for the floating IP assignments to propagate"

http://logs.openstack.org/01/55101/28/check/check-tempest-dsvm-neutron/0541dff/console.html
http://logs.openstack.org/01/55101/28/check/check-tempest-dsvm-neutron/f383f4b/console.html
http://logs.openstack.org/01/55101/31/check/check-tempest-dsvm-neutron/321413a/console.html
http://logs.openstack.org/97/62697/5/check/check-tempest-dsvm-neutron/960c6ad/console.html

also - the floating ip is accessible long time before it is updated in nova DB

How to reproduce:
https://review.openstack.org/#/c/62697/

So the problem is both:
1. the time it takes for nova to get the update
and
2. the timeout defined in the tempest neutron-gate

since I don't see this in my local setup (rhos-4.0), I don't know if this is due to stress in neutron or nova, or if it's a devstack issue

See original description

Tags:

Revision history for this message

Yair Fried (yfried) wrote on 2013-12-19:

Jenkins log files Edit (1.4 MiB, application/x-tar)

summary:

- Floating IP takes too long to update in nova and even longer
+ Floating IP takes too long to update in nova and even longer for
+ multiple VMs

Yair Fried (yfried) on 2013-12-19

description:

updated

Revision history for this message

Adalberto Medeiros (adalbas) wrote on 2013-12-19:

Does not seem to be a tempest issue.

Changed in tempest:
status:	New → Invalid

Revision history for this message

Yair Fried (yfried) wrote on 2013-12-19:

@adalbas this is a tempest issue because the test needs to change, as agreed in the mailing list

Changed in tempest:
status:	Invalid → Incomplete

Revision history for this message

Feng Ju (jufeng) wrote on 2013-12-22:

HI Yair,
As I know, if you use quantum/neutron to associate floating ip.
1. First the neutron-server will update FloatingIP table in database (which will make the `neutron floatingip-show` command show the floating ip that has associated with port, actually the flaoting ip hasn't associate with port), then notify l3-agent to update qrouter device and nat rules. (https://github.com/openstack/neutron/blob/master/neutron/db/l3_db.py#L659)
2. l3-agent periodically update routeres (assocate floating ip to qrouter device and apply nat rules.) every 60 seconds which is the default value (https://github.com/openstack/neutron/blob/master/neutron/agent/l3_agent.py#L733)
After this step, floating ip will can be pinged.
3. Nova compute also periodically update InstanceInfoCache table which is queried by `nova show <instance>` command every 60 seconds which is the default value. (https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4391)

The reason of this issue I guess maybe is that l3-agent or nova compute has something wrong when run periodical tasks.
maybe l3-agent/nova-compute is to busy to run periodical tasks.
maybe the peridoical task raise excepitons.

About the timeout value in tempest to test floating ip, I think 120 seconds is the mininum value.
if we want to check out who make the floating ip not appear in `nova show` command.
we can first check step 2 (but this step is not easy to check, maybe using the default value 60 seconds to check. which is checking the floating ip connectivity with a timeout of 60 seconds),
then check step 3.

Maybe my understanding of floating ip is not completely correct, so how do you think?

Revision history for this message

Yair Fried (yfried) wrote on 2013-12-22:

I'm thinking:

Tempest sollution:
1. Removing "wait_for_floating_ip" from network_basic_ops should be the first step as it doesn't check for network connectivity and fails the test for no reason
2. Create a scenario that will test both DBs (neutron and nova) and try to stress them (as describe in mailing list)

Analyzing the new scenario should tell us how to solve the issue in neutron/nova

Revision history for this message

Yair Fried (yfried) wrote on 2013-12-22:

This patch removes this test from network_basic_ops

Revision history for this message

Yair Fried (yfried) wrote on 2013-12-22:

^ https://review.openstack.org/#/c/63627/ ^

Revision history for this message

Salvatore Orlando (salvatore-orlando) wrote on 2013-12-29:

Hi Yair,

I agree with your suggestion about removing the check in network basic ops.
In principle I also agree with your advice about creating a new scenario for 'association' (it's actually not in my opinion an association, but let's stick to the point!); however this effort should be coupled with a parallel effort aimed at solving the underlying problem in nova, which perhaps is connected to the way neutron floating ips updates are propagated back to nova.

Revision history for this message

Alexander Ignatov (aignatov) wrote on 2014-02-07:

Savanna team faced the same issue during the integrating testing with using novaclient with comparison of neutronclient:
https://bugs.launchpad.net/savanna/+bug/1277501

Mark McClain (markmcclain) on 2014-03-12

Changed in neutron:
status:	New → Triaged
tags:	added: nova-neutron
Changed in neutron:
importance:	Undecided → Medium

Revision history for this message

Yair Fried (yfried) wrote on 2014-03-13:

#10

I think the Tempest portion can be closed - it remains a neutron-nova issue

Revision history for this message

Phil Hopkins (phil-hopkins-a) wrote on 2014-03-19:

#11

I too have been having a problen in that it may take up to 15 minutes for a floating IP to appear on an instance when I run a nova list command after associating a floating IP to an instance.

No errors appear in any of the log files.

I am running Havana using Neutron ML2/Open vSwitch on Ubuntu 1204.

My environment is a controller node, a network node and 7 compute/hypervisor nodes.

The controller, network and 5 hypervisor nodes are dual proc 12 core Xeon processors with 72G of RAM.
The other two hypervisor nodes only have one 12 core proc with 72G RAM.

I have a script that I run on the controller node which creates a project, two users in the project, 4 networks, modifies the security group rules, starts 4 instances that attach to some or all of the networks and then assigns a floating IP to one interface on one VM.

This script may loop to create up to 25 of these environments.

Watching nova list --all-tenants it can take up to 20 to 30 minutes for all of the VMs to have their floating IPs appear. The floating IPs seem to work right away but the delay for them to show on nova list is strange.

Revision history for this message

Craig Anderson (canderso) wrote on 2014-03-19:

#12

Can confirm that this also affects me, using Neutron OVS on Ubuntu 12.04 (Havana). Had the same problem on Grizzly as well (using quantum linuxbridge on Ubuntu 12.04).

Revision history for this message

Mauro S M Rodrigues (maurorodrigues) wrote on 2014-03-19:

#13

As per yfried's comment closing tempest portion.

Mauro S M Rodrigues (maurorodrigues) on 2014-03-19

Changed in tempest:
status:	Incomplete → Invalid

Revision history for this message

Aaron Rosen (arosen) wrote on 2014-06-03:

#14

Fixed in Icehouse with the addition of nova-neutron events

Changed in nova:
status:	New → Invalid
status:	Invalid → Fix Released
Changed in devstack:
status:	New → Invalid

Revision history for this message

Eugene Nikanorov (enikanorov) wrote on 2014-11-21:

#15

Changing to Fix Released per Aaron's comment.

Changed in neutron:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1262234

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Jenkins log files Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.