Nova throws 400 when attempting to add floating ip (instance.info_cache.network_info is empty)

Bug #1249065 reported by Luis A. Garcia
160
This bug affects 36 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Kilo
Fix Released
Undecided
Unassigned

Bug Description

Ran into this problem in check-tempest-devstack-vm-neutron

 Traceback (most recent call last):
   File "tempest/scenario/test_snapshot_pattern.py", line 74, in test_snapshot_pattern
     self._set_floating_ip_to_server(server, fip_for_server)
   File "tempest/scenario/test_snapshot_pattern.py", line 62, in _set_floating_ip_to_server
     server.add_floating_ip(floating_ip)
   File "/opt/stack/new/python-novaclient/novaclient/v1_1/servers.py", line 108, in add_floating_ip
     self.manager.add_floating_ip(self, address, fixed_address)
   File "/opt/stack/new/python-novaclient/novaclient/v1_1/servers.py", line 465, in add_floating_ip
     self._action('addFloatingIp', server, {'address': address})
   File "/opt/stack/new/python-novaclient/novaclient/v1_1/servers.py", line 993, in _action
     return self.api.client.post(url, body=body)
   File "/opt/stack/new/python-novaclient/novaclient/client.py", line 234, in post
     return self._cs_request(url, 'POST', **kwargs)
   File "/opt/stack/new/python-novaclient/novaclient/client.py", line 213, in _cs_request
     **kwargs)
   File "/opt/stack/new/python-novaclient/novaclient/client.py", line 195, in _time_request
     resp, body = self.request(url, method, **kwargs)
   File "/opt/stack/new/python-novaclient/novaclient/client.py", line 189, in request
     raise exceptions.from_response(resp, body, url, method)
 BadRequest: No nw_info cache associated with instance (HTTP 400) (Request-ID: req-9fea0363-4532-4ad1-af89-114cff68bd89)

Full console logs here: http://logs.openstack.org/27/55327/3/check/check-tempest-devstack-vm-neutron/8d26d3c/console.html

Luis A. Garcia (luisg-8)
affects: oslo → nova
description: updated
Revision history for this message
Luis A. Garcia (luisg-8) wrote :

Ran into another stack trace in the same test case:

2013-11-11 18:44:12.913 | Traceback (most recent call last):
2013-11-11 18:44:12.914 | File "tempest/scenario/test_snapshot_pattern.py", line 90, in test_snapshot_pattern
2013-11-11 18:44:12.914 | self._check_timestamp(fip_for_snapshot.ip)
2013-11-11 18:44:12.914 | File "tempest/scenario/test_snapshot_pattern.py", line 52, in _check_timestamp
2013-11-11 18:44:12.915 | ssh_client = self._ssh_to_server(server_or_ip)
2013-11-11 18:44:12.915 | File "tempest/scenario/test_snapshot_pattern.py", line 43, in _ssh_to_server
2013-11-11 18:44:12.915 | linux_client = self.get_remote_client(server_or_ip)
2013-11-11 18:44:12.915 | File "tempest/scenario/manager.py", line 466, in get_remote_client
2013-11-11 18:44:12.916 | return RemoteClient(ip, username, pkey=private_key)
2013-11-11 18:44:12.916 | File "tempest/common/utils/linux/remote_client.py", line 47, in __init__
2013-11-11 18:44:12.917 | if not self.ssh_client.test_connection_auth():
2013-11-11 18:44:12.917 | File "tempest/common/ssh.py", line 148, in test_connection_auth
2013-11-11 18:44:12.918 | connection = self._get_ssh_connection()
2013-11-11 18:44:12.918 | File "tempest/common/ssh.py", line 76, in _get_ssh_connection
2013-11-11 18:44:12.918 | password=self.password)
2013-11-11 18:44:12.918 | SSHTimeout: Connection to the 172.24.4.229 via SSH timed out.
2013-11-11 18:44:12.919 | User: cirros, Password: None

Revision history for this message
Joe Gordon (jogo) wrote :

logstash query: filename:"logs/screen-n-api.txt" AND message:"No nw_info cache associated with instance"

Changed in nova:
status: New → Confirmed
Revision history for this message
Joe Gordon (jogo) wrote :

This only happens when using neutron

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

This can be an issue within nova/network/neutronv2/api.py

I'll take this on myself for traging.

Changed in neutron:
assignee: nobody → Salvatore Orlando (salvatore-orlando)
Revision history for this message
Anita Kuno (anteaya) wrote :
Changed in neutron:
assignee: Salvatore Orlando (salvatore-orlando) → Aaron Rosen (arosen)
Revision history for this message
Aaron Rosen (arosen) wrote :

This is caused by the cache not being updated here: http://codepad.org/RhoXf936

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/57711

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/57921

Revision history for this message
Joe Gordon (jogo) wrote : Re: Tempest failure: tempest/scenario/test_snapshot_pattern.py

Setting to critical because this is destabilizing the gate

Changed in nova:
importance: Undecided → Critical
milestone: none → icehouse-1
Revision history for this message
Anita Kuno (anteaya) wrote :

https://review.openstack.org/#/c/57711/ is still open for review/approval, 57921 has been abandoned.

Aaron Rosen (arosen)
Changed in nova:
assignee: nobody → Aaron Rosen (arosen)
no longer affects: neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/57711
Committed: http://github.com/openstack/nova/commit/4c03383f21bc13caf3fed4db5aa1317d37758d5c
Submitter: Jenkins
Branch: master

commit 4c03383f21bc13caf3fed4db5aa1317d37758d5c
Author: Aaron Rosen <email address hidden>
Date: Thu Nov 21 07:57:44 2013 -0800

    Do not hide exception in update_instance_cache_with_nw_info

    From time to time an exception is raised in this method causing
    the nw_info cache not to be saved. If this occurs we should raise
    as this error will cause later errors to occur. For example, one
    won't be able to associate a floatingip with the instance as there
    is no nw_info found in this table. In addition, the fixed_ips on
    the instance won't be returned via the api.

    This patch also stubs out update_instance_cache_with_nw_info in a
    few tests where an exception was being raised previously but went
    unnoticed as it was not reraised but now is.

    Related-Bug: #1252849
    Related-Bug: #1249065

    Change-Id: Ic860f72210ba736e11c10df21c4cb7625e9c0928

Changed in nova:
milestone: icehouse-1 → icehouse-2
Revision history for this message
Alan Pevec (apevec) wrote : Re: Tempest failure: tempest/scenario/test_snapshot_pattern.py

SSH timeout from comment 1 is tracked in bug 1249116

Sean Dague (sdague)
summary: - Tempest failure: tempest/scenario/test_snapshot_pattern.py
+ Nova throws 400 when attempting to add floating ip (No nw_info cache
+ associated with instance)
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → icehouse-3
Revision history for this message
Attila Fazekas (afazekas) wrote : Re: Nova throws 400 when attempting to add floating ip (No nw_info cache associated with instance)

test_rescued_vm_associate_dissociate_floating_ip test case also triggers this issue.

Revision history for this message
Joe Gordon (jogo) wrote :

only 4 hits in last 14 days, dropping down from critical status.

Changed in nova:
importance: Critical → High
Revision history for this message
Jeremy Stanley (fungi) wrote :
Revision history for this message
Aaron Rosen (arosen) wrote :

Whoops here's the backport that I believe fixes this issue: https://review.openstack.org/73202

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/havana)

Related fix proposed to branch: stable/havana
Review: https://review.openstack.org/73202

Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → icehouse-rc1
Changed in nova:
milestone: icehouse-rc1 → none
importance: High → Medium
tags: added: icehouse-rc-potential network
Revision history for this message
Joe Gordon (jogo) wrote : Re: Nova throws 400 when attempting to add floating ip (No nw_info cache associated with instance)

Re-adding neutron as this only happens in nova+neutron. If this is really just a nova being silly problem please feel free to mark as invalid/remove neutron

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/80427

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/80427
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a798282be785a673ef200063ba114c91b9cad683
Submitter: Jenkins
Branch: master

commit a798282be785a673ef200063ba114c91b9cad683
Author: Dan Smith <email address hidden>
Date: Thu Mar 13 17:14:53 2014 -0700

    Make sure instance saves network_info when we go ACTIVE

    This makes sure that when we do a save() on our instance during
    creation, that we don't save the empty network_info we created
    earlier over a potentially-full one built during the process.
    This may happen since we wrap the networkinfo in the async helper.
    This also moves the network_info.wait() out from under a conditional
    which could cover up other issues leading to a case where it is
    None (which shouldn't be possible). If we didn't call wait(),
    then saving it wouldn't be productive or correct.

    Change-Id: I9bd8dd0dce9e17cd128c7495acd52aa82119c57c
    Related-bug: 1249065

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/73202
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=53acc09fb9b3ffe7c69bdc484c8cf56032182e28
Submitter: Jenkins
Branch: stable/havana

commit 53acc09fb9b3ffe7c69bdc484c8cf56032182e28
Author: Aaron Rosen <email address hidden>
Date: Thu Nov 21 07:57:44 2013 -0800

    Do not hide exception in update_instance_cache_with_nw_info

    From time to time an exception is raised in this method causing
    the nw_info cache not to be saved. If this occurs we should raise
    as this error will cause later errors to occur. For example, one
    won't be able to associate a floatingip with the instance as there
    is no nw_info found in this table. In addition, the fixed_ips on
    the instance won't be returned via the api.

    This patch also stubs out update_instance_cache_with_nw_info in a
    few tests where an exception was being raised previously but went
    unnoticed as it was not reraised but now is.

    Related-Bug: #1252849
    Related-Bug: #1249065

    Change-Id: Ic860f72210ba736e11c10df21c4cb7625e9c0928
    (cherry picked from commit 4c03383f21bc13caf3fed4db5aa1317d37758d5c)

tags: added: in-stable-havana
Alan Pevec (apevec)
tags: removed: in-stable-havana
Thierry Carrez (ttx)
tags: added: icehouse-backport-potential
removed: icehouse-rc-potential
Revision history for this message
Eugene Nikanorov (enikanorov) wrote : Re: Nova throws 400 when attempting to add floating ip (No nw_info cache associated with instance)

Looks like neutron can be removed from the bug as everything seem to be fixed on nova side

no longer affects: neutron
Brent Eagles (beagles)
tags: added: neutron
David Britton (dpb)
tags: added: landscape
Revision history for this message
rushi.ns (murthy-nsm) wrote :

Hello Support,

I have the same issue while associating a IP address to a instance.

could you please provide what is the fix. its quite urgent that i am stuck ..

i have followed the link http://docs.openstack.org/icehouse/install-guide/install/zypper/content/launch-instance-neutron.html
 to install openstack icehouse with nova ...can you help .

BR.
Rushi.

Revision history for this message
Attila Fazekas (afazekas) wrote :

2015-01-19 14:44:56.738 DEBUG nova.api.openstack.wsgi [req-e07c9571-e408-4cb0-b17f-74da537029c0 TestSnapshotPattern-1814883997 TestSnapshotPattern-1557289366] Returning 400 to user: No nw_info cache associated with instance __call__ /opt/stack/new/nova/nova/api/openstack/wsgi.py:1159

I saw the above message in nova-network setup with the master branch installed at '2015-01-16 03:17:09.406'.
The Log does not have stack trace.
I do not see similar message with nova-network in the past week on logstash.

Revision history for this message
George Silvis, III (george-iii-silvis) wrote :

I'm hitting this too, intermittently, on Icehouse

/var/log/nova/nova-api.log:2015-02-23 14:51:05.866 5072 INFO nova.api.openstack.wsgi [req-bccfea1b-1bba-4e79-ab11-2906d9cd0b8a b3c2566d88a44dc0948c569147e861a2 f88f2cc107914b90a86820252a71c270] HTTP exception thrown: No nw_info cache associated with instance

Revision history for this message
Tom Fifield (fifieldt) wrote :

Just hit this bug as well with Icehouse:

Exception: 400 Bad Request No nw_info cache associated with instance

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
assignee: Aaron Rosen (arosen) → nobody
no longer affects: nova/havana
Matt Riedemann (mriedem)
tags: removed: icehouse-backport-potential landscape neutron
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171390

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/171391

Revision history for this message
Alex Xu (xuhj) wrote : Re: Nova throws 400 when attempting to add floating ip (No nw_info cache associated with instance)

I dig into this case http://logs.openstack.org/18/170118/5/check/check-tempest-dsvm-multinode-full/5f871c2/logs/screen-n-net.txt.gz#_2015-04-24_09_30_55_207

I think there is race between allocate fixed ip for instance and the dhcpbridge script.

The fixed ip allocation begin from here

2015-04-24 09:30:55.055 DEBUG nova.network.manager [req-543f9791-de80-40e3-90af-90a8ff30bb40 TestShelveInstance-2072004359 TestShelveInstance-310454229] [instance: c0d42ce0-c38d-4ed3-ad05-352c57ae96e5] Networks retrieved for instance: |[{'bridge': u'br100', 'multi_host': True, 'should_create_bridge': True, 'bridge_interface': u'br_flat', 'tenant_id': None, 'label': u'private', 'id': 'f17001ac-a843-4bc3-923f-44f9800f7aaf'}]| allocate_for_instance /opt/stack/new/nova/nova/network/manager.py:486

And end at here:
2015-04-24 09:30:55.396 DEBUG nova.network.manager [req-543f9791-de80-40e3-90af-90a8ff30bb40 TestShelveInstance-2072004359 TestShelveInstance-310454229] Built network info: |[]| get_instance_nw_info /opt/stack/new/nova/nova/network/manager.py:650

The end show the network info is empty. But there isn't any error at here. So the problem is the fixed ip allocated, but at the end it's disapeared.

And there is log between the beginning and the end:
2015-04-24 09:30:55.207 DEBUG nova.network.manager [req-36fe2852-902d-48b0-935f-f75f7ca68b37 None None] Released IP |10.1.0.7| release_fixed_ip

This 'release_fixed_ip' is only can be triggered by dhcpbridge script
https://github.com/openstack/nova/blob/master/nova/cmd/dhcpbridge.py#L67

The dhcpbridge script will called by dnsmasq.

So the thing is when the middle of allocated ip, the dhcpbridge release same ip which used by previous instance at same time.

Revision history for this message
jichenjc (jichenjc) wrote :

There is a log above:

2015-04-24 09:30:50.695 WARNING nova.network.manager [req-946bbdde-ecbd-4a32-9bd8-454d947d0ccf None None] IP |10.1.0.7| leased that isn't allocated

seems we didn't check whether it's allocated = True in the release function of dhcp_release?

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
jichenjc (jichenjc) wrote :

ok, then this 'WARNING' is really confusing ...
same IP was released right above the log which contains error info as Alex said
and it's a warning ,so I thought it might be related..

let me read more about logs you added from #32

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/181035
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c13f42e36cddd820b475ffecf1f3a9e894d0cc04
Submitter: Jenkins
Branch: master

commit c13f42e36cddd820b475ffecf1f3a9e894d0cc04
Author: Dan Smith <email address hidden>
Date: Thu May 7 08:42:49 2015 -0700

    Log the actual instance.info_cache when empty in floating ip associate

    This attempts to add some enlightenment to the case where we try to
    associate a floating ip for an instance and hit the case where there
    is no info_cache. We need to know if it's actually None, or empty.

    Related-Bug: #1249065

    Change-Id: Ic0051d1d520f800d845b7006b5b95be5d8edf7c9

Revision history for this message
Matt Riedemann (mriedem) wrote : Re: Nova throws 400 when attempting to add floating ip (No nw_info cache associated with instance)

Got 1 hit in 7 days with the new logging:

http://logs.openstack.org/69/190169/3/check/check-tempest-dsvm-full-glusterfs-nv/eecdc4e/logs/screen-n-api.txt.gz#_2015-06-11_14_40_45_576

The instance.info_cache is not None, but it's network_info attribute is empty:

2015-06-11 14:40:45.576 WARNING nova.api.openstack.compute.contrib.floating_ips [req-b21b23d2-fc5b-4520-84e8-596f994f7873 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Info cache is InstanceInfoCache(created_at=2015-06-11T14:39:49Z,deleted=False,deleted_at=None,instance_uuid=a12ee0d4-5747-4e40-9d22-d956afcbe490,network_info=NetworkModel(),updated_at=None) during associate

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/171391
Reason: The instance.info_cache is not None, but the network_info attribute inside it is empty:

http://logs.openstack.org/69/190169/3/check/check-tempest-dsvm-full-glusterfs-nv/eecdc4e/logs/screen-n-api.txt.gz#_2015-06-11_14_40_45_576

The instance.info_cache is not None, but it's network_info attribute is empty:

2015-06-11 14:40:45.576 WARNING nova.api.openstack.compute.contrib.floating_ips [req-b21b23d2-fc5b-4520-84e8-596f994f7873 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Info cache is InstanceInfoCache(created_at=2015-06-11T14:39:49Z,deleted=False,deleted_at=None,instance_uuid=a12ee0d4-5747-4e40-9d22-d956afcbe490,network_info=NetworkModel(),updated_at=None) during associate

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/171390
Reason: The instance.info_cache is not None, but the network_info attribute inside it is empty:
http://logs.openstack.org/69/190169/3/check/check-tempest-dsvm-full-glusterfs-nv/eecdc4e/logs/screen-n-api.txt.gz#_2015-06-11_14_40_45_576
The instance.info_cache is not None, but it's network_info attribute is empty:
2015-06-11 14:40:45.576 WARNING nova.api.openstack.compute.contrib.floating_ips [req-b21b23d2-fc5b-4520-84e8-596f994f7873 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Info cache is InstanceInfoCache(created_at=2015-06-11T14:39:49Z,deleted=False,deleted_at=None,instance_uuid=a12ee0d4-5747-4e40-9d22-d956afcbe490,network_info=NetworkModel(),updated_at=None) during associate

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Confirmed
summary: - Nova throws 400 when attempting to add floating ip (No nw_info cache
- associated with instance)
+ Nova throws 400 when attempting to add floating ip
+ (instance.info_cache.network_info is empty)
Revision history for this message
Matt Riedemann (mriedem) wrote :

I see this in the compute logs for the instance that failed:

http://logs.openstack.org/69/190169/3/check/check-tempest-dsvm-full-glusterfs-nv/eecdc4e/logs/screen-n-cpu.txt.gz#_2015-06-11_14_39_50_212

2015-06-11 14:39:50.212 INFO nova.virt.libvirt.driver [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Using config drive
2015-06-11 14:39:50.482 16662 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "refresh_cache-a12ee0d4-5747-4e40-9d22-d956afcbe490" lock /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:391
2015-06-11 14:39:50.482 16662 DEBUG nova.network.base_api [-] Updating cache with info: [] update_instance_cache_with_nw_info /opt/stack/new/nova/nova/network/base_api.py:42
2015-06-11 14:39:50.494 16662 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "refresh_cache-a12ee0d4-5747-4e40-9d22-d956afcbe490" lock /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:404

The instance uuid is a12ee0d4-5747-4e40-9d22-d956afcbe490.

This is a nova-network job so something is coming back as [] from _get_instance_nw_info.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Looking in the nova-network logs, a network is never allocated/built properly for the instance:

http://logs.openstack.org/69/190169/3/check/check-tempest-dsvm-full-glusterfs-nv/eecdc4e/logs/screen-n-net.txt.gz#_2015-06-11_14_39_50_479

There are 9 calls to refresh the nw_info and they are always empty, e.g.:

2015-06-11 14:39:50.443 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Get instance network info get_instance_nw_info /opt/stack/new/nova/nova/network/manager.py:588
2015-06-11 14:39:50.478 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] Built network info: |[]| get_instance_nw_info /opt/stack/new/nova/nova/network/manager.py:650

Revision history for this message
Matt Riedemann (mriedem) wrote :

We see that a fixed_ip is allocated for the instance here:

http://logs.openstack.org/69/190169/3/check/check-tempest-dsvm-full-glusterfs-nv/eecdc4e/logs/screen-n-net.txt.gz#_2015-06-11_14_39_50_442

2015-06-11 14:39:50.442 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Allocated fixed ip 10.1.0.2 on network f816e349-5d47-4229-af2b-357301adf78e allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:954

I'm wondering if we're hitting some DB race and getting an exception but it's ignored?

http://git.openstack.org/cgit/openstack/nova/tree/nova/network/manager.py#n590

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.6 KiB)

The fixed IP is being associated with the instance here:

2015-06-11 14:39:49.991 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Associating instance with fixed IP from pool in network 1 on subnet 10.1.0.0/20. allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:895
2015-06-11 14:39:50.002 DEBUG nova.network.manager [req-47d26fed-b6cd-4971-8f01-eff886f4b728 None None] Released IP |10.1.0.2| release_fixed_ip /opt/stack/new/nova/nova/network/manager.py:1089
2015-06-11 14:39:50.027 WARNING nova.network.manager [req-47d26fed-b6cd-4971-8f01-eff886f4b728 None None] IP 10.1.0.2 released that was not leased

This is logged before we leave the allocate_fixed_ip method:

2015-06-11 14:39:50.442 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Allocated fixed ip 10.1.0.2 on network f816e349-5d47-4229-af2b-357301adf78e allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:954
2015-06-11 14:39:50.443 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Get instance network info get_instance_nw_info /opt/stack/new/nova/nova/network/manager.py:588
2015-06-11 14:39:50.478 DEBUG nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] Built network info: |[]| get_instance_nw_info /opt/stack/new/nova/nova/network/manager.py:650
2015-06-11 14:39:50.479 INFO nova.network.manager [req-3a2c66ac-1348-46b4-9add-8d5d3ef10a33 ServerRescueTestJSON-360939495 ServerRescueTestJSON-1539944143] [instance: a12ee0d4-5747-4e40-9d22-d956afcbe490] Allocated network: '[]' for instance

So the fixed IP was 10.1.0.2 and noting the warning that it's released before it was leased. That was for a different request though:

req-47d26fed-b6cd-4971-8f01-eff886f4b728

That's for instance: f493b6ed-65b6-4d55-a32c-a92b549309f2

The network for that instance is being torn down at the same time that the failed instance network allocation is coming up:

2015-06-11 14:39:41.594 INFO nova.network.manager [req-dbbefbd1-9d0b-453c-ad82-8a5de66be512 ServerMetadataNegativeTestJSON-592398845 ServerMetadataNegativeTestJSON-211084830] [instance: f493b6ed-65b6-4d55-a32c-a92b549309f2] Network deallocated for instance (fixed ips: '['10.1.0.2']')
2015-06-11 14:39:42.806 DEBUG nova.network.manager [req-59280c49-b125-474a-beed-da3d5024c427 None None] Released IP |10.1.0.2| release_fixed_ip /opt/stack/new/nova/nova/network/manager.py:1089
2015-06-11 14:39:47.565 DEBUG nova.network.manager [req-c2f3e01f-1992-48e6-9010-306ea4333cef None None] Leased IP |10.1.0.2| lease_fixed_ip /opt/stack/new/nova/nova/network/manager.py:1074
2015-06-11 14:39:47.577 WARNING nova.network.manager [req-c2f3e01f-1992-48e6-9010-306ea4333cef None None] IP 10.1.0.2 leased that is not associated

So it seems that when we get here:

http://logs.openstack.org/69/...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/192348

Revision history for this message
Matt Riedemann (mriedem) wrote :

With the logging added in https://review.openstack.org/#/c/192348/ we see the race.

1. The fixed IP is associated to an instance here (leased and allocated are False at this point):

http://logs.openstack.org/42/175742/10/check/check-tempest-dsvm-nova-v21-full/2850b02/logs/screen-n-net.txt.gz#_2015-06-23_05_10_55_382

2015-06-23 05:10:55.382 DEBUG nova.network.manager [req-8dc15314-f79e-489a-a454-7167007017e0 ServerRescueTestJSON-1209960414 ServerRescueTestJSON-118037235] [instance: 75316103-db05-4a9f-8967-c2a1f0325e2d] Associated instance with fixed IP: FixedIP(address=10.1.0.5,allocated=False,created_at=2015-06-23T04:58:34Z,default_route=<?>,deleted=False,deleted_at=None,floating_ips=<?>,host=None,id=6,instance=<?>,instance_uuid=75316103-db05-4a9f-8967-c2a1f0325e2d,leased=False,network=<?>,network_id=1,reserved=False,updated_at=2015-06-23T05:09:51Z,virtual_interface=<?>,virtual_interface_id=None) allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:908

2. Then via the dhcp-agent callback the fixed IP is released from a previous instance:

http://logs.openstack.org/42/175742/10/check/check-tempest-dsvm-nova-v21-full/2850b02/logs/screen-n-net.txt.gz#_2015-06-23_05_10_55_459

2015-06-23 05:10:55.459 WARNING nova.network.manager [req-23a38a9f-b465-4b39-bb95-4044e1639636 None None] [instance: 75316103-db05-4a9f-8967-c2a1f0325e2d] IP 10.1.0.5 released that was not leased

Since the fixed_ip.allocated value is False at that point, it thinks it's OK to disassociate the fixed IP, which leads to:

3. get_instance_nw_info() doesn't find any associated fixed IP for the new instance:

http://logs.openstack.org/42/175742/10/check/check-tempest-dsvm-nova-v21-full/2850b02/logs/screen-n-net.txt.gz#_2015-06-23_05_10_55_731

2015-06-23 05:10:55.731 DEBUG nova.network.manager [req-8dc15314-f79e-489a-a454-7167007017e0 ServerRescueTestJSON-1209960414 ServerRescueTestJSON-118037235] [instance: 75316103-db05-4a9f-8967-c2a1f0325e2d] Found 0 fixed IPs associated to the instance in the database. get_instance_nw_info /opt/stack/new/nova/nova/network/manager.py:598

And the nw info cache isn't built and we fail.

Changed in nova:
status: Confirmed → Triaged
assignee: nobody → Matt Riedemann (mriedem)
Matt Riedemann (mriedem)
tags: added: kilo-backport-potential
Changed in nova:
status: Triaged → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

There is a tactical fix here which doesn't completely solve the race but might help, and is backportable:

https://review.openstack.org/#/c/194815/

This should be the long term fix for the race:

https://review.openstack.org/#/c/194843/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/195190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/195190
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2b5fe5db0da7356527f076a2db9822506299ecfe
Submitter: Jenkins
Branch: stable/kilo

commit 2b5fe5db0da7356527f076a2db9822506299ecfe
Author: Matt Riedemann <email address hidden>
Date: Tue Jun 23 13:49:42 2015 -0700

    Reduce window for allocate_fixed_ip / release_fixed_ip race in nova-net

    There is a race during allocate_fixed_ip where a new instance, B, is
    associated with a fixed IP, X, which was previously associated with
    instance A that is being deallocated.

    Between the time that instance A is associated with fixed IP X and the
    time that it's VIF is allocated, and fip.allocated = True in the DB, the
    dhcpagent callback hits release_fixed_ip for the fixed IP X from when
    instance A was deallocating.

    release_fixed_ip checks to see if the instance is allocated and if not
    it disassociates the instance, which was associated with new instance B.

    This leads to get_instance_nw_info() not building anything since there
    are no fixed IPs associated with instance A in the database, so
    eventually anything needing to do networking with instance A, like
    assocating a floating IP, fails.

    To narrow the race, we do the VIF allocation before associating the
    fixed IP to the new instance. This does not completely fix the bug, but
    it's a tactical change that we can backport to stable branches while
    working on the longer-term fix which is going to involve network RPC API
    changes to release_fixed_ip().

    Note that test_vpn_allocate_fixed_ip_no_network_id is removed since it
    no longer works and arguably was testing the DB API in the wrong place,
    so a new test is added to test_db_api for the same coverage.

    Partial-Bug: #1249065

    Change-Id: I8cf5483982085da57ee470fa2753b0d0aebc12b3
    (cherry picked from commit 3736c120cbeacbc4349efbda99e279cddfb5f09e)

tags: added: in-stable-kilo
Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Matt Riedemann (mriedem) wrote :

Another related fix for the full race fix so the DB is updated atomically so that when the instance is associated it's also marked as allocated:

https://review.openstack.org/#/c/197135/

Revision history for this message
Matt Riedemann (mriedem) wrote :

I just hit this again on master:

http://logs.openstack.org/23/197223/1/check/check-tempest-dsvm-full-ceph/bd59580/logs/screen-n-net.txt.gz#_2015-06-30_20_41_13_227

We need some more debug logging in the release_fixed_ip method.

Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-2
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/197608
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fd471b2fc1fee6e859ee3657b53548a61c0f3a1e
Submitter: Jenkins
Branch: master

commit fd471b2fc1fee6e859ee3657b53548a61c0f3a1e
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 1 06:56:32 2015 -0700

    n-net: add more debug logging to release_fixed_ip

    Despite commit 8c7690221294d6d7f2db84491fec8d81ef358cac we're still
    seeing this race in the gate so add more debug logging in the
    release_fixed_ip method to get a better idea of what's going on.

    Related-Bug: #1249065

    Change-Id: I9eda6cadf8aa31576bedf10c71c41fe1a3d4d629

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/197135
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=73a68abb28b9ad5f4e9dee2781cdc5203bcb5937
Submitter: Jenkins
Branch: master

commit 73a68abb28b9ad5f4e9dee2781cdc5203bcb5937
Author: Ryan Rossiter <email address hidden>
Date: Mon Jun 29 13:19:07 2015 +0000

    Set vif and allocated when associating fixed ip

    Follow on to changes I84fa2f0926f719bff33ac43b39e161a4eb412f44 and
    I8cf5483982085da57ee470fa2753b0d0aebc12b3.

    This change removes a race when associating fixed IPs and pools.
    Originally, the fixed IPs were associated, and
    "virtual_interface_id" was updated and saved to the DB at a later time.
    This value is now sent into associate(_pool) and are tunneled down
    to the DB on associate, so now the value is saved to the DB at
    association.

    Change-Id: Ic37d96deba3a5ef51594733cc16091173a9655a4
    Closes-Bug: #1249065

Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-2 → 12.0.0
Revision history for this message
Martin Gerhard Loschwitz (martin-loschwitz) wrote :
Download full text (3.3 KiB)

While testing our current Kilo cloud, we're seeing a lot of occasions of this in the following tempest tests:

tempest.api.compute.floating_ips.test_floating_ips_actions_negative.FloatingIPsNegativeTestJSON.test_associate_nonexistent_floating_ip[id-595fa616-1a71-4670-9614-46564ac49a4c,negative,network]

tempest.api.compute.floating_ips.test_floating_ips_actions.FloatingIPsTestJSON.test_associate_already_associated_floating_ip[id-6edef4b2-aaf1-4abc-bbe3-993e2561e0fe,network]

tempest.api.compute.floating_ips.test_floating_ips_actions.FloatingIPsTestJSON.test_associate_disassociate_floating_ip[id-307efa27-dc6f-48a0-8cd2-162ce3ef0b52,network]

tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescued_vm_associate_dissociate_floating_ip[id-4842e0cf-e87d-4d9d-b61f-f4791da3cacc]

tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance[compute,id-1164e700-0af0-4a4c-8792-35909a88743c,image,network]
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance[compute,id-c1b6318c-b9da-490b-9c67-9339b627271f,image,network,volume]

tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,id-bdbb5441-9204-419d-a225-b4fdbfb1a1a8,image,network,volume]

tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern[compute,id-608e604b-1d63-4a82-8e3e-91bc665c90b4,image,network]

tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,smoke,volume]

tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,smoke,volume]

The error messages are always very similar:

    Traceback (most recent call last):
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest/test.py", line 113, in wrapper
        return f(self, *func_args, **func_kwargs)
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest/scenario/test_volume_boot_pattern.py", line 117, in test_volume_boot_pattern
        ip_instance_1st = self.get_server_or_ip(instance_1st)
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest/scenario/manager.py", line 657, in get_server_or_ip
        ip = self.create_floating_ip(server)['ip']
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest/scenario/manager.py", line 625, in create_floating_ip
        floating_ip['ip'], thing['id'])
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest_lib/services/compute/floating_ips_client.py", line 75, in associate_floating_ip_to_server
        resp, body = self.post(url, post_body)
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest_lib/common/rest_client.py", line 259, in post
        return self.request('POST', url, extra_headers, headers, body)
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest_lib/common/rest_client.py", line 640, in request
        resp, resp_body)
      File "/home/ubuntu/virtualenv/local/lib/python2.7/site-packages/tempest_lib/common/r...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.