OpenStack Compute (nova)

[SRU]_heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

Bug #1751923 reported by Maciej Jozefczyk on 2018-02-26

This bug affects 12 people

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Medium	Maciej Jozefczyk
Ubuntu Cloud Archive	Fix Released	Undecided	Unassigned
Queens	Fix Released	Undecided	Jorge Niedbalski
Rocky	Fix Released	Undecided	Unassigned
Stein	Fix Released	Undecided	Unassigned
nova (Ubuntu)	Fix Released	Medium	Unassigned
Bionic	Fix Released	Medium	Jorge Niedbalski
Disco	Fix Released	Medium	Unassigned

Bug Description

[Impact]

* During periodic task _heal_instance_info_cache the instance_info_caches are not updated using instance port_ids taken from neutron, but from nova db.
* This causes that existing VMs to loose their network interfaces after reboot.

[Test Plan]

* This bug is reproducible on Bionic/Queens clouds.

1) Deploy the following Juju bundle: https://paste.ubuntu.com/p/HgsqZfsDGh/
2) Run the following script: https://paste.ubuntu.com/p/c4VDkqyR2z/
3) If the script finishes with "Port not found" , the bug is still present.

[Where problems could occur]

Instances created prior to the Openstack Newton release that have more than one interface will not have associated information in the virtual_interfaces table that is required to repopulate the cache with interfaces in the same order they were attached prior. In the unlikely event that this occurs and you are using Openstack release Queen or Rocky, it will be necessary to either manually populate this table. Openstack Stein has a patch that adds support for generating this data. Since as things stand the guest will be unable to identify it's network information at all in the event the cache gets purged and given the hopefully low risk that a vm was created prior to Newton we hope the potential for this regression is very low.

[Discussion]
SRU team, please review the most recent version of nova 2:17.0.13-0ubuntu3 in the unapproved queue. The older version can be rejected.

------------------------------------------------------------------------------

Description
===========

During periodic task _heal_instance_info_cache the
instance_info_caches are not updated using instance port_ids taken
from neutron, but from nova db.

Sometimes, perhaps because of some race-condition, its possible to
lose some ports from instance_info_caches. Periodic task
_heal_instance_info_cache should clean this up (add missing records),
but in fact it's not working this way.

How it looks now?
=================

_heal_instance_info_cache during crontask:

https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/compute/manager.py#L6525

is using network_api to get instance_nw_info (instance_info_caches):

          try:
              # Call to network API to get instance info.. this will
              # force an update to the instance's info_cache
              self.network_api.get_instance_nw_info(context, instance)

self.network_api.get_instance_nw_info() is listed below:

https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1377

and it uses _build_network_info_model() without networks and port_ids
parameters (because we're not adding any new interface to instance):

https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2356

Next: _gather_port_ids_and_networks() generates the list of instance
networks and port_ids:

networks, port_ids = self._gather_port_ids_and_networks(
context, instance, networks, port_ids, client)

https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L2389-L2390

https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/network/neutronv2/api.py#L1393

As we see that _gather_port_ids_and_networks() takes the port list
from DB:

https://github.com/openstack/nova/blob/ef4000a0d326deb004843ee51d18030224c5630f/nova/objects/instance.py#L1173-L1176

And thats it. When we lose a port its not possible to add it again with this periodic task.
The only way is to clean device_id field in neutron port object and re-attach the interface using `nova interface-attach`.

When the interface is missing and there is no port configured on
compute host (for example after compute reboot) - interface is not
added to instance and from neutron point of view port state is DOWN.

When the interface is missing in cache and we reboot hard the instance
- its not added as tapinterface in xml file = we don't have the
network on host.

Steps to reproduce
==================
1. Spawn devstack
2. Spawn VM inside devstack with multiple ports (for example also from 2 different networks)
3. Update the DB row, drop one interface from interfaces_list
4. Hard-Reboot the instance
5. See that nova list shows instance without one address, but nova interface-list shows all addresses
6. See that one port is missing in instance xml files
7. In theory the _heal_instance_info_cache should fix this things, it relies on memory, not on the fresh list of instance ports taken from neutron.

Reproduced Example
==================
1. Spawn VM with 1 private network port
nova boot --flavor m1.small --image cirros-0.3.5-x86_64-disk --nic net-name=private test-2
2. Attach ports to have 2 private and 2 public interfaces
nova list:
| a64ed18d-9868-4bf0-90d3-d710d278922d | test-2 | ACTIVE | - | Running | public=2001:db8::e, 172.24.4.15, 2001:db8::c, 172.24.4.16; private=fdda:5d77:e18e:0:f816:3eff:fee8:3333, 10.0.0.3, fdda:5d77:e18e:0:f816:3eff:fe53:231c, 10.0.0.5 |

So we see 4 ports:
stack@mjozefcz-devstack-ptg:~$ nova interface-list a64ed18d-9868-4bf0-90d3-d710d278922d
+------------+--------------------------------------+--------------------------------------+-----------------------------------------------+-------------------+
| Port State | Port ID | Net ID | IP addresses | MAC Addr |
+------------+--------------------------------------+--------------------------------------+-----------------------------------------------+-------------------+
| ACTIVE | 6c230305-43f8-42ec-9936-61fe67551168 | 96343d33-5dd2-4289-b0cc-e6c664c2ddd9 | 10.0.0.3,fdda:5d77:e18e:0:f816:3eff:fee8:3333 | fa:16:3e:e8:33:33 |
| ACTIVE | 71e6c6ad-8016-450f-93f2-75e7e014084d | 9e702a96-2744-40a2-a649-33f935d83ad3 | 172.24.4.16,2001:db8::c | fa:16:3e:6d:dc:85 |
| ACTIVE | a74c9ee8-c426-48ef-890f-3988ecbe95ff | 9e702a96-2744-40a2-a649-33f935d83ad3 | 172.24.4.15,2001:db8::e | fa:16:3e:cf:0c:e0 |
| ACTIVE | b89d6863-fb4c-405c-89f9-698bd9773ad6 | 96343d33-5dd2-4289-b0cc-e6c664c2ddd9 | 10.0.0.5,fdda:5d77:e18e:0:f816:3eff:fe53:231c | fa:16:3e:53:23:1c |
+------------+--------------------------------------+--------------------------------------+-----------------------------------------------+-------------------+
stack@mjozefcz-devstack-ptg:~$

We can also see 4 tap interfaces in xml file:

stack@mjozefcz-devstack-ptg:~$ sudo virsh dumpxml instance-00000002 | grep -i tap
    <target dev='tap6c230305-43'/>
    <target dev='tapb89d6863-fb'/>
    <target dev='tapa74c9ee8-c4'/>
    <target dev='tap71e6c6ad-80'/>
stack@mjozefcz-devstack-ptg:~$

3. Now lets 'corrupt' the instance_info_caches for this specific VM.
We also noticed some race-condition that cause the same problem, but
we're unable to reproduce it in devel environment.

Original one:

---
mysql> select * from instance_info_caches where instance_uuid="a64ed18d-9868-4bf0-90d3-d710d278922d"\G;
*************************** 1. row ***************************
created_at: 2018-02-26 21:25:31
updated_at: 2018-02-26 21:29:17
deleted_at: NULL
id: 2
network_info: [{"profile": {}, "ovs_interfaceid": "6c230305-43f8-42ec-9936-61fe67551168", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "fdda:5d77:e18e:0:f816:3eff:fee8:3333"}], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "fdda:5d77:e18e:0:f816:3eff:fee7:b04"}, "dns": [], "routes": [], "cidr": "fdda:5d77:e18e::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "fdda:5d77:e18e::1"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.0.0.3"}], "version": 4, "meta": {"dhcp_server": "10.0.0.2"}, "dns": [], "routes": [], "cidr": "10.0.0.0/26", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.0.0.1"}}], "meta": {"injected": false, "tenant_id": "0314943f52014a5b9bc56b73bec475e6", "mtu": 1450}, "id": "96343d33-5dd2-4289-b0cc-e6c664c2ddd9", "label": "private"}, "devname": "tap6c230305-43", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:e8:33:33", "active": true, "type": "ovs", "id": "6c230305-43f8-42ec-9936-61fe67551168", "qbg_params": null}, {"profile": {}, "ovs_interfaceid": "b89d6863-fb4c-405c-89f9-698bd9773ad6", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "fdda:5d77:e18e:0:f816:3eff:fe53:231c"}], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "fdda:5d77:e18e:0:f816:3eff:fee7:b04"}, "dns": [], "routes": [], "cidr": "fdda:5d77:e18e::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "fdda:5d77:e18e::1"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.0.0.5"}], "version": 4, "meta": {"dhcp_server": "10.0.0.2"}, "dns": [], "routes": [], "cidr": "10.0.0.0/26", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.0.0.1"}}], "meta": {"injected": false, "tenant_id": "0314943f52014a5b9bc56b73bec475e6", "mtu": 1450}, "id": "96343d33-5dd2-4289-b0cc-e6c664c2ddd9", "label": "private"}, "devname": "tapb89d6863-fb", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:53:23:1c", "active": true, "type": "ovs", "id": "b89d6863-fb4c-405c-89f9-698bd9773ad6", "qbg_params": null}, {"profile": {}, "ovs_interfaceid": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "2001:db8::e"}], "version": 6, "meta": {}, "dns": [], "routes": [], "cidr": "2001:db8::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "172.24.4.15"}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": "172.24.4.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "172.24.4.1"}}], "meta": {"injected": false, "tenant_id": "9c6f74dab29f4c738e82320075fa1f57", "mtu": 1500}, "id": "9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, "devname": "tapa74c9ee8-c4", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:cf:0c:e0", "active": true, "type": "ovs", "id": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", "qbg_params": null}, {"profile": {}, "ovs_interfaceid": "71e6c6ad-8016-450f-93f2-75e7e014084d", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "2001:db8::c"}], "version": 6, "meta": {}, "dns": [], "routes": [], "cidr": "2001:db8::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "172.24.4.16"}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": "172.24.4.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "172.24.4.1"}}], "meta": {"injected": false, "tenant_id": "9c6f74dab29f4c738e82320075fa1f57", "mtu": 1500}, "id": "9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, "devname": "tap71e6c6ad-80", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:6d:dc:85", "active": true, "type": "ovs", "id": "71e6c6ad-8016-450f-93f2-75e7e014084d", "qbg_params": null}]
instance_uuid: a64ed18d-9868-4bf0-90d3-d710d278922d
deleted: 0
1 row in set (0.00 sec)
----

Modified one (I removed first port from list):
tap6c230305-43

----
mysql> select * from instance_info_caches where instance_uuid="a64ed18d-9868-4bf0-90d3-d710d278922d"\G;
*************************** 1. row ***************************
created_at: 2018-02-26 21:25:31
updated_at: 2018-02-26 21:29:17
deleted_at: NULL
id: 2
network_info: [{"profile": {}, "ovs_interfaceid": "b89d6863-fb4c-405c-89f9-698bd9773ad6", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "fdda:5d77:e18e:0:f816:3eff:fe53:231c"}], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "fdda:5d77:e18e:0:f816:3eff:fee7:b04"}, "dns": [], "routes": [], "cidr": "fdda:5d77:e18e::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "fdda:5d77:e18e::1"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "10.0.0.5"}], "version": 4, "meta": {"dhcp_server": "10.0.0.2"}, "dns": [], "routes": [], "cidr": "10.0.0.0/26", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "10.0.0.1"}}], "meta": {"injected": false, "tenant_id": "0314943f52014a5b9bc56b73bec475e6", "mtu": 1450}, "id": "96343d33-5dd2-4289-b0cc-e6c664c2ddd9", "label": "private"}, "devname": "tapb89d6863-fb", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:53:23:1c", "active": true, "type": "ovs", "id": "b89d6863-fb4c-405c-89f9-698bd9773ad6", "qbg_params": null}, {"profile": {}, "ovs_interfaceid": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "2001:db8::e"}], "version": 6, "meta": {}, "dns": [], "routes": [], "cidr": "2001:db8::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "172.24.4.15"}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": "172.24.4.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "172.24.4.1"}}], "meta": {"injected": false, "tenant_id": "9c6f74dab29f4c738e82320075fa1f57", "mtu": 1500}, "id": "9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, "devname": "tapa74c9ee8-c4", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:cf:0c:e0", "active": true, "type": "ovs", "id": "a74c9ee8-c426-48ef-890f-3988ecbe95ff", "qbg_params": null}, {"profile": {}, "ovs_interfaceid": "71e6c6ad-8016-450f-93f2-75e7e014084d", "preserve_on_delete": false, "network": {"bridge": "br-int", "subnets": [{"ips": [{"meta": {}, "version": 6, "type": "fixed", "floating_ips": [], "address": "2001:db8::c"}], "version": 6, "meta": {}, "dns": [], "routes": [], "cidr": "2001:db8::/64", "gateway": {"meta": {}, "version": 6, "type": "gateway", "address": "2001:db8::2"}}, {"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "172.24.4.16"}], "version": 4, "meta": {}, "dns": [], "routes": [], "cidr": "172.24.4.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "172.24.4.1"}}], "meta": {"injected": false, "tenant_id": "9c6f74dab29f4c738e82320075fa1f57", "mtu": 1500}, "id": "9e702a96-2744-40a2-a649-33f935d83ad3", "label": "public"}, "devname": "tap71e6c6ad-80", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:6d:dc:85", "active": true, "type": "ovs", "id": "71e6c6ad-8016-450f-93f2-75e7e014084d", "qbg_params": null}]
instance_uuid: a64ed18d-9868-4bf0-90d3-d710d278922d
deleted: 0
----

4. Now lets take a look on `nova list`:

So as you see we missed one interface (private).

Nova interface-list shows it (because it calls neutron instead nova
itself):

stack@mjozefcz-devstack-ptg:~$ nova interface-list a64ed18d-9868-4bf0-90d3-d710d278922d
+------------+--------------------------------------+--------------------------------------+-----------------------------------------------+-------------------+
| Port State | Port ID | Net ID | IP addresses | MAC Addr |
+------------+--------------------------------------+--------------------------------------+-----------------------------------------------+-------------------+
| ACTIVE | 6c230305-43f8-42ec-9936-61fe67551168 | 96343d33-5dd2-4289-b0cc-e6c664c2ddd9 | 10.0.0.3,fdda:5d77:e18e:0:f816:3eff:fee8:3333 | fa:16:3e:e8:33:33 |
| ACTIVE | 71e6c6ad-8016-450f-93f2-75e7e014084d | 9e702a96-2744-40a2-a649-33f935d83ad3 | 172.24.4.16,2001:db8::c | fa:16:3e:6d:dc:85 |
| ACTIVE | a74c9ee8-c426-48ef-890f-3988ecbe95ff | 9e702a96-2744-40a2-a649-33f935d83ad3 | 172.24.4.15,2001:db8::e | fa:16:3e:cf:0c:e0 |
| ACTIVE | b89d6863-fb4c-405c-89f9-698bd9773ad6 | 96343d33-5dd2-4289-b0cc-e6c664c2ddd9 | 10.0.0.5,fdda:5d77:e18e:0:f816:3eff:fe53:231c | fa:16:3e:53:23:1c |
+------------+--------------------------------------+--------------------------------------+-----------------------------------------------+-------------------+
stack@mjozefcz-devstack-ptg:~$

5. During this time check the logs - yes, the
_heal_instance_info_cache has been running for a while but without
success - stil missing port in instance_info_caches table:

Feb 26 22:12:03 mjozefcz-devstack-ptg nova-compute[27459]: DEBUG oslo_service.periodic_task [None req-ac707da5-3413-412c-b314-ab38db2134bc service nova] Running periodic task ComputeManager._heal_instance_info_cache {{(pid=27459) run_periodic_tasks /usr/local/lib/python2.7/dist-packages/oslo_service/periodic_task.py:215}}
Feb 26 22:12:03 mjozefcz-devstack-ptg nova-compute[27459]: DEBUG nova.compute.manager [None req-ac707da5-3413-412c-b314-ab38db2134bc service nova] Starting heal instance info cache {{(pid=27459) _heal_instance_info_cache /opt/stack/nova/nova/compute/manager.py:6541}}
Feb 26 22:12:04 mjozefcz-devstack-ptg nova-compute[27459]: DEBUG nova.compute.manager [None req-ac707da5-3413-412c-b314-ab38db2134bc service nova] [instance: a64ed18d-9868-4bf0-90d3-d710d278922d] Updated the network info_cache for instance {{(pid=27459) _heal_instance_info_cache /opt/stack/nova/nova/compute/manager.py:6603}}

5. Ok, so lets pretend that customer restart the VM.
stack@mjozefcz-devstack-ptg:~$ nova reboot a64ed18d-9868-4bf0-90d3-d710d278922d --hard
Request to reboot server <Server: test-2> has been accepted.

6. And now check connected interfaces - WOOPS there is no
`tap6c230305-43` on the list ;(

stack@mjozefcz-devstack-ptg:~$ sudo virsh dumpxml instance-00000002 | grep -i tap
    <target dev='tapb89d6863-fb'/>
    <target dev='tapa74c9ee8-c4'/>
    <target dev='tap71e6c6ad-80'/>

Environment
===========
Nova master branch, devstack

See original description

Tags:

Maciej Jozefczyk (maciejjozefczyk) on 2018-02-26

Changed in nova:
assignee:	nobody → Maciej Jozefczyk (maciej.jozefczyk)
summary:	- _heal_instance_info_cache base on cache not on ports from neutron side + _heal_instance_info_cache periodic task bases on port list from memory, + not from neutron server

Maciej Jozefczyk (maciejjozefczyk) on 2018-02-27

summary:

- _heal_instance_info_cache periodic task bases on port list from memory,
+ _heal_instance_info_cache periodic task bases on port list from nova db,
not from neutron server

Maciej Jozefczyk (maciejjozefczyk) on 2018-02-27

description:

updated

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-03-03: Re: _heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

Can we pass some kind of force_refresh parameter (which defaults to existing behavior) which will do the full refresh and then the _heal_instance_info_cache would be the only thing that passes True for that?

I worry about all of the spaghetti code in there and existing usage by different callers, but the heal instance info cache periodic task is meant to be a full refresh based on latest information from neutron, so it seems reasonable to do it in that case.

Revision history for this message

Maciej Jozefczyk (maciejjozefczyk) wrote on 2018-03-05:

For me its okey to do force_refresh=False and use it only in _heal_instance_info_caches. I'll propose fix doing it. Thanks!

Revision history for this message

Maciej Jozefczyk (maciejjozefczyk) wrote on 2018-03-05:

This fix is in conflict with bugfix: 46922068ac167f492dd303efb359d0c649d69118.
We need to think twice how to fix it.

commit 46922068ac167f492dd303efb359d0c649d69118
Author: Aaron Rosen <email address hidden>
Date: Thu Dec 5 17:28:17 2013 -0800

Make network_cache more robust with neutron

    Currently, nova treats neutron as the source of truth for which ports are
    attached to an instance which is a false assumption. Because of this
    if someone creates a port in neutron with a device_id that matches one
    of their existing instance_ids that port will eventually show up in
    nova list (through the periodic heal task).

    This problem usually manifests it's self when nova-compute
    calls to neutron to create a port and the request times out (though
    the port is actually created in neutron). When this occurs the instance
    can be rescheduled on another compute node which it will call out to
    neutron again to create a port. In this case two ports will show
    up in the network_cache table (since they have the same instance_id) though
    only one port is attached to the instance.

    This patch addresses this issue by only adding ports to network_cache
    if nova successfully allocated the port (or it was passed in). This
    way these ghost ports are avoided. A follow up patch will come later
    that garbage collects these ports.

Closes-bug: #1258620
Closes-bug: #1272195

Change-Id: I961c224d95291727c8614174de07805a0d0a9e46

melanie witt (melwitt) on 2018-04-05

Changed in nova:
importance:	Undecided → Medium
status:	New → Confirmed
tags:	added: compute neutron

Revision history for this message

Boris Bobrov (bbobrov) wrote on 2018-06-08:

We ran into this issue too.

We tried to fix bug https://bugs.launchpad.net/keystone/+bug/968696 by changing policy.json. And at some point of time our service users had incorrect permissions. We noticed that network information disappeared from "openstack server list". But even after we fixed service user permissions, network information was not restored.

Investigation revealed that periodic jobs query list of ports from neutron. Neutron returned empty list because of bad service user permissions. Nova successfully deleted the ports from info_cache. We noticed that and restored permissions. Neutron started to return non-empty list again, but nova did not consume it.

We managed to fix it by changing code and forcing nova to record ports from the list.

Revision history for this message

s10 (vlad-esten) wrote on 2018-07-27:

This bug might be caused by commit https://github.com/openstack/nova/commit/8694c1619d774bb8a6c23ed4c0f33df2084849bc
Nova never repopulate instance_info_cache if it is empty.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-08-14:

That commit from arosen is from 2013, and this is fixed I think since then:

"""
This problem usually manifests it's self when nova-compute
     calls to neutron to create a port and the request times out (though
     the port is actually created in neutron). When this occurs the instance
     can be rescheduled on another compute node which it will call out to
     neutron again to create a port. In this case two ports will show
     up in the network_cache table (since they have the same instance_id) though
     only one port is attached to the instance.
"""

via this change:

https://review.openstack.org/#/c/520248/

So nova will cleanup ports created during a failed build prior to rescheduling.

So I think we should add a force_refresh flag to the _heal_instance_info_cache flow so that we refresh from neutron rather than the nova db.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-08-14:

FWIW, our public cloud team (Huawei) reported the exact same issue as from comment 4 where the policy changed on the neutron side which resulted in returning no ports for the instance, so nova wiped out the entries from the cache and the heal periodic task didn't fix it.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-08-14: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/591607

Changed in nova:
assignee:	Maciej Jozefczyk (maciej.jozefczyk) → Matt Riedemann (mriedem)
status:	Confirmed → In Progress

OpenStack Infra (hudson-openstack) on 2018-10-25

Changed in nova:
assignee:	Matt Riedemann (mriedem) → Maciej Jozefczyk (maciej.jozefczyk)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-10-30: Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/614167

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2018-11-20: Re: _heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

#10

Hi i have a customer that filed a downstream bug for this on newton.

i have been able to reproduce this without any db surgery however i did have to use
curl to send a raw command to neutron api.

i could try and create a function test for this if that helped.

currently the only workaround i have found is to delete teh neutorn ports
and create new ones. when the instance is in this broken state you cannot use
add/remove port to fix it.

paste bin log of script execution
http://paste.openstack.org/show/735818/

repro scipt below.
------------------------------------------------------------------------
#!/bin/bash

set -x

IMAGE="cirros-0.3.5-x86_64-disk"
NETWORK="private"
FLAVOR="m1.nano"
TEMP_TOKEN=$(openstack token issue -c id -f value)
SERVER=$(openstack server create --image ${IMAGE} --flavor ${FLAVOR} --network ${NETWORK} -c id -f value --wait repro-bug)
openstack server show ${SERVER}
PORT_ID=$(openstack port list --device-id ${SERVER} -f value -c id)
openstack port show ${PORT_ID}

#wait for it to be fully up
sleep 10

NEUTRON_ENDPOINT=$(openstack endpoint list --service network -f value -c URL)
curl -X PUT -H "X-Auth-Token:${TEMP_TOKEN}" -d '{ "port":{"device_id":"","device_owner":"", "binding:host_id":"" }}' "${NEUTRON_ENDPOINT}v2.0/ports/${PORT_ID}" | python -mjson.tool
# after this curl command nova and neutron will diagree as to the state of the port.

openstack server reboot --hard --wait ${SERVER}
# after the vm is rebooted the vm will not have an interface attached
openstack server show ${SERVER}
openstack port show ${PORT_ID}

#try to fix the issue by attaching the port again
openstack server add port ${SERVER} ${PORT_ID}
# note this will result in fixing the port in neutron but it will be broken on the nova side
# as a result the vm will still not have an interface attach but nuetron will say it is.
openstack server show ${SERVER}
openstack port show ${PORT_ID}

# wait for nova to have time to try and attach the interface
sleep 30
openstack server reboot --hard --wait ${SERVER}

set +x

Hi i have a customer that filed a downstream bug for this on newton.

i have been able to reproduce this without any db surgery however i did have to use
curl to send a raw command to neutron api.

i could try and create a function test for this if that helped.

currently the only workaround i have found is to delete teh neutorn ports
and create new ones. when the instance is in this broken state you cannot use 
add/remove port to fix it.

paste bin log of script execution 
http://paste.openstack.org/show/735818/

repro scipt below.
------------------------------------------------------------------------
#!/bin/bash

set -x

#wait for it to be fully up
sleep 10

NEUTRON_ENDPOINT=$(openstack endpoint list --service network -f value -c URL)
curl -X  PUT -H "X-Auth-Token:${TEMP_TOKEN}" -d '{ "port":{"device_id":"","device_owner":"", "binding:host_id":"" }}'  "${NEUTRON_ENDPOINT}v2.0/ports/${PORT_ID}" | python -mjson.tool
# after this curl command nova and neutron will diagree as to the state of the port.

openstack server reboot --hard --wait ${SERVER}
# after the vm is rebooted  the vm will not have an interface attached
openstack server show ${SERVER}
openstack port show ${PORT_ID}

# wait for nova to have time to try and attach the interface
sleep 30
openstack server reboot --hard --wait ${SERVER}

set +x

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-11-30:

#11

I saw this over IRC last night:

(5:12:51 PM) pacharya_: Hi need some help with nova instance info cache table. Due to some network connectivity issues nova received empty list during the heal instance info cache periodic task and instance cache table got updated with same.
(5:13:12 PM) pacharya_: now the list and get API for that instance does not have any IPs listed
(5:13:20 PM) pacharya_: Does anyone know how to fix this?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-31: Related fix merged to nova (master)

#12

Reviewed: https://review.openstack.org/614167
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3534471c578eda6236e79f43153788c4725a5634
Submitter: Zuul
Branch: master

commit 3534471c578eda6236e79f43153788c4725a5634
Author: Maciej Jozefczyk <email address hidden>
Date: Tue Oct 30 09:58:30 2018 +0000

Add fill_virtual_interface_list online_data_migration script

    In change [1] we modified _heal_instance_info_cache periodic task
    to use Neutron point of view while rebuilding InstanceInfoCache
    objects.
    The crucial point was how we know the previous order of ports, if
    the cache was broken. We decided to use VirtualInterfaceList objects
    as source of port order.
    For instances older than Newton VirtualInterface objects doesn't
    exist, so we need to introduce a way of creating it.
    This script should be executed while upgrading to Stein release.

[1] https://review.openstack.org/#/c/591607

Change-Id: Ic26d4ce3d071691a621d3c925dc5cd436b2005f1
Related-Bug: 1751923

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-31: Fix merged to nova (master)

#13

Reviewed: https://review.openstack.org/591607
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ba44c155ce1dcefede9741722a0525820d6da2b8
Submitter: Zuul
Branch: master

commit ba44c155ce1dcefede9741722a0525820d6da2b8
Author: Matt Riedemann <email address hidden>
Date: Tue Aug 14 17:57:53 2018 +0800

Force refresh instance info_cache during heal

    If the instance info_cache is corrupted somehow, like during
    a host reboot and the ports aren't wired up properly or
    a mistaken policy change in neutron results in nova resetting
    the info_cache to an empty list, the _heal_instance_info_cache
    is meant to fix it (once the current state of the ports for
    the instance in neutron is corrected). However, the task is
    currently only refreshing the cache *based* on the current contents
    of the cache, which defeats the purpose of neutron being the source
    of truth for the ports attached to the instance.

    This change makes the _heal_instance_info_cache periodic task
    pass a "force_refresh" kwarg, which defaults to False for backward
    compatibility with other methods that refresh the cache after
    operations like attach/detach interface, and if True will make
    nova get the current state of the ports for the instance from neutron
    and fully rebuild the info_cache.

    To not lose port order in info_cache this change takes original order
    from nova historical data that are stored as VirtualInterfaceList
    objects. For ports that are not registered as VirtualInterfaces
    objects it will add them at the end of port_order list. Due to this
    for instances older than Newton another patch was introduced to fill
    missing VirtualInterface objects in the DB [1].

    Long-term we should be able to refactor some of the older refresh
    code which leverages the cache to instead use the refresh_vif_id
    kwarg so that we do targeted cache updates when we do things like
    attach and detach ports, but that's a change for another day.

[1] https://review.openstack.org/#/c/614167

    Co-Authored-By: Maciej Jozefczyk <email address hidden>
    Change-Id: I629415236b2447128ae9a980d4ebe730a082c461
    Closes-Bug: #1751923

Reviewed:  https://review.openstack.org/591607
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ba44c155ce1dcefede9741722a0525820d6da2b8
Submitter: Zuul
Branch:    master

commit ba44c155ce1dcefede9741722a0525820d6da2b8
Author: Matt Riedemann <mriedem.os@gmail.com>
Date:   Tue Aug 14 17:57:53 2018 +0800

Force refresh instance info_cache during heal
    
    If the instance info_cache is corrupted somehow, like during
    a host reboot and the ports aren't wired up properly or
    a mistaken policy change in neutron results in nova resetting
    the info_cache to an empty list, the _heal_instance_info_cache
    is meant to fix it (once the current state of the ports for
    the instance in neutron is corrected). However, the task is
    currently only refreshing the cache *based* on the current contents
    of the cache, which defeats the purpose of neutron being the source
    of truth for the ports attached to the instance.
    
    This change makes the _heal_instance_info_cache periodic task
    pass a "force_refresh" kwarg, which defaults to False for backward
    compatibility with other methods that refresh the cache after
    operations like attach/detach interface, and if True will make
    nova get the current state of the ports for the instance from neutron
    and fully rebuild the info_cache.
    
    To not lose port order in info_cache this change takes original order
    from nova historical data that are stored as VirtualInterfaceList
    objects. For ports that are not registered as VirtualInterfaces
    objects it will add them at the end of port_order list. Due to this
    for instances older than Newton another patch was introduced to fill
    missing VirtualInterface objects in the DB [1].
    
    Long-term we should be able to refactor some of the older refresh
    code which leverages the cache to instead use the refresh_vif_id
    kwarg so that we do targeted cache updates when we do things like
    attach and detach ports, but that's a change for another day.
    
    [1] https://review.openstack.org/#/c/614167
    
    Co-Authored-By: Maciej Jozefczyk <maciej.jozefczyk@corp.ovh.com>
    Change-Id: I629415236b2447128ae9a980d4ebe730a082c461
    Closes-Bug: #1751923

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-01: Related fix proposed to nova (master)

#14

Related fix proposed to branch: master
Review: https://review.openstack.org/640516

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-22: Fix included in openstack/nova 19.0.0.0rc1

#15

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-16: Fix proposed to nova (stable/rocky)

#16

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/653040

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-16: Change abandoned on nova (stable/rocky)

#17

Change abandoned by Mohammed Naser (<email address hidden>) on branch: stable/rocky
Review: https://review.openstack.org/653040

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2019-08-29: Re: _heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

#18

I also hit this issue on our Queens cloud were 232 vms were affected. I created local backports for Q and R and tested Q backport in our cloud and can confirm that it did automatically resolve the problem i.e. the periodic task eventually re-healed all the caches correctly. Therefore I would like to propose this for backport to Q & R.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-29: Fix proposed to nova (stable/rocky)

#19

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/679271

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-08-29: Fix proposed to nova (stable/queens)

#20

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/679274

Edward Hope-Morley (hopem) on 2019-08-29

Changed in nova (Ubuntu Disco):
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-05: Change abandoned on nova (master)

#21

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/640516

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-11-05: Re: _heal_instance_info_cache periodic task bases on port list from nova db, not from neutron server

#22

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nova (Ubuntu Bionic):
status:	New → Confirmed
Changed in nova (Ubuntu):
status:	New → Confirmed

Revision history for this message

Jason Davidson (djason2018) wrote on 2020-03-25:

#24

We just had ~220 instances lose their ip addresses on one of our Queens clouds. Is anyone still working on a fix for this issue?

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2020-10-06:

#25

for what its worth this has been partially backported downstream in redhat osp

we backported only the self healying and not the online data migration which had a bug in it.

so https://review.opendev.org/#/c/591607/ can be safely backport ported but https://review.opendev.org/#/c/614167/20 has a bug and should not be backported.

Jorge Niedbalski (niedbalski) on 2021-05-15

Changed in nova (Ubuntu):
status:	Confirmed → Fix Released
Changed in nova (Ubuntu Bionic):
assignee:	nobody → Jorge Niedbalski (niedbalski)
summary:	- _heal_instance_info_cache periodic task bases on port list from nova db, - not from neutron server + [SRU]_heal_instance_info_cache periodic task bases on port list from + nova db, not from neutron server

Jorge Niedbalski (niedbalski) on 2021-05-15

description:

updated

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2021-05-17:

#26

Hello,

I've prepared a PPA for testing the proposed patch on B/Queens
https://launchpad.net/~niedbalski/+archive/ubuntu/lp1751923/+packages

Attached is the debdiff for bionic.

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2021-05-17:

#27

lp1751923_bionic.debdiff Edit (21.7 KiB, text/plain)

description:

updated

Jorge Niedbalski (niedbalski) on 2021-05-17

description:

updated

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2021-05-20:

#28

Since Queens is populating the virtual_interfaces table as standard I think we should proceed with this SRU - https://pastebin.ubuntu.com/p/BdCPsVKGk5/ - since it will provide a clean fix for Queens clouds.

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2021-06-01:

#29

@corey anything in specific you need at my end to get this SRU reviewed?

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2021-06-28:

#30

@coreycb I think we have everything we need to proceed with this SRU now. Since Queens is the oldest release currently supported on Ubuntu and support for populating vif attach ordering required to rebuild the cache has been available since Newton I think the risk of anyone being impacted is very small. VMs created prior to Newton would need the patch [1] and eventually [2] backported from Stein but I don't see them as essential and given the impact of not having this fix asap I think it supersedes those which we can handle separately.

[1] https://github.com/openstack/nova/commit/3534471c578eda6236e79f43153788c4725a5634
[2] https://bugs.launchpad.net/nova/+bug/1825034

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2021-06-28:

#31

Restored the bug description to its original format and updated SRU info.

description:

updated

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-06-28:

#32

Thanks Jorge. Let's patch rocky as well for upgrade purposes.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-06-28:

#33

New nova packages including this fix have been uploaded to rocky-staging and the bionic unapproved queue:
https://launchpad.net/~ubuntu-cloud-archive/+archive/ubuntu/rocky-staging/+packages?field.name_filter=nova
https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=nova

Changed in nova (Ubuntu Bionic):
status:	Confirmed → Triaged
status:	Triaged → In Progress

Corey Bryant (corey.bryant) on 2021-06-28

description:

updated

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-06-29: Please test proposed package

#34

Hello Maciej, or anyone else affected,

Accepted nova into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

sudo add-apt-repository cloud-archive:rocky-proposed
sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags:

added: verification-rocky-needed

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2021-07-05:

#35

I'm a bit worried about the aforementioned regression potential here, but I'll accept it seeing that this was accepted by the OpenStack team. Since I'd best prefer if the SRUs were as safe as possible, offering fallback functionality in case the system is old. I assume this would require the additional commits cherry-picked?

Anyway, let's proceed for now.

Changed in nova (Ubuntu Bionic):
status:	In Progress → Fix Committed
tags:	added: verification-needed verification-needed-bionic

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2021-07-05:

#36

Hello Maciej, or anyone else affected,

Accepted nova into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:17.0.13-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-07-06:

#37

Hello Maciej, or anyone else affected,

Accepted nova into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

sudo add-apt-repository cloud-archive:queens-proposed
sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags:

added: verification-queens-needed

Mathew Hodson (mhodson) on 2021-07-23

Changed in nova (Ubuntu):
importance:	Undecided → Medium
Changed in nova (Ubuntu Bionic):
importance:	Undecided → Medium
Changed in nova (Ubuntu Disco):
importance:	Undecided → Medium

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2021-07-28:

#38

I am in the process to verify bionic/rocky/queens releases.

Revision history for this message

Jorge Niedbalski (niedbalski) wrote on 2021-08-27:

#39

Download full text (8.7 KiB)

Hello,

I've verified that this problem doesn't reproduces with the package contained in proposed.

1) Deployed this bundle of bionic-queens

Upgraded to the following version:

root@juju-51d6ad-1751923-6:/home/ubuntu# systemctl status nova*|grep -i active
   Active: active (running) since Fri 2021-08-27 22:02:25 UTC; 1h 7min ago
   Active: active (running) since Fri 2021-08-27 22:02:12 UTC; 1h 8min ago
   Active: active (running) since Fri 2021-08-27 22:02:25 UTC; 1h 7min ago

3) Created a server with 4 private ports, 1 public one.

Hello,

I've verified that this problem doesn't reproduces with the package contained in proposed.

1) Deployed this bundle of bionic-queens

Upgraded to the following version:

root@juju-51d6ad-1751923-6:/home/ubuntu# dpkg -l | grep nova
ii  nova-api-os-compute              2:17.0.13-0ubuntu3                        all          OpenStack Compute - OpenStack Compute API frontend
ii  nova-common                      2:17.0.13-0ubuntu3                        all          OpenStack Compute - common files
ii  nova-conductor                   2:17.0.13-0ubuntu3                        all          OpenStack Compute - conductor service
ii  nova-placement-api               2:17.0.13-0ubuntu3                        all          OpenStack Compute - placement API frontend
ii  nova-scheduler                   2:17.0.13-0ubuntu3                        all          OpenStack Compute - virtual machine scheduler
ii  python-nova                      2:17.0.13-0ubuntu3                        all          OpenStack Compute Python libraries

root@juju-51d6ad-1751923-7:/home/ubuntu# dpkg -l | grep nova
ii  nova-api-metadata                    2:17.0.13-0ubuntu3                        all          OpenStack Compute - metadata API frontend
ii  nova-common                          2:17.0.13-0ubuntu3                        all          OpenStack Compute - common files
ii  nova-compute                         2:17.0.13-0ubuntu3                        all          OpenStack Compute - compute node base
ii  nova-compute-kvm                     2:17.0.13-0ubuntu3                        all          OpenStack Compute - compute node (KVM)
ii  nova-compute-libvirt                 2:17.0.13-0ubuntu3                        all          OpenStack Compute - compute node libvirt support
ii  python-nova                          2:17.0.13-0ubuntu3                        all          OpenStack Compute Python libraries
ii  python-novaclient                    2:9.1.1-0ubuntu1                          all          client library for OpenStack Compute API - Python 2.7
ii  python3-novaclient                   2:9.1.1-0ubuntu1                          all          client library for OpenStack Compute API - 3.x

3) Created a server with 4 private ports, 1 public one.

ubuntu@niedbalski-bastion:~/stsstack-bundles/openstack$ openstack server list
+--------------------------------------+---------------+--------+-------------------------------------------------------------------------------+--------+-----------+
| ID                                   | Name          | Status | Networks                                                                      | Image  | Flavor    |
+--------------------------------------+---------------+--------+-------------------------------------------------------------------------------+--------+-----------+
| 5843e6b5-e1a7-4208-9f19-1d051c032afb | cirros-232302 | ACTIVE | private=192.168.21.22, 192.168.21.6, 192.168.21.10, 192.168.21.13, 10.5.150.1 | cirros | m1.cirros |
+--------------------------------------+---------------+--------+-------------------------------------------------------------------------------+--------+-----------+

4) I can see the 4 tap devices.

root@juju-51d6ad-1751923-7:/home/ubuntu# virsh dumpxml instance-00000001|grep -i tap
      <target dev='tapb11d1c8e-d4'/>
      <target dev='tap5865a40e-36'/>
      <target dev='tap5f400107-d9'/>
      <target dev='tap1680b164-14'/>

5) I modified the instance info caches removing one of the interfaces.

Database changed
mysql>  update instance_info_caches set network_info='[{"profile":{},"ovs_interfaceid":"5865a40e-36fa-4cf9-bd40-85a1e78031f5","preserve_on_delete":true,"network":{"bridge":"br-int","subnets":[{"ips":[{"meta":{},"version":4,"type":"fixed","floating_ips":[],"address":"192.168.21.6"}],"version":4,"meta":{"dhcp_server":"192.168.21.2"},"dns":[],"routes":[],"cidr":"192.168.21.0/24","gateway":{"meta":{},"version":4,"type":"gateway","address":"192.168.21.1"}}],"meta":{"injected":false,"tenant_id":"991d5e66c5f64485a8b6c49db60cfe99","mtu":1500},"id":"8d91e266-0925-4c29-8039-0d71862df4fc","label":"private"},"devname":"tap5865a40e-36","vnic_type":"normal","qbh_params":null,"meta":{},"details":{"port_filter":true,"datapath_type":"system","ovs_hybrid_plug":false},"address":"fa:16:3e:eb:73:b1","active":true,"type":"ovs","id":"5865a40e-36fa-4cf9-bd40-85a1e78031f5","qbg_params":null},{"profile":{},"ovs_interfaceid":"5f400107-d9eb-4a1b-a37b-3bd034d8f995","preserve_on_delete":true,"network":{"bridge":"br-int","subnets":[{"ips":[{"meta":{},"version":4,"type":"fixed","floating_ips":[],"address":"192.168.21.10"}],"version":4,"meta":{"dhcp_server":"192.168.21.2"},"dns":[],"routes":[],"cidr":"192.168.21.0/24","gateway":{"meta":{},"version":4,"type":"gateway","address":"192.168.21.1"}}],"meta":{"injected":false,"tenant_id":"991d5e66c5f64485a8b6c49db60cfe99","mtu":1500},"id":"8d91e266-0925-4c29-8039-0d71862df4fc","label":"private"},"devname":"tap5f400107-d9","vnic_type":"normal","qbh_params":null,"meta":{},"details":{"port_filter":true,"datapath_type":"system","ovs_hybrid_plug":false},"address":"fa:16:3e:95:9a:78","active":true,"type":"ovs","id":"5f400107-d9eb-4a1b-a37b-3bd034d8f995","qbg_params":null},{"profile":{},"ovs_interfaceid":"1680b164-14d7-4d6e-b085-94292ece82cf","preserve_on_delete":true,"network":{"bridge":"br-int","subnets":[{"ips":[{"meta":{},"version":4,"type":"fixed","floating_ips":[],"address":"192.168.21.13"}],"version":4,"meta":{"dhcp_server":"192.168.21.2"},"dns":[],"routes":[],"cidr":"192.168.21.0/24","gateway":{"meta":{},"version":4,"type":"gateway","address":"192.168.21.1"}}],"meta":{"injected":false,"tenant_id":"991d5e66c5f64485a8b6c49db60cfe99","mtu":1500},"id":"8d91e266-0925-4c29-8039-0d71862df4fc","label":"private"},"devname":"tap1680b164-14","vnic_type":"normal","qbh_params":null,"meta":{},"details":{"port_filter":true,"datapath_type":"system","ovs_hybrid_plug":false},"address":"fa:16:3e:cf:f8:c8","active":true,"type":"ovs","id":"1680b164-14d7-4d6e-b085-94292ece82cf","qbg_params":null}]'; 
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

6) Rebooted the server

ubuntu@niedbalski-bastion:~/stsstack-bundles/openstack$ openstack server reboot 5843e6b5-e1a7-4208-9f19-1d051c032afb

7) Listed the interfaces and those are available.

tags:

added: verification-done-bionic verification-queens-done
removed: verification-needed-bionic verification-queens-needed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2021-08-30:

#41

This bug was fixed in the package nova - 2:17.0.13-0ubuntu3

---------------
nova (2:17.0.13-0ubuntu3) bionic; urgency=medium

  * Force refresh instance info_cache during heal (LP: #1751923):
    - d/p/0001-Force-refresh-instance-info_cache-during-heal.patch
    - d/p/0002-remove-deprecated-test_list_vifs_neutron_notimplemented.patch

-- Jorge Niedbalski <email address hidden> Mon, 17 May 2021 14:25:43 -0400

Changed in nova (Ubuntu Bionic):
status:	Fix Committed → Fix Released

Revision history for this message

Łukasz Zemczak (sil2100) wrote on 2021-08-30: Update Released

#40

The verification of the Stable Release Update for nova has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2021-09-01:

#42

Verified rocky-proposed using [Test Plan] with output as follows:

# apt-cache policy nova-common
nova-common:
  Installed: 2:18.3.0-0ubuntu1~cloud3
  Candidate: 2:18.3.0-0ubuntu1~cloud3
  Version table:
*** 2:18.3.0-0ubuntu1~cloud3 500
        500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-proposed/rocky/main amd64 Packages
        100 /var/lib/dpkg/status
     2:17.0.13-0ubuntu3 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
     2:17.0.10-0ubuntu2.1 500
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
     2:17.0.1-0ubuntu1 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

I also tested by manually deleting the network_info for a vm then waiting for the periodic task to run - https://pastebin.ubuntu.com/p/7gmZQsvC8H/

tags:

added: verification-rocky-done
removed: verification-rocky-needed

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-09-01:

#43

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-09-01:

#44

This bug was fixed in the package nova - 2:18.3.0-0ubuntu1~cloud3
---------------

nova (2:18.3.0-0ubuntu1~cloud3) bionic-rocky; urgency=medium
.
* Force refresh instance info_cache during heal (LP: #1751923):
- d/p/0001-Force-refresh-instance-info_cache-during-heal.patch

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-09-01:

#45

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2021-09-01:

#46

This bug was fixed in the package nova - 2:17.0.13-0ubuntu3~cloud0
---------------

nova (2:17.0.13-0ubuntu3~cloud0) xenial-queens; urgency=medium
.
   * New update for the Ubuntu Cloud Archive.
.
nova (2:17.0.13-0ubuntu3) bionic; urgency=medium
.
   * Force refresh instance info_cache during heal (LP: #1751923):
     - d/p/0001-Force-refresh-instance-info_cache-during-heal.patch
     - d/p/0002-remove-deprecated-test_list_vifs_neutron_notimplemented.patch

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/queens)

#47

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/queens
Review: https://review.opendev.org/c/openstack/nova/+/679274
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/rocky)

#48

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/rocky
Review: https://review.opendev.org/c/openstack/nova/+/679271
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.