Nova does not delete or update neutron port for failed VMs

Bug #1594604 reported by Arvind Somya
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Mohammed Naser

Bug Description

Environment:
Stable/Liberty with neutron ML2 (Mechanism Driver is a custom asynchronous driver based off the Opendaylight V2 driver)
No agents or OVS bridges in use.
One controller/network node and two compute nodes.

When a VM fails to start on any compute node, nova removes the host binding from the VM but doesn't send a port update to neutron notifying it about the hostbinding change.

When the same error state VM is deleted, nova doesn't send any events to neutron. As a result the driver thinks the port is still active, owned by an existing VM and bound to the host properly.

NOTE: This behavior is only seen with VM's in the ERROR state, ports for VMs in ACTIVE state are deleted properly.

$ nova show vm1
+--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | AUTO |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | vm1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-0000077a |
| OS-EXT-SRV-ATTR:kernel_id | 72d053dc-aae8-4559-a7c4-107c980bd674 |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | 3d25fd54-ffbc-4370-b971-2196a7963c24 |
| OS-EXT-SRV-ATTR:reservation_id | r-pujfofv9 |
| OS-EXT-SRV-ATTR:root_device_name | /dev/vda |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2016-06-20T22:51:13Z |
| fault | {"message": "No valid host was found. There are not enough hosts available.", "code": 500, "details": " File \"/opt/stack/nova/nova/conductor/manager.py\", line 739, in build_instances |
| flavor | m1.small (2) |
| hostId | |
| id | 5d37f0b1-d3ac-4246-badc-b1a2aa6c3137 |
| image | cirros-0.3.4-x86_64-uec (50976ce1-c00e-4a80-9d72-6bf5d1abe54a) |
| key_name | - |
| metadata | {} |
| name | vm1 |
| net1 network | 30.0.0.231 |
| os-extended-volumes:volumes_attached | [] |
| security_groups | default |
| status | ERROR |
| tenant_id | 4c3ddd212d4f4af9b4d0d55e89b43125 |
| updated | 2016-06-20T22:51:23Z |
| user_id | 1e3e7e3addc14861afe177f285e43b40 |
+--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

$ neutron port-show 5efcfa1b-1300-4194-8229-11863f85d0a9
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | bxb-ds-50 |
| binding:profile | {} |
| binding:vif_details | {"port_filter": false, "vhostuser_ovs_plug": false, "vhostuser_socket": "/tmp/sock-fa163eaba0d2", "vhostuser_mode": "server"} |
| binding:vif_type | vhostuser |
| binding:vnic_type | normal |
| device_id | 5d37f0b1-d3ac-4246-badc-b1a2aa6c3137 |
| device_owner | compute:nova |
| dns_assignment | {"hostname": "host-30-0-0-231", "ip_address": "30.0.0.231", "fqdn": "host-30-0-0-231.openstacklocal."} |
| dns_name | |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "880aac1b-662e-4a4f-b640-7929d69a2b3d", "ip_address": "30.0.0.231"} |
| id | 5efcfa1b-1300-4194-8229-11863f85d0a9 |
| mac_address | fa:16:3e:ab:a0:d2 |
| name | |
| network_id | cf31ce68-4c5b-4e22-b725-dff0b6511a3d |
| port_security_enabled | True |
| security_groups | ae59a53e-38f2-471d-9de9-b8aea815dc27 |
| status | ACTIVE |
| tenant_id | 4c3ddd212d4f4af9b4d0d55e89b43125 |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------+

$ nova delete vm1
Request to delete server vm1 has been accepted.

$ nova show vm1
ERROR (CommandError): No server with a name or ID of 'vm1' exists.

$ neutron port-show 5efcfa1b-1300-4194-8229-11863f85d0a9
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | bxb-ds-50 |
| binding:profile | {} |
| binding:vif_details | {"port_filter": false, "vhostuser_ovs_plug": false, "vhostuser_socket": "/tmp/sock-fa163eaba0d2", "vhostuser_mode": "server"} |
| binding:vif_type | vhostuser |
| binding:vnic_type | normal |
| device_id | 5d37f0b1-d3ac-4246-badc-b1a2aa6c3137 |
| device_owner | compute:nova |
| dns_assignment | {"hostname": "host-30-0-0-231", "ip_address": "30.0.0.231", "fqdn": "host-30-0-0-231.openstacklocal."} |
| dns_name | |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "880aac1b-662e-4a4f-b640-7929d69a2b3d", "ip_address": "30.0.0.231"} |
| id | 5efcfa1b-1300-4194-8229-11863f85d0a9 |
| mac_address | fa:16:3e:ab:a0:d2 |
| name | |
| network_id | cf31ce68-4c5b-4e22-b725-dff0b6511a3d |
| port_security_enabled | True |
| security_groups | ae59a53e-38f2-471d-9de9-b8aea815dc27 |
| status | ACTIVE |
| tenant_id | 4c3ddd212d4f4af9b4d0d55e89b43125 |
+-----------------------+-------------------------------------------------------------------------------------------------------------------------------+

Arvind Somya (asomya)
description: updated
Revision history for this message
Arvind Somya (asomya) wrote :

On further investigation, it seems that the network_info cache in the db is wiped out on VM error:
select * from instances where id=1964\G;
*************************** 1. row ***************************
              created_at: 2016-06-21 14:44:17
              updated_at: 2016-06-21 14:44:19
              deleted_at: NULL
                      id: 1964
             internal_id: NULL
                 user_id: 1e3e7e3addc14861afe177f285e43b40
              project_id: 4c3ddd212d4f4af9b4d0d55e89b43125
               image_ref: 50976ce1-c00e-4a80-9d72-6bf5d1abe54a
               kernel_id: 72d053dc-aae8-4559-a7c4-107c980bd674
              ramdisk_id: 3d25fd54-ffbc-4370-b971-2196a7963c24
            launch_index: 9
                key_name: NULL
                key_data: NULL
             power_state: 0
                vm_state: error
               memory_mb: 2048
                   vcpus: 1
                hostname: vm1-10
                    host: NULL
               user_data: NULL
          reservation_id: r-xr60h8pj
            scheduled_at: NULL
             launched_at: NULL
           terminated_at: NULL
            display_name: vm1-10
     display_description: vm1
       availability_zone: nova
                  locked: 0
                 os_type: NULL
             launched_on: NULL
        instance_type_id: 5
                 vm_mode: NULL
                    uuid: 9e191fd1-7a5d-430f-973b-625190f525b9
            architecture: NULL
        root_device_name: NULL
            access_ip_v4: NULL
            access_ip_v6: NULL
            config_drive:
              task_state: NULL
default_ephemeral_device: NULL
     default_swap_device: NULL
                progress: 0
        auto_disk_config: 1
      shutdown_terminate: 0
       disable_terminate: 0
                 root_gb: 20
            ephemeral_gb: 0
               cell_name: NULL
                    node: NULL
                 deleted: 0
               locked_by: NULL
                 cleaned: 0
      ephemeral_key_uuid: NULL
1 row in set (0.00 sec)

mysql> select * from instance_info_caches where id='1964'\G;
*************************** 1. row ***************************
   created_at: 2016-06-21 14:44:17
   updated_at: NULL
   deleted_at: NULL
           id: 1964
 network_info: []
instance_uuid: 9e191fd1-7a5d-430f-973b-625190f525b9
      deleted: 0
1 row in set (0.00 sec)

Changed in nova:
assignee: nobody → Baodong (Robert) Li (baoli)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/333579

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Baodong (Robert) Li (<email address hidden>) on branch: master
Review: https://review.openstack.org/333579
Reason: https://review.openstack.org/#/c/340614/
Has a similar fix that in addition includes fix for volume detach. So abandon this one.

Changed in nova:
assignee: Baodong (Robert) Li (baoli) → melanie witt (melwitt)
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: melanie witt (melwitt) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version liberty in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.liberty
Sean Dague (sdague)
Changed in nova:
assignee: nobody → melanie witt (melwitt)
status: New → In Progress
Changed in nova:
assignee: melanie witt (melwitt) → Charlotte Han (hanrong)
Changed in nova:
assignee: Charlotte Han (hanrong) → Mohammed Naser (mnaser)
Revision history for this message
melanie witt (melwitt) wrote :

This bug was reported in liberty, where an instance that failed with NoValidHost did not have its ports cleaned up.

This is the exception handling block for liberty where we'd handle NoValidHost:

https://github.com/openstack/nova/blob/liberty-eol/nova/conductor/manager.py#L740-L746

and we weren't cleaning up the network in that case.

In newer code, I find that we are cleaning up networks upon NoValidHost or MaxRetriesExceeded since this change:

https://review.openstack.org/#/c/243477/

so, I think this bug has been fixed since then.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.