Hyper-V neutron agent hangs on nova boot (with enable_security_group=true)

Bug #1583541 reported by Vinod Kumar
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
networking-hyperv
Invalid
Undecided
Unassigned

Bug Description

The issue is reproducable easily when the compute node get Provider rule updated before VM boot.

Steps to reproduce.

1. Create Network
2. Create Subnet
3. Create a provider rule under default security group.
4. Create a VM with ERROR state (i.e. ensure nova boot fails)
5. Bringup hyperv agent and ensure :) is seen
6. Initiate delete for the failed VM and verify in hyperv neutron agent log file that Provider rule updated is received and refresh firewall is called.
7. Now boot a new VM.
8. Check for neutron agent status. It will show hyperv agent as XXX.
9. On compute node neutron agent log will stop logging any message which gives a feel of agent hanged.

Cli Trace:
sdn@osc:~/devstack$ neutron agent-list
An auth plugin is required to fetch a token
sdn@osc:~/devstack$ source openrc admin admin
WARNING: setting legacy OS_TENANT_NAME to support cli tools.
sdn@osc:~/devstack$
sdn@osc:~/devstack$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+----------------------------------------------------------+
| 70dda170-d455-4cac-aec7-1e7c6e454204 | private | e235035c-285a-455f-af14-1850792e6d9b 10.0.0.0/24 |
| | | 5666b860-6112-492b-8c8f-5efa3aefc7ed fddf:dac3:a016::/64 |
| 749b045a-cc3b-4d88-b802-f12d7db79865 | n2 | e8873aca-aa30-411e-a10c-7c926ba38069 2.2.2.0/24 |
| 46185e06-460c-4909-a2ef-458d0f06c2bd | n1 | db8cdac2-9610-4c5b-9c4a-5f49e00cb47d 1.1.1.0/24 |
+--------------------------------------+---------+----------------------------------------------------------+
sdn@osc:~/devstack$ neutron subnet-list
+--------------------------------------+---------------------+---------------------+------------------------------------------+
| id | name | cidr | allocation_pools |
+--------------------------------------+---------------------+---------------------+------------------------------------------+
| e235035c-285a-455f-af14-1850792e6d9b | private-subnet | 10.0.0.0/24 | {"start": "10.0.0.2", "end": |
| | | | "10.0.0.254"} |
| db8cdac2-9610-4c5b-9c4a-5f49e00cb47d | sub1 | 1.1.1.0/24 | {"start": "1.1.1.2", "end": "1.1.1.254"} |
| e8873aca-aa30-411e-a10c-7c926ba38069 | sub2 | 2.2.2.0/24 | {"start": "2.2.2.2", "end": "2.2.2.254"} |
| 5666b860-6112-492b-8c8f-5efa3aefc7ed | ipv6-private-subnet | fddf:dac3:a016::/64 | {"start": "fddf:dac3:a016::2", "end": |
| | | | "fddf:dac3:a016:0:ffff:ffff:ffff:ffff"} |
+--------------------------------------+---------------------+---------------------+------------------------------------------+
sdn@osc:~/devstack$
sdn@osc:~/devstack$ neutron security-group-list
+--------------------------------------+---------+----------------------------------------------------------------------+
| id | name | security_group_rules |
+--------------------------------------+---------+----------------------------------------------------------------------+
| 4ed72860-6cb4-406d-a224-471959ae95c7 | default | egress, IPv4 |
| | | egress, IPv6 |
| | | ingress, IPv4, 635/tcp, remote_ip_prefix: 2.2.2.0/24 |
| | | ingress, IPv4, remote_group_id: 4ed72860-6cb4-406d-a224-471959ae95c7 |
| | | ingress, IPv6, remote_group_id: 4ed72860-6cb4-406d-a224-471959ae95c7 |
| 86cdf254-6d29-42b9-bf54-7613aaedd868 | default | egress, IPv4 |
| | | egress, IPv6 |
| | | ingress, IPv4, remote_group_id: 86cdf254-6d29-42b9-bf54-7613aaedd868 |
| | | ingress, IPv6, remote_group_id: 86cdf254-6d29-42b9-bf54-7613aaedd868 |
| cf39eaa2-e0d3-45b9-86e7-0b49f98be724 | default | egress, IPv4 |
| | | egress, IPv6 |
| | | ingress, IPv4, remote_group_id: cf39eaa2-e0d3-45b9-86e7-0b49f98be724 |
| | | ingress, IPv6, remote_group_id: cf39eaa2-e0d3-45b9-86e7-0b49f98be724 |
+--------------------------------------+---------+----------------------------------------------------------------------+
sdn@osc:~/devstack$ neutron agent-list
+--------------------+--------------------+-----------------+-------------------+-------+----------------+-----------------------+
| id | agent_type | host | availability_zone | alive | admin_state_up | binary |
+--------------------+--------------------+-----------------+-------------------+-------+----------------+-----------------------+
| 3a9bcd08-d228-4ef2 | L3 agent | osc | nova | :-) | True | neutron-l3-agent |
| -bce9-8836a8d1f454 | | | | | | |
| 3d778a93-1d6f-4460 | Metadata agent | osc | | :-) | True | neutron-metadata- |
| -967f-ec99aa6759e8 | | | | | | agent |
| 402261af-c8cf-41b9 | DHCP agent | osc | nova | :-) | True | neutron-dhcp-agent |
| -b7f4-0c92e60c9dba | | | | | | |
| 88fcb5fb-fb5d-46f0 | Open vSwitch agent | osc | | :-) | True | neutron-openvswitch- |
| -b25c-b9165488a145 | | | | | | agent |
| b6398de6-4c6d-4e85 | HyperV agent | WIN-7PSEDT471HE | | :-) | True | neutron-hyperv-agent |
| -837b-93566cc49648 | | | | | | |
+--------------------+--------------------+-----------------+-------------------+-------+----------------+-----------------------+
sdn@osc:~/devstack$ nova service-list
+----+------------------+-----------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+-----------------+----------+---------+-------+----------------------------+-----------------+
| 5 | nova-conductor | osc | internal | enabled | up | 2016-05-19T08:40:54.000000 | - |
| 7 | nova-scheduler | osc | internal | enabled | up | 2016-05-19T08:40:51.000000 | - |
| 8 | nova-consoleauth | osc | internal | enabled | up | 2016-05-19T08:40:54.000000 | - |
| 9 | nova-compute | WIN-7PSEDT471HE | nova | enabled | up | 2016-05-19T08:40:47.000000 | - |
+----+------------------+-----------------+----------+---------+-------+----------------------------+-----------------+
sdn@osc:~/devstack$ nova list
+--------------------------------------+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+----------+
| 33d5f2da-2fbe-494b-bea0-ac61d5a923dd | vm2 | ERROR | - | NOSTATE | |
+--------------------------------------+------+--------+------------+-------------+----------+
sdn@osc:~/devstack$ nova delete vm2
Request to delete server vm2 has been accepted.
sdn@osc:~/devstack$
sdn@osc:~/devstack$ nova boot --image Fedora --flavor 256 --nic net-id=46185e06-460c-4909-a2ef-458d0f06c2bd vm1
+--------------------------------------+-----------------------------------------------+
| Property | Value |
+--------------------------------------+-----------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | vm1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000003 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-4i7qppn6 |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | a8hsnUuFjVqQ |
| config_drive | |
| created | 2016-05-19T08:48:49Z |
| description | - |
| flavor | 256 (256) |
| hostId | |
| host_status | |
| id | 9d9629be-6f9d-468c-afeb-57cee6037715 |
| image | Fedora (37560002-8068-4226-8a9b-ea4cd02691b8) |
| key_name | - |
| locked | False |
| metadata | {} |
| name | vm1 |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | 1e6f6a59ee6e48378c9890589164ae08 |
| updated | 2016-05-19T08:48:50Z |
| user_id | d5fa17a7caa34d1981d555a6c0d50bce |
+--------------------------------------+-----------------------------------------------+
sdn@osc:~/devstack$
sdn@osc:~/devstack$
sdn@osc:~/devstack$
sdn@osc:~/devstack$
sdn@osc:~/devstack$ nova list
+--------------------------------------+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+----------+
| 9d9629be-6f9d-468c-afeb-57cee6037715 | vm1 | ERROR | - | NOSTATE | |
+--------------------------------------+------+--------+------------+-------------+----------+
sdn@osc:~/devstack$ nova show 9d9629be-6f9d-468c-afeb-57cee6037715
+--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | vm1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000003 |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-4i7qppn6 |
| OS-EXT-SRV-ATTR:root_device_name | /dev/sda |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| config_drive | |
| created | 2016-05-19T08:48:49Z |
| description | - |
| fault | {"message": "No valid host was found. There are not enough hosts available.", "code": 500, "details": " File \"/opt/stack/nova/nova/conductor/manager.py\", line 418, in build_instances |
| | context, request_spec, filter_properties) |
| | File \"/opt/stack/nova/nova/conductor/manager.py\", line 464, in _schedule_instances |
| | hosts = self.scheduler_client.select_destinations(context, spec_obj) |
| | File \"/opt/stack/nova/nova/scheduler/utils.py\", line 370, in wrapped |
| | return func(*args, **kwargs) |
| | File \"/opt/stack/nova/nova/scheduler/client/__init__.py\", line 51, in select_destinations |
| | return self.queryclient.select_destinations(context, spec_obj) |
| | File \"/opt/stack/nova/nova/scheduler/client/__init__.py\", line 37, in __run_method |
| | return getattr(self.instance, __name)(*args, **kwargs) |
| | File \"/opt/stack/nova/nova/scheduler/client/query.py\", line 32, in select_destinations |
| | return self.scheduler_rpcapi.select_destinations(context, spec_obj) |
| | File \"/opt/stack/nova/nova/scheduler/rpcapi.py\", line 126, in select_destinations |
| | return cctxt.call(ctxt, 'select_destinations', **msg_args) |
| | File \"/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py\", line 157, in call |
| | retry=self.retry) |
| | File \"/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py\", line 91, in _send |
| | timeout=timeout, retry=retry) |
| | File \"/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py\", line 466, in send |
| | retry=retry) |
| | File \"/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py\", line 457, in _send |
| | raise result |
| | ", "created": "2016-05-19T08:49:05Z"} |
| flavor | 256 (256) |
| hostId | |
| host_status | |
| id | 9d9629be-6f9d-468c-afeb-57cee6037715 |
| image | Fedora (37560002-8068-4226-8a9b-ea4cd02691b8) |
| key_name | - |
| locked | False |
| metadata | {} |
| name | vm1 |
| os-extended-volumes:volumes_attached | [] |
| status | ERROR |
| tenant_id | 1e6f6a59ee6e48378c9890589164ae08 |
| updated | 2016-05-19T08:49:07Z |
| user_id | d5fa17a7caa34d1981d555a6c0d50bce |
+--------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
sdn@osc:~/devstack$
sdn@osc:~/devstack$
sdn@osc:~/devstack$ neutron agent-list
+--------------------+--------------------+-----------------+-------------------+-------+----------------+-----------------------+
| id | agent_type | host | availability_zone | alive | admin_state_up | binary |
+--------------------+--------------------+-----------------+-------------------+-------+----------------+-----------------------+
| 3a9bcd08-d228-4ef2 | L3 agent | osc | nova | :-) | True | neutron-l3-agent |
| -bce9-8836a8d1f454 | | | | | | |
| 3d778a93-1d6f-4460 | Metadata agent | osc | | :-) | True | neutron-metadata- |
| -967f-ec99aa6759e8 | | | | | | agent |
| 402261af-c8cf-41b9 | DHCP agent | osc | nova | :-) | True | neutron-dhcp-agent |
| -b7f4-0c92e60c9dba | | | | | | |
| 88fcb5fb-fb5d-46f0 | Open vSwitch agent | osc | | :-) | True | neutron-openvswitch- |
| -b25c-b9165488a145 | | | | | | agent |
| b6398de6-4c6d-4e85 | HyperV agent | WIN-7PSEDT471HE | | xxx | True | neutron-hyperv-agent |
| -837b-93566cc49648 | | | | | | |
+--------------------+--------------------+-----------------+-------------------+-------+----------------+-----------------------+
sdn@osc:~/devstack$ nova list
+--------------------------------------+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+----------+
| 9d9629be-6f9d-468c-afeb-57cee6037715 | vm1 | ERROR | - | NOSTATE | |
+--------------------------------------+------+--------+------------+-------------+----------+

Revision history for this message
Vinod Kumar (vinod-kumar5) wrote :
description: updated
summary: - Hyper-V agent hangs on nova boot (with enable_security_group=true)
+ Hyper-V neutron agent hangs on nova boot (with
+ enable_security_group=true)
Revision history for this message
Claudiu Belu (cbelu) wrote :

Hello. What branch are you using?

Revision history for this message
Vinod Kumar (vinod-kumar5) wrote :

The issue is reproducible on
Controller on Liberty and Hyperv on Liberty.
Controller on Mitaka and Hyperv on Mitaka.
Controller on master and Hyperv on Mitaka.
Should be reproducible on Hyper-V latest also.

Revision history for this message
Vinod Kumar (vinod-kumar5) wrote :

As part of further analysis it was observed that main thread which is suppose to send agent status is blocked along with the rpc threads.
We suspected it could be oslo.messaging version that might be causing the issue, we tried with controller and hyperv neutron agent with same oslo.messaging version, but the issue was still seen.
Further, we tried taking stack dump of hyperv neutron agent thread from windows task manager and analysing it. It was observed that two threads were waiting on a object (probably a lock).
Due to lack of tools/options it was not possible to conclude on which object threads were waiting for.

Revision history for this message
Vinod Kumar (vinod-kumar5) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-hyperv (master)

Change abandoned by Claudiu Belu (<email address hidden>) on branch: master
Review: https://review.openstack.org/321452
Reason: Threading is now monkey patched.

Revision history for this message
Claudiu Belu (cbelu) wrote :

As far as I understand, this issue wasn't encountered anymore.

Changed in networking-hyperv:
status: New → Incomplete
Claudiu Belu (cbelu)
Changed in networking-hyperv:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.