binding_failed because of l2 agent assumed down

Bug #1244255 reported by Stanislaw Pitucha
50
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
neutron
Fix Released
High
Unassigned
Havana
Fix Released
High
Armando Migliaccio

Bug Description

Tempest test ServerAddressesTestXML failed on a change that does not involve any code modification.

https://review.openstack.org/53633

2013-10-24 14:04:29.188 | ======================================================================
2013-10-24 14:04:29.189 | FAIL: setUpClass (tempest.api.compute.servers.test_server_addresses.ServerAddressesTestXML)
2013-10-24 14:04:29.189 | setUpClass (tempest.api.compute.servers.test_server_addresses.ServerAddressesTestXML)
2013-10-24 14:04:29.189 | ----------------------------------------------------------------------
2013-10-24 14:04:29.189 | _StringException: Traceback (most recent call last):
2013-10-24 14:04:29.189 | File "tempest/api/compute/servers/test_server_addresses.py", line 31, in setUpClass
2013-10-24 14:04:29.189 | resp, cls.server = cls.create_server(wait_until='ACTIVE')
2013-10-24 14:04:29.189 | File "tempest/api/compute/base.py", line 143, in create_server
2013-10-24 14:04:29.190 | server['id'], kwargs['wait_until'])
2013-10-24 14:04:29.190 | File "tempest/services/compute/xml/servers_client.py", line 356, in wait_for_server_status
2013-10-24 14:04:29.190 | return waiters.wait_for_server_status(self, server_id, status)
2013-10-24 14:04:29.190 | File "tempest/common/waiters.py", line 71, in wait_for_server_status
2013-10-24 14:04:29.190 | raise exceptions.BuildErrorException(server_id=server_id)
2013-10-24 14:04:29.190 | BuildErrorException: Server e21d695e-4f15-4215-bc62-8ea645645a26 failed to build and is in ERROR status

From n-cpu.log (http://logs.openstack.org/33/53633/1/check/check-tempest-devstack-vm-neutron/4dd98e5/logs/screen-n-cpu.txt.gz#_2013-10-24_13_58_07_532):

 Error: Unexpected vif_type=binding_failed
 Traceback (most recent call last):
     set_access_ip=set_access_ip)
   File "/opt/stack/new/nova/nova/compute/manager.py", line 1413, in _spawn
     LOG.exception(_('Instance failed to spawn'), instance=instance)
   File "/opt/stack/new/nova/nova/compute/manager.py", line 1410, in _spawn
     block_device_info)
   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2084, in spawn
     write_to_disk=True)
   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3064, in to_xml
     disk_info, rescue, block_device_info)
   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2951, in get_guest_config
     inst_type)
   File "/opt/stack/new/nova/nova/virt/libvirt/vif.py", line 380, in get_config
     _("Unexpected vif_type=%s") % vif_type)
 NovaException: Unexpected vif_type=binding_failed
 TRACE nova.compute.manager [instance: e21d695e-4f15-4215-bc62-8ea645645a26]

Tags: ovs
Revision history for this message
Joe Gordon (jogo) wrote :
description: updated
summary: - Tempest test ServerAddressesTestXML fails
+ NovaException: Unexpected vif_type=binding_failed
Revision history for this message
Joe Gordon (jogo) wrote : Re: NovaException: Unexpected vif_type=binding_failed

logstash query:

message:"NovaException: Unexpected vif_type=binding_failed" AND filename:"logs/screen-n-cpu.txt"

Revision history for this message
Gary Kotton (garyk) wrote :
Download full text (5.2 KiB)

ML2 returns an invalid VUF type:

2013-10-24 13:54:39.818 2507 DEBUG amqp [-] using channel_id: 1 __init__ /usr/local/lib/python2.7/dist-packages/amqp/channel.py:71
2013-10-24 13:54:39.818 2507 DEBUG amqp [-] Channel open _open_ok /usr/local/lib/python2.7/dist-packages/amqp/channel.py:429
2013-10-24 13:54:40.465 2507 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_msg_id': u'dae0d2c2c71c4e0c96f029ea4cd61431', u'_context_read_deleted': u'no', u'_reply_q': u'reply_5b9f8ea5199a44b09b37f73f7d526862', u'_context_tenant_id': None, u'args': {u'host': u'devstack-precise-hpcloud-az2-604954', u'router_ids': None}, u'namespace': None, u'_unique_id': u'5b0bc46ec657447ea6e3ad6929023dda', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-10-24 13:54:40.462146', u'_context_user_id': None, u'method': u'sync_routers'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.466 2507 DEBUG neutron.openstack.common.rpc.amqp [-] unpacked context: {'user_id': None, 'roles': [u'admin'], 'tenant_id': None, 'is_admin': True, 'timestamp': u'2013-10-24 13:54:40.462146', 'project_id': None, 'read_deleted': u'no'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.514 2507 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_msg_id': u'5c9a0a48c86843e78aaf4d933a84bfe2', u'_context_read_deleted': u'no', u'_reply_q': u'reply_31ffaf94bfb8479193dc79ba87a2e593', u'_context_tenant_id': None, u'args': {u'device': u'a50f879f-daf8-4fe1-896f-01fa50e44631', u'agent_id': u'ovse6aec7996c40'}, u'namespace': None, u'_unique_id': u'6e8309527d65440d99664222ee7dd14b', u'_context_is_admin': True, u'version': u'1.1', u'_context_project_id': None, u'_context_timestamp': u'2013-10-24 13:54:37.631879', u'_context_user_id': None, u'method': u'get_device_details'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.514 2507 DEBUG neutron.openstack.common.rpc.amqp [-] unpacked context: {'user_id': None, 'roles': [u'admin'], 'tenant_id': None, 'is_admin': True, 'timestamp': u'2013-10-24 13:54:37.631879', 'project_id': None, 'read_deleted': u'no'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.515 2507 DEBUG neutron.db.l3_rpc_base [-] Checking router: a5ba89d8-af0e-4de1-af94-2f299cdd6200 for host: devstack-precise-hpcloud-az2-604954 _ensure_host_set_on_ports /opt/stack/new/neutron/neutron/db/l3_rpc_base.py:70
2013-10-24 13:54:40.534 2507 DEBUG neutron.plugins.ml2.managers [-] Attempting to bind port db786ea8-ef93-4f5a-8e0e-862b2299f605 on host devstack-precise-hpcloud-az2-604954 bind_port /opt/stack/new/neutron/neutron/plugins/ml2/managers.py:440
2013-10-24 13:54:40.534 2507 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind port db786ea8-ef93-4f5a-8e0e-862b2299f605 on network c117c209-dc08-46ef-b1ff-9a907572a31f bind_port /opt/stack/new/neutron/neutron/plugins/ml2/drivers/mech_agent.py:57
2013-10-24 13:54:40.537 2507 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Checking a...

Read more...

Revision history for this message
Yaguang Tang (heut2008) wrote :
Revision history for this message
Gary Kotton (garyk) wrote :
Download full text (5.2 KiB)

2013-10-24 13:54:39.818 2507 DEBUG amqp [-] using channel_id: 1 __init__ /usr/local/lib/python2.7/dist-packages/amqp/channel.py:71
2013-10-24 13:54:39.818 2507 DEBUG amqp [-] Channel open _open_ok /usr/local/lib/python2.7/dist-packages/amqp/channel.py:429
2013-10-24 13:54:40.465 2507 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_msg_id': u'dae0d2c2c71c4e0c96f029ea4cd61431', u'_context_read_deleted': u'no', u'_reply_q': u'reply_5b9f8ea5199a44b09b37f73f7d526862', u'_context_tenant_id': None, u'args': {u'host': u'devstack-precise-hpcloud-az2-604954', u'router_ids': None}, u'namespace': None, u'_unique_id': u'5b0bc46ec657447ea6e3ad6929023dda', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-10-24 13:54:40.462146', u'_context_user_id': None, u'method': u'sync_routers'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.466 2507 DEBUG neutron.openstack.common.rpc.amqp [-] unpacked context: {'user_id': None, 'roles': [u'admin'], 'tenant_id': None, 'is_admin': True, 'timestamp': u'2013-10-24 13:54:40.462146', 'project_id': None, 'read_deleted': u'no'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.514 2507 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_msg_id': u'5c9a0a48c86843e78aaf4d933a84bfe2', u'_context_read_deleted': u'no', u'_reply_q': u'reply_31ffaf94bfb8479193dc79ba87a2e593', u'_context_tenant_id': None, u'args': {u'device': u'a50f879f-daf8-4fe1-896f-01fa50e44631', u'agent_id': u'ovse6aec7996c40'}, u'namespace': None, u'_unique_id': u'6e8309527d65440d99664222ee7dd14b', u'_context_is_admin': True, u'version': u'1.1', u'_context_project_id': None, u'_context_timestamp': u'2013-10-24 13:54:37.631879', u'_context_user_id': None, u'method': u'get_device_details'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.514 2507 DEBUG neutron.openstack.common.rpc.amqp [-] unpacked context: {'user_id': None, 'roles': [u'admin'], 'tenant_id': None, 'is_admin': True, 'timestamp': u'2013-10-24 13:54:37.631879', 'project_id': None, 'read_deleted': u'no'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-10-24 13:54:40.515 2507 DEBUG neutron.db.l3_rpc_base [-] Checking router: a5ba89d8-af0e-4de1-af94-2f299cdd6200 for host: devstack-precise-hpcloud-az2-604954 _ensure_host_set_on_ports /opt/stack/new/neutron/neutron/db/l3_rpc_base.py:70
2013-10-24 13:54:40.534 2507 DEBUG neutron.plugins.ml2.managers [-] Attempting to bind port db786ea8-ef93-4f5a-8e0e-862b2299f605 on host devstack-precise-hpcloud-az2-604954 bind_port /opt/stack/new/neutron/neutron/plugins/ml2/managers.py:440
2013-10-24 13:54:40.534 2507 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind port db786ea8-ef93-4f5a-8e0e-862b2299f605 on network c117c209-dc08-46ef-b1ff-9a907572a31f bind_port /opt/stack/new/neutron/neutron/plugins/ml2/drivers/mech_agent.py:57
2013-10-24 13:54:40.537 2507 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Checking agent: {'binary': u'neutron-openvsw...

Read more...

Changed in neutron:
assignee: nobody → Armando Migliaccio (armando-migliaccio)
Changed in neutron:
status: New → Confirmed
Changed in nova:
status: New → Confirmed
Changed in neutron:
assignee: Armando Migliaccio (armando-migliaccio) → nobody
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Download full text (18.0 KiB)

This is caused by the server erroneously assuming the l2 agent is down. See the traces where binding fails because the agent is marked down, but then is assumed back on line.

This is due to poor settings of report and timeout intervals.

2013-11-01 21:13:35.261 2769 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Checking agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2013, 11, 1, 21, 13, 30), 'alive': False, 'topic': u'N/A', 'host': u'devstack-precise-hpcloud-az1-631654', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2013, 11, 1, 21, 5, 42), 'started_at': datetime.datetime(2013, 11, 1, 21, 5, 42), 'id': u'3acf236a-589a-4f51-be43-9b6f19d0ba70', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'10.5.153.104', u'bridge_mappings': {}, u'l2_population': False, u'devices': 14}} bind_port /opt/stack/new/neutron/neutron/plugins/ml2/drivers/mech_agent.py:59
2013-11-01 21:13:35.268 2769 WARNING neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind with dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2013, 11, 1, 21, 13, 30), 'alive': False, 'topic': u'N/A', 'host': u'devstack-precise-hpcloud-az1-631654', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2013, 11, 1, 21, 5, 42), 'started_at': datetime.datetime(2013, 11, 1, 21, 5, 42), 'id': u'3acf236a-589a-4f51-be43-9b6f19d0ba70', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'10.5.153.104', u'bridge_mappings': {}, u'l2_population': False, u'devices': 14}}
2013-11-01 21:13:35.268 2769 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind port f5129205-db1b-46e5-bef5-e9a1abb114c5 on network 692d1116-107a-4c04-be6a-7fff7b47334e bind_port /opt/stack/new/neutron/neutron/plugins/ml2/drivers/mech_agent.py:57
2013-11-01 21:13:35.271 2769 WARNING neutron.plugins.ml2.managers [-] Failed to bind port f5129205-db1b-46e5-bef5-e9a1abb114c5 on host devstack-precise-hpcloud-az1-631654
2013-11-01 21:13:35.287 2769 DEBUG neutron.openstack.common.rpc.amqp [-] received {u'_context_roles': [u'admin'], u'_context_read_deleted': u'no', u'_context_tenant_id': None, u'args': {u'agent_state': {u'agent_state': {u'topic': u'l3_agent', u'binary': u'neutron-l3-agent', u'host': u'devstack-precise-hpcloud-az1-631654', u'agent_type': u'L3 agent', u'configurations': {u'router_id': u'', u'gateway_external_network_id': u'', u'handle_internal_only_routers': True, u'use_namespaces': True, u'routers': 4, u'interfaces': 4, u'floating_ips': 0, u'interface_driver': u'neutron.agent.linux.interface.OVSInterfaceDriver', u'ex_gw_ports': 4}}}, u'time': u'2013-11-01T21:13:34.989761'}, u'namespace': None, u'_unique_id': u'989343937ab34d1190b68be196187860', u'_context_is_admin': True, u'version': u'1.0', u'_context_project_id': None, u'_context_timestamp': u'2013-11-01 21:05:42.202105', u'_context_user_id': None, u'method': u'report_state'} _safe_log /opt/stack/new/neutron/neutron/openstack/common/rpc/common.py:276
2013-11-01 21:13:35.287 2769 DEBUG neutron.openstack.common.rpc.amqp [-] un...

Changed in neutron:
assignee: nobody → Armando Migliaccio (armando-migliaccio)
no longer affects: nova
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/55000

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Dazhao Yu (dzyu) wrote : Re: NovaException: Unexpected vif_type=binding_failed

Hi Armando Migliaccio,

I found your code patch did not fixed this error. I have changed "agent_down_time" to 2000 in Controller node and Compute Node, then boot instance still get such error message. Please see my env log:

http://paste.openstack.org/show/50468/

Revision history for this message
Dazhao Yu (dzyu) wrote :
Revision history for this message
Dazhao Yu (dzyu) wrote :

Hi Armando,

Please see the whole log message, I have synced the time

Revision history for this message
Dazhao Yu (dzyu) wrote :

I just run "nova boot" to get the new log message, not the previous log message.

Jeremy Stanley (fungi)
no longer affects: openstack-ci
Revision history for this message
Robert Kukura (rkukura) wrote :

Hi Dazhao,

The reason binding failed in your most recent neutron server log is not that the L2 agent appears dead as before. Instead, its because the L2 agent does not have a mapping for the physical_network 'physnet1'. See the log:

2013-11-05 23:51:55.750 16413 DEBUG neutron.plugins.ml2.managers [-] NT-19C6DA8 Attempting to bind port 51904944-ab70-4cd4-893a-7a78b056523b on host wanghliu-10-7-0-153.sce.cn.ibm.com bind_port /usr/lib/python2.6/site-packages/neutron/plugins/ml2/managers.py:440
2013-11-05 23:51:55.751 16413 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] NT-53163AF Attempting to bind port 51904944-ab70-4cd4-893a-7a78b056523b on network 2b5d5943-f201-4dda-922a-679dcb4aa503 bind_port /usr/lib/python2.6/site-packages/neutron/plugins/ml2/drivers/mech_agent.py:57
2013-11-05 23:51:55.754 16413 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] NT-C9C8378 Checking agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2013, 11, 6, 5, 51, 52), 'alive': True, 'topic': u'N/A', 'host': u'wanghliu-10-7-0-153.sce.cn.ibm.com', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2013, 11, 4, 5, 29, 54), 'started_at': datetime.datetime(2013, 11, 6, 5, 48, 56), 'id': u'b7045650-d7e4-43cb-b10d-f725877c73e9', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'', u'bridge_mappings': {}, u'l2_population': False, u'devices': 0}} bind_port /usr/lib/python2.6/site-packages/neutron/plugins/ml2/drivers/mech_agent.py:59

<<<Note that bridge_mappings in the line above is an empty list.>>>

2013-11-05 23:51:55.755 16413 DEBUG neutron.plugins.ml2.drivers.mech_openvswitch [-] NT-9EF951A Checking segment: {'segmentation_id': 1001L, 'physical_network': u'physnet1', 'id': u'b04b63b4-80db-4a22-80db-463ff4262897', 'network_type': u'vlan'} for mappings: {} with tunnel_types: [] check_segment_for_agent /usr/lib/python2.6/site-packages/neutron/plugins/ml2/drivers/mech_openvswitch.py:48

<<<And that network_type is 'vlan' and physical_network is 'physnet1' in the line above, and this is the only network segment available to bind.>>>

2013-11-05 23:51:55.755 16413 WARNING neutron.plugins.ml2.managers [-] NT-C2C97DA Failed to bind port 51904944-ab70-4cd4-893a-7a78b056523b on host wanghliu-10-7-0-153.sce.cn.ibm.com

The port cannot be bound because the L2 agent on the node does not have connectivity to the needed physical network. Most likely, bridge_mappings needs to be properly configured for the L2 agent on the compute node.

If so, I think Armando's tuning did address the issue, although I agree with him that a better liveness algorithm really is needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/55000
Committed: http://github.com/openstack/neutron/commit/291048aba2fcaec8afee3011fb4c12c6ca0ebbba
Submitter: Jenkins
Branch: master

commit 291048aba2fcaec8afee3011fb4c12c6ca0ebbba
Author: armando-migliaccio <email address hidden>
Date: Fri Nov 1 15:47:22 2013 -0700

    Tune up report and downtime intervals for l2 agent

    If the neutron server erroneously thinks than the l2 agent is down
    it will fail to bind a port, which can lead to VM's spawn errors.
    However, the issue is only transient because the agent effectively
    is only 'late' in reporting back.

    Best solution would be an alpha-count algorithm (so that we can detect
    persistent failures more reliably), but for now let's be more tolerant
    assuming that the agent is down by waiting at least twice the report
    interval plus a tiny teeny bit.

    Change-Id: I544135ce1f6b7eaefb34ac44af8f5844d92ddd95
    Close-bug: #1244255

Changed in neutron:
status: In Progress → Fix Committed
summary: - NovaException: Unexpected vif_type=binding_failed
+ binding_failed because of l2 agent assumed down
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/56506

Revision history for this message
Anita Kuno (anteaya) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/57478

tags: added: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/havana)

Reviewed: https://review.openstack.org/57478
Committed: http://github.com/openstack/neutron/commit/ea25bc0051621d61f321735b5e8c5a771fe449af
Submitter: Jenkins
Branch: stable/havana

commit ea25bc0051621d61f321735b5e8c5a771fe449af
Author: armando-migliaccio <email address hidden>
Date: Fri Nov 1 15:47:22 2013 -0700

    Tune up report and downtime intervals for l2 agent

    If the neutron server erroneously thinks that the l2 agent is down
    it will fail to bind a port, which can lead to VM's spawn errors.
    However, the issue is only transient because the agent effectively
    is only 'late' in reporting back.

    Best solution would be an alpha-count algorithm (so that we can detect
    persistent failures more reliably), but for now let's be more tolerant
    assuming that the agent is down by waiting at least twice the report
    interval plus a tiny teeny bit.

    (cherry picked from commit 291048aba2fcaec8afee3011fb4c12c6ca0ebbba)

    Close-bug: #1244255

    Change-Id: I544135ce1f6b7eaefb34ac44af8f5844d92ddd95

tags: added: in-stable-havana
Changed in neutron:
importance: Undecided → High
milestone: none → icehouse-1
tags: added: ovs
removed: havana-backport-potential
Revision history for this message
Joe Gordon (jogo) wrote :

This doesn't appear to be fixed according to http://status.openstack.org/elastic-recheck/

Changed in neutron:
status: Fix Committed → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hitting this outside the gate on TripleO I think. Instances are failing to come up as 'binding_failed' early on, but later on work fine, leading me to believe that there is just a race between l2 agent and compute trying to consume the l2 agent... but I'm still wrapping my head around l2 agent.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Can you confirm that the the bind fails because the agent is considered transiently down?

Binding failures may occur for all sorts of reasons, without knowing when/where this occurs it's difficult to see whether we need to track a different bug.

Changed in neutron:
status: Confirmed → Incomplete
assignee: Armando Migliaccio (armando-migliaccio) → nobody
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-1 → icehouse-2
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

So, I don't know why the vif_type=binding_failed happens. I don't get any WARNING about trying to bind with a dead agent, so it is not reaching that part of the code. So my best guest is that this is happening because the host has no agents just yet:

https://git.openstack.org/cgit/openstack/neutron/tree/neutron/plugins/ml2/drivers/mech_agent.py#n59

A few minutes later all works fine, so I'm guessing we're just scheduling things to nova-compute before it definitely has a neutron agent to bind things to.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I've added nova as I'm having a hard time reading through nova and figuring out where it signals to the scheduler that it is available to have things scheduled to it. That should probably be _after_ it has an l2 agent. _OR_ the scheduler needs to become neutron-aware so that the scheduler can retry.

Changed in neutron:
assignee: nobody → Eugene Nikanorov (enikanorov)
Alan Pevec (apevec)
tags: removed: in-stable-havana
Revision history for this message
Anita Kuno (anteaya) wrote :

From Dec. 05 until now the only patch to hit this bug in logstash is https://review.openstack.org/#/c/59787/.

All 522 hits during that time period (logstash current compete history) occur when running tests on this patch.

While the source of the bug still needs to be addressed, perhaps patch 59787 might yield some clues, I don't think that this is a gate blocking bug, at least for any patch other than 59787, it seems to have block that patch quite effectively.

Revision history for this message
Joe Gordon (jogo) wrote :

This isn't a gate issue anymore, but this bug is hit when setting the api_workers to 4. And that shouldn't cause things to break so easily so I think high priority is still correct https://review.openstack.org/#/c/59787/.

Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → nobody
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-2 → none
milestone: none → icehouse-3
Changed in nova:
status: New → Invalid
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-3 → icehouse-rc1
Changed in neutron:
importance: High → Undecided
milestone: icehouse-rc1 → none
status: Incomplete → Fix Committed
milestone: none → icehouse-rc1
assignee: nobody → Armando Migliaccio (armando-migliaccio)
importance: Undecided → High
Changed in neutron:
status: Fix Committed → In Progress
Changed in neutron:
assignee: Armando Migliaccio (armando-migliaccio) → nobody
Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Is there a commit hash or gerrit change-id we can look for to ensure we have this fix?

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Eugene Nikanorov pointed to the Change-Id of the fix:
https://review.openstack.org/#/q/I544135ce1f6b7eaefb34ac44af8f5844d92ddd95,n,z

The fix was merged 3 month ago but we observe the same symptom right now in our CI devstack. We increased the agent_down_time to 60 at the moment. We'll have to wait to see if that helped.

Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: icehouse-rc1 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers