Fuel for OpenStack

[10.0 swarm] Launch of instance fails due to PortBindingFailed

Bug #1674381 reported by Sergey Novikov on 2017-03-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Confirmed	High	Fuel Sustaining	Fuel for OpenStack 10.1

Bug Description

Detailed bug description:

The issue was found by https://product-ci.infra.mirantis.net/job/10.0.system_test.ubuntu.ha_neutron/213/testReport/fuel_tests.tests.test_neutron/TestNeutronTunHa/test_deploy_neutron_gre_ha/

Steps to reproduce:
            1. Create cluster
            2. Add 3 nodes with controller role
            3. Add 2 nodes with compute role
            4. Deploy the cluster
            5. Run network verification
            6. Run OSTF

Actual result:
AssertionError: Failed 1 OSTF tests; should fail 0 tests. Names of failed tests: - Launch instance, create snapshot, launch instance from snapshot (failure) Failed to get to expected status. In error state. Please refer to OpenStack logs for more details.

some details from logs
http://paste.openstack.org/show/603441/

Tags:

Revision history for this message

Sergey Novikov (snovikov) wrote on 2017-03-20:

test_failed_test_deploy_neutron_gre_ha-fuel-snapshot-2017-03-20_01-32-08.tar Edit (122.5 MiB, application/x-tar)

Changed in fuel:
milestone:	none → 10.1

Oleksiy Molchanov (omolchanov) on 2017-03-21

Changed in fuel:
assignee:	nobody → MOS Neutron (mos-neutron)

Vitaly Sedelnik (vsedelnik) on 2017-03-22

Changed in fuel:
status:	New → Confirmed
tags:	added: area-neutron

Oleg Bondarev (obondarev) on 2017-03-22

Changed in fuel:
assignee:	MOS Neutron (mos-neutron) → Oleg Bondarev (obondarev)

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2017-03-22:

From neutron logs:

Refusing to bind port c0ebb7e1-ffb9-4cfc-a418-46b1a8dc3dbe to dead agent

2017-03-20 01:32:25.458 17933 WARNING neutron.db.agents_db [req-faf604d7-11a7-4e32-a727-65171800240d - - - - -] Agent healthcheck: found 10 dead agents out of 14:
                Type Last heartbeat host
  Open vSwitch agent 2017-03-20 01:31:34 node-5.test.domain.local
            L3 agent 2017-03-20 01:31:39 node-2.test.domain.local
      Metadata agent 2017-03-20 01:31:34 node-2.test.domain.local
      Metadata agent 2017-03-20 01:31:41 node-1.test.domain.local
      Metadata agent 2017-03-20 01:31:32 node-4.test.domain.local
  Open vSwitch agent 2017-03-20 01:31:34 node-1.test.domain.local
  Open vSwitch agent 2017-03-20 01:31:32 node-3.test.domain.local
            L3 agent 2017-03-20 01:31:40 node-1.test.domain.local
  Open vSwitch agent 2017-03-20 01:31:38 node-4.test.domain.local
          DHCP agent 2017-03-20 01:31:32 node-2.test.domain.local

Possibly some problems with rabbit/messaging

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2017-03-29:

Node-1 started to report dead agents starting from 2017-03-20 01:29:40, while other nodes didn't: "found 14 active agents" on both nodes 2 and 4.

However there is a strange time jump on other controllers; node-2:

2017-03-20 01:32:39.586 5600 INFO neutron.wsgi [req-82c09c56-2b96-4a98-b209-1558903e3d37 6be4335fc7594520a6644ff2bc24da68 b6f9819e09e145eb8793ded57ad6f407 - - -] 10.109.54.8 - - [20/Mar/2017 01:32:39] "GET /v2.0/subnets.json?fields=id&fields=cidr&id=9a04f9ae-8b59-4551-88c1-831d24afa596&id=48a1bc6d-7c96-415c-bc6c-bdc6f84a9c85 HTTP/1.1" 200 370 0.133562
2017-03-20 01:33:24.949 5601 DEBUG neutron.wsgi [-] (5601) accepted ('10.109.54.2', 55042) server /usr/lib/python2.7/dist-packages/eventlet/wsgi.py:867

and node-4:

2017-03-20 01:32:43.967 23558 DEBUG neutron.api.rpc.handlers.resources_rpc [req-490fc077-a8fc-4ca5-927b-181425bf04ca - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPushToServerRpcCallback method report_agent_resource_versions called with arguments (<neutron.context.Context object at 0x7fa479994bd0>,) {u'version_map': {u'SubPort': u'1.0', u'QosPolicy': u'1.3', u'Trunk': u'1.0'}, u'agent_type': u'Open vSwitch agent', u'agent_host': u'node-2.test.domain.local'} wrapper /usr/lib/python2.7/dist-packages/oslo_log/helpers.py:47
2017-03-20 01:33:28.589 23559 DEBUG oslo_messaging._drivers.amqpdriver [-] received message with unique_id: 0015f4278ffc49b29e378035d5b7c10d __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:196

all service logs on nodes 2 and 4 have this jump (~40 sec)

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2017-03-29:

node-1 has this time jump (same ~40 sec) a bit earlier at 01:28:41.161:

2017-03-20 01:28:41.161 17933 DEBUG neutron.db.agents_db [req-faf604d7-11a7-4e32-a727-65171800240d - - - - -] Agent healthcheck: found 14 active agents agent_health_check /usr/lib/python2.7/dist-packages/neutron/db/agents_db.py:310
2017-03-20 01:29:26.181 17922 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: c6ff2fb951264277ad4fe3a907d0644b reply to reply_0f40b8804c2d4211bfc601d4462faf2b __call__ /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:194

And after this node-1 starts reporting dead agents.

Currently it looks like time syncing issue...

Revision history for this message

Oleg Bondarev (obondarev) wrote on 2017-03-29:

Looking at fuel master node logs can also see this time jump clearly; node-1 nova-api log:

2017-03-20T01:28:39.120193+00:00 info: 2017-03-20 01:28:39.465 24878 INFO nova.osapi_compute.wsgi.server [req-f4dbc9f9-31ee-45ca-9ea2-298ca47db84b 6be4335fc7594520a6644ff2bc24da68 b6f9819e09e145eb8793ded57ad6f407 - default default] 10.109.54.8 "POST /v2.1/os-floating-ips HTTP/1.1" status: 200 len: 483 time: 0.8551819
2017-03-20T01:28:41.369852+00:00 debug: 2017-03-20 01:29:26.009 24902 DEBUG nova.metadata.wsgi.server [req-25a6d42c-6337-4d15-b152-a0e77d253bd7 - - - - -] (24902) accepted ('10.109.54.8', 60288) server /usr/lib/python2.7/dist-packages/eventlet/wsgi.py:867

node-2 nova-api log:

2017-03-20T01:32:39.496879+00:00 info: 2017-03-20 01:32:39.840 24128 INFO nova.metadata.wsgi.server [-] 10.109.54.2 "OPTIONS / HTTP/1.0" status: 200 len: 234 time: 0.0004840
2017-03-20T01:32:40.334387+00:00 debug: 2017-03-20 01:33:24.980 24128 DEBUG nova.metadata.wsgi.server [req-0efcb845-b371-440c-a3e4-e08e71a7866d - - - - -] (24128) accepted ('10.109.54.8', 52580) server /usr/lib/python2.7/dist-packages/eventlet/wsgi.py:867

node-4:

2017-03-20T01:32:41.424439+00:00 info: 2017-03-20 01:32:41.776 4887 INFO nova.metadata.wsgi.server [-] 240.0.0.2 "OPTIONS / HTTP/1.0" status: 200 len: 234 time: 0.0005469
2017-03-20T01:32:44.256129+00:00 debug: 2017-03-20 01:33:28.903 4884 DEBUG nova.metadata.wsgi.server [req-4b076af4-a646-4feb-babc-6b611db632f5 - - - - -] (4884) accepted ('10.109.54.8', 35338) server /usr/lib/python2.7/dist-packages/eventlet/wsgi.py:867