Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario jobs

Bug #1815585 reported by Slawek Kaplonski
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
High
Unassigned

Bug Description

From time to time in neutron-tempest-plugin-scenario-linuxbridge job test neutron_tempest_plugin.scenario.test_floatingip.FloatingIPPortDetailsTest.test_floatingip_port_details is failing because of error like:

2019-02-12 09:19:02.673764 | controller | Captured traceback:
2019-02-12 09:19:02.673899 | controller | ~~~~~~~~~~~~~~~~~~~
2019-02-12 09:19:02.674023 | controller | b'Traceback (most recent call last):'
2019-02-12 09:19:02.674267 | controller | b' File "/opt/stack/neutron-tempest-plugin/neutron_tempest_plugin/scenario/test_floatingip.py", line 247, in test_floatingip_port_details'
2019-02-12 09:19:02.674429 | controller | b" fip = self._wait_for_fip_port_down(fip['id'])"
2019-02-12 09:19:02.674664 | controller | b' File "/opt/stack/neutron-tempest-plugin/neutron_tempest_plugin/scenario/test_floatingip.py", line 317, in _wait_for_fip_port_down'
2019-02-12 09:19:02.674814 | controller | b' raise exceptions.TimeoutException(message)'
2019-02-12 09:19:02.674975 | controller | b'tempest.lib.exceptions.TimeoutException: Request timed out'
2019-02-12 09:19:02.675289 | controller | b'Details: Floating IP 5547d655-01e0-4f14-a375-fd805a689299 attached port status failed to transition to DOWN (current status ACTIVE) within the required time (120 s).'
2019-02-12 09:19:02.675390 | controller | b''

Example of failure: http://logs.openstack.org/91/635691/1/gate/neutron-tempest-plugin-scenario-linuxbridge/3cc34c4/job-output.txt.gz#_2019-02-12_09_19_02_673764

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22attached%20port%20status%20failed%20to%20transition%20to%20DOWN%20(current%20status%20ACTIVE)%20within%20the%20required%20time%5C%22

Revision history for this message
Brian Haley (brian-haley) wrote :

The change in https://review.openstack.org/#/c/602694/ made this a little better by waiting, but I can confirm that this test still does fail occasionally.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I did some initial investigation on this. It looks that nova-compute received information to detach interface and told Libvirt to detach it:
http://logs.openstack.org/91/635691/1/gate/neutron-tempest-plugin-scenario-linuxbridge/3cc34c4/controller/logs/screen-n-cpu.txt.gz#_Feb_12_09_16_39_511365

So neutron-linuxbridge-agent found this event and removed port here: http://logs.openstack.org/91/635691/1/gate/neutron-tempest-plugin-scenario-linuxbridge/3cc34c4/controller/logs/screen-q-agt.txt.gz#_Feb_12_09_16_39_951803

And there is also matched even in neutron-server to switch port to DOWN:
http://logs.openstack.org/91/635691/1/gate/neutron-tempest-plugin-scenario-linuxbridge/3cc34c4/controller/logs/screen-q-svc.txt.gz#_Feb_12_09_16_39_973353

According to that it looks that all was fine and port should be set to DOWN but for some reason port status in response from neutron looked like:

{"port":{"id":"7b6ccd25-cc09-41a3-ba4b-b23169819c11","name":"","network_id":"0976b2ae-c8c4-4d3c-a1df-bbbded993f28","tenant_id":"5ecab117503f46b7af1a7e6c3707c2e7","mac_address":"fa:16:3e:60:df:15","admin_state_up":true,"status":"ACTIVE","fixed_ips":[{"subnet_id":"ad2dc4a4-1b02-45f9-9042-60b5788cd2d9","ip_address":"10.1.0.12"}],"device_id":"","device_owner":"","allowed_address_pairs":[],"extra_dhcp_opts":[],"security_groups":["fd9de87a-7f94-4def-ae63-871b5f220abc"],"description":"","binding:vnic_type":"normal","port_security_enabled":true,"propagate_uplink_status":false,"qos_policy_id":null,"dns_name":"","dns_assignment":[{"ip_address":"10.1.0.12","hostname":"host-10-1-0-12","fqdn":"host-10-1-0-12.openstackgate.local."}],"dns_domain":"","ip_allocation":"immediate","tags":[],"created_at":"2019-02-12T09:15:16Z","updated_at":"2019-02-12T09:16:47Z","revision_number":9,"project_id":"5ecab117503f46b7af1a7e6c3707c2e7"}}

It is here:
http://logs.openstack.org/91/635691/1/gate/neutron-tempest-plugin-scenario-linuxbridge/3cc34c4/controller/logs/tempest_log.txt.gz#_2019-02-12_09_16_50_122

Please note that port has empty device_id but status ACTIVE

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

According to last comment. I also checked that port status was actually changed from "ACTIVE to DOWN" because there was event to nova send here:
http://logs.openstack.org/91/635691/1/gate/neutron-tempest-plugin-scenario-linuxbridge/3cc34c4/controller/logs/screen-q-svc.txt.gz#_Feb_12_09_16_41_263557

And such event can be send only if port status was changed from ACTIVE to DOWN: https://github.com/openstack/neutron/blob/master/neutron/notifiers/nova.py#L194

So now question is, what switched this port to ACTIVE again?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-tempest-plugin (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/639067

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-tempest-plugin (master)

Reviewed: https://review.openstack.org/639067
Committed: https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=7451ad7fe0fede24c15e39ecd15e46dd6fbf4d86
Submitter: Zuul
Branch: master

commit 7451ad7fe0fede24c15e39ecd15e46dd6fbf4d86
Author: Slawek Kaplonski <email address hidden>
Date: Mon Feb 25 12:02:49 2019 +0100

    Log port details when FloatingIP port details test fails

    In case when port status is not transitioned to DOWN in
    test_floatingip.FloatingIPPortDetailsTest tests, log of
    all port info can be useful for further debugging.

    Change-Id: I0ba1efea68e6e53bd7df9bf84ecb3d73e95925ca
    Related-bug: #1815585

Revision history for this message
Slawek Kaplonski (slaweq) wrote : Re: Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario-linuxbridge

It happened again. And indeed it confirmed what I wrote in my previous comments here. Port is ACTIVE but it's "unbound" and device_id is empty.

Now we need to find out why it's active in such case.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Maybe it is somehow related to https://bugs.launchpad.net/neutron/+bug/1819446 but I will have to investigate that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-tempest-plugin (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/647816

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-tempest-plugin (master)

Reviewed: https://review.openstack.org/647816
Committed: https://git.openstack.org/cgit/openstack/neutron-tempest-plugin/commit/?id=5ddcf33322f1e257322369fff6416aa48c4214b9
Submitter: Zuul
Branch: master

commit 5ddcf33322f1e257322369fff6416aa48c4214b9
Author: Slawek Kaplonski <email address hidden>
Date: Tue Mar 26 18:00:58 2019 +0100

    Mark test_floatingip_port_details test as unstable

    Change-Id: I52f5552807c889c63a8583b2bfc124e9c89fca6f
    Related-Bug: #1815585

Revision history for this message
Slawek Kaplonski (slaweq) wrote : Re: Floating IP status failed to transition to DOWN in neutron-tempest-plugin-scenario-linuxbridge
summary: Floating IP status failed to transition to DOWN in neutron-tempest-
- plugin-scenario-linuxbridge
+ plugin-scenario jobs
tags: removed: linuxbridge
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.