neutron-tempest-iptables_hybrid job failing with internal server error while listining ports

Bug #1810504 reported by Slawek Kaplonski on 2019-01-04
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
High
Slawek Kaplonski
Slawek Kaplonski (slaweq) wrote :

This is very strange issue for me. It looks that it fails in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L102 where there is something like:

    net = network_object.Network.get_object(
        context.get_admin_context(), id=port_res['network_id'])
    if net.qos_policy_id:
        qos_policy = policy_object.QosPolicy.get_network_policy(
            context.get_admin_context(), net.id)

But how it is even possible that there is no network which is configured in port?
From logs it doesn't look like it could be removed in parallel :/

Slawek Kaplonski (slaweq) wrote :

It's always failing same test:

tempest.api.network.admin.test_ports.PortsAdminExtendedAttrsTestJSON.test_list_ports_binding_ext_attr
or
tempest.api.network.admin.test_ports.PortsAdminExtendedAttrsIpV6TestJSON.test_list_ports_binding_ext_attr

Slawek Kaplonski (slaweq) wrote :

It looks that this started failing when we switched this job to python3

Fix proposed to branch: master
Review: https://review.openstack.org/628492

Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → Nate Johnston (nate-johnston)
status: Confirmed → In Progress
Slawek Kaplonski (slaweq) wrote :

I was able to reproduce this issue locally by running tempest.api.network tests couple of times on vm with Devstack. It happens once for 7 up to 10 runs.

Here is what I found. Problem comes from some network used in different test. GET /ports call in failing test is done by admin user so it will return all ports from all tenants. During preparation of list of ports API worker is trying to get QoS policy for ports. Sometimes it has on list of ports port which was concurrently removed by other API worker (network was from different test). In such case net object in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L102 will be None as this network was already deleted by other thread.

So I think that this is not very common case when it may happen in real life and simple fix by checking if network is not None in https://github.com/openstack/neutron/blob/master/neutron/services/qos/qos_plugin.py#L102 should be enough to fix this issue.

Changed in neutron:
assignee: Nate Johnston (nate-johnston) → Slawek Kaplonski (slaweq)
Slawek Kaplonski (slaweq) wrote :

Patch https://review.openstack.org/628439 was linked to this bug by mistake. It's not related to this one.

Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → Nate Johnston (nate-johnston)
Changed in neutron:
assignee: Nate Johnston (nate-johnston) → Slawek Kaplonski (slaweq)

Reviewed: https://review.openstack.org/628492
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=65a2f86aafa7637ec07aa0a313ea0fae39758608
Submitter: Zuul
Branch: master

commit 65a2f86aafa7637ec07aa0a313ea0fae39758608
Author: Nate Johnston <email address hidden>
Date: Fri Jan 4 10:28:01 2019 -0500

    Gracefully handle fetch network fail in qos extend port

    When the qos plugin is handling a port resource request through it's
    port resource request extension, sometimes the network a port is
    attached to is looked up and returns None. It may happen like that
    if network will be deleted in concurrent API request.

    Change-Id: Ide4acdf4c373713968f9d43274fb0c7550283c11
    Closes-Bug: #1810504

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential

This issue was fixed in the openstack/neutron 14.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers