tempest scenario test_qos fails intermittently

Bug #1662109 reported by Jakub Libosvar on 2017-02-06
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
High
Slawek Kaplonski
Miguel Angel Ajo (mangelajo) wrote :

Ok, I'm taking this one. The test seems to progress, but the progress is slow and it seems to exceed the total timeout.

Changed in neutron:
assignee: nobody → Miguel Angel Ajo (mangelajo)
importance: Undecided → High

Fix proposed to branch: master
Review: https://review.openstack.org/430309

Changed in neutron:
status: New → In Progress
Miguel Angel Ajo (mangelajo) wrote :

2017-02-05 16:46:08,109 12021 DEBUG [neutron.tests.tempest.scenario.test_qos] time_elapsed = 8,total_bytes_read = 86508,cycle_data_read = 86508
2017-02-05 16:46:08,550 12021 DEBUG [neutron.tests.tempest.scenario.test_qos] time_elapsed = 0,total_bytes_read = 1048576,cycle_data_read = 962068

from the logs it looks like in "0" it received seconds 962068 bytes in the first 8 seconds 86508
if we take the mean 962068+86508 / 8 = 131 kB/s = 1Mbps

such trace should have activated on time_elapsed = 5
once, and then on completion or time_elapsed=10
but the system was nuts (high load on the gate)

Reviewed: https://review.openstack.org/430309
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b7213030fd59c6169b9a2d4cb9680cf5adbfe6d8
Submitter: Jenkins
Branch: master

commit b7213030fd59c6169b9a2d4cb9680cf5adbfe6d8
Author: Miguel Angel Ajo <email address hidden>
Date: Tue Feb 7 16:21:16 2017 +0100

    Simplify the QoS bandwidth test to increase reliability

    The initial implementation was measuring the bandwidth stability
    over segments of time. But recent failures on high gate pressure
    has shown that such stability can't be expected on the edge.

    The test now checks the average bandwidth during the whole transmission.

    Change-Id: Ic6a00f20ce76aba319ecdada79f68599c891cf29
    Closes-Bug: #1662109

Changed in neutron:
status: In Progress → Fix Released
Jakub Libosvar (libosvar) wrote :
Changed in neutron:
status: Fix Released → Confirmed
Miguel Angel Ajo (mangelajo) wrote :

hmm, there's still something wrong down there
2017-02-16 12:10:11,685 12786 DEBUG [neutron.tests.tempest.scenario.test_qos] time_elapsed = 0, total_bytes_read = 1048576, bytes_per_second = 1650318

Miguel Angel Ajo (mangelajo) wrote :

Looking at the logs of the agent here:

http://logs.openstack.org/07/436307/1/check/gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv/342948d/logs/screen-q-agt.txt.gz

I can see updates to the specific port (a5b6388d-6d1a-41d7-aa21-4cf65c5abcd) but nothing related to the QoS policy: 9be9a9a5-ba61-4ae5-ae28-689edd so, it seems like we have a genuine bug, where somehow the agent is not getting the qos_policy_id on the port when the port is updated.

So the question is, what is preventing such port update from reaching the agent via RPC properly... I'll keep looking at this.

Miguel Angel Ajo (mangelajo) wrote :

The server request for QoS policy association is: http://logs.openstack.org/07/436307/1/check/gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv/342948d/logs/screen-q-svc.txt.gz#_2017-02-22_06_31_58_211

I see the ovo_rpc code triggering, but not the ml2 notification somehow.

May be we should fix the QoS extension to consume the ovo Port object updates instead of the ML2 notifications?

Reviewed: https://review.openstack.org/437011
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e674034aa1d33ae5aa65ac7b809ca5dd4c2867c5
Submitter: Jenkins
Branch: master

commit e674034aa1d33ae5aa65ac7b809ca5dd4c2867c5
Author: Jakub Libosvar <email address hidden>
Date: Wed Feb 22 10:51:35 2017 -0500

    tempest: Skip QoS test until fixed

    The test is failing intermittently. In order to reach a better stability
    of the job running in-tree tempest tests, this patch skips the test
    until we come up with a proper fix.

    Change-Id: I37f1488db258f6a4d383fb472cb5433c65371ac5
    Related-bug: 1662109

Reviewed: https://review.openstack.org/448020
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=61183804bf81225f2177277639ed3e8b1d15c27e
Submitter: Jenkins
Branch: stable/ocata

commit 61183804bf81225f2177277639ed3e8b1d15c27e
Author: Jakub Libosvar <email address hidden>
Date: Wed Feb 22 10:51:35 2017 -0500

    tempest: Skip QoS test until fixed

    The test is failing intermittently. In order to reach a better stability
    of the job running in-tree tempest tests, this patch skips the test
    until we come up with a proper fix.

    Change-Id: I37f1488db258f6a4d383fb472cb5433c65371ac5
    Related-bug: 1662109
    (cherry picked from commit e674034aa1d33ae5aa65ac7b809ca5dd4c2867c5)

tags: added: in-stable-ocata

This issue was fixed in the openstack/neutron 11.0.0.0b1 development milestone.

Fix proposed to branch: master
Review: https://review.openstack.org/463268

Changed in neutron:
assignee: Miguel Angel Ajo (mangelajo) → YAMAMOTO Takashi (yamamoto)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/463268
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8528ea2b0e45379ce29ba6e1cfe17c5ae4197592
Submitter: Jenkins
Branch: master

commit 8528ea2b0e45379ce29ba6e1cfe17c5ae4197592
Author: YAMAMOTO Takashi <email address hidden>
Date: Mon May 8 16:49:06 2017 +0900

    Disable QoS scenario tests differently

    The test was disabled due to some issues in the reference implementation.
    CIs for other implementations might not want to disable it.

    Closes-Bug: #1689238
    Related-Bug: #1662109
    Change-Id: I36357e2ef967db3a73c2341903cd18f5109a006b

tags: added: neutron-proactive-backport-potential

Reviewed: https://review.openstack.org/471620
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1df1508d6b4abb27feda64c5401d5d986f88afd7
Submitter: Jenkins
Branch: stable/ocata

commit 1df1508d6b4abb27feda64c5401d5d986f88afd7
Author: YAMAMOTO Takashi <email address hidden>
Date: Mon May 8 16:49:06 2017 +0900

    Disable QoS scenario tests differently

    The test was disabled due to some issues in the reference implementation.
    CIs for other implementations might not want to disable it.

    Closes-Bug: #1689238
    Related-Bug: #1662109
    Change-Id: I36357e2ef967db3a73c2341903cd18f5109a006b
    (cherry picked from commit 8528ea2b0e45379ce29ba6e1cfe17c5ae4197592)
    Conflicts:
     neutron/tests/contrib/gate_hook.sh

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/488855
Reason: It's duplicate of https://review.openstack.org/#/c/468326

Reviewed: https://review.openstack.org/468326
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fe0bb07113a00a4d8a56dc08b004bf758c505e4b
Submitter: Jenkins
Branch: master

commit fe0bb07113a00a4d8a56dc08b004bf758c505e4b
Author: YAMAMOTO Takashi <email address hidden>
Date: Fri May 26 18:16:21 2017 +0900

    Enable QoS scenario tests

    This test has been disabled since [1].
    But it seems the failure ratio is not so high with
    the current code. Let's re-enable it now and see how it goes.

    Note: This same test has been enabled on the gate jobs for
    other implementation for a while. It doesn't fail much there
    either. (networking-midonet)

    [1] I37f1488db258f6a4d383fb472cb5433c65371ac5

    Related-Bug: #1662109
    Change-Id: Ia39c73189ad8a3331c1911989fe69428f064f7a6

Fix proposed to branch: master
Review: https://review.openstack.org/491244

Changed in neutron:
assignee: YAMAMOTO Takashi (yamamoto) → Slawek Kaplonski (slaweq)

Reviewed: https://review.openstack.org/497928
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9f88a3ab05f472f760b228a0bd7549f38e414e43
Submitter: Jenkins
Branch: stable/ocata

commit 9f88a3ab05f472f760b228a0bd7549f38e414e43
Author: YAMAMOTO Takashi <email address hidden>
Date: Fri May 26 18:16:21 2017 +0900

    Enable QoS scenario tests

    This test has been disabled since [1].
    But it seems the failure ratio is not so high with
    the current code. Let's re-enable it now and see how it goes.

    Note: This same test has been enabled on the gate jobs for
    other implementation for a while. It doesn't fail much there
    either. (networking-midonet)

    [1] I37f1488db258f6a4d383fb472cb5433c65371ac5

    Related-Bug: #1662109
    Change-Id: Ia39c73189ad8a3331c1911989fe69428f064f7a6
    (cherry picked from commit fe0bb07113a00a4d8a56dc08b004bf758c505e4b)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/491244
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Slawek Kaplonski (slaweq) wrote :

Since quite long time I can't find such issues in the gate.
If it will happen again we can probably reopen this bug or report new one and work on it again.

Changed in neutron:
status: In Progress → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers