Fix DVR multinode upstream CI testing

Bug #1450604 reported by Armando Migliaccio
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Medium
Unassigned

Bug Description

This bug should capture any change required to get the DVR multi node job to run successfully and reliably.

Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
tags: added: l3-dvr-backlog
Changed in neutron:
status: New → Confirmed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

On existing Neutron patches, a comment of 'check experimental' will lead to the execution of the multi-node CI for DVR. As this graph shows:

http://graphite.openstack.org/render/?from=-10days&height=500&until=now&width=1200&bgcolor=ffffff&fgcolor=000000&yMax=100&yMin=0&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.check-tempest-dsvm-neutron-dvr.FAILURE,sum(stats.zuul.pipeline.check.job.check-tempest-dsvm-neutron-dvr.{SUCCESS,FAILURE})),%2736hours%27),%20%27check-tempest-dsvm-neutron-dvr%27),%27orange%27)&target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.experimental.job.check-tempest-dsvm-neutron-multinode-full.FAILURE,sum(stats.zuul.pipeline.experimental.job.check-tempest-dsvm-neutron-multinode-full.{SUCCESS,FAILURE})),%2736hours%27),%20%27check-tempest-dsvm-neutron-multinode-full%27),%27blue%27)

The failure rate for the multi-node job is pretty lousy.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Will look into it.

Ryan Moats (rmoats)
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Ryan Moats (rmoats)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Ryan Moats (rmoats) wrote :

The multinode dvr job is now in the pipeline as non voting and a comparison to the single node can be found at
http://goo.gl/EAugSi

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Good stuff...it'd be good if we could add multi and single jobs failure rate trends without DVR to the graph too, if it's not too much of an hassle, but don't mind me ;)

Revision history for this message
Ryan Moats (rmoats) wrote :

ask and ye shall receive: http://goo.gl/Y7LIYX

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

aren't you a rock star? Thanks!

Ryan Moats (rmoats)
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Oleg Bondarev (obondarev) wrote :

This race might be directly related to dvr multinode gate failures: https://bugs.launchpad.net/neutron/+bug/1521524

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Another culprit for dvr multinode job instability: https://bugs.launchpad.net/neutron/+bug/1522824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.openstack.org/253569
Reason: in favor of https://review.openstack.org/#/c/215467/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/215467
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c5fa665de3173f3ad82cc3e7624b5968bc52c08d
Submitter: Jenkins
Branch: master

commit c5fa665de3173f3ad82cc3e7624b5968bc52c08d
Author: shihanzhang <email address hidden>
Date: Fri Aug 21 09:51:59 2015 +0800

    ML2: update port's status to DOWN if its binding info has changed

    This fixes the problem that when two or more ports in a network
    are migrated to a host that did not previously have any ports in
    the same network, the new host is sometimes not told about the
    IP/MAC addresses of all the other ports in the network. In other
    words, initial L2population does not work, for the new host.

    This is because the l2pop mechanism driver only sends catch-up
    information to the host when it thinks it is dealing with the first
    active port on that host; and currently, when multiple ports are
    migrated to a new host, there is always more than one active port so
    the condition above is never triggered.

    The fix is for the ML2 plugin to set a port's status to DOWN when
    its binding info changes.

    This patch also fixes the bug when nova thinks it should not wait
    for any events from neutron because all ports are already active.

    Closes-bug: #1483601
    Closes-bug: #1443421
    Closes-Bug: #1522824
    Related-Bug: #1450604

    Change-Id: I342ad910360b21085316c25df2154854fd1001b2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/liberty)

Related fix proposed to branch: stable/liberty
Review: https://review.openstack.org/300539

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/300559

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/300539
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a38cb93dde1633005e9e66e6b7ecec9e726304bb
Submitter: Jenkins
Branch: stable/liberty

commit a38cb93dde1633005e9e66e6b7ecec9e726304bb
Author: venkata anil <email address hidden>
Date: Fri Apr 1 14:52:01 2016 +0000

    ML2: update port's status to DOWN if its binding info has changed

    This fixes the problem that when two or more ports in a network
    are migrated to a host that did not previously have any ports in
    the same network, the new host is sometimes not told about the
    IP/MAC addresses of all the other ports in the network. In other
    words, initial L2population does not work, for the new host.

    This is because the l2pop mechanism driver only sends catch-up
    information to the host when it thinks it is dealing with the first
    active port on that host; and currently, when multiple ports are
    migrated to a new host, there is always more than one active port so
    the condition above is never triggered.

    The fix is for the ML2 plugin to set a port's status to DOWN when
    its binding info changes.

    This patch also fixes the bug when nova thinks it should not wait
    for any events from neutron because all ports are already active.

    Closes-bug: #1483601
    Closes-bug: #1443421
    Closes-Bug: #1522824
    Related-Bug: #1450604
    (cherry picked from commit c5fa665de3173f3ad82cc3e7624b5968bc52c08d)

    Conflicts: neutron/plugins/ml2/drivers/l2pop/mech_driver.py

    Change-Id: I342ad910360b21085316c25df2154854fd1001b2

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/306300

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/kilo)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/300559

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Dave Walker (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/306300
Reason:
stable/kilo closed for 2015.1.4

This release is now pending its final release and no freeze exception has
been seen for this changeset. Therefore, I am now abandoning this change.

If this is not correct, please urgently raise a thread on openstack-dev.

More details at: https://wiki.openstack.org/wiki/StableBranch

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This bug is > 180 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
assignee: Ryan Moats (rmoats) → nobody
status: In Progress → Incomplete
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

We should probably file more tailored and up to date bug reports for more recent failures.

Changed in neutron:
status: Incomplete → Invalid
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This report has exhausted its useful life.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.