timestamp mechanism in linux bridge false positives

Bug #1622833 reported by Kevin Benton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Kevin Benton

Bug Description

The linux bridge agent is picking up too many false positives in its detection mechanism for when devices have been modified locally. In the following the 4 tap devices attached to a particular bridge had timestamps that jumped forward even though none of the interfaces actually changed:

2016-09-13 00:13:38.744 14179 DEBUG neutron.plugins.ml2.drivers.agent._common_agent [req-82c02245-80fd-4712-baa6-cdd4033315d1 - -] Adding locally changed devices to updated set: set(['tap422b85d9-95', 'tap9b365584-34', 'tapee2684f8-51', 'tap66ef2d8e-3b']) scan_devices /opt/stack/new/neutron/neutron/plugins/ml2/drivers/agent/_common_agent.py:397
2016-09-13 00:13:38.744 14179 DEBUG neutron.plugins.ml2.drivers.agent._common_agent [req-82c02245-80fd-4712-baa6-cdd4033315d1 - -] Agent loop found changes! {'current': set(['tap422b85d9-95', 'tapee2684f8-51', 'tap6028e7a2-c0', 'tap9b365584-34', 'tap0960ffac-f9', 'tap7ba5f865-54', 'tap66ef2d8e-3b', 'tapfe427ba3-63', 'tap475f33ef-c3']), 'timestamps': {'tap422b85d9-95': 1473725618.73996, 'tapee2684f8-51': 1473725618.73996, 'tap6028e7a2-c0': None, 'tap9b365584-34': 1473725618.73996, 'tap0960ffac-f9': 1473725618.73996, 'tap7ba5f865-54': 1473725616.7399597, 'tap66ef2d8e-3b': 1473725618.73996, 'tapfe427ba3-63': 1473725616.7399597, 'tap475f33ef-c3': None}, 'removed': set([]), 'added': set([]), 'updated': set(['tap422b85d9-95', 'tap9b365584-34', 'tapee2684f8-51', 'tap66ef2d8e-3b'])} daemon_loop /opt/stack/new/neutron/neutron/plugins/ml2/drivers/agent/_common_agent.py:448

This leads to the agent refetching the details, which puts the port in BUILD and then back to ACTIVE. This leads to sporadic failures when tempest tests are asserting that a port should be in the ACTIVE status.

Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
milestone: none → newton-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/369179

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/369179
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a2bd0b4b53db8468681eb2905e2fbc2f9073869a
Submitter: Jenkins
Branch: master

commit a2bd0b4b53db8468681eb2905e2fbc2f9073869a
Author: Kevin Benton <email address hidden>
Date: Mon Sep 12 22:27:33 2016 -0700

    LinuxBridge: Use ifindex for logical 'timestamp'

    With Xenial (and maybe older versions), the modified timestamps
    in /sys/class/net/(device_name) are not stable. They appear to
    work for a period of time, and then when some kind of cache clears
    on the kernel side, all of the timestamps are reset to the latest
    access time.

    This was causing the Linux Bridge agent to think that the interfaces
    were experiencing local changes much more frequently than they actually
    were, resulting in more polling to the Neutron server and subsequently
    more BUILD->ACTIVE->BUILD->ACTIVE transitions in the logical model.

    The purpose of the timestamp patch was to catch rapid server REBUILD
    operations where the interface would be deleted and re-added within
    a polling interval. Without it, these would be stuck in the BUILD
    state since the agent wouldn't realize it needed to wire the ports.

    This patch switches to looking at the IFINDEX of the interfaces to
    use as a sort of logical timestamp. If an interface gets removed
    and readded, it will get a different index, so the original timestamp
    comparison logic will still work.

    In the future, the agent should undergo a larger refactor to just
    watch 'ip monitor' for netlink events to replace the polling of the
    interface listing and the timestamp logic entirely. However, this
    approach was taken due to the near term release and the ability to
    back-port it to older releases.

    This was verified with both Nova rebuild actions and Nova interface
    attach/detach actions.

    Change-Id: I016019885446bff6806268ab49cd5476d93ec61f
    Closes-Bug: #1622833

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0rc1

This issue was fixed in the openstack/neutron 9.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.