agent_alive(): KeyError race problem

Bug #1796878 reported by Lucas Alvares Gomes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Medium
Lucas Alvares Gomes

Bug Description

The agent_alive() method may fail with a unhandled KeyError when it tries to determine whether a agent that has been already deleted from the AgentStats tracker is alive or not:

Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/usr/local/lib/python2.7/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron LOG.debug("Retry wrapper got retriable exception: %s", e)
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron self.force_reraise()
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron six.reraise(self.type_, self.value, self.tb)
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/usr/local/lib/python2.7/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron return f(*dup_args, **dup_kwargs)
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/opt/stack/neutron/neutron/db/agents_db.py", line 311, in agent_health_check
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron filters={'admin_state_up': [True]})
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/opt/stack/networking-ovn/networking_ovn/ml2/mech_driver.py", line 938, in fn
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron return op(results, new_method(*args, _driver=self, **kwargs))
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/opt/stack/networking-ovn/networking_ovn/ml2/mech_driver.py", line 965, in get_agents
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron for agent in _driver.agents_from_chassis(ch).values():
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/opt/stack/networking-ovn/networking_ovn/ml2/mech_driver.py", line 922, in agents_from_chassis
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron alive = self.agent_alive(chassis, agent_type)
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/opt/stack/networking-ovn/networking_ovn/ml2/mech_driver.py", line 876, in agent_alive
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron updated_at = stats.AgentStats.get_stat(id_).updated_at
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron File "/opt/stack/networking-ovn/networking_ovn/agent/stats.py", line 30, in get_stat
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron return self._agents[id_]
Oct 05 15:50:18 ubuntu neutron-server[30081]: ERROR neutron KeyError: 'metadata-29af7c8a-88f9-4d46-8e54-35aa040f199a'

Changed in networking-ovn:
importance: Undecided → Medium
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/608936

Changed in networking-ovn:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611915

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/608936
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=062a05ade43def985dcd735f13460a973a594939
Submitter: Zuul
Branch: master

commit 062a05ade43def985dcd735f13460a973a594939
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Oct 9 13:14:22 2018 +0100

    Fix: agent_alive() KeyError problem

    The agent_alive() method may fail with a unhandled KeyError when it
    tries to determine whether an agent that has been already deleted from
    the AgentStats tracker is alive or not.

    This patch handles the problem and in those cases it will just report
    the agent as "dead".

    Closes-Bug: #1796878
    Change-Id: Ieee97721af09474fbbd4f03ab74da409f1e12833

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.openstack.org/611915
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=781411c7b3084039889d5c6e64f45473ba20cc52
Submitter: Zuul
Branch: stable/rocky

commit 781411c7b3084039889d5c6e64f45473ba20cc52
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Oct 9 13:14:22 2018 +0100

    Fix: agent_alive() KeyError problem

    The agent_alive() method may fail with a unhandled KeyError when it
    tries to determine whether an agent that has been already deleted from
    the AgentStats tracker is alive or not.

    This patch handles the problem and in those cases it will just report
    the agent as "dead".

    Closes-Bug: #1796878
    Change-Id: Ieee97721af09474fbbd4f03ab74da409f1e12833
    (cherry picked from commit 062a05ade43def985dcd735f13460a973a594939)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 6.0.0.0b1

This issue was fixed in the openstack/networking-ovn 6.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.1.0

This issue was fixed in the openstack/networking-ovn 5.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.