[Backport 1417267] neutron-ovs-agent on compute couldn't create interface

Bug #1417693 reported by Alexander Ignatov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Eugene Nikanorov
4.1.x
Won't Fix
High
Unassigned
5.0.x
Won't Fix
High
Unassigned
5.1.x
Won't Fix
High
Unassigned
6.0.x
Invalid
High
Eugene Nikanorov
6.1.x
Invalid
High
Eugene Nikanorov

Bug Description

This is a backport of patch https://bugs.launchpad.net/neutron/+bug/1417267

================
Original description
================

Steps to reproduce:

1. Create env with two powerful compute nodes and start to provision instances.
2. After 130 instance one of the compute nodes starting failing provision because of unable to create interfaces. Here is trace from logs. http://paste.openstack.org/show/164308/
3. Restarting of agent restore functionality of compute node.
4. After some more VMs provisioned ovs-agent on another node stop working with the same symptoms.
5. Restarting of agent restore functionality of compute node.

Tags: neutron ovs scale
tags: added: scale
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Scalability to 100 nodes was only certified since 6.0, marked as Won't Fix for preceding releases.

Revision history for this message
Dina Belova (dbelova) wrote :

Dmitry, as you see the desciption you need only two powerful compute nodes, not 100 to reproduce the bug :) But anyway :)

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Right now evidence show that the issue is outside of the neutron: ovs agent timeouts during different waiting of different rpc requests to neutron server.

Some of those timeouts lead to vif creation failure.

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I'm moving state to incomplete and lowering priority so the bug is not weighting on mos-neutron team.

The repro cases that we've seen show that the reason was due to ovs agent was not able to receive information from neutron-server during rabbitmq outage that was caused by Fuel monitoring.
So right now https://bugs.launchpad.net/mos/+bug/1423404 is considered as a root cause of this issue.
There's not much that could be done on neutron (ovs agent) side.

We will reopen bug if this analysis is wrong.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.