Infinite router update in neutron L3 agent (HA)

Bug #1666549 reported by Roman Klimenko on 2017-02-21
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
High
MOS Neutron

Bug Description

After fresh deployment of environment and launched ostf tests (or rally), neutron l3 agent logs on nodes filled (every .003 second timestamp) with such traces:
http://paste.openstack.org/show/599851/
which causes cluster fall when log partition will filled up.

Environment: Fuel 9.0 upgraded to 9.2, fresh install
3 controllers/kafka + 3 computes + 4 storage ceph-osd + 1 LMA nodes

neutron agents 8.3.0:
neutron-dhcp-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - DHCP agent
neutron-l3-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - l3 agent
neutron-lbaasv2-agent 2:8.3.0-2~u14.04+mos1 all Neutron is a virtual network service for Openstack - LBaaSv2 agent
neutron-metadata-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - metadata agent
neutron-openvswitch-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - Open vSwitch agent

Steps to reproduce:
1. Deploy openstack witj Fuel 9.2
2. Create rally venv and run scenario
    NeutronNetworks.create_and_delete_routers (concurrency 100 and times 100, or more)
3. /var/log/neutron/l3-agent.log full of these traces.

affects: neutron → mos
tags: added: area-neutron
Ann Taraday (akamyshnikova) wrote :

PLease, attach neutron-server and neutron-l3-agent logs.

Changed in mos:
status: New → Incomplete
assignee: nobody → Ann Taraday (akamyshnikova)
Roman Klimenko (rklimenko) wrote :

attached logs

Related fix proposed to branch: 9.0/mitaka
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/31330

Changed in mos:
status: Incomplete → In Progress
Changed in mos:
importance: Undecided → High
Changed in mos:
milestone: none → 9.x-updates
Ann Taraday (akamyshnikova) wrote :

In the logs http://paste.openstack.org/show/601001/ for router 2fcdef4e-83fe-48b5-be0f-f45a631c1482 we get notification for router deletion, cleanup network, removing port and then started loop where router is trying to be deleted, but failed as HA port is already None.

Oleg Bondarev (obondarev) wrote :

Based on logs analysis I believe this might help: https://review.openstack.org/#/c/365653 (not in stable/mitaka)

Denis Meltsaykin (dmeltsaykin) wrote :

The fix is merged upstream, will be obtained with a sync.

Change abandoned by Oleg Bondarev <email address hidden> on branch: 9.0/mitaka
Review: https://review.fuel-infra.org/31330
Reason: Mitaka fix will be synced: https://review.openstack.org/#/c/440799/

Roman Klimenko (rklimenko) wrote :

https://review.openstack.org/#/c/365653 fixed the problem, thanks.

Ann Taraday (akamyshnikova) wrote :

Change is included in stable/newton https://review.openstack.org/#/c/365653, so it should be fixed for MOS 10.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers