FIP disassociation takes longer in non DVR test scenario

Bug #1505571 reported by Sonu on 2015-10-13
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

Problem description:
With series of VM delete operation in openstack (4000 vms) with KVM compute nodes, the VM instance goes into ERROR state.
The error shown in Horizon UI is
"ConnectionFailed: Connection to neutron failed: HTTPConnectionPool(host='', port=9696): Read timed out. (read timeout=30)"

This happens because neutron takes more than 30 secs (actually around 80 secs) to delete one port, and nova sets the instance into ERROR state 'cz the default timeout of all neutron api(s) is set to 30 sec in nova.
This can be worked around, by increasing the timeout to 120 in nova.conf. But this cannot be recommended as the solution.

cat /etc/nova/nova.conf | grep url_timeout
url_timeout = 120

Sonu (sonu-sudhakaran) on 2015-10-13
Changed in neutron:
assignee: nobody → Sonu (sonu-sudhakaran)
tags: added: delete
tags: added: read timeout
The review attempts to solve the problem for non DVR cases in Kilo.
DVR code is re-factored for better performance in L release and there is no need to fix anything in Liberty.

Please describe your setup, which types of routers you use, how many of them you have, etc. Please provide debug logs and config files.

tags: removed: delete read timeout
Changed in neutron:
status: New → Incomplete
Sonu (sonu-sudhakaran) wrote :

Neutron release tested : Juno/stable
Compute environment: KVM (100 Hosts)
Networking configuration: centralized router, no DVR in picture
Number of tenant networks: 25
Number of Floating IPs: 4000

Sonu (sonu-sudhakaran) wrote :

horizon error

Sonu (sonu-sudhakaran) wrote :

stack trace

Sonu (sonu-sudhakaran) wrote :

stack trace

Sonu, the code proposed, what makes you think it's the culprit? Have you tested it locally?

tags: added: l3-dvr-backlog
Ryan Moats (rmoats) wrote :

It is still unclear from the description and comment stream, whether the original test was performed for both DVR and non-DVR cases and whether the problem still exists with liberty/master for the non-DVR case - the code commit addresses the non-DVR code path in kilo, but the comment about liberty/master only discusses DVR code path refactoring without being clear that the non-DVR code path is also fixed.

Sonu (sonu-sudhakaran) wrote :

We have tested this fix on both DVR and non DVR case in Juno. The DVR code is refactored majorly in L release, and this function is removed. The fix addresses the performance degradation under non DVR configuration in Juno and Kilo. For DVR cases, customer will have to move to Liberty to solve the performance issue.

This is really a catch-all bug, in the sense that a connection timeout during heavy load can occur for all sort of reasons. Please consider going more specific on the issue you're facing and adjust the description accordingly.

Sonu (sonu-sudhakaran) on 2015-11-04
summary: - VM delete operation fails with 'Connection to neutron failed - Read
- timeout' error
+ FIP disassociation takes longer in non DVR test scenario

Change abandoned by Sonu (<email address hidden>) on branch: stable/kilo
Reason: Not applicable in liberty due to major refactoring in DVR.

This bug is > 180 days without activity. We are unsetting assignee and milestone and setting status to Incomplete in order to allow its expiry in 60 days.

If the bug is still valid, then update the bug status.

Changed in neutron:
assignee: Sonu (sonu-sudhakaran) → nobody
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers