centos7 train vm live migration stops network on vm for some minutes

Bug #1928299 reported by ignazio
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned
Train
Expired
Medium
Unassigned

Bug Description

Hello, I have upgraded my centos 7 openstack installation from Stein to Train.
On train I am facing an issue with live migration:
when a vm is migrated from one kvm node to another, it stops to respond to ping requests from some minutes.
I had the same issue on Stein and I resolved it with a workaround suggest by Sean Mooney where legacy port binding was used.

On train seems there aren't backported patches to solve the issue.

I enabled debug option on neutron and here there is the dhcp-agent.log from the exact time when the live migration started:
http://paste.openstack.org/show/805325/

Here there is the openvswitch-agent log from the source kvm node:

http://paste.openstack.org/show/805327/

Here there is the openvswich agent log from the destination kvm node:

http://paste.openstack.org/show/805329/

I am using openvswitch mechanism driver and iptables_hybrid firewall driver.

Please any help will be appreciated
Ignazio

Tags: ovs
tags: added: ovs
removed: train-ignazio
Revision history for this message
Oleg Bondarev (obondarev) wrote :

Can you please clarify the following:
 - does live migration succeed according to Nova?
 - do VM and VM port have ACTIVE state before/after the migration?
 - are pings OK right after migration and fail some minutes after, or pings fail initially and start working after some minutes?

Changed in neutron:
status: New → Incomplete
Revision history for this message
ignazio (cassano) wrote :

Hello Oleg, the live migration works fine as far as nova is concerned.

Pings fail initially and start working after some minutes.
When vm is migrated and stops to respond to ping, if I connect to the vm serial console and within the vm I try to ping the vm default gateway it starts to respond again.

Let me check is the vm port is active before/after the Migration.

I will respond asap.
Ignazio

Revision history for this message
ignazio (cassano) wrote :

The port before and after live migration is in status ACTIVE and admin_state UP.
Ignazio

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Ok, it seems the reason is dropped RARP packets, current bug seems to be a duplicate of https://bugs.launchpad.net/neutron/+bug/1815989 - please check it and mark as duplicate if you agree, or specify here why your case differs. Thanks.

Revision history for this message
ignazio (cassano) wrote :

I know this is the same of 1815989, Infact on stein I solved with some suggestions releated to this bug.
On train there are no solutions proposed or backported patches if I understood well.
I think someone must suggest a solution like legacy port binding as suggested for stein.
I talked with Rodolfo Alonso in IRC two days ago, and he suggested me to fire a new bug.
Ignazio

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Still I'm not sure why we need separate bug to track the same issue, leaving as Incomplete until difference is clarified.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron train because there has been no activity for 60 days.]

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.