OVS port fails after hyper-v cluster live migration (Invalid argument)

Bug #1644122 reported by semen
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
compute-hyperv
Expired
Undecided
Unassigned

Bug Description

Hi Team, I'm using Mitaka cluster driver for Hyper-V with OVS 2.5 Windows port.
When I perform live migration, an instance loses network connection but succesfully migrates to another host.
There was a bug in Liberty where ovs port had not been migrated to target hosts but now it is seems to be okay, ovs port is on target host but adds with errors.

vswitchd.log (migration time) https://paste.ubuntu.com/23510786/

nova-compute.log (migration time) https://paste.ubuntu.com/23510782/

nova.conf https://paste.ubuntu.com/23510815/

neutron-ovs-agent.conf https://paste.ubuntu.com/23510816/

If I execute the command (ovs-vsctl --timeout=120 -- --if-exists del-port 4b26fd63-e779-49c9-b419-ac2c53ef8c9a ....) logged in nova manually then I get network connection even without restarting the services.

I can also restart hyper-v OVS and network connection becomes available.

I've also logged OVS debug rpc (another try of migration) and it contains "protocols":"OpenFlow10","datapath_version":"<unknown>"}

2016-11-21T13:10:55.017Z|04642|netdevwindows|DBG|construct device 4b26fd63-e779-49c9-b419-ac2c53ef8c9a, ovstype: 0. 2016-11-21T13:10:55.017Z|04643|dpif|WARN|system@ovs-system: failed to add 4b26fd63-e779-49c9-b419-ac2c53ef8c9a as port: Invalid argument

Another try: 2016-11-21T12:48:21.789Z|00444|dpif|DBG|system@ovs-system: device br-tun is on port 5 2016-11-21T12:48:21.789Z|00445|netlinksocket|DBG|received NAK error=0 (No such device) 2016-11-21T12:48:21.789Z|00446|netdevwindows|DBG|construct device 4b26fd63-e779-49c9-b419-ac2c53ef8c9a, ovstype: 0. 2016-11-21T12:48:21.789Z|00447|netlinksocket|DBG|received NAK error=0 (No such device) 2016-11-21T12:48:21.789Z|00448|netlink_socket|DBG|received NAK error=0 (Invalid argument) 2016-11-21T12:48:21.789Z|00449|dpif|WARN|system@ovs-system: failed to add 4b26fd63-e779-49c9-b419-ac2c53ef8c9a as port: Invalid argument

https://paste.ubuntu.com/23511010/

Claudiu Belu (cbelu)
no longer affects: nova
Revision history for this message
Claudiu Belu (cbelu) wrote :

Hello,

There are a few logs that I would like to see:

I see that you've attached the nova-compute.log from the source. Can you also include the nova-compute.log from the destination node? The live-migration operation is a 2-node operation, and it is logged on both nodes.

Also, the neutron-ovs-agent.log files from both nodes, during the live migration.

I will also try to replicate the issue.

Thanks,

Belu Claudiu

Revision history for this message
semen (smesilov) wrote :

Hello,
Thanks for the quick reply. All logs are below.

11(14):56 - 11(14):57 - restarted nova-compute, neutron-ovs-agent, openvswitch service.
11(14):59 - 12(15):00 - created new instance 'hypervinstance'. Source hosts - bc11. Network - vxlan.

Instance ID: 1a7da68c-6c24-40fe-9d96-883e26808936
OVS Port: 9a515dfc-85ad-444f-9fd9-39f6a796c56e
ovs-vsctl show output on bc11 (source): https://paste.ubuntu.com/23521848/
VM gets IP address by DCHP, connection is OK.

12(15):07 - migrating instance 1a7da68c-6c24-40fe-9d96-883e26808936 to host bc12 (target).
ovs-vsctl show output on bc12 (target): https://paste.ubuntu.com/23521863/
VM can't get IP address, no connection.
No "tag" option for instance OVS port on target node.

Zip attached:

23.11.2016 15:15 5 370 627 BC11_neutron-ovs-agent.log
21.11.2016 17:45 623 BC11_neutron_ovs_agent.conf
23.11.2016 15:15 561 876 BC11_nova-compute.log
23.11.2016 10:08 3 089 BC11_nova.conf
23.11.2016 15:14 5 811 990 BC11_ovs-vswitchd.log
23.11.2016 15:12 5 790 091 BC12_neutron-ovs-agent.log
23.11.2016 14:53 622 BC12_neutron_ovs_agent.conf
23.11.2016 15:12 495 012 BC12_nova-compute.log
23.11.2016 14:53 3 013 BC12_nova.conf
23.11.2016 15:13 6 608 400 BC12_ovs-vswitchd.log

Claudiu Belu (cbelu)
Changed in compute-hyperv:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Claudiu Belu (cbelu)
importance: High → Undecided
assignee: Claudiu Belu (cbelu) → nobody
status: Fix Committed → New
Claudiu Belu (cbelu)
Changed in compute-hyperv:
status: New → Incomplete
Revision history for this message
Claudiu Belu (cbelu) wrote :

Cannot reproduce the error.

You should upgrade to a new Windows OVS version, as well as a new OpenStack version (currently only Newton and Ocata are the supported stable releases).

Make sure that the neutron-ovs-agent is configured to have:

[OVS]
of_interface = ovs-ofctl
ovsdb_interface = vsctl

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for compute-hyperv because there has been no activity for 60 days.]

Changed in compute-hyperv:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.