live_migration failing due to network filter not found

Bug #746821 reported by Tushar Patil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Kei Masumoto

Bug Description

Steps to reproduce:-

nova.conf settings
use_cow_images=False

1) First I created one instance which is running on host ubuntu-openstack-compute-02
$root@ubuntu-openstack-network-api-server:/home/tpatil# euca-describe-instances
RESERVATION r-z0ks6nqu admin default
INSTANCE i-00000030 ami-0000000b 10.0.0.5 10.0.0.5 running flat (admin, ubuntu-openstack-compute-02) 0 m1.tiny 2011-03-31T22:33:25Z nova

2) Migrate instance running on ubuntu-openstack-compute-02 to another host ubuntu-openstack-compute-01 using nova-manage command
$nova/bin/nova-manage --flagfile=/etc/nova/nova.conf vm live_migration i-00000030 ubuntu-openstack-compute-01
Migration of i-00000030 initiated.Check its progress using euca-describe-instances.

3) Error in nova-compute.log on destination host.
2011-03-31 15:42:34,081 DEBUG nova.rpc [-] received {'_msg_id': '1e11547d29734cdf8419a518268622b1', '_context_read_deleted': False, '_context_request_id': 'VPPW4LMUT3B8XDVOO92Y', '_context_timestamp': '2011-03-31T22:42:05Z', '_context_is_admin': True, '_context_user': None, 'method': 'create_shared_storage_test_file', '_context_project': None, '_context_remote_address': None} from (pid=12463) _receive /home/tpatil/nova/nova/rpc.py:169
2011-03-31 15:42:34,082 DEBUG nova.rpc [-] unpacked context: {'timestamp': '2011-03-31T22:42:05Z', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': 'VPPW4LMUT3B8XDVOO92Y', 'read_deleted': False} from (pid=12463) _unpack_context /home/tpatil/nova/nova/rpc.py:333
2011-03-31 15:42:34,206 DEBUG nova.compute.manager [-] Creating tmpfile /nova/instances/tmp2FvNq8 to notify to other compute nodes that they should mount the same storage. from (pid=12463) create_shared_storage_test_file /home/tpatil/nova/nova/compute/manager.py:812
2011-03-31 15:42:34,814 DEBUG nova.rpc [-] received {'_msg_id': '0664ca60960a4882841a5b2eb313cf33', '_context_read_deleted': False, '_context_request_id': 'VPPW4LMUT3B8XDVOO92Y', 'args': {'filename': 'tmp2FvNq8'}, '_context_is_admin': True, '_context_timestamp': '2011-03-31T22:42:05Z', '_context_user': None, 'method': 'cleanup_shared_storage_test_file', '_context_project': None, '_context_remote_address': None} from (pid=12463) _receive /home/tpatil/nova/nova/rpc.py:169
2011-03-31 15:42:34,815 DEBUG nova.rpc [-] unpacked context: {'timestamp': '2011-03-31T22:42:05Z', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': 'VPPW4LMUT3B8XDVOO92Y', 'read_deleted': False} from (pid=12463) _unpack_context /home/tpatil/nova/nova/rpc.py:333
2011-03-31 15:42:35,118 DEBUG nova.rpc [-] received {'_msg_id': 'eb6b2604e0634b18b2f12b8b1477cf64', '_context_read_deleted': False, '_context_request_id': 'VPPW4LMUT3B8XDVOO92Y', 'args': {'cpu_info': '{"vendor": "Intel", "model": "Nehalem", "arch": "x86_64", "features": ["rdtscp", "xtpr", "tm2", "est", "vmx", "ds_cpl", "monitor", "pbe", "tm", "ht", "ss", "acpi", "ds", "vme"], "topology": {"cores": "4", "threads": "2", "sockets": "1"}}'}, '_context_is_admin': True, '_context_timestamp': '2011-03-31T22:42:05Z', '_context_user': None, 'method': 'compare_cpu', '_context_project': None, '_context_remote_address': None} from (pid=12463) _receive /home/tpatil/nova/nova/rpc.py:169
2011-03-31 15:42:35,118 DEBUG nova.rpc [-] unpacked context: {'timestamp': '2011-03-31T22:42:05Z', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': 'VPPW4LMUT3B8XDVOO92Y', 'read_deleted': False} from (pid=12463) _unpack_context /home/tpatil/nova/nova/rpc.py:333
2011-03-31 15:42:35,119 INFO nova.virt.libvirt_conn [-] Instance launched has CPU info:
{"vendor": "Intel", "model": "Nehalem", "arch": "x86_64", "features": ["rdtscp", "xtpr", "tm2", "est", "vmx", "ds_cpl", "monitor", "pbe", "tm", "ht", "ss", "acpi", "ds", "vme"], "topology": {"cores": "4", "threads": "2", "sockets": "1"}}
2011-03-31 15:42:35,148 INFO nova.virt.libvirt_conn [-] to xml...
:<cpu>
    <arch>x86_64</arch>
    <model>Nehalem</model>
    <vendor>Intel</vendor>
    <topology sockets="1" cores="4" threads="2"/>
    <features name="rdtscp" />
    <features name="xtpr" />
    <features name="tm2" />
    <features name="est" />
    <features name="vmx" />
    <features name="ds_cpl" />
    <features name="monitor" />
    <features name="pbe" />
    <features name="tm" />
    <features name="ht" />
    <features name="ss" />
    <features name="acpi" />
    <features name="ds" />
    <features name="vme" />
</cpu>

2011-03-31 15:42:35,730 DEBUG nova.rpc [-] received {'_msg_id': 'b6fca21f60c34221aa3491702fe13253', '_context_read_deleted': False, '_context_request_id': 'VPPW4LMUT3B8XDVOO92Y', 'args': {'instance_id': 48}, '_context_is_admin': True, '_context_timestamp': '2011-03-31T22:42:05Z', '_context_user': None, 'method': 'pre_live_migration', '_context_project': None, '_context_remote_address': None} from (pid=12463) _receive /home/tpatil/nova/nova/rpc.py:169
2011-03-31 15:42:35,730 DEBUG nova.rpc [-] unpacked context: {'timestamp': '2011-03-31T22:42:05Z', 'remote_address': None, 'project': None, 'is_admin': True, 'user': None, 'request_id': 'VPPW4LMUT3B8XDVOO92Y', 'read_deleted': False} from (pid=12463) _unpack_context /home/tpatil/nova/nova/rpc.py:333
2011-03-31 15:42:35,891 INFO nova.compute.manager [-] i-00000030 has no volume.
2011-03-31 15:42:35,919 DEBUG nova.utils [-] Running cmd (subprocess): ip link show dev vlan100 from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:35,926 DEBUG nova.utils [-] Running cmd (subprocess): ip link show dev br100 from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:35,933 DEBUG nova.utils [-] Running cmd (subprocess): sudo route -n from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:35,944 DEBUG nova.utils [-] Running cmd (subprocess): sudo ip addr show dev vlan100 scope global from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:35,954 DEBUG nova.utils [-] Running cmd (subprocess): sudo brctl addif br100 vlan100 from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:35,963 DEBUG nova.utils [-] Result was 1 from (pid=12463) execute /home/tpatil/nova/nova/utils.py:166
2011-03-31 15:42:36,004 INFO nova [-] called setup_basic_filtering in nwfilter
2011-03-31 15:42:36,005 INFO nova [-] ensuring static filters
2011-03-31 15:42:36,225 INFO nova [-] <nova.db.sqlalchemy.models.SecurityGroupIngressRule object at 0x44e64d0>
2011-03-31 15:42:36,225 INFO nova [-] <nova.db.sqlalchemy.models.SecurityGroupIngressRule object at 0x44e6850>
2011-03-31 15:42:36,247 DEBUG nova.utils [-] Attempting to grab semaphore "iptables" for method "apply"... from (pid=12463) inner /home/tpatil/nova/nova/utils.py:594
2011-03-31 15:42:36,247 DEBUG nova.utils [-] Attempting to grab file lock "iptables" for method "apply"... from (pid=12463) inner /home/tpatil/nova/nova/utils.py:599
2011-03-31 15:42:36,269 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t filter from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:36,281 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:36,301 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-save -t nat from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:36,312 DEBUG nova.utils [-] Running cmd (subprocess): sudo iptables-restore from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:36,323 DEBUG nova.utils [-] Running cmd (subprocess): sudo ip6tables-save -t filter from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:42:36,335 DEBUG nova.utils [-] Running cmd (subprocess): sudo ip6tables-restore from (pid=12463) execute /home/tpatil/nova/nova/utils.py:150
2011-03-31 15:43:05,624 ERROR nova [-] Exception during message handling
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "/home/tpatil/nova/nova/rpc.py", line 190, in _receive
(nova): TRACE: rval = node_func(context=ctxt, **node_args)
(nova): TRACE: File "/home/tpatil/nova/nova/compute/manager.py", line 908, in pre_live_migration
(nova): TRACE: self.driver.ensure_filtering_rules_for_instance(instance_ref)
(nova): TRACE: File "/home/tpatil/nova/nova/virt/libvirt_conn.py", line 1414, in ensure_filtering_rules_for_instance
(nova): TRACE: raise exception.Error(msg % locals())
(nova): TRACE: Error: Timeout migrating for i-00000030(instance-00000030)
(nova): TRACE:
2011-03-31 15:43:05,625 ERROR nova.rpc [-] Returning exception Timeout migrating for i-00000030(instance-00000030) to caller
2011-03-31 15:43:05,626 ERROR nova.rpc [-] ['Traceback (most recent call last):\n', ' File "/home/tpatil/nova/nova/rpc.py", line 190, in _receive\n rval = node_func(context=ctxt, **node_args)\n', ' File "/home/tpatil/nova/nova/compute/manager.py", line 908, in pre_live_migration\n self.driver.ensure_filtering_rules_for_instance(instance_ref)\n', ' File "/home/tpatil/nova/nova/virt/libvirt_conn.py", line 1414, in ensure_filtering_rules_for_instance\n raise exception.Error(msg % locals())\n', 'Error: Timeout migrating for i-00000030(instance-00000030)\n']

4) Also I could see following console messages on the destination host (on the terminal running nova-compute service)
libvir: Network Filter error : Network filter not found: no nwfilter with matching name 'nova-instance-instance-00000030'
libvir: Network Filter error : Network filter not found: no nwfilter with matching name 'nova-instance-instance-00000030'
libvir: Network Filter error : Network filter not found: no nwfilter with matching name 'nova-instance-instance-00000030'
libvir: Network Filter error : Network filter not found: no nwfilter with matching name 'nova-instance-instance-00000030'
libvir: Network Filter error : Network filter not found: no nwfilter with matching name 'nova-instance-instance-00000030'

Related branches

Revision history for this message
Tushar Patil (tpatil) wrote :

I am getting this problem in revision No 925.

Revision history for this message
Kei Masumoto (masumotok) wrote :

hold on. I'm reproducing...

Changed in nova:
assignee: nobody → Kei Masumoto (masumotok)
Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Kei Masumoto (masumotok) wrote :

The case an instance has only one NIC, this bug never occurs. According to error log, live migration should follow multi-NIC changes.

Revision history for this message
Kei Masumoto (masumotok) wrote :

I can reprodule this issue. Here is a detail explanatioin.

The procedures of live-migration is,
1. destination compute node prepares volumes/secgrp/etc config for coming instances
2. src compute node migrates instance to destination compute node.
3. src compute node updates db records.

Regarding to 1 above, destination compute node creates nwfilter-instance-instance-<id> through libvirt.
But because of multinic changes, nwfilter-instance-instance-<id>-<nic> is necessary, so some changes at nova.virt.libvirtConnection are necessary.

Detail changes are shown at lp:~nttdata/nova/lp746821

Changed in nova:
status: Confirmed → In Progress
Thierry Carrez (ttx)
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.