[regression] Nova Live migration doesn't work as expected: Failed 1 OSTF test: Instance live migration (failure)

Bug #1576688 reported by EduardFazliev
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Albert Syriy
8.0.x
Fix Released
High
Denis Meltsaykin
Mitaka
Fix Released
High
Albert Syriy
Newton
Fix Released
High
Albert Syriy

Bug Description

Detailed bug description:

Traceback (most recent call last):
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-devops-venv/local/lib/python2.7/site-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-devops-venv/local/lib/python2.7/site-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-devops-venv/local/lib/python2.7/site-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/system_test/core/factory.py", line 37, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/system_test/helpers/decorators.py", line 40, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/system_test/actions/ostf_actions.py", line 46, in health_check
    failed_test_name=getattr(self, 'failed_test_name', None))
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/fuelweb_test/models/fuel_web_client.py", line 1209, in run_ostf
    failed_test_name=failed_test_name, test_sets=test_sets)
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/fuelweb_test/__init__.py", line 59, in wrapped
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0_create_param_environment/fuel-qa/fuelweb_test/models/fuel_web_client.py", line 273, in assert_ostf_run
    indent=1)))
AssertionError: Failed 1 OSTF tests; should fail 0 tests. Names of failed tests:
  - Instance live migration (failure) Actual value - node-2.test.domain.local,

Steps to reproduce:
Deploy ISO 258 9.0_all (http://srv41-bud.infra.mirantis.net/fuelweb-iso/fuel-9.0-258-2016-04-28_14-00-00.iso) with fuel-qa and fuel-devops 2.9.20 with configuration:
    - 3 controllers, 2 compute
    - VLAN
    - DVR
    - Sahara
    - Ceilometer with Mongo
    - Ceph, Rados

Expected results:
All OSTF tests are passed.

Actual result:
 Failed 1 OSTF tests:
  - Instance live migration
Target component: Nova
Scenario:
1. Create a new security group.
2. Create an instance using the new security group.
3. Assign floating ip
4. Check instance connectivity by floating ip
5. Find host to migrate
6. Migrate instance
(!NB) 7. Check instance host - This step fails due to Healthcheck
8. Check connectivity to migrated instance by floating ip
9. Remove floating ip
10. Delete instance.

Reproducibility:
This error persist through many deploys.

Workaround:
None at the moment

Impact:
Deployment cannot be completed.

Description of the environment:
{"release": "9.0", "auth_required": true, "api": "1", "openstack_version": "mitaka-9.0", "feature_groups": []}

Operation system: Linux nailgun.test.domain.local 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31 16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

 Reference architecture: HA

Network architectire: Neutron with VLAN segmentation

Additional information:
diagnostic-snapshot is attached

Revision history for this message
EduardFazliev (efazliev) wrote :
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
milestone: none → 9.0
tags: added: blocker-for-qa
Revision history for this message
Yury Tregubov (ytregubov) wrote :

The similar nova tests with live migration are failed on CI sicne about 9.0 mitaka iso #262:
https://mirantis.testrail.com/index.php?/tests/view/5187950&group_by=cases:section_id&group_order=asc&group_id=23085

The tests are failed to wait till the hypervisor for the VM is changed.
Actual code of the wait is https://github.com/Mirantis/mos-integration-tests/blob/master/mos_tests/nova/nova_test.py#L530

The tests were passed on 9.0 mitaka iso #254: https://mirantis.testrail.com/index.php?/tests/view/4938575

summary: - Failed 1 OSTF test: Instance live migration (failure)
+ [regression] Nova Live migration doesn't work as expected: Failed 1 OSTF
+ test: Instance live migration (failure)
Revision history for this message
Oleksiy Butenko (obutenko) wrote :

This bug affects Tempest tests results, marked as blocker for QA team.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Migration fails as one libvirtd daemon can't connect to another one:

2016-04-29 12:10:51.162 16241 ERROR nova.virt.libvirt.driver [req-01a6cad4-5d70-4086-8554-907d48777914 b58b04ab93924d3fa382e1ee0f17ae1f 0d91ed8
b75bc4f648368639732ba8c64 - - -] [instance: 8091d537-b975-4781-8680-e0abdd834255] Live Migration failure: operation failed: Failed to connect t
o remote libvirt URI qemu+tcp://node-3.test.domain.local/system: unable to connect to server at 'node-3.test.domain.local:16509': Connection refused

We are trying to find an environment to check why that happens.

Revision history for this message
Nastya Urlapova (aurlapova) wrote :
tags: added: swarm-blocker
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

@Roma, you can ask Aleksey Stepanov, he can provide env for debug.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

I've checked Aleksey's environment and see that libvirtd *does not* listen on a TCP socket at all:

http://paste.openstack.org/show/496057/

At the same time *it is* configured to listen for TCP connections in /etc/libvirt/libvirtd.conf and /etc/default/libvirtd:

root@node-2:~# cat /etc/default/libvirtd
# Defaults for libvirtd initscript (/etc/init.d/libvirtd)
# This is a POSIX shell fragment

# Start libvirtd to handle qemu/kvm:
start_libvirtd="yes"

# options passed to libvirtd, add "-l" to listen on tcp
libvirtd_opts="-d -l"

# pass in location of kerberos keytab
#export KRB5_KTNAME=/etc/libvirt/libvirt.keytab

# Whether to mount a systemd like cgroup layout (only
# useful when not running systemd)
#mount_cgroups=yes
# Which cgroups to mount
#cgroups="memory devices"

root@node-2:~# cat /etc/libvirt/libvirtd.conf | grep listen
# Flag listening for secure TLS connections on the public TCP/IP port.
# NB, must pass the --listen flag to the libvirtd process for this to
listen_tls = 0
# NB, must pass the --listen flag to the libvirtd process for this to
listen_tcp = 1
#listen_addr = "192.168.0.1"

Based on the ps output, -l and -d options were ignored on start. Neither -f (config file path) was passed to libvirtd. So it runs with default settings.

My current understanding is that https://review.fuel-infra.org/gitweb?p=packages/trusty/libvirt.git;a=commit;h=b109ae4033b0eb02adfee7c55d163081975e7551 changed the way how libvirtd is started and options specified in /etc/default/libvirtd are ignored, /etc/libvirt/libvirtd.conf is not passed to libvirtd executable on start.

MOS Linux team, please take a look at this. Our puppet manifests specifically configure libvirtd to start with -l (listen) and -d (daemon) options.

tags: added: area-linux
Revision history for this message
Timofey Durakov (tdurakov) wrote :

Futher investigation shows that /etc/puppet/modules/nova/manifests/migration/libvirt.pp doesn't work and this lines isn't applied to env:

file_line { '/etc/sysconfig/libvirtd libvirtd args':
        path => '/etc/sysconfig/libvirtd',
        line => 'LIBVIRTD_ARGS="--listen"',
        match => 'LIBVIRTD_ARGS=',
        tag => 'libvirt-file_line',
      }

which cause /etc/init/libvirtd.conf to fail importing '--listen' flag. There are 2 options to fix that:
1. make puppet script works properly
2. switch to usage /etc/defaul/libvirtd instead of /etc/sysconfig/libvirtd. In this case upstart should source it instead, and also switch to usage libvirtd_opts variable instead of LIBVIRTD_ARGS, as /etc/default/libvirtd defines libvirtd_opts.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/libvirt (master)

Fix proposed to branch: master
Change author: Dmitry Teselkin <email address hidden>
Review: https://review.fuel-infra.org/20345

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

We've applied temporary workaround for tests, so it is not a blocker for QA now but anyway it is critical issue for customers and we need to fix it in MOS 9.0.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Reproduced the same http://paste.openstack.org/show/496861/ on env with configuration:
Nodes:
Controller 1
Compute 2
Cinder 1 - LVM
Ceph OSD 1 - Ceph for Ephemeral Volumes

Network settings: Neutron VLAN, TLS off.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

My observation done on iso 278
# shotgun2 short-report
cat /etc/fuel_build_id:
 278
cat /etc/fuel_build_number:
 278
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6343.noarch
 fuel-misc-9.0.0-1.mos8332.noarch
 fuel-mirror-9.0.0-1.mos133.noarch
 shotgun-9.0.0-1.mos88.noarch
 fuel-openstack-metadata-9.0.0-1.mos8677.noarch
 fuel-notify-9.0.0-1.mos8332.noarch
 fuel-ostf-9.0.0-1.mos930.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8677.noarch
 python-fuelclient-9.0.0-1.mos313.noarch
 fuel-9.0.0-1.mos6343.noarch
 fuel-utils-9.0.0-1.mos8332.noarch
 fuel-nailgun-9.0.0-1.mos8677.noarch
 rubygem-astute-9.0.0-1.mos742.noarch
 fuel-library9.0-9.0.0-1.mos8332.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-agent-9.0.0-1.mos274.noarch
 fuel-ui-9.0.0-1.mos2676.noarch
 fuel-setup-9.0.0-1.mos6343.noarch
 nailgun-mcagents-9.0.0-1.mos742.noarch
 python-packetary-9.0.0-1.mos133.noarch
 fuelmenu-9.0.0-1.mos269.noarch
 fuel-bootstrap-cli-9.0.0-1.mos274.noarch
 fuel-migrate-9.0.0-1.mos8332.noarch

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This issue blocks acceptance testing of MOS 9.0 (more than 30 test cases with live migration).
Marked as blocker-for-qa.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This defect blocks 30+ acceptance tests for Nova with live migration.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/trusty/libvirt (master)

Reviewed: https://review.fuel-infra.org/20345
Submitter: Igor Yozhikov <email address hidden>
Branch: master

Commit: eb63fba049c18cd2ea860b8c9ef05c04c0f10dd5
Author: Dmitry Teselkin <email address hidden>
Date: Fri May 13 10:34:41 2016

Fix reading libvirt default config on start

On debian systems /etc/default/libvirtd should be used
for setting daemon configuration options. The important
option is -l [listen]. The second config file
/etc/libvirt/libvirtd.conf is used as well.
The root cause of the LP#1576688 was missing of reading
config parameters from config files when migrated to the
Upstart

While at it make sure /var/run/libvirt directory exists
before starting libvirtd (the content of /var/run is not
preserved across reboots) so the daemon can start after
rebooting the host.

Change-Id: I1cd150a9285244d19d64e44737caf89d38703f23
Closes-bug: #1576688
Closes-bug: #1543951

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/libvirt (9.0)

Fix proposed to branch: 9.0
Change author: Dmitry Teselkin <email address hidden>
Review: https://review.fuel-infra.org/20674

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/trusty/libvirt (9.0)

Reviewed: https://review.fuel-infra.org/20674
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0

Commit: 5639d6ba7d73febec521ca8548edb8670d4c6b8f
Author: Dmitry Teselkin <email address hidden>
Date: Mon May 16 07:44:07 2016

Fix reading libvirt default config on start

On debian systems /etc/default/libvirtd should be used
for setting daemon configuration options. The important
option is -l [listen]. The second config file
/etc/libvirt/libvirtd.conf is used as well.
The root cause of the LP#1576688 was missing of reading
config parameters from config files when migrated to the
Upstart

While at it make sure /var/run/libvirt directory exists
before starting libvirtd (the content of /var/run is not
preserved across reboots) so the daemon can start after
rebooting the host.

Change-Id: I1cd150a9285244d19d64e44737caf89d38703f23
Closes-bug: #1576688
Closes-bug: #1543951
(cherry picked from commit eb63fba049c18cd2ea860b8c9ef05c04c0f10dd5)

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Fix merged ^

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/libvirt (8.0)

Fix proposed to branch: 8.0
Change author: Denis V. Meltsaykin <email address hidden>
Review: https://review.fuel-infra.org/20853

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Revision history for this message
Alexander Gubanov (ogubanov) wrote :

I've verified live-migration on MOS 9.0 (build 402) - fixed!
Proof:

[root@nailgun ~]# shotgun2 short-report | head -n 6
cat /etc/fuel_build_id:
 402
cat /etc/fuel_build_number:
 402
cat /etc/fuel_release:
 9.0

compute:

root@node-9:~# netstat -antp | grep libvirtd
tcp 0 0 0.0.0.0:16509 0.0.0.0:* LISTEN 8501/libvirtd
tcp6 0 0 :::16509 :::* LISTEN 8501/libvirtd
root@node-9:~# ps aux | grep '[l]ibvirtd'
root 8501 0.0 0.3 880004 12336 ? Ssl 12:55 0:01 /usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf -l

controller:

root@node-6:~# nova hypervisor-list
+----+---------------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------------+-------+---------+
| 1 | node-9.test.domain.local | up | enabled |
| 4 | node-10.test.domain.local | up | enabled |
+----+---------------------------+-------+---------+

root@node-6:~# nova show test1 | grep hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-10.test.domain.local
root@node-6:~# nova live-migration --block-migrate test1 node-9.test.domain.local
root@node-6:~# nova show test1 | grep hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-9.test.domain.local

tags: added: on-verification
Revision history for this message
Ekaterina Shutova (eshutova) wrote :

Verified on MOS 8.0 + mu2 updates.
Env: 3controllers + 2computes + 3 Ceph OSD nodes

Checked live migration before and after compute restart: works fine.
root@node-7:~# nova hypervisor-list
+----+---------------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+----+---------------------------+-------+---------+
| 2 | node-13.test.domain.local | up | enabled |
| 5 | node-14.test.domain.local | up | enabled |
+----+---------------------------+-------+---------+
root@node-7:~# nova show vm1 | grep hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-13.test.domain.local |
root@node-7:~# nova live-migration --block-migrate vm1 node-14.test.domain.local
ERROR (BadRequest): node-13.test.domain.local is not on local storage: Block migration can not be used with shared storage. (HTTP 400) (Request-ID: req-41edf6c9-d07e-433f-bfff-1a7d4e3c3b19)
root@node-7:~# nova live-migration vm1 node-14.test.domain.local
root@node-7:~# nova show vm1 | grep hyper
| OS-EXT-SRV-ATTR:hypervisor_hostname | node-14.test.domain.local |

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.