migrate_vm_backed_with_ceph system test fails

Bug #1317548 reported by Vladimir Kuklin
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
In Progress
Critical
Fuel Library (Deprecated)

Bug Description

Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/usr/lib/python2.7/unittest/case.py", line 1043, in runTest
    self._testFunc()
  File "/usr/lib/python2.7/dist-packages/proboscis/case.py", line 296, in testng_method_mistake_capture_func
    compatability.capture_type_error(s_func)
  File "/usr/lib/python2.7/dist-packages/proboscis/compatability/exceptions_2_6.py", line 27, in capture_type_error
    func()
  File "/usr/lib/python2.7/dist-packages/proboscis/case.py", line 350, in func
    func(test_case.state.get_state())
  File "/home/jenkins/workspace/master_fuelmain.system_test.centos.thread_1/fuelweb_test/helpers/decorators.py", line 49, in wrapper
    return func(*args, **kwagrs)
  File "/home/jenkins/workspace/master_fuelmain.system_test.centos.thread_1/fuelweb_test/tests/test_ceph.py", line 335, in migrate_vm_backed_with_ceph
    scenario='./fuelweb_test/helpers/instance_initial_scenario')
  File "/home/jenkins/workspace/master_fuelmain.system_test.centos.thread_1/fuelweb_test/helpers/os_actions.py", line 69, in create_server_for_migration
    timeout=100)
  File "/home/jenkins/venv-nailgun-tests/local/lib/python2.7/site-packages/devops/helpers/helpers.py", line 95, in wait
    raise TimeoutError("Waiting timed out")
TimeoutError: Waiting timed out

ostf-stdout.log

2014-05-08 07:57:40 ERROR (hooks) Pecan state <thread._local object at 0x1bfbdb0>
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/pecan/core.py", line 570, in __call__
    self.handle_request(req, resp)
  File "/usr/lib/python2.6/site-packages/pecan/core.py", line 421, in handle_request
    controller, remainder = self.route(req, self.root, path)
  File "/usr/lib/python2.6/site-packages/pecan/core.py", line 257, in route
    node, remainder = lookup_controller(node, path)
  File "/usr/lib/python2.6/site-packages/pecan/routing.py", line 60, in lookup_controller
    raise exc.HTTPNotFound
HTTPNotFound: The resource could not be found.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

There is separate method in system test that create instance for migration, and as I see instance do not become Active for the 100 s. Ostf is not invoked yet

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

seems the issue here is that instance do not became active in 100 sec. Downgrade priority to the medium and move to the 5.1

Changed in fuel:
importance: High → Medium
milestone: 5.0 → 5.1
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

test failed by timeout according ti instance fail to migrate on other compute with next traces:
http://paste.openstack.org/show/80887/

Changed in fuel:
importance: Medium → High
assignee: Fuel QA Team (fuel-qa) → Fuel Library Team (fuel-library)
milestone: 5.1 → 5.0
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-main (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/94190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/94190
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=c2026435eb727df73ecf4a2a118c5a848abfc429
Submitter: Jenkins
Branch: master

commit c2026435eb727df73ecf4a2a118c5a848abfc429
Author: Tatyana Leontovich <email address hidden>
Date: Mon May 19 16:13:00 2014 +0000

    Add security group creation in service tests

    add creation of separate security group
    for service tests. Also Increase timeout for
    migration vm functionality and add more logging
    to improve debug of this tests

    Change-Id: Ia310f57a98c4d4aa11138f76cd9e0913accfbebc
    Closes-Bug: #1320181
    Related-Bug: #1317548

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Borodaenko (dborodaenko)
Revision history for this message
Andrew Woodward (xarses) wrote :

Does not reproduce by hand in 212 (Centos | Ubuntu Neutron VLAN 3 controller, 2 compute + ceph-osd)

I was able to deploy above env and spawn m1.tiny instance and migrate it fine.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Issue still persists in system tests:
 File "/home/jenkins/workspace/master_fuelmain.system_test.ubuntu.thread_1/fuelweb_test/tests/test_ceph.py", line 356, in migrate_vm_backed_with_ceph
    new_srv = os.migrate_server(srv, avail_hosts[0], timeout=120)

Can it be an issue that test was unable to make migration within allowed timeout just simply because it takes a bit longer or server is under load, or any other possible reason? How can we investigate it?

Changed in fuel:
assignee: Dmitry Borodaenko (dborodaenko) → Nastya Urlapova (aurlapova)
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Reproduced manually, Mike it can not 120 sec should be enough for migration, more over during manual testing instance stay in migration state with running miration task more that 20 minutes... , If we take a look at the compute hosts, we could see that new qemu process already start on the new host(before live migration instacne places on the node 2- and should migrate to the node 1)
On node 2 we can see a lot of problem with migration of network and we can see that we try do it (http://paste.openstack.org/show/81045/).
Also there is still [root@node-2 ~]# cat /var/lib/nova/networks/nova-br100.conf
fa:16:3e:f4:8a:99,test-serv469132.novalocal,10.0.0.2[root@node-2 ~]# [root@node-1 ~]# ls /var/lib/nova/instances/

[root@node-2 ~]# ls -all /var/lib/nova/instances/
total 4
drwxr-xr-x 5 nova nova 93 May 21 10:30 .
drwxr-xr-x 9 nova nova 96 May 21 09:50 ..
drwxr-xr-x 2 nova nova 42 May 21 09:51 5d55b352-6621-49dc-8406-7039a3eb0e56
drwxr-xr-x 2 nova nova 53 May 21 09:51 _base
-rw-r--r-- 1 nova nova 48 May 21 10:30 compute_nodes
drwxr-xr-x 2 nova nova 91 May 21 10:30 locks

On node -1: Instance qemu proccess starts:
http://paste.openstack.org/show/81046/
Also we could not see there any disk of instactance
[root@node-1 ~]# ls /var/lib/nova/instances/
5d55b352-6621-49dc-8406-7039a3eb0e56
[root@node-1 ~]# ls -all /var/lib/nova/instances/5d55b352-6621-49dc-8406-7039a3eb0e56/
total 4
drwxr-xr-x 2 nova nova 24 May 21 09:53 .
drwxr-xr-x 3 nova nova 49 May 21 09:53 ..
-rw-r--r-- 1 qemu qemu 66 May 21 09:59 console.log
And the same dr.conf file is emty on new node
[root@node-1 ~]# cat /var/lib/nova/networks/nova-br100.conf
[root@node-1 ~]#

Revision history for this message
Łukasz Oleś (loles) wrote :

I've made some testing also. It happens only when VirtualIP is assigned to instance and only on nova-network.

When neutron is used migration works without any problems.

Revision history for this message
Łukasz Oleś (loles) wrote :

Also deleting such instance is not possible. At least not via horizon

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-main (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/94583

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

We need conntrack binary installed

For ubuntu the package is called conntrack and is in main repo
For Centos it is conntrack-tools and is here: http://centos.alt.ru/pub/repository/centos/6/x86_64/

We should test if everything works with these packages and add them to the ISO.

Changed in fuel:
assignee: Nastya Urlapova (aurlapova) → Matthew Mosesohn (raytrac3r)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

you need to install packages on nova-compute nodes and do another testing

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Confirmed fix works for CentOS. Testing Ubuntu now.

Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Fuel OSCI Team (fuel-osci)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Ubuntu works, but new issue for instances with floating IP:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 597, in _object_dispatch
    return getattr(target, method)(context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/objects/base.py", line 151, in wrapper
    return fn(self, ctxt, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/objects/floating_ip.py", line 128, in save
    updates)
  File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 386, in floating_ip_update
    return IMPL.floating_ip_update(context, address, values)
  File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 164, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 1024, in floating_ip_update
    float_ip_ref.update(values)
  File "/usr/lib/python2.7/dist-packages/nova/openstack/common/db/sqlalchemy/models.py", line 88, in
    setattr(self, k, v)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 303, in __set__
    instance_dict(instance), value, None)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 804, in set
    value = self.fire_replace_event(state, dict_, value, old, initiator)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 824, in fire_replace_ev
    value = fn(state, value, previous, initiator or self)
  File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/unitofwork.py", line 85, in set_
    newvalue_state = attributes.instance_state(newvalue)
AttributeError: 'FixedIP' object has no attribute '_sa_instance_state'

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Cancel the last concert. I missed conntrack on the destination compute node. Looks like this 100% solves the issue.

Revision history for this message
OSCI Robot (oscirobot) wrote :
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package nova has been built from changeset: http://gerrit.mirantis.com/15776
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0-stable/centos

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package nova has been built from changeset: http://gerrit.mirantis.com/15776
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0-stable/ubuntu

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/94583
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=e5f90d115594b0c37c66d84507d8aaa6cf6ef970
Submitter: Jenkins
Branch: master

commit e5f90d115594b0c37c66d84507d8aaa6cf6ef970
Author: Tatyana Leontovich <email address hidden>
Date: Wed May 21 12:15:32 2014 +0000

    Add error message to the ceph migration test

    In case if assertion fails it to hard to understand
    from logs what can be caused "waiting timeout error"
    Also revert new security group creation dor service
    tests

    Change-Id: I85cfa22f476f463de4a38ac52133b01c67cd4b99
    Related-Bug: #1317548

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

The issue was reproduced on ISO #216 during system tests:

http://jenkins-product.srt.mirantis.net:8080/view/0_0_swarm/job/master_fuelmain.system_test.centos.thread_1/63/testReport/(root)/migrate_vm_backed_with_ceph/migrate_vm_backed_with_ceph/

ISO still contains old nova packages without requirements for 'conntrack-tools', so we need to update it in repositories mirrors.

Changed in fuel:
status: Fix Committed → Confirmed
Roman Vyalov (r0mikiam)
Changed in fuel:
status: Confirmed → Fix Committed
Roman Vyalov (r0mikiam)
Changed in fuel:
status: Fix Committed → In Progress
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

on rc2 - package conntrack-tool is installed on compute, but migration instace still stack in migration with mention above error about Fixed ip

Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

System tests fails but manual migration works for me if I create instance with volume and migrate it via CLI (nova live-migration)

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Ivan, did you have floating IP assigned to the instance?

Changed in fuel:
importance: High → Critical
assignee: Fuel OSCI Team (fuel-osci) → Fuel Library Team (fuel-library)
status: In Progress → Confirmed
Revision history for this message
Ivan Kolodyazhny (e0ne) wrote :

Mike,
No, only fixed IP

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

the traceback is here: http://paste.openstack.org/show/81339/
looks like some sqlalchemy stuff

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

may be this bug is more relevant: https://bugs.launchpad.net/cinder/+bug/1290468

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Can you confirm that conntrack (ubuntu) or conntrack-tools (centos) is installed and still throws the same traceback? Debug logs showed that rootwrap was trying to call conntrack and failed for me, but that part should be fixed now

Revision history for this message
OSCI Robot (oscirobot) wrote :
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OSCI Robot (oscirobot) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.