Creating of instance snapshot - snapshot is deleting right after create command

Bug #1610261 reported by Alexey. Kalashnikov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
MOS Linux
Mitaka
Fix Released
High
Ivan Suzdal
Newton
Fix Released
High
Ivan Suzdal

Bug Description

See update below;
original bug body:
==========================================================

Detailed bug description:
 During swarm test run, one of the test had failed cause ostf tests check had failure test
'Launch instance, create snapshot, launch instance from snapshot'

Steps to reproduce:
 """Launch instance, create snapshot, launch instance from snapshot
    Target component: Glance
    Scenario:
     1. Get existing image by name.
     2. Launch an instance using the default image.
     3. Make snapshot of the created instance.
     4. Delete the instance created in step 1.
     5. Wait while instance deleted
     6. Launch another instance from the snapshot created in step 2.
     7. Delete server.

Expected results:
 all steps passed
Actual result:
 Failed step 3 with timeout exceeded

ostf_log_cut:
http://paste.openstack.org/show/550601/

diagnostic snapshot:
https://drive.google.com/a/mirantis.com/file/d/0B0EB6QSDWt2vVTJ6MjgwZlU2eXc/view?usp=sharing

Impact:
 Swarm test had failed
Description of the environment:
9.1 snapshot #93

console output:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/17/console

job parameters:
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/17/parameters/

=========================================================
update (vkhlyunev)
https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.support_dpdk/120/testReport/(root)/deploy_cluster_with_dpdk/deploy_cluster_with_dpdk/

From OSTF log:
fuel_health.common.test_mixins: INFO: STEP:3, verify action: 'snapshotting an instance'
fuel_health.config: INFO: INSTANCE {22176: <fuel_health.config.NailgunConfig object at 0x469fe10>}
fuel_health.config: INFO: INSTANCE {22176: <fuel_health.config.NailgunConfig object at 0x469fe10>}
fuel_health.test: DEBUG: Waiting for <Image: ost1_test-snapshot-846528610> to get to ACTIVE status. Currently in saving status
fuel_health.test: DEBUG: Sleeping for 10 seconds
fuel_health.test: DEBUG: Waiting for <Image: ost1_test-snapshot-846528610> to get to ACTIVE status. Currently in deleted status
fuel_health.test: DEBUG: Sleeping for 10 seconds

Snapshot is falling to "deleted" status after "saving" stage, also it has 0 size.
Looks like issue with configured hugepages (without dpdk/hugepages all works fine)

Changed in nova:
assignee: nobody → Oleksiy Molchanov (omolchanov)
assignee: Oleksiy Molchanov (omolchanov) → nobody
affects: nova → fuel
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

QA team, can you check. I get:

[10 of 11] [error] 'Launch instance, create snapshot, launch instance from snapshot' (191.3 s)

But I can do this manually

root@node-1:~# glance image-show 342e9ba9-1bbb-4383-aaa2-924e9c4491c1
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| base_image_ref | 4794e609-e0e8-4338-9ef3-f71128d9b404 |
| checksum | 5f2f2f25956f179395f9e3f5e36801b8 |
| container_format | bare |
| created_at | 2016-08-05T14:09:00Z |
| disk_format | qcow2 |
| id | 342e9ba9-1bbb-4383-aaa2-924e9c4491c1 |
| image_location | snapshot |
| image_state | available |
| image_type | snapshot |
| instance_uuid | 8c4435d9-141e-4296-85c5-36963b55d6c5 |
| kernel_id | None |
| min_disk | 20 |
| min_ram | 64 |
| name | snapshot 2 |
| owner | 5cb15129e80e40cfb46b6004d8c31028 |
| owner_id | 5cb15129e80e40cfb46b6004d8c31028 |
| protected | False |
| ramdisk_id | None |
| size | 40108032 |
| status | active |
| tags | [] |
| updated_at | 2016-08-05T14:09:03Z |
| user_id | af28678291f94083bb4423e0c4aeaa09 |
| virtual_size | None |
| visibility | private |
+------------------+--------------------------------------+

Changed in fuel:
milestone: none → 9.1
tags: added: area-qa
description: updated
tags: added: swarm-fail
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Revision history for this message
Tatyana Kuterina (tkuterina) wrote :
Revision history for this message
ElenaRossokhina (esolomina) wrote :
Revision history for this message
Alexey. Kalashnikov (akalashnikov) wrote :
Revision history for this message
Alexey. Kalashnikov (akalashnikov) wrote :
Revision history for this message
Alexandra (aallakhverdieva) wrote :
tags: added: area-ostf
removed: area-qa
tags: added: swarm-blocker
removed: swarm-fail
summary: - Test 'Launch instance, create snapshot, launch instance from snapshot'
- failed
+ Creating of instance snapshot - snapshot is deleting right after create
+ command
description: updated
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Raising to High because it reproduces on cluster with NVF features which is key feature to 9.2

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Sustaining (fuel-sustaining-team)
importance: Medium → High
description: updated
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Also this bug reproduces not 100% cases but >80% (I reproduced it manually 6 of 7 times on 465 snapshot) manually, without OSTF

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

@Vladimir, please provide me with environment.

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Vladimir Khlyunev (vkhlyunev)
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Investigation is blocked by bug #1641698

Changed in fuel:
assignee: Vladimir Khlyunev (vkhlyunev) → Oleksiy Molchanov (omolchanov)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Instance snapshotting fails with the following error:

2016-11-17 11:31:26.587 13780 ERROR nova.compute.manager [instance: 7b9b169e-da6d-4f2d-ba4d-437cee1a05db] if ret == -1: raise libvirtError ('virDomainManagedSave() failed', dom=self)
2016-11-17 11:31:26.587 13780 ERROR nova.compute.manager [instance: 7b9b169e-da6d-4f2d-ba4d-437cee1a05db] libvirtError: internal error: unable to execute QEMU command 'migrate': Migration disabled: vhost-user backend lacks VHOST_USER_PROTOCOL_F_LOG_SHMFD feature.

the VM is booted with `vhostuser` network interface type:

    <interface type='vhostuser'>
      <mac address='fa:16:3e:54:9c:aa'/>
      <source type='unix' path='/var/run/openvswitch/vhuafc16d24-8d' mode='client'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

This looks very much like https://bugzilla.redhat.com/show_bug.cgi?id=1321299 - it seems that the DPDK version we use:

root@node-3:~# dpkg -l | grep dpdk
ii dpdk 2.2.0-0ubuntu8~u1404+mos4 amd64 Data Plane Development Kit (runtime)
ii dpdk-dkms 2.2.0-0ubuntu8~u1404+mos4 all Data Plane Development Kit (dkms module)
ii libdpdk0:amd64 2.2.0-0ubuntu8~u1404+mos4 amd64 Data Plane Development Kit (runtime libraries)
ii openvswitch-switch-dpdk 2.5.0-0ubuntu1~u1404+mos3 amd64 DPDK enabled Open vSwitch switch implementation

does not support live migration.

My current understanding is that the support for live migration was added in http://dpdk.org/ml/archives/dev/2015-December/030489.html patch series, which was first released as v16.04:

rpodolyaka@rpodolyaka-pc:~/src/dpdk$ git log --grep "vhost-user live migration support"
commit d639996a74fa71a9553bcef7cb2b2e9bb0fd5203
Author: Yuanhan Liu <email address hidden>
Date: Fri Jan 29 12:58:02 2016 +0800

    vhost: enable log_shmfd protocol feature

    To claim that we support vhost-user live migration support:
    SET_LOG_BASE request will be send only when this feature flag
    is set.

    Besides this flag, we actually need another feature flag set
    to make vhost-user live migration work: VHOST_F_LOG_ALL.
    Which, however, has been enabled long time ago.

    Signed-off-by: Yuanhan Liu <email address hidden>
    Tested-by: Pavel Fedin <email address hidden>
rpodolyaka@rpodolyaka-pc:~/src/dpdk$ git tag --contains d639996a74fa71a9553bcef7cb2b2e9bb0fd5203
v16.04
...

Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → MOS Linux (mos-linux)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

^ live migration functionality is reused to implement live snapshotting

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to packages/trusty/dpdk (9.0)

Fix proposed to branch: 9.0
Change author: Ivan Suzdal <email address hidden>
Review: https://review.fuel-infra.org/28634

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to packages/trusty/dpdk (9.0)

Reviewed: https://review.fuel-infra.org/28634
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0

Commit: 5df385febd27d693fcdcbbd6e7f2ad4bf839070e
Author: Ivan Suzdal <email address hidden>
Date: Fri Nov 18 10:41:12 2016

vhost: enable log_shmfd protocol feature

Add add-vhost-user-live-migration-support.patch sourced from [0]

[0] http://dpdk.org/browse/dpdk/commit/?id=d639996a74fa71a9553bcef7cb2b2e9bb0fd5203

Change-Id: I2e3fa7db9905ecc42112a5c8b4b7f1698177b479
Closes-Bug: #1610261

Ivan Suzdal (isuzdal)
Changed in fuel:
status: Confirmed → Fix Committed
tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :
tags: removed: on-verification
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 10.0 build #1566.

'Launch instance, create snapshot, launch instance from snapshot' OSTF test is passed in the all latest dpdk runs.

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.