snapshot stays in saving state if the vm base image is deleted

Bug #921774 reported by Satya Sanjibani Routray on 2012-01-25
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Boris Filippov
Essex
Undecided
Boris Filippov
nova (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Unassigned

Bug Description

1. Create a VM from standard image
2. Wait till VM comes active
3. Take a snapshot of the VM and wait till snapshot becomes active
4. Delete the VM
5. Create a VM from the Snapshot, wait till VM is active
6. Delete the snapshot
7. Create a snapshot of the VM

Observation:
Snapshot never comes to active state

Brian Waldon (bcwaldon) wrote :

Is it possible that you deleted an instance before the snapshot was done being created? Could you paste some nova-compute logs?

Changed in nova:
status: New → Incomplete
Download full text (100.5 KiB)

No I have not deleted the VM before snapshot is complete
For simplyfying the issue

Create a VM from an image
When VM is active and running
Delete the image
when the image is deleted try to take a snapshot of the VM

Observation: Snapshot stays in saving state and wont come to active.
Compute Log
<snip>
er': u'10.13.0.1', u'gateway': u'10.13.0.1'}]], 35449L: [[{u'bridge': u'brx2', u'multi_host': True, u'bridge_interface': None, u'cidr_v6': None, u'vlan': None, u'injected': False, u'cidr': u'10.13.0.0/17', u'id': 2}, {u'rxtx_cap': 0, u'should_create_bridge': True, u'should_create_vlan': False, u'label': u'private', u'broadcast': u'10.13.127.255', u'ips': [{u'ip': u'10.13.3.173', u'netmask': u'255.255.128.0', u'enabled': u'1'}], u'mac': u'02:16:3e:51:66:2c', u'vif_uuid': u'5986d88e-7613-4e93-bb0e-f310ac8caf8c', u'dns': [u'8.8.4.4'], u'dhcp_server': u'10.13.0.1', u'gateway': u'10.13.0.1'}]], 33962L: [[{u'bridge': u'brx2', u'multi_host': True, u'bridge_interface': None, u'cidr_v6': None, u'vlan': None, u'injected': False, u'cidr': u'10.13.0.0/17', u'id': 2}, {u'rxtx_cap': 0, u'should_create_bridge': True, u'should_create_vlan': False, u'label': u'private', u'broadcast': u'10.13.127.255', u'ips': [{u'ip': u'10.13.0.179', u'netmask': u'255.255.128.0', u'enabled': u'1'}], u'mac': u'02:16:3e:08:8d:a4', u'vif_uuid': u'fbb150f9-ec62-49e6-ade0-30c854864bba', u'dns': [u'8.8.4.4'], u'dhcp_server': u'10.13.0.1', u'gateway': u'10.13.0.1'}]], 25342L: []}}, '_wrapped_conn': {'_o': <capsule object "virConnectPtr" at 0x3d3b420>}, 'vif_driver': {}, 'firewall_driver': {'iptables': '?', 'instances': {34824L: {'vm_state': 'building', 'availability_zone': None, 'terminated_at': None, 'ramdisk_id': '', 'instance_type_id': 7L, 'user_data': '', 'vm_mode': None, 'deleted_at': None, 'fixed_ips': [], 'id': 34824L, 'security_groups': [{'project_id': '42688733288874', 'user_id': '44478785933097', 'name': 'default', 'deleted': False, 'created_at': '2012-01-28 13:29:54', 'updated_at': None, 'rules': [{'deleted_at': None, 'from_port': 22L, 'protocol': 'tcp', 'deleted': False, 'created_at': '2012-01-28 13:29:54', 'updated_at': None, 'to_port': 22L, 'parent_group_id': 2285L, 'cidr': '0.0.0.0/0', 'group_id': None, 'id': 4604L}, {'deleted_at': None, 'from_port': 80L, 'protocol': 'tcp', 'deleted': False, 'created_at': '2012-01-28 13:29:54', 'updated_at': None, 'to_port': 80L, 'parent_group_id': 2285L, 'cidr': '0.0.0.0/0', 'group_id': None, 'id': 4605L}, {'deleted_at': None, 'from_port': 443L, 'protocol': 'tcp', 'deleted': False, 'created_at': '2012-01-28 13:29:54', 'updated_at': None, 'to_port': 443L, 'parent_group_id': 2285L, 'cidr': '0.0.0.0/0', 'group_id': None, 'id': 4606L}, {'deleted_at': None, 'from_port': -1L, 'protocol': 'icmp', 'deleted': False, 'created_at': '2012-01-28 13:29:54', 'updated_at': None, 'to_port': -1L, 'parent_group_id': 2285L, 'cidr': '0.0.0.0/0', 'group_id': None, 'id': 4607L}, {'deleted_at': None, 'from_port': None, 'protocol': None, 'deleted': False, 'created_at': '2012-01-28 13:29:54', 'updated_at': None, 'to_port': None, 'parent_group_id': 2285L, 'cidr': None, 'group_id': 2285L, 'id': 4608L}], 'deleted_at': None, 'id': 2285L, 'description': ...

Brian Waldon (bcwaldon) wrote :

It appears that we depend on the image the instance was originally created from to seed the new snapshot image with certain attributes (disk_format, container_format, architecture). So at this time, it is a requirement to have the original image available in order to take a snapshot. I do think we should redesign snapshotting so we don't have to depend on the original image, but I don't think that is feasible for the essex release timeframe.

Changed in nova:
status: Incomplete → Confirmed
importance: Undecided → High
Mike Scherbakov (mihgen) on 2012-03-02
Changed in nova:
assignee: nobody → Mike Scherbakov (mihgen)

Fix proposed to branch: master
Review: https://review.openstack.org/12390

Changed in nova:
assignee: Mike Scherbakov (mihgen) → Boris Filippov (bfilippov)
status: Confirmed → In Progress
tags: added: essex-backport
Ghe Rivero (ghe.rivero) wrote :

I cann't reproduce this bug neither in essex or folsom, but the fix looks good.

Boris Filippov (bfilippov) wrote :
Download full text (15.6 KiB)

It's easy to reporoduce on current master using devstack:

vagrant@precise:~/devstack$ nova image-list
+--------------------------------------+---------------------------------+--------+--------+
| ID | Name | Status | Server |
+--------------------------------------+---------------------------------+--------+--------+
| 684486b3-f77e-46b0-b709-12a8e0a9f0c7 | cirros-0.3.0-x86_64-uec | ACTIVE | |
| 973351dd-fd04-4382-9ee9-6ca216ea4f44 | cirros-0.3.0-x86_64-uec-kernel | ACTIVE | |
| 5a5ea70a-4e91-4af2-a639-30e86ac7aff6 | cirros-0.3.0-x86_64-uec-ramdisk | ACTIVE | |
+--------------------------------------+---------------------------------+--------+--------+
vagrant@precise:~/devstack$ nova boot --flavor=1 --image=684486b3-f77e-46b0-b709-12a8e0a9f0c7 base_vm
+------------------------+----------------------------------------------------------+
| Property | Value |
+------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | 9bb4MKy4UjXf |
| config_drive | |
| created | 2012-09-05T10:42:10Z |
| flavor | m1.tiny |
| hostId | 475ec55bd30b4320134f868bef40489f7b3cf000d951a319cf000ced |
| id | 3f3abae0-c8dc-4334-a742-8998302a13f0 |
| image | cirros-0.3.0-x86_64-uec |
| key_name | None |
| metadata | {} |
| name | base_vm |
| progress | 0 |
| security_groups | [{u'name': u'default'}] |
| status | BUILD |
| tenant_id | ee9a7e206b454b61a66e6386d596f796 |
| updated | 2012-09-05T10:42:10Z |
| user_id | f21327bf650349f1af4c744399617b14 |
+------------------------+----------------------------------------------------------+

vagrant@precise:~/devstack$ nova list
+--------------------------------------+---------+--------+---------...

Boris Filippov (bfilippov) wrote :

In glance-api log you can find source for this error:

Registry request PUT /images/72479ffc-34cd-42e3-b6f2-d8f13784b6d6 Exception
2012-09-05 11:02:04 TRACE glance.registry.client Traceback (most recent call last):
2012-09-05 11:02:04 TRACE glance.registry.client File "/opt/stack/glance/glance/registry/client.py", line 89, in do_request
2012-09-05 11:02:04 TRACE glance.registry.client action, **kwargs)
2012-09-05 11:02:04 TRACE glance.registry.client File "/opt/stack/glance/glance/common/client.py", line 63, in wrapped
2012-09-05 11:02:04 TRACE glance.registry.client return func(self, *args, **kwargs)
2012-09-05 11:02:04 TRACE glance.registry.client File "/opt/stack/glance/glance/common/client.py", line 444, in do_request
2012-09-05 11:02:04 TRACE glance.registry.client headers=headers)
2012-09-05 11:02:04 TRACE glance.registry.client File "/opt/stack/glance/glance/common/client.py", line 80, in wrapped
2012-09-05 11:02:04 TRACE glance.registry.client return func(self, method, url, body, headers)
2012-09-05 11:02:04 TRACE glance.registry.client File "/opt/stack/glance/glance/common/client.py", line 574, in _do_request
2012-09-05 11:02:04 TRACE glance.registry.client raise exception.Invalid(res.read())
2012-09-05 11:02:04 TRACE glance.registry.client Invalid: 400 Bad Request
2012-09-05 11:02:04 TRACE glance.registry.client
2012-09-05 11:02:04 TRACE glance.registry.client The server could not comply with the request since it is either malformed or otherwise incorrect.
2012-09-05 11:02:04 TRACE glance.registry.client
2012-09-05 11:02:04 TRACE glance.registry.client Failed to update image metadata. Got error: Invalid container format 'None' for image.
2012-09-05 11:02:04 TRACE glance.registry.client
2012-09-05 11:02:04 ERROR glance.api.v1.images [bfbcbd08-4431-46f6-8b45-f1efa292cd20 f21327bf650349f1af4c744399617b14 ee9a7e206b454b61a66e6386d596f796] Failed to activate image. Got error: 400 Bad Request
Failed to activate image. Got error: 400 Bad Request

Changed in nova:
milestone: none → folsom-rc1

Reviewed: https://review.openstack.org/12390
Committed: http://github.com/openstack/nova/commit/804732d0c5bdc78c20200e7ed51b2f43bb5e936d
Submitter: Jenkins
Branch: master

commit 804732d0c5bdc78c20200e7ed51b2f43bb5e936d
Author: Boris Filippov <email address hidden>
Date: Wed Sep 5 05:40:05 2012 +0400

    Use bare container format by default

    Set container_format to bare during libvirt snapshot, when VM image in
    glance was deleted. Currently, if VM image in glance was already deleted
    before snapshot, nova will attempt to create snapshot image with
    container_format: None. This cause glance to return error on attempt to
    upload snapshot. According to glance docs container_format is not used
    anywhere in glance or nova explicitly and it is safe to set it to bare,
    when you are unsure which container_format you need to use.
    Current snapshot logic sets snapshot disk_format to currently used
    image_format in absence of base image in glance.

    This resolves bug 921774 without need for snapshot mechanism redesign.

    Change-Id: I7beea35120aaeac0837daecdf58f38f62e24454c

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-09-19
Changed in nova:
status: Fix Committed → Fix Released

Reviewed: https://review.openstack.org/12395
Committed: http://github.com/openstack/nova/commit/75f69225c5be5e5817a4008792e974ba4014c9e1
Submitter: Jenkins
Branch: stable/essex

commit 75f69225c5be5e5817a4008792e974ba4014c9e1
Author: Boris Filippov <email address hidden>
Date: Wed Sep 5 05:40:05 2012 +0400

    Use bare container format by default

    Set container_format to bare during libvirt snapshot, when VM image in
    glance was deleted. Currently, if VM image in glance was already deleted
    before snapshot, nova will attempt to create snapshot image with
    container_format: None. This cause glance to return error on attempt to
    upload snapshot. According to glance docs container_format is not used
    anywhere in glance or nova explicitly and it is safe to set it to bare,
    when you are unsure which container_format you need to use.
    Current snapshot logic sets snapshot disk_format to currently used
    image_format in absence of base image in glance.

    This resolves bug 921774 without need for snapshot mechanism redesign.

    Change-Id: I7beea35120aaeac0837daecdf58f38f62e24454c

Thierry Carrez (ttx) on 2012-09-27
Changed in nova:
milestone: folsom-rc1 → 2012.2
Changed in nova (Ubuntu):
status: New → Fix Released

Hello Satya, or anyone else affected,

Accepted nova into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/nova/2012.1.3+stable-20130423-e52e6912-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Precise):
status: New → Fix Committed
tags: added: verification-needed

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/12390
Stable review: https://review.openstack.org/12395

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Yolanda Robla (yolanda.robla) wrote :

Test coverage log.

Yolanda Robla (yolanda.robla) wrote :

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/12390
Stable review: https://review.openstack.org/12395

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Yolanda Robla (yolanda.robla) wrote :

Test coverage log.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2012.1.3+stable-20130423-e52e6912-0ubuntu1

---------------
nova (2012.1.3+stable-20130423-e52e6912-0ubuntu1) precise-proposed; urgency=low

  * Resynchronize with stable/essex (e52e6912) (LP: #1089488):
    - [48e81f1] VNC proxy can be made to connect to wrong VM LP: 1125378
    - [3bf5a58] snat rule too broad for some network configurations LP: 1048765
    - [efaacda] DOS by allocating all fixed ips LP: 1125468
    - [b683ced] Add nosehtmloutput as a test dependency.
    - [45274c8] Nova unit tests not running, but still passing for stable/essex
      LP: 1132835
    - [e02b459] vnc unit-test fixes
    - [87361d3] Jenkins jobs fail because of incompatibility between sqlalchemy-
      migrate and the newest sqlalchemy-0.8.0b1 (LP: #1073569)
    - [e98928c] VNC proxy can be made to connect to wrong VM LP: 1125378
    - [c0a10db] DoS through XML entity expansion (CVE-2013-1664) LP: 1100282
    - [243d516] No authentication on block device used for os-volume_boot
      LP: 1069904
    - [80fefe5] use_single_default_gateway does not function correctly
      (LP: #1075859)
    - [bd10241] Essex 2012.1.3 : Error deleting instance with 2 Nova Volumes
      attached (LP: #1079745)
    - [86a5937] do_refresh_security_group_rules in nova.virt.firewall is very
      slow (LP: #1062314)
    - [ae9c5f4] deallocate_fixed_ip attempts to update an already deleted
      fixed_ip (LP: #1017633)
    - [20f98c5] failed to allocate fixed ip because old deleted one exists
      (LP: #996482)
    - [75f6922] snapshot stays in saving state if the vm base image is deleted
      (LP: #921774)
    - [1076699] lock files may be removed in error dues to permissions issues
      (LP: #1051924)
    - [40c5e94] ensure_default_security_group() does not call sgh (LP: #1050982)
    - [4eebe76] At termination, LXC rootfs is not always unmounted before
      rmtree() is called (LP: #1046313)
    - [47dabb3] Heavily loaded nova-compute instances don't sent reports
      frequently enough (LP: #1045152)
    - [b375b4f] When attach volume lost attach when node restart (LP: #1004791)
    - [4ac2dcc] nova usage-list returns wrong usage (LP: #1043999)
    - [014fcbc] Bridge port's hairpin mode not set after resuming a machine
      (LP: #1040537)
    - [2f35f8e] Nova flavor ephemeral space size reported incorrectly
      (LP: #1026210)
  * Dropped, superseeded by new snapshot:
    - debian/patches/CVE-2013-0335.patch: [48e81f1]
    - debian/patches/CVE-2013-1838.patch: [efaacda]
    - debian/patches/CVE-2013-1664.patch: [c0a10db]
    - debian/patches/CVE-2013-0208.patch: [243d516]
 -- Yolanda <email address hidden> Mon, 22 Apr 2013 12:37:08 +0200

Changed in nova (Ubuntu Precise):
status: Fix Committed → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers