[vcenter] After failover nova-compute looks to new cache directory

Bug #1482121 reported by Serg Lystopad
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Roman Podoliaka
6.0.x
Fix Released
High
Denis Meltsaykin
6.1.x
Fix Released
High
Denis Meltsaykin
7.0.x
Fix Released
High
Roman Podoliaka

Bug Description

When vcenter is configured as hypervisor nova-compute uses vcenter datastores for image cache.
By default image cache directory is
nova.conf:
image_cache_subdirectory_name = $my_ip_base

When nova-compute is moved to another controller (manual service restart with `pcs resource disable ..` or failover occurs) new empty cache directory appears on datastore. VMs boot much slower because nova starts downloading image to cache before it can actually start VM.

Workaround is to configure
image_cache_subdirectory_name = _base
to make nova using same cache directory regardless which controller nova-compute is running on.

Environment:
HA with vCenter hypevisor
nova-network vlanmanager
cinder VMwareVcVmdkDriver

node is CentOS 6.5

api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-26_14-25-46
build_number: '58'
feature_groups:
- mirantis
fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
      build_id: 2014-12-26_14-25-46
      build_number: '58'
      feature_groups:
      - mirantis
      fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
      fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
      nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
      ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
      production: docker
      release: '6.0'

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0-updates because of Medium importance

Changed in mos:
assignee: nobody → MOS Nova (mos-nova)
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

It is customer found bug, we need to fix it in MOS 7.0 release.

Revision history for this message
Andrew Woodward (xarses) wrote :

There is a question as to if nova-compute for vcenter was HA in 6.0, if not it wont be fixed in 6.0

tags: added: vmware
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

So this is similar to https://bugs.launchpad.net/fuel/+bug/1463977 .

The idea with $my_ip in VMWare driver implementation in Nova is to prevent race conditions between different nova-compute instances (https://github.com/openstack/nova/blob/stable/juno/nova/virt/vmwareapi/vmops.py#L151-L157), as datastores are shared among them (http://docs.openstack.org/kilo/config-reference/content/vmware.html#VMwareVCDriver_details).

Thus, the suggested work around in only applicable for MOS deployments, which deploy *at most* one nova-compute instance and then ensure its HA by the means of Pacemaker in active/passive mode. If a customer ran another nova-compute with the suggested setting, there would be a race condition on images ageing.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Roman Podoliaka <email address hidden>
Review: https://review.fuel-infra.org/10243

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :
Revision history for this message
Serg Lystopad (slystopad) wrote :

FYI:
Roman may be it is related to your comment https://bugs.launchpad.net/mos/+bug/1482121/comments/4 (I'm not sure)

As far as I remember we have also configured in nova.conf
remove_unused_base_images=false

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Right, but in that case you wouldn't need to change `image_cache_subdirectory_name ' at all ;)

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/10243
Reason: abandoned in favor of https://review.openstack.org/213071

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Gary Kotton <email address hidden>
Review: https://review.fuel-infra.org/10447

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-6.0-updates/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0-updates/2014.2
Change author: Gary Kotton <email address hidden>
Review: https://review.fuel-infra.org/10856

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Cherry-pick to stable/6.1 branch of fuel-library: https://review.openstack.org/#/c/220219/

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.0-updates/2014.2)

Reviewed: https://review.fuel-infra.org/10856
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.0-updates/2014.2

Commit: f9982fb8a0a33c2b58e8c49d7d0e24e66b61eb66
Author: Gary Kotton <email address hidden>
Date: Thu Sep 3 14:08:18 2015

VMware: enable a cache prefix configuration parameter

Background:
Images that are stored in the cache folder will be stored in a folder whose
name is the image ID. In the event that an image is discovered to be no longer
used then a timestamp will be added to the image folder.
At each aging iteration we check if the image can be aged.
This is done by comparing the current nova compute time to the time embedded
in the timestamp. If the time exceeds the configured aging time then
the parent folder, that is the image ID folder, will be deleted.
That effectively ages the cached image.
If an image is used then the timestamps will be deleted.

When accessing a timestamp we make use of locking. This ensures that aging
will not delete an image during the spawn operation. When spawning
the timestamp folder will be locked and the timestamps will be purged.
This will ensure that a image is not deleted during the spawn.

In order to ensure that there is not a race between compute nodes each
compute node will have its own cache directory on the VMware datastore.

This is terrible costly when using more than one compute node (which is
a MUST for HA).

Due to the fact that we are using a nova lock utils if the compute nodes
have a shared file system then the locking is valid for all compute nodes.
In order to enable an administrator this option we provide a new
configuration parameter to enable multiple compute nodes to use the same
cache folder on the backend datastore. NOTE that this can only be done
when the compute nodes are running on the same host or the compute nodes
have a shared file system.

DocImpact
   New variable - cache_prefix
   This is in the vmware section

Conflicts:
 nova/tests/virt/vmwareapi/test_vmops.py

Partial-Bug: #1482121
(cherry-picked from 536e99041d67b7f9beff873c10dbb000744e84ee)
Change-Id: I02e758af19cf3a652a5c39d02904e73a1088fe60

tags: added: on-verification
tags: added: 6.0 release-notes-done
tags: removed: on-verification
Revision history for this message
Ilya Bumarskov (ibumarskov) wrote :

Checked on Fuel 7.0 build #301.

Default settings:
image_cache_subdirectory_name=_base
remove_unused_base_images=true

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/10447
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 62f87bd371112d03fa9230e63234ecfb5c6bcfcc
Author: Gary Kotton <email address hidden>
Date: Thu Sep 17 16:55:40 2015

VMware: enable a cache prefix configuration parameter

Background:
Images that are stored in the cache folder will be stored in a folder whose
name is the image ID. In the event that an image is discovered to be no longer
used then a timestamp will be added to the image folder.
At each aging iteration we check if the image can be aged.
This is done by comparing the current nova compute time to the time embedded
in the timestamp. If the time exceeds the configured aging time then
the parent folder, that is the image ID folder, will be deleted.
That effectively ages the cached image.
If an image is used then the timestamps will be deleted.

When accessing a timestamp we make use of locking. This ensures that aging
will not delete an image during the spawn operation. When spawning
the timestamp folder will be locked and the timestamps will be purged.
This will ensure that a image is not deleted during the spawn.

In order to ensure that there is not a race between compute nodes each
compute node will have its own cache directory on the VMware datastore.

This is terrible costly when using more than one compute node (which is
a MUST for HA).

Due to the fact that we are using a nova lock utils if the compute nodes
have a shared file system then the locking is valid for all compute nodes.
In order to enable an administrator this option we provide a new
configuration parameter to enable multiple compute nodes to use the same
cache folder on the backend datastore. NOTE that this can only be done
when the compute nodes are running on the same host or the compute nodes
have a shared file system.

DocImpact
   New variable - cache_prefix
   This is in the vmware section

Conflicts:
 nova/tests/virt/vmwareapi/test_vmops.py

Partial-Bug: #1482121
(cherry-picked from 536e99041d67b7f9beff873c10dbb000744e84ee)
Change-Id: I02e758af19cf3a652a5c39d02904e73a1088fe60

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.