boot from volume instance failed,because when reschedule delete the volume

Bug #1427179 reported by YaoZheng_ZTE
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Low
Unassigned

Bug Description

1. Create a volume "nova volume-create --display-name test_volume 1"
[root@controller51 nova(keystone_admin)]# nova volume-list
+--------------------------------------+-----------+-------------------------+------+-------------+---------------------------------------------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Attached to |
+--------------------------------------+-----------+-------------------------+------+-------------+---------------------------------------------------------------------------+
| a740ca7b-6881-4e28-9fdb-eb0d80336757 | available | test_volume | 1 | None | |
| 1f1c19c7-a5f9-4683-a1f6-e339f02e1410 | in-use | NFVO_system_disk2 | 30 | None | 6fa391f8-bd8b-483d-9286-3cebc9a93d55 |
| d868710e-30d4-4095-bd8f-fea9f16fe8ea | in-use | NFVO_data_software_disk | 30 | None | a07abdd5-07a6-4b41-a285-9b825f7b5623;6fa391f8-bd8b-483d-9286-3cebc9a93d55 |
| b03a39ca-ebc1-4472-9a04-58014e67b37c | in-use | NFVO_system_disk1 | 30 | None | a07abdd5-07a6-4b41-a285-9b825f7b5623 |
+--------------------------------------+-----------+-------------------------+------+-------------+---------------------------------------------------------------------------+
2. use The following command will boot a new instance and attach a volume at the same time:
[root@controller51 nova(keystone_admin)]# nova boot --flavor 1 --image 1736471c-3530-49f2-ad34-6ef7da285050 --block-device-mapping vdb=a740ca7b-6881-4e28-9fdb-eb0d80336757:blank:1:1 --nic net-id=31fce69e-16b9-4114-9fa9-589763e58fb0 test
+--------------------------------------+-----------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------------+-----------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000082 |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | sWTuKqzrpS32 |
| config_drive | |
| created | 2015-03-02T11:34:29Z |
| flavor | m1.tiny (1) |
| hostId | |
| id | 868cfd12-eb36-4140-b7b3-98cfcec627cd |
| image | VMB_X86_64_LX_2.6.32_64_REL_2014_12_26.img (1736471c-3530-49f2-ad34-6ef7da285050) |
| key_name | - |
| metadata | {} |
| name | test |
| os-extended-volumes:volumes_attached | [{"id": "547aae0e-455e-4d18-9c3c-e86bdc6c62e7"}] |
| progress | 0 |
| security_groups | default |
| serial_type | file |
| status | BUILD |
| tenant_id | df86efb4c5264f3c9bbe3df6717f8654 |
| updated | 2015-03-02T11:34:30Z |
| user_id | 7d376e69fc5d4697a1edb2600815de3f |
+--------------------------------------+-----------------------------------------------------------------------------------+
3、if the instance are scheduled the host1, but, if the host1 network service is inactive, then will reschedule the other host,
    before reschedule ,as for create instance command the parameter delete-on-terminate is 1, so will run delete volume.
    but, the issue is after reschedule another host, the volume is deleted ,the instance cannot build success.

Tags: volumes
Revision history for this message
YaoZheng_ZTE (zheng-yao1) wrote :

this bug is the parameter delete-on-terminate is 1, when scheduler one host allocate resources failed,then reschedule another host,the volume should not be deleted.

Revision history for this message
YaoZheng_ZTE (zheng-yao1) wrote :

The parameter delete-on-terminate:"A boolean to indicate whether the volume should be deleted when the instance is terminated. True can be specified as True or 1. False can be specified as False or 0." ,so only when the instance will be terminated,the volume should be deleted. but now,when create instance, the schedule error first, then will reschedule another host, the volume till need be used. so , should not delete the volume.

Revision history for this message
YaoZheng_ZTE (zheng-yao1) wrote :

I use icehouse2014.1.3 version, but I review code in K version, this issue is also present

Revision history for this message
jichenjc (jichenjc) wrote :

I am wondering whether K release already fix this because the code delete the volume is

2474 def _cleanup_volumes(self, context, instance_uuid, bdms, raise_exc=True):
2475 exc_info = None
2476
2477 for bdm in bdms:
2478 LOG.debug("terminating bdm %s", bdm,
2479 instance_uuid=instance_uuid)
2480 if bdm.volume_id and bdm.delete_on_termination:
2481 try:
2482 self.volume_api.delete(context, bdm.volume_id)

this function either be called when instance is deleted or build instance failed ,it will be used to clean the env
I think current function _reschedule_or_error is not used so do you have a stack trace for kilo or
where do you think the volume is deleted? Thanks

Changed in nova:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
YaoZheng_ZTE (zheng-yao1) wrote :

Hi jichenjc:
 I reviewed the latest version is kilo-2, The code calls the relationship:
 _build_instance()------->_reschedule_or_error()-------->_cleanup_volumes()
I added: Whether reschedule or not ,depending on Configuration item "scheduler_max_attempts=3",3 is default value.

def _build_instance(self, context, request_spec, filter_properties, requested_networks, injected_files, admin_password, is_first_time, node, instance, image_meta, legacy_bdm_in_spec):
     original_context = context
     context = context.elevated()

     try:

     except Exception:
         exc_info = sys.exc_info()
         # try to re-schedule instance:
         # Make sure the async call finishes
         if network_info is not None:
             network_info.wait(do_raise=False)
             rescheduled = self._reschedule_or_error(original_context, instance,
                    exc_info, requested_networks, admin_password,
                    injected_files_orig, is_first_time, request_spec,
                    filter_properties, bdms, legacy_bdm_in_spec)

def _reschedule_or_error(self, context, instance, exc_info,
            requested_networks, admin_password, injected_files, is_first_time,
            request_spec, filter_properties, bdms=None,
            legacy_bdm_in_spec=True):
        """Try to re-schedule the build or re-raise the original build error to
        error out the instance.
        """
        original_context = context
        context = context.elevated()
try:
            LOG.debug("Clean up resource before rescheduling.",
                      instance=instance)
            if bdms is None:
                bdms = objects.BlockDeviceMappingList.get_by_instance_uuid(
                        context, instance.uuid)

            self._shutdown_instance(context, instance,
                                    bdms, requested_networks)
            self._cleanup_volumes(context, instance.uuid, bdms)
        except Exception:
            # do not attempt retry if clean up failed:
            with excutils.save_and_reraise_exception():
                self._log_original_error(exc_info, instance_uuid)

Revision history for this message
Sean Dague (sdague) wrote :

Looks like a race on the delete of the volume getting queued but not executed until the guest is started.

tags: added: volumes
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
jichenjc (jichenjc) wrote :

ok, I got it , thanks YaoZheng_ZTE for your explanation bdm.delete_on_termination can't be always used here

Sean , I think it's not a race , we can' t delete the volume if it's going to be rescheduled otherwise next reschedule action will not be able to use the volume

jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/169097

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
YaoZheng_ZTE (zheng-yao1) wrote :

Hi Sean :

    it's not a race, when reschedule action happened, if bdm.delete_on_termination is True, this question will be sure to happen.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by jichenjc (<email address hidden>) on branch: master
Review: https://review.openstack.org/169097
Reason: this whole function is gone, no need to update it now

Revision history for this message
jichenjc (jichenjc) wrote :

the whole function you talked about is gone now , should I submit a patch for Kilo instead?

Changed in nova:
assignee: jichenjc (jichenjc) → nobody
status: In Progress → Confirmed
Revision history for this message
Dan Smith (danms) wrote :

Can someone confirm if this is still a bug in master (rocky)? If not, we should close this.

Changed in nova:
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.