Detached VIF may be restored during node cleaning

Bug #1750785 reported by Hironori Shiina on 2018-02-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Dmitry Tantsur

Bug Description

When a node is undeployed from nova, nova also detaches VIFs from the node.
The detached VIF UUIDs may be restored to port.internal_info during node cleaning.
Now, VIFs are detached without locking a node. A task for cleaning may rewrite VIF UUIDs to port when it stores port UUID for cleaning network because the tenant VIF UUIDs remains in port object.

This happens as follows:

1. node was locked for tear down

Feb 21 17:25:10 localhost ironic-conductor[3618]: DEBUG ironic.conductor.task_manager [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a successfully reserved for node tear down (took 0.04 seconds) {{(pid=3618) reserve_node /opt/stack/ironic/ironic/conductor/task_manager.py:273}}

2. started adding cleaning network
Feb 21 17:25:21 localhost ironic-conductor[3618]: INFO ironic.drivers.modules.network.flat [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Adding cleaning network to node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a

3. VIF was detached by a request from Nova

Feb 21 17:25:22 localhost ironic-conductor[3618]: INFO ironic.conductor.manager [None req-bf75c88d-8dc0-487a-bba3-8bf0d6fd042f None None] VIF 894340b0-42f7-4673-a7f4-47ff8f6fc84b successfully detached from node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a

4. port for cleaning network was created
Feb 21 17:25:29 localhost ironic-conductor[3618]: INFO ironic.common.neutron [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Successfully created ports for node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a in network 4918b028-d39c-4bd1-9465-1c626cd85e5c.

5. Then, cleaning_vif_port_id was saved to port.internal_info.
   I guess tenant_vif_port_id which remained in port object was saved here.

6. lock was released.
Feb 21 17:26:20 localhost ironic-conductor[3618]: DEBUG ironic.conductor.task_manager [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Successfully released exclusive lock for node tear down on node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a (lock was held 69.98 sec) {{(pid=3618) release_resources /opt/stack/ironic/ironic/conductor/task_manager.py:352}}

7. during cleaning, tenant_vif_port_id was found in port.internal info.

Feb 21 17:33:48 localhost ironic-conductor[3618]: DEBUG ironic.drivers.modules.agent_client Status of agent commands for node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a: get_clean_steps: result "{u'clean_steps': {u'GenericHardwareManager': [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices'}]}, u'hardware_manager_version': {u'generic_hardware_manager': u'1.1'}}", error "None"; execute_clean_step: result "{u'clean_step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'clean_result': None}", error "None"{{(pid=3618) get_commands_status /opt/stack/ironic/ironic/drivers/modules/agent_client.py:118}}[00m
Feb 21 17:33:48 localhost ironic-conductor[3618]: DEBUG ironic.drivers.modules.agent_base_vendor Cleaning command status for node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a on step {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}: {u'command_error': None, u'command_status': u'SUCCEEDED', u'command_params': {u'node': {u'target_power_state': None, u'inspect_interface': u'irmc', u'raid_interface': u'no-raid', u'target_provision_state': u'available', u'last_error': None, u'storage_interface': u'cinder', u'updated_at': u'2018-02-21T08:33:48.143602', u'boot_interface': u'irmc-virtual-media', u'traits': [], u'chassis_id': None, u'provision_state': u'cleaning', u'clean_step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'id': 1, u'vendor_interface': u'no-vendor', u'uuid': u'2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a', u'console_enabled': False, u'extra': {}, u'raid_config': {}, u'provision_updated_at': u'2018-02-21T08:33:48.000000', u'maintenance': False, u'target_raid_config': {}, u'network_interface': u'flat', u'conductor_affinity': 1, u'inspection_started_at': None, u'inspection_finished_at': None, u'power_state': u'power on', u'driver': u'irmc', u'power_interface': u'irmc', u'rescue_interface': u'no-rescue', u'maintenance_reason': None, u'reservation': u'shiina-dev-real-ironic', u'management_interface': u'irmc', u'properties': {u'memory_mb': 1024, u'cpu_arch': u'x86_64', u'local_gb': 20, u'cpus': 1, u'capabilities': u'iscsi_boot:True'}, u'instance_uuid': None, u'name': u'mybm', u'driver_info': {u'irmc_address': u'192.0.2.254', u'irmc_username': u'admin', u'ipmi_terminal_port': 9999, u'irmc_deploy_iso': u'9beb1848-b1d0-49eb-bffa-0618ff8d9304', u'ipmi_username': u'admin', u'deploy_kernel': u'2f976ce5-fc47-41e8-bc45-d527dc3ad8b6', u'ipmi_address': u'192.0.2.254', u'deploy_ramdisk': u'badd21d6-7f23-459b-a734-0ff2e2304d1f', u'ipmi_password': u'******', u'irmc_password': u'******'}
Feb 21 17:33:48 localhost ironic-conductor[3618]: , u'resource_class': u'baremetal', u'created_at': u'2018-02-20T08:37:06.000000', u'deploy_interface': u'iscsi', u'console_interface': u'ipmitool-socat', u'driver_internal_info': {u'clean_step_index': 0, u'agent_cached_clean_steps_refreshed': u'2018-02-21 08:33:47.808626', u'agent_cached_clean_steps': {u'deploy': [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices'}]}, u'clean_steps': [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}], u'hardware_manager_version': {u'generic_hardware_manager': u'1.1'}, u'is_whole_disk_image': False, u'agent_continue_if_ata_erase_failed': False, u'agent_erase_devices_iterations': 1, u'agent_erase_devices_zeroize': True, u'agent_url': u'http://192.0.3.14:9999', u'agent_version': u'3.0.0'}, u'instance_info': {}}, u'step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'ports': [{u'local_link_connection': {}, u'uuid': u'dcf5c46d-369e-4920-a136-ab84fd7ebc5e', u'extra': {}, u'pxe_enabled': True, u'created_at': u'2018-02-20T08:37:10.000000', u'portgroup_id': None, u'updated_at': u'2018-02-21T08:25:29.000000', u'node_id': 1, u'physical_network': None, u'address': u'90:1b:0e:0e:e9:d1', u'internal_info': {u'cleaning_vif_port_id': u'9baf377a-1352-47d3-b340-9b11fc5949d7', u'tenant_vif_port_id': u'894340b0-42f7-4673-a7f4-47ff8f6fc84b'}, u'id': 1}], u'clean_version': {u'generic_hardware_manager': u'1.1'}}, u'command_result': {u'clean_step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'clean_result': None}, u'id': u'2b6d159e-0811-4855-bd68-4b799ea34a8a', u'command_name': u'execute_clean_step'}{{(pid=3618) continue_cleaning /opt/stack/ironic/ironic/driv

Changed in ironic:
assignee: nobody → Hironori Shiina (shiina-hironori)

Fix proposed to branch: master
Review: https://review.openstack.org/546584

Changed in ironic:
status: New → In Progress
Dmitry Tantsur (divius) on 2018-02-21
Changed in ironic:
importance: Undecided → High
tags: added: cleaning queens-backport-potential

Fix proposed to branch: master
Review: https://review.openstack.org/546705

Changed in ironic:
assignee: Hironori Shiina (shiina-hironori) → Dmitry Tantsur (divius)
Ruby Loo (rloo) wrote :

Hironori, thanks for reporting this! After discussing, we feel that without locks, there will be a race condition. As such, we decided to revert the original patch that detaches VIFs without a lock: https://review.openstack.org/546705

Reviewed: https://review.openstack.org/546705
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=08ed859ce2c2388e903fb42f7efad9f9b265b7c9
Submitter: Zuul
Branch: master

commit 08ed859ce2c2388e903fb42f7efad9f9b265b7c9
Author: Dmitry Tantsur <email address hidden>
Date: Wed Feb 21 17:21:59 2018 +0000

    Revert "Don't try to lock for vif detach"

    This is causing more serious issues, as there is a race
    between tenant VIF removal and cleaning VIF adding.

    This reverts commit 4f79cb3932f2518ab3f06b86ceea065cbb399e8c.

    The release note is not deleted from it, because the change has
    already been released. A new one is added instead.

    Change-Id: I922f24293645ff6bb79ad753f49dc9548b9f2485
    Closes-Bug: #1750785

Changed in ironic:
status: In Progress → Fix Released

Change abandoned by Hironori Shiina (<email address hidden>) on branch: master
Review: https://review.openstack.org/546584
Reason: As mentioned, this patch doesn't completely solve the problem. I agree to revert the original patch.

Thanks.

Reviewed: https://review.openstack.org/546719
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=37e7eb680990c354091f60862c787bb2ebd13f9d
Submitter: Zuul
Branch: stable/queens

commit 37e7eb680990c354091f60862c787bb2ebd13f9d
Author: Dmitry Tantsur <email address hidden>
Date: Wed Feb 21 17:21:59 2018 +0000

    Revert "Don't try to lock for vif detach"

    This is causing more serious issues, as there is a race
    between tenant VIF removal and cleaning VIF adding.

    This reverts commit 4f79cb3932f2518ab3f06b86ceea065cbb399e8c.

    The release note is not deleted from it, because the change has
    already been released. A new one is added instead.

    Change-Id: I922f24293645ff6bb79ad753f49dc9548b9f2485
    Closes-Bug: #1750785

tags: added: in-stable-queens

This issue was fixed in the openstack/ironic 10.1.1 release.

This issue was fixed in the openstack/ironic 11.0.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers