Detached VIF may be restored during node cleaning

Bug #1750785 reported by Hironori Shiina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Dmitry Tantsur

Bug Description

When a node is undeployed from nova, nova also detaches VIFs from the node.
The detached VIF UUIDs may be restored to port.internal_info during node cleaning.
Now, VIFs are detached without locking a node. A task for cleaning may rewrite VIF UUIDs to port when it stores port UUID for cleaning network because the tenant VIF UUIDs remains in port object.

This happens as follows:

1. node was locked for tear down

Feb 21 17:25:10 localhost ironic-conductor[3618]: DEBUG ironic.conductor.task_manager [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a successfully reserved for node tear down (took 0.04 seconds) {{(pid=3618) reserve_node /opt/stack/ironic/ironic/conductor/task_manager.py:273}}

2. started adding cleaning network
Feb 21 17:25:21 localhost ironic-conductor[3618]: INFO ironic.drivers.modules.network.flat [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Adding cleaning network to node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a

3. VIF was detached by a request from Nova

Feb 21 17:25:22 localhost ironic-conductor[3618]: INFO ironic.conductor.manager [None req-bf75c88d-8dc0-487a-bba3-8bf0d6fd042f None None] VIF 894340b0-42f7-4673-a7f4-47ff8f6fc84b successfully detached from node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a

4. port for cleaning network was created
Feb 21 17:25:29 localhost ironic-conductor[3618]: INFO ironic.common.neutron [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Successfully created ports for node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a in network 4918b028-d39c-4bd1-9465-1c626cd85e5c.

5. Then, cleaning_vif_port_id was saved to port.internal_info.
   I guess tenant_vif_port_id which remained in port object was saved here.

6. lock was released.
Feb 21 17:26:20 localhost ironic-conductor[3618]: DEBUG ironic.conductor.task_manager [None req-d8dddab5-562c-437d-8fbd-b0f559b570c2 None None] Successfully released exclusive lock for node tear down on node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a (lock was held 69.98 sec) {{(pid=3618) release_resources /opt/stack/ironic/ironic/conductor/task_manager.py:352}}

7. during cleaning, tenant_vif_port_id was found in port.internal info.

Feb 21 17:33:48 localhost ironic-conductor[3618]: DEBUG ironic.drivers.modules.agent_client Status of agent commands for node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a: get_clean_steps: result "{u'clean_steps': {u'GenericHardwareManager': [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices'}]}, u'hardware_manager_version': {u'generic_hardware_manager': u'1.1'}}", error "None"; execute_clean_step: result "{u'clean_step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'clean_result': None}", error "None"{{(pid=3618) get_commands_status /opt/stack/ironic/ironic/drivers/modules/agent_client.py:118}}[00m
Feb 21 17:33:48 localhost ironic-conductor[3618]: DEBUG ironic.drivers.modules.agent_base_vendor Cleaning command status for node 2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a on step {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}: {u'command_error': None, u'command_status': u'SUCCEEDED', u'command_params': {u'node': {u'target_power_state': None, u'inspect_interface': u'irmc', u'raid_interface': u'no-raid', u'target_provision_state': u'available', u'last_error': None, u'storage_interface': u'cinder', u'updated_at': u'2018-02-21T08:33:48.143602', u'boot_interface': u'irmc-virtual-media', u'traits': [], u'chassis_id': None, u'provision_state': u'cleaning', u'clean_step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'id': 1, u'vendor_interface': u'no-vendor', u'uuid': u'2f71bf62-8c8f-4dc3-b9b9-aaeb71ec1c2a', u'console_enabled': False, u'extra': {}, u'raid_config': {}, u'provision_updated_at': u'2018-02-21T08:33:48.000000', u'maintenance': False, u'target_raid_config': {}, u'network_interface': u'flat', u'conductor_affinity': 1, u'inspection_started_at': None, u'inspection_finished_at': None, u'power_state': u'power on', u'driver': u'irmc', u'power_interface': u'irmc', u'rescue_interface': u'no-rescue', u'maintenance_reason': None, u'reservation': u'shiina-dev-real-ironic', u'management_interface': u'irmc', u'properties': {u'memory_mb': 1024, u'cpu_arch': u'x86_64', u'local_gb': 20, u'cpus': 1, u'capabilities': u'iscsi_boot:True'}, u'instance_uuid': None, u'name': u'mybm', u'driver_info': {u'irmc_address': u'192.0.2.254', u'irmc_username': u'admin', u'ipmi_terminal_port': 9999, u'irmc_deploy_iso': u'9beb1848-b1d0-49eb-bffa-0618ff8d9304', u'ipmi_username': u'admin', u'deploy_kernel': u'2f976ce5-fc47-41e8-bc45-d527dc3ad8b6', u'ipmi_address': u'192.0.2.254', u'deploy_ramdisk': u'badd21d6-7f23-459b-a734-0ff2e2304d1f', u'ipmi_password': u'******', u'irmc_password': u'******'}
Feb 21 17:33:48 localhost ironic-conductor[3618]: , u'resource_class': u'baremetal', u'created_at': u'2018-02-20T08:37:06.000000', u'deploy_interface': u'iscsi', u'console_interface': u'ipmitool-socat', u'driver_internal_info': {u'clean_step_index': 0, u'agent_cached_clean_steps_refreshed': u'2018-02-21 08:33:47.808626', u'agent_cached_clean_steps': {u'deploy': [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices'}]}, u'clean_steps': [{u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}], u'hardware_manager_version': {u'generic_hardware_manager': u'1.1'}, u'is_whole_disk_image': False, u'agent_continue_if_ata_erase_failed': False, u'agent_erase_devices_iterations': 1, u'agent_erase_devices_zeroize': True, u'agent_url': u'http://192.0.3.14:9999', u'agent_version': u'3.0.0'}, u'instance_info': {}}, u'step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'ports': [{u'local_link_connection': {}, u'uuid': u'dcf5c46d-369e-4920-a136-ab84fd7ebc5e', u'extra': {}, u'pxe_enabled': True, u'created_at': u'2018-02-20T08:37:10.000000', u'portgroup_id': None, u'updated_at': u'2018-02-21T08:25:29.000000', u'node_id': 1, u'physical_network': None, u'address': u'90:1b:0e:0e:e9:d1', u'internal_info': {u'cleaning_vif_port_id': u'9baf377a-1352-47d3-b340-9b11fc5949d7', u'tenant_vif_port_id': u'894340b0-42f7-4673-a7f4-47ff8f6fc84b'}, u'id': 1}], u'clean_version': {u'generic_hardware_manager': u'1.1'}}, u'command_result': {u'clean_step': {u'priority': 99, u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, u'step': u'erase_devices_metadata'}, u'clean_result': None}, u'id': u'2b6d159e-0811-4855-bd68-4b799ea34a8a', u'command_name': u'execute_clean_step'}{{(pid=3618) continue_cleaning /opt/stack/ironic/ironic/driv

Changed in ironic:
assignee: nobody → Hironori Shiina (shiina-hironori)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/546584

Changed in ironic:
status: New → In Progress
Revision history for this message
Hironori Shiina (shiina-hironori) wrote :
Dmitry Tantsur (divius)
Changed in ironic:
importance: Undecided → High
tags: added: cleaning queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/546705

Changed in ironic:
assignee: Hironori Shiina (shiina-hironori) → Dmitry Tantsur (divius)
Revision history for this message
Ruby Loo (rloo) wrote :

Hironori, thanks for reporting this! After discussing, we feel that without locks, there will be a race condition. As such, we decided to revert the original patch that detaches VIFs without a lock: https://review.openstack.org/546705

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/546719

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/546705
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=08ed859ce2c2388e903fb42f7efad9f9b265b7c9
Submitter: Zuul
Branch: master

commit 08ed859ce2c2388e903fb42f7efad9f9b265b7c9
Author: Dmitry Tantsur <email address hidden>
Date: Wed Feb 21 17:21:59 2018 +0000

    Revert "Don't try to lock for vif detach"

    This is causing more serious issues, as there is a race
    between tenant VIF removal and cleaning VIF adding.

    This reverts commit 4f79cb3932f2518ab3f06b86ceea065cbb399e8c.

    The release note is not deleted from it, because the change has
    already been released. A new one is added instead.

    Change-Id: I922f24293645ff6bb79ad753f49dc9548b9f2485
    Closes-Bug: #1750785

Changed in ironic:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Hironori Shiina (<email address hidden>) on branch: master
Review: https://review.openstack.org/546584
Reason: As mentioned, this patch doesn't completely solve the problem. I agree to revert the original patch.

Thanks.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (stable/queens)

Reviewed: https://review.openstack.org/546719
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=37e7eb680990c354091f60862c787bb2ebd13f9d
Submitter: Zuul
Branch: stable/queens

commit 37e7eb680990c354091f60862c787bb2ebd13f9d
Author: Dmitry Tantsur <email address hidden>
Date: Wed Feb 21 17:21:59 2018 +0000

    Revert "Don't try to lock for vif detach"

    This is causing more serious issues, as there is a race
    between tenant VIF removal and cleaning VIF adding.

    This reverts commit 4f79cb3932f2518ab3f06b86ceea065cbb399e8c.

    The release note is not deleted from it, because the change has
    already been released. A new one is added instead.

    Change-Id: I922f24293645ff6bb79ad753f49dc9548b9f2485
    Closes-Bug: #1750785

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic 10.1.1

This issue was fixed in the openstack/ironic 10.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic 11.0.0

This issue was fixed in the openstack/ironic 11.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.