Stale BDM records remain in the DB after n-api to n-cpu RPC timeouts during reserve_block_device_name

Bug #1844296 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Undecided
Lee Yarwood

Bug Description

Description
===========
When attaching block devices n-api will call out to the n-cpu service hosting an instance and attempt to reserve a block device name via reserve_block_device_name. This call also creates the initial BDM record within the database. RPC timeouts, seen by n-api as exceptions can then result in stale BDM records persisting in the database.

Steps to reproduce
==================

1. Attach a volume to an instance, ensuring any call to reserve_block_device_name takes longer than the configured RPC timeout within the environment.

Expected result
===============
The RPC timeout is hit and any BDM records created by the n-cpu service are removed by n-api.

Actual result
=============
The RPC timeout is hit but the BDM records persist.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

   Master.

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

   N/A

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

   N/A

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

   N/A

Logs & Configs
==============

Invalid bdm record remains when reserve_block_device_name rpc call times out
https://bugzilla.redhat.com/show_bug.cgi?id=1752734

Lee Yarwood (lyarwood)
tags: added: compute
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/682594

Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: New → In Progress
Revision history for this message
Matt Riedemann (mriedem) wrote :

I could have sworn we already had a really old bug for this same thing that ndipanov opened.

Revision history for this message
Matt Riedemann (mriedem) wrote :

This looks like a duplicate of bug 1425352 though that wasn't the bug I was thinking of.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/692940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/693537

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/693537
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1b63c7a83f4cd4e49aa0e6336dcc19d798b09f75
Submitter: Zuul
Branch: master

commit 1b63c7a83f4cd4e49aa0e6336dcc19d798b09f75
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 8 12:26:01 2019 +0000

    compute: Use long_rpc_timeout in reserve_block_device_name

    Given the instance.uuid lock taken on the remote compute, calls to
    reserve_block_device_name can take a large amount of time to complete
    when attaching multiple volumes. To help avoid timeouts during such
    attempts this change switches to using the long_rpc_timeout for the
    overall timeout for each call.

    Related-Bug: #1844296
    Change-Id: I17e0e45117a3312c11d6c7f2762dd416b6067979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.opendev.org/682594

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/696953

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/696955

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/696956

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/696953
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f9a1bc71050b41753259b96998a2a4b4dc8ecd79
Submitter: Zuul
Branch: stable/train

commit f9a1bc71050b41753259b96998a2a4b4dc8ecd79
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 8 12:26:01 2019 +0000

    compute: Use long_rpc_timeout in reserve_block_device_name

    Given the instance.uuid lock taken on the remote compute, calls to
    reserve_block_device_name can take a large amount of time to complete
    when attaching multiple volumes. To help avoid timeouts during such
    attempts this change switches to using the long_rpc_timeout for the
    overall timeout for each call.

    Conflicts:
            nova/conf/rpc.py

    NOTE(lyarwood): Conflicts due to the following not being present in
    stable/train. I9115ef6df59844cd6e702f19ba38ffbf9f8b35d3,
    I518ae675b7a67da64a5796e57e87860f0c3ef0db and
    If373fedb8d2e0dfc46b8ac5b018f8216aa5c643c.

    Related-Bug: #1844296
    Change-Id: I17e0e45117a3312c11d6c7f2762dd416b6067979
    (cherry picked from commit 648c05f7bee025087c2f9d8e2f9cda6e2c13e13f)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/stein)

Reviewed: https://review.opendev.org/696955
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e858ab3bbe79bde32cce4bc661c0c4edc04b51d3
Submitter: Zuul
Branch: stable/stein

commit e858ab3bbe79bde32cce4bc661c0c4edc04b51d3
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 8 12:26:01 2019 +0000

    compute: Use long_rpc_timeout in reserve_block_device_name

    Given the instance.uuid lock taken on the remote compute, calls to
    reserve_block_device_name can take a large amount of time to complete
    when attaching multiple volumes. To help avoid timeouts during such
    attempts this change switches to using the long_rpc_timeout for the
    overall timeout for each call.

    Conflicts:
            nova/conf/rpc.py

    NOTE(lyarwood): Conflict due to
    If32bca070185937ef83f689b7163d965a89ec10a not being present in
    stable/stein.

    Related-Bug: #1844296
    Change-Id: I17e0e45117a3312c11d6c7f2762dd416b6067979
    (cherry picked from commit 648c05f7bee025087c2f9d8e2f9cda6e2c13e13f)
    (cherry picked from commit f9a1bc71050b41753259b96998a2a4b4dc8ecd79)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/696956
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b03c8f589498a44c0713be761951ece534a325f
Submitter: Zuul
Branch: stable/rocky

commit 6b03c8f589498a44c0713be761951ece534a325f
Author: Lee Yarwood <email address hidden>
Date: Fri Nov 8 12:26:01 2019 +0000

    compute: Use long_rpc_timeout in reserve_block_device_name

    Given the instance.uuid lock taken on the remote compute, calls to
    reserve_block_device_name can take a large amount of time to complete
    when attaching multiple volumes. To help avoid timeouts during such
    attempts this change switches to using the long_rpc_timeout for the
    overall timeout for each call.

    Related-Bug: #1844296
    Change-Id: I17e0e45117a3312c11d6c7f2762dd416b6067979
    (cherry picked from commit 648c05f7bee025087c2f9d8e2f9cda6e2c13e13f)
    (cherry picked from commit f9a1bc71050b41753259b96998a2a4b4dc8ecd79)
    (cherry picked from commit e858ab3bbe79bde32cce4bc661c0c4edc04b51d3)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/692940

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.