Shares cannot be unmounted/destroyed after migration tests

Bug #1903773 reported by Goutham Pacha Ravi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Fix Released
High
Goutham Pacha Ravi

Bug Description

Description
===========

We've had past issues where deleting shares that have been locally mounted (example: manila-data service mounts the share to migrate content) has been troublesome. One of the issues identified is that, when the linux "make-rshared" option is set (it is set by default in most modern Linux systems), a mount is propagated to all namespaces (e.g.: linux containers, dnsmasq/ip namespaces). So, should we ever mount a share on the same system that owns the export, we should wait to unshare the mount across all the namespaces before we attempt to delete the share.

Currently, when we run migration API tests, tests create/mount/unmount/delete shares in seconds, this doesn't provide enough time for slower unshare operations to get propagated across what may be tens of linux namespaces.

When we tackled this issue in the past, we added a single retry with one second waits around the unmount operation in the zfsonlinux driver: https://bugs.launchpad.net/neutron/+bug/1546723

Migration tests have started failing on Focal Fossa (Ubuntu 20.04 LTS) systems because this retry is insufficient; the error observed on manila-share is captured here: http://paste.openstack.org/show/799892/ (Complete log file attached to this bug)

Steps to reproduce
==================

A chronological list of steps which will help reproduce the issue you hit:
* Setup the ZFSOnLinux driver on Ubuntu 20.04 LTS (Focal Fossa)
* Run the host assisted migration tempest test: manila_tempest_tests.tests.api.admin.test_migration.MigrationTwoPhaseNFSTest.test_migration_2phase.*True

It's possible to simulate this without the tempest test using manual share migration commands, but, you've got to type really fast and attempt to delete the share as soon as the share migration has completed :D

Expected result
===============
No share deletion errors

Actual result
=============
Share fails to be deleted (status gets set to "error_deleting")

Environment
===========
1. Exact version of OpenStack Manila you are running: trunk

2. Which storage backend did you use? ZFSOnLinux

3. Which networking type did you use? Neutron/ML2-OVS

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :
Changed in manila:
importance: Undecided → High
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (master)

Fix proposed to branch: master
Review: https://review.opendev.org/762212

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/762468

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (master)

Reviewed: https://review.opendev.org/762212
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=e3fea14788471cca01c8219e1ef638007e64c97f
Submitter: Zuul
Branch: master

commit e3fea14788471cca01c8219e1ef638007e64c97f
Author: Goutham Pacha Ravi <email address hidden>
Date: Tue Nov 10 13:31:38 2020 -0800

    Retry unmount operation on the ZFSOnLinux driver

    When a share is mounted on the same host as the manila-share
    process, zfs prevents us from destroying the underlying
    dataset until the share has been cleanly unmounted from
    the host. Kernel mounts can take a few seconds to get
    unmounted fully especially when there are a lot of
    linux namespaces that the mountpoint has been shared to.

    Add a retry on these operations to harden the deletion
    process and prevent spurious failures.

    Change-Id: I4aba76b72df274d0a8cb90fe0ab8799523c260ef
    Closes-Bug: #1903773
    Related-Bug: #1896672
    Signed-off-by: Goutham Pacha Ravi <email address hidden>

Changed in manila:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/762468
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=8a691d8631b9fdb18996566e0706e05776fa0ec4
Submitter: Zuul
Branch: master

commit 8a691d8631b9fdb18996566e0706e05776fa0ec4
Author: Goutham Pacha Ravi <email address hidden>
Date: Wed Nov 11 23:25:32 2020 -0800

    Retry unmount operation on the LVM driver

    When a share is mounted on the same host as the manila-share
    process, the kernel prevents us from destroying the
    mount directory until the share has been cleanly unmounted
    from the host. Kernel mounts can take a few seconds to get
    unmounted fully especially when there are a lot of
    linux namespaces that the mountpoint has been shared to.

    Add a retry on these operations to harden the deletion
    process and prevent spurious failures.

    Change-Id: I3c1a2ec19d6bc18638db0875519ce60f2c89f33a
    Closes-Bug: #1903773
    Related-Bug: #1896672
    Signed-off-by: Goutham Pacha Ravi <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to manila-tempest-plugin (master)

Reviewed: https://review.opendev.org/762219
Committed: https://git.openstack.org/cgit/openstack/manila-tempest-plugin/commit/?id=1214b82277f3a1e438231de7094453ce8d9a8cd9
Submitter: Zuul
Branch: master

commit 1214b82277f3a1e438231de7094453ce8d9a8cd9
Author: Goutham Pacha Ravi <email address hidden>
Date: Tue Nov 10 15:26:44 2020 -0800

    [ci] Switch lvm driver job to focal fossa

    Depends-On: https://review.opendev.org/762468/
    Change-Id: I15fe5a95037a31d313450a5a4c1ca394f0c259b6
    Partial-Bug: #1896672
    Related-Bug: #1903773
    Signed-off-by: Goutham Pacha Ravi <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/763060

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/763061

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/victoria)

Reviewed: https://review.opendev.org/763061
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=bd19642dfd69af3c63dd4d3711d25c353eed4ab6
Submitter: Zuul
Branch: stable/victoria

commit bd19642dfd69af3c63dd4d3711d25c353eed4ab6
Author: Goutham Pacha Ravi <email address hidden>
Date: Wed Nov 11 23:25:32 2020 -0800

    Retry unmount operation on the LVM driver

    When a share is mounted on the same host as the manila-share
    process, the kernel prevents us from destroying the
    mount directory until the share has been cleanly unmounted
    from the host. Kernel mounts can take a few seconds to get
    unmounted fully especially when there are a lot of
    linux namespaces that the mountpoint has been shared to.

    Add a retry on these operations to harden the deletion
    process and prevent spurious failures.

    Change-Id: I3c1a2ec19d6bc18638db0875519ce60f2c89f33a
    Closes-Bug: #1903773
    Related-Bug: #1896672
    Signed-off-by: Goutham Pacha Ravi <email address hidden>
    (cherry picked from commit 8a691d8631b9fdb18996566e0706e05776fa0ec4)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/manila 11.0.1

This issue was fixed in the openstack/manila 11.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/manila 12.0.0.0rc1

This issue was fixed in the openstack/manila 12.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.