Ensure only one manila-ganesha unit is running services in HA

Bug #1904623 reported by Corey Bryant
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Manila-Ganesha Charm
Fix Released
High
Felipe Reyes
charm-interface-hacluster
Fix Released
High
Felipe Reyes

Bug Description

The manila-ganesha charm supports active/passive HA. This is done by co-locating manila-share and nfs-ganesha systemd services on the same unit, and ensuring that those services are only running on the master unit with the VIP. Once an HA deployment is complete, pacemaker will ensure that only one unit is running the services.

If a second unit starts the services while the first is connected to CephFS, the first will be evicted and its session state corrupted.

The manila-ganesha charm ensures that the services are disabled/stopped until the HA cluster setup is complete. It also overrides service_(re)start methods so that they cooperate with pacemaker and allow pacemaker to control any service starts. The goal being to prevent a second unit from starting services while the first is connected to CephFS.

Once HA setup is complete, pacemaker will ensure that only one unit has running services. However, there is still a risk that a systemd service is restarted manually by a user or other means.

Can the systemd unit file or ExecStart, ExecStartPre, ExecStartPost script be updated to better handle this scenario?

affects: charms.openstack → charm-manila-ganesha
Changed in charm-manila-ganesha:
status: New → Triaged
importance: Undecided → Wishlist
importance: Wishlist → Medium
Revision history for this message
Nobuto Murata (nobuto) wrote :

The duplicated bug[1] has a slightly different scenario than:

> However, there is still a risk that a systemd service is restarted manually by a user or other means.

But the service is running on multiple units after the initial deployment. And a workaround is to run the following command as a post-deployment task:
$ juju run --app=manila-ganesha "systemctl stop manila-share nfs-ganesha"
and double check the pacemaker status after that.

[1] https://bugs.launchpad.net/charm-manila-ganesha/+bug/1936455

Revision history for this message
Felipe Reyes (freyes) wrote :

Steps to reproduce:

1) juju deploy ./my-bundle.yaml # bundle -> http://paste.ubuntu.com/p/NDFW3qpKxc/
2) wait until all services are idle.
3) Unseal vault

Expected result: a single manila-share service is running
Actual result: all manila-ganesha units have the manila-share service running.

Detailed output on how to figure this out can be found at: https://pastebin.ubuntu.com/p/myrbqtFX9m/

The resource should be configured in this way to make pacemaker stop the service in the nodes where they shouldn't be running:

primitive res_manila_share_manila_share systemd:manila-share \
        meta migration-threshold=INFINITY failure-timeout=5s \
        op monitor interval=5s role=Started \
        op monitor interval=6s role=Stopped

^ 2 monitors, one for the role Started and another one for Stopped, they have a slightly different interval to avoid running them at the same time when the cluster is being brought up from a cold start. More details at https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html

Changed in charm-manila-ganesha:
assignee: nobody → Felipe Reyes (freyes)
Revision history for this message
Felipe Reyes (freyes) wrote :

changing importance to 'high' since it can break the sessions.

Changed in charm-manila-ganesha:
importance: Medium → High
Revision history for this message
Felipe Reyes (freyes) wrote :

interface-hacluster doesn't expose a way to configure multiple monitors, the monitor configured is always configured by the SystemdService class[0]

[0] https://opendev.org/openstack/charm-interface-hacluster/src/branch/master/interface_hacluster/common.py#L1004

Changed in charm-interface-hacluster:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Felipe Reyes (freyes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-interface-hacluster (master)
Changed in charm-interface-hacluster:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-interface-hacluster (master)

Reviewed: https://review.opendev.org/c/openstack/charm-interface-hacluster/+/827912
Committed: https://opendev.org/openstack/charm-interface-hacluster/commit/5fc5216f51dcf98530d45e137d55fd94b39d150a
Submitter: "Zuul (22348)"
Branch: master

commit 5fc5216f51dcf98530d45e137d55fd94b39d150a
Author: Felipe Reyes <email address hidden>
Date: Fri Feb 4 14:49:03 2022 -0300

    Add monitor for stopped services when clone=False

    The cluster as its currently configured for services with clone=False,
    Pacemaker will monitor exclusively that the daemon is running in the
    node where it should, but will take no actions if the same daemon is
    running (e.g. started manually by a sysadmin) in another node of the
    cluster, this becomes a problem for services that are expected to be
    configured in active/passive (e.g. manila-share).

    This change configures two monitors for services with clone=False, one
    that monitors the daemon is running where it should, and another one
    that monitors the daemon is not running where it shouldn't.

    primitive res_apache systemd:apache2 \
            ...
            op monitor interval=5s role=Started \
            op monitor interval=6s role=Stopped

    https://clusterlabs.org/pacemaker/doc/deprecated/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html#s-resource-monitoring

    Closes-Bug: #1904623
    Change-Id: I9e5383f5ab6b6967aa0f2318764519989a292227

Changed in charm-interface-hacluster:
status: In Progress → Fix Released
Revision history for this message
Pedro Castillo (peterctl) wrote :

Please backport this fix to the ussuri/edge channel.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

> Please backport this fix to the ussuri/edge channel.

Note for the hacluster charm, this would be being backported to the git branch stable/jammy (targeting the 2.4 track) first, and the git branch stable/focal (targeting the 2.0.3 track on charmhub).

Felipe: how big a change is this to consider/discount for a backport, please?

Revision history for this message
Felipe Reyes (freyes) wrote : Re: [Bug 1904623] Re: Ensure only one manila-ganesha unit is running services in HA

On Thu, 2022-07-07 at 09:24 +0000, Alex Kavanagh wrote:
> > Please backport this fix to the ussuri/edge channel.
>
> Note for the hacluster charm, this would be being backported to the git
> branch stable/jammy (targeting the 2.4 track) first, and the git branch
> stable/focal (targeting the 2.0.3 track on charmhub).
>
> Felipe: how big a change is this to consider/discount for a backport,
> please?
>
the fix is in the interface[0] which at the moment is branch-less, so we would
"only" need to update the lockfile in the manila-ganesha charm.

These would be commits that go in if we update the build.lock file.

$ git log --oneline 8125a7baecccf9b0869e515b92300dde3a86f31b..HEAD
5fc5216 (HEAD -> master, origin/master, origin/HEAD, gerrit/master) Add monitor
for stopped services when clone=False
2b714e9 Merge "Update relation data even if the new value is empty"
5451d82 Drop six.
56710fd Update relation data even if the new value is empty

[0]
https://opendev.org/openstack/charm-interface-hacluster/commit/5fc5216f51dcf98530d45e137d55fd94b39d150a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/yoga)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/xena)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852217

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/21.10)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-manila-ganesha (stable/21.10)

Change abandoned by "Felipe Reyes <email address hidden>" on branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852218

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-manila-ganesha (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852220

Revision history for this message
Felipe Reyes (freyes) wrote :

I submitted this set of backports/updates to the build.lock in manila-ganesha

https://review.opendev.org/q/topic:bug%252F1904623

Revision history for this message
Billy Olsen (billy-olsen) wrote :

FTR - I've reviewed the three patches that are part of the interface here, and I think all 3 of them are safe/okay to backport as part of this particular patch.

5fc5216 (HEAD -> master, origin/master, origin/HEAD, gerrit/master) Add monitor for stopped services when clone=False

^^ this is the one we really want

2b714e9 Merge "Update relation data even if the new value is empty"

^^ merge commit

5451d82 Drop six.

^^ trivial, and target branches are all python3

56710fd Update relation data even if the new value is empty

^^ is relatively small, though the semantics change in that when all pacemaker resources are removed, they are also removed from the relation data to allow the hacluster charm to see this. I think this is a perfectly acceptable backport and shouldn't actually trigger anything in this charm anyways since we don't hit this condition

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila-ganesha (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852215
Committed: https://opendev.org/openstack/charm-manila-ganesha/commit/b48a3630c0c801aeddfa53f797b50455a0f8552e
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit b48a3630c0c801aeddfa53f797b50455a0f8552e
Author: Felipe Reyes <email address hidden>
Date: Thu Aug 4 16:18:02 2022 -0400

    Update build.lock

    This change updates the build.lock to include the following commits from
    interface-hacluster:

    5fc5216 Add monitor for stopped services when clone=False

    Closes-Bug: #1904623
    Change-Id: I22217db6def5a565a54df10f45f9dbdb9b17a6db

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila-ganesha (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852217
Committed: https://opendev.org/openstack/charm-manila-ganesha/commit/b1c8cb66938bab7dde46cc4c07e7f221bafb3aba
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit b1c8cb66938bab7dde46cc4c07e7f221bafb3aba
Author: Felipe Reyes <email address hidden>
Date: Thu Aug 4 16:23:51 2022 -0400

    Update build.lock

    This change updates the build.lock to include the following commits from
    interface-hacluster:

    5fc5216 Add monitor for stopped services when clone=False
    5451d82 Drop six.
    56710fd Update relation data even if the new value is empty

    Closes-Bug: #1904623
    Closes-Bug: #1953623
    Change-Id: I7ca83f7732b5b5993eb0d92e604ed48e810437d6

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila-ganesha (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852216
Committed: https://opendev.org/openstack/charm-manila-ganesha/commit/23bb05682222c240a94951acc54ffb0d8b3d7304
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 23bb05682222c240a94951acc54ffb0d8b3d7304
Author: Felipe Reyes <email address hidden>
Date: Thu Aug 4 16:21:16 2022 -0400

    Update build.lock

    This change updates the build.lock to include the following commits from
    interface-hacluster:

    5fc5216 Add monitor for stopped services when clone=False
    5451d82 Drop six.
    56710fd Update relation data even if the new value is empty

    Closes-Bug: #1904623
    Closes-Bug: #1953623
    Change-Id: Iad74325687e6608c33eb5490c81112ff8555fc51

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila-ganesha (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852219
Committed: https://opendev.org/openstack/charm-manila-ganesha/commit/5d07b236358b030a39a3b52fdda03716197ada6d
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 5d07b236358b030a39a3b52fdda03716197ada6d
Author: Felipe Reyes <email address hidden>
Date: Thu Aug 4 16:26:40 2022 -0400

    Update build.lock

    This change updates the build.lock to include the following commits from
    interface-hacluster:

    5fc5216 Add monitor for stopped services when clone=False
    5451d82 Drop six.
    56710fd Update relation data even if the new value is empty

    Closes-Bug: #1904623
    Closes-Bug: #1953623
    Change-Id: Ifd03b8d81dea52b1dfdc81679d34f366116579ab

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-manila-ganesha (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/charm-manila-ganesha/+/852220
Committed: https://opendev.org/openstack/charm-manila-ganesha/commit/3e6fe07730099dd5005e091b523e0b73a6aa5d8c
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 3e6fe07730099dd5005e091b523e0b73a6aa5d8c
Author: Felipe Reyes <email address hidden>
Date: Thu Aug 4 16:33:02 2022 -0400

    Update build.lock

    This change updates the build.lock to include the following commits from
    interface-hacluster:

    5fc5216 Add monitor for stopped services when clone=False
    5451d82 Drop six.
    56710fd Update relation data even if the new value is empty

    Closes-Bug: #1904623
    Closes-Bug: #1953623
    Change-Id: Ib76c9190ca2c3beff53f3cb87c58a036eba38d6a

tags: added: in-stable-ussuri
Revision history for this message
Billy Olsen (billy-olsen) wrote :

This has been backported and included back to ussuri.

Changed in charm-manila-ganesha:
status: Triaged → Fix Released
status: Fix Released → Fix Committed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

This is now all fix released.

Changed in charm-manila-ganesha:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.