Add fence_aws fencing from v4.6.0 agent to Bionic

Bug #1894323 reported by Rafael David Tinoco
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
fence-agents (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Wishlist
Unassigned
Focal
Won't Fix
Wishlist
Unassigned

Bug Description

SRU reviewer: I got a complex backport for Focal thinking about introducing a better version of fence_aws in Bionic but I abandoned the idea and created a simple 1 patch fence_aws backport to Bionic (so it is like Focal, and not both be like Groovy).

[Impact]

 * Currently Ubuntu Bionic does not have fence_aws available and that is needed in order to have a fully working HA solution in AWS environment.

 * fence_aws from Focal fence-agents (4.5.2-1) is missing some fixes that happened in between Focal and Groovy versions. With that, I initially opted to fully bring all the fixes from version 4.6.0 to Focal and backport this same version to Bionic (this way Bionic and Focal were in the same level).

 * After MR reviews, thinking about the SRU review, I agreed to minimize this change making Ubuntu Bionic fence_aws agent just like focal and working in any needed Focal fix for fence_aws (as long as there was a test case for it).

[Test Case]

 * Provision 3 nodes in AWS with Ubuntu Focal (and Ubuntu Bionic) and configure it adding the following primitive as a fencing resource:

primitive fence-focal stonith:fence_aws params access_key="xxxx" secret_key="yyyy"
region="us-east-1" pcmk_host_map="focal01:i-034dc89cca4310b03;focal02:i-0a160b14b40f1330a;focal03:i-03b6976ab0a7f377c"

and the cluster cib options:

property cib-bootstrap-options: \
    have-watchdog=false \
    cluster-infrastructure=corosync \
    stonith-enabled=on \
    stonith-action=reboot \
    no-quorum-policy=stop \
    cluster-name=bionic

 * After that you can remove the interconnect of one of the nodes and watch the cluster to shutdown the node that was disconnected from the cluster ring.

[Regression Potential]

 * Bionic wise:

   - Same as focal, but here it is even better situation as there isn't an existing fence_aws agent. Biggest problem here could be introduce something that does not fully work (which is technically not a regression).

[Other Info]

This is a request from AWS to backport existing fence_aws agent into Bionic

Currently fence-agents Ubuntu Bionic version is at:

4.0.25-2ubuntu1

and the fence_aws new agent started at:

$ git tag --contains a3f45322 | head -1
v4.1.0

Related branches

Changed in fence-agents (Ubuntu Focal):
status: New → Fix Released
Changed in fence-agents (Ubuntu Bionic):
status: New → Confirmed
Changed in fence-agents (Ubuntu):
status: New → Fix Released
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Looks like the ONLY commit present in Ubuntu Focal and Groovy

$ for commit in $(git log --no-merges --grep fence_aws --pretty='format:%h'); do echo -n $(git describe $commit); echo " - " $(git tag --contains $commit | head -1) ; done
v4.5.2-50-g50772024 - v4.6.0
v4.5.2-49-gbe206158 - v4.6.0
v4.5.2-26-g9758f8c8 - v4.6.0
v4.5.2-25-g3f5676a7 - v4.6.0
v4.5.2-21-g1c2f791b - v4.6.0
v4.5.2-12-g7ac16fb2 - v4.6.0
v4.0.25-95-ga3f45322 - v4.1.0

Okay, so here we see that Groovy should definitely, together with LP: #1889070, have a FFe for recently released v4.6.0. Then, I have to backport the fence agent fixes to Focal. Then I have to backport v4.0.25-95-ga3f45322 AND fixes to Bionic.

Changed in fence-agents (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm marking Focal as confirmed because it can catch up with fixes made for version in v4.6.0 so Bionic and Focal can have the same version/features for fence_aws.

summary: - Add fence_aws fencing agent to Bionic (4.0.25-2ubuntu1)
+ Add fence_aws fencing from v4.6.0 agent to Bionic
Changed in fence-agents (Ubuntu Focal):
status: Fix Released → Confirmed
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

In bug:

https://bugs.launchpad.net/ubuntu/+source/fence-agents/+bug/1894325

I have synced v4.6.0-1 recently from Debian, and it includes fence_aws and fence_ibmz:

- fence_aws agent being backported to Bionic (LP: #1894323)
- Add LPAR fence agent to Pacemaker (LP: #1889070)

So in this bug we should backport fence_aws to Focal and Bionic, if possible.

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I see that AWS has 2 fencing mechanisms:

- fence_aws using boto3 library (in fence-agents)
- fence_ec2 (in cluster-glue)

Bug:

https://bugs.launchpad.net/ubuntu/+source/cluster-glue/+bug/1895355

has brought fence_ec2 support to our cluster-glue package by backporting needed patches. Perhaps that should also be checked for backport (backporting fence_ec2 to Focal and Bionic if possible).

I opened the following bug for this:

https://bugs.launchpad.net/ubuntu/+source/cluster-glue/+bug/1896696

Changed in fence-agents (Ubuntu Bionic):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
importance: Undecided → Wishlist
Changed in fence-agents (Ubuntu Focal):
importance: Undecided → Wishlist
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm providing a PPA containing a full v4.6.0 fence_aws backport to Ubuntu Focal fence-agents package at:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1894323-focal

with source at:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/fence-agents/+git/fence-agents/+ref/lp1894323-focal-v4.6.0-backport/

----

Because of the nature of fence-agents, where they're practically isolated scripts in the form of a metadata <-> python script pair, I think it would be okay to SRU a particular agent to the latest Ubuntu version. All regression risk is confined to the agent itself and it would be easy to fix/revert if ever needed (without jeopardizing those not relying in the agent).

For example, Groovy has fence-agents v4.6.0... containing all the patches needed for a good fence_aws support. I'm backporting all fixes from Groovy to Focal in this PPA.

Focal had only the initial fence_aws patch, but missing all the fixes from v4.5.2 to v4.6.0. Bionic is missing all commits, including the fence_aws agent.

I'm going to backport everything - the agent and fixes - to Bionic.. but it would be weird to have a more updated agent in Bionic (since it does not have the agent, it would be able to have the agent introduction with all fixes) than in Focal (that is why this SRU tries to keep both, Focal and Bionic, in the same codelevel for fence_aws).

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

I'm also providing a PPA containing a full v4.6.0 fence_aws backport to Ubuntu Bionic fence-agents package at:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1894323-bionic

with source at:

https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/fence-agents/+git/fence-agents/+ref/lp1894323-bionic-v4.6.0-backport/

----

Like explained in the previous comment, Bionic fence-agents package does not have fence_aws... so I have backported not only the fence_aws agent inclusion from Focal (just 1 commit), but also all the fixes up to Groovy (several fixes were provided from Focal to Groovy).

I'm going to test both and as for a SRU exception considering this as an "enablement".

Changed in fence-agents (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in fence-agents (Ubuntu Focal):
status: Confirmed → In Progress
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Everything is good and ready but I have discovered a small issue in the Bionic Backport... and relates to:

https://access.redhat.com/solutions/4642491

The story is this: when declaring the fence_aws primitive, you can either declare it as a single resource and describe the pcmk_host_map... OR you can declare one fence resource PER NODE doing the exact same thing BUT using the "plug/port" resource argument. The thing is... in Focal, both methods work but in Bionic, the second method does not work. It is not a big deal as there are some fence agents designed to work with "pcmk_host_map" only, and some others are designed to work with "plug/port" argument... but I have opened the bug:

https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1900374

to deal with this in a later moment (might require pacemaker bisecting, etc)

For this SRU.. after it is complete, the correct way of declaring the fence_aws resource is:

# focal

node 1: focal01
node 2: focal02
node 3: focal03

primitive fence-focal stonith:fence_aws \
 params access_key=xxxxxxxx secret_key="xxxxxxxx" region=us-east-1 pcmk_host_map="focal01:i-abcdefgh;focal02:i-ijlmnop;focal03:i-qrstuvxz"

property cib-bootstrap-options: \
 have-watchdog=false \
 dc-version=2.0.3-4b1f869f0f \
 cluster-infrastructure=corosync \
 stonith-enabled=on \
 stonith-action=reboot \
 no-quorum-policy=stop \
 cluster-name=focal

# bionic

node 1: bionic01
node 2: bionic02
node 3: bionic03

primitive fence-bionic stonith:fence_aws \
 params access_key=xxxxxxxx secret_key="xxxxxxxx" region=us-east-1 pcmk_host_map="bionic01:i-abcdefgh;bionic02:i-ijlmnop;bionic03:i-qrstuvxz"

property cib-bootstrap-options: \
    have-watchdog=false \
    dc-version=1.1.18-2b07d5c5a9 \
    cluster-infrastructure=corosync \
    stonith-enabled=on \
    stonith-action=reboot \
    no-quorum-policy=stop \
    cluster-name=bionic

description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :
Download full text (3.2 KiB)

To help with SRU review, I'm adding here the discussion taken from the merge review.. basically I would like to give it a try backporting all fixes for fence_aws into Focal and adding that same agent version in Bionic (like a minor SRU exception). Instead of relying in the test case results, I would like to rely in my functional/regression tests for a pacemaker cluster configured in AWS with this new agent (if possible).

---- Discussion with @bryce from the Ubuntu Server Team:

> I also read through each of the patches to understand what they do, and make sure the changes look safe, which they indeed do. One thought I had though is that the common theme in the patches is improvements to debug/logging output; the SRU team sometimes demurs over debug/logging changes as less important than actual bug fixes. At least you'll want to include good justification on this in the SRU text.

My justification to that is that Bionic does not have anything and I really would like Focal to be "as good as Focal", instead of adding something better in Bionic just because it did not have anything. Or even adding something not as good as Groovy just because of formal reasons.

> Commit 1c2f791b changes the cli option behavior, which is akin to an API change. I.e. before if you passed --region but not --access-key or --secret-key it would ignore --region and use configured values, with this change you can specify just --region and the keys will come from the config file. This feels more like a behavioral change than a bug fix, so I might anticipate some pushback from the SRU team on this.

Yes, this was per AWS request... and follows the same idea as the previous justification. This change allows one not to explicitly put the access or secret keys in the cluster CIB file (so its more secure also).

> In terms of SRU, I notice there are not (upstream|downstream) bug reports associated with the patches, which may make one wonder if these fix actual defects encountered in the wild, or are more like clarification/refactoring.

> I understand the logic of since the scripts don't exist in bionic to bring the current versions rather so as to have the most up to date code. But as you mention this then leaves a weird situation and having to pull delta into focal that otherwise might not be needed.

> Did you consider pulling the fence_aws from focal rather than the one in groovy? (And then cherrypicking the most relevant bug fixes from groovy, like the encoding fix and/or the race fix?)

Yes I did.. unfortunately its a SRU philosophical question. I'm considering fence_aws here as a confined code that is mostly supported by AWS themselves. I can go on that direction but I feel it is not the best for our user base.

> Alternatively, if you definitely do want to backport the whole stack, did you consider filing for an SRU exception for this package? If it really is important to keep the scripts identical on all LTS's that might be a better long term approach.

That would be a no-go because of agents metadata and pacemaker. Pacemaker should be able to handler older and newer fence-agents packages.. but it is not as good as "compatible with all further versions".

I...

Read more...

description: updated
description: updated
description: updated
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

fence-agents (4.0.25-2ubuntu1.2) bionic; urgency=medium

  * fence_aws backport from Focal (LP: #1894323):
    + d/p/lp1894323-01-fence_aws-new-agent.patch

 -- Rafael David Tinoco <email address hidden> Thu, 22 Oct 2020 04:47:00 +0000

----

[rafaeldtinoco@bionic fence-agents]$ git ubuntu tag --upload

[rafaeldtinoco@bionic fence-agents]$ git describe
upload/4.0.25-2ubuntu1.2

[rafaeldtinoco@bionic fence-agents]$ git push pkg upload/4.0.25-2ubuntu1.2
Counting objects: 15, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (15/15), done.
Writing objects: 100% (15/15), 4.49 KiB | 460.00 KiB/s, done.
Total 15 (delta 10), reused 0 (delta 0)
To ssh://git.launchpad.net/ubuntu/+source/fence-agents
 * [new tag] upload/4.0.25-2ubuntu1.2 -> upload/4.0.25-2ubuntu1.2

[rafaeldtinoco@bionic ubuntu]$ debdiff fence-agents_4.0.25-2ubuntu1.1.dsc fence-agents_4.0.25-2ubuntu1.2.dsc | diffstat
 changelog | 7 +
 control | 5 -
 patches/lp1894323-01-fence_aws-new-agent.patch | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 patches/series | 1
 4 files changed, 298 insertions(+), 1 deletion(-)

[rafaeldtinoco@bionic ubuntu]$ dput ubuntu fence-agents_4.0.25-2ubuntu1.2_source.changes
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading fence-agents_4.0.25-2ubuntu1.2.dsc: done.
  Uploading fence-agents_4.0.25-2ubuntu1.2.debian.tar.xz: done.
  Uploading fence-agents_4.0.25-2ubuntu1.2_source.buildinfo: done.
  Uploading fence-agents_4.0.25-2ubuntu1.2_source.changes: done.
Successfully uploaded packages.

----

Note: the fence_aws agent primitive should be declared as:

primitive fence-bionic stonith:fence_aws \
 params access_key=yyyy secret_key="xxxx" region=us-east-1 pcmk_host_map="bionic01:i-068e134de1beddc7f;bionic02:i-0136eddd045ceb7e2;bionic03:i-0de279ab4e6d642c8"

and cluster properties as:

have-watchdog=false \
dc-version=1.1.18-2b07d5c5a9 \
cluster-infrastructure=corosync \
stonith-enabled=on \
stonith-action=reboot \
no-quorum-policy=stop \
cluster-name=bionic

crm configure might complain about you not specifying the "plug" argument, you can safely ignore that as this fence_agent does not require the plug argument (and this pacemaker version has an issue when plug is given, see comment #11 for more information).

Changed in fence-agents (Ubuntu Focal):
status: In Progress → Won't Fix
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Rafael, or anyone else affected,

Accepted fence-agents into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/fence-agents/4.0.25-2ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in fence-agents (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

I verified the test case using the package available in bionic-proposed and I confirm it is working as expected. I set up a 3 nodes cluster on AWS to test this.

Note: When installing fence-agents also install the Suggested dependencies, otherwise the 'fence_aws' command will not work.

ubuntu@node1:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
ubuntu@node1:~$ dpkg -l | grep fence-agents
ii fence-agents 4.0.25-2ubuntu1.2 amd64 Fence Agents for Red Hat Cluster
ubuntu@node1:~$ sudo crm configure show
node 1: node1
node 2: node2
node 3: node3
primitive fence-node1 stonith:fence_aws \
 params access_key=xxxx secret_key="xxxx" region=us-east-2 plug=i-093f875f9f2ffa1db pcmk_host_map="node1:i-093f875f9f2ffa1db;node2:i-08649fdfb0a74bc9f;node3:i-0394f790feeba28b0"
primitive fence-node2 stonith:fence_aws \
 params access_key=xxxx secret_key="xxxx" region=us-east-2 plug=i-08649fdfb0a74bc9f pcmk_host_map="node1:i-093f875f9f2ffa1db;node2:i-08649fdfb0a74bc9f;node3:i-0394f790feeba28b0"
primitive fence-node3 stonith:fence_aws \
 params access_key=xxxx secret_key="xxxx" region=us-east-2 plug=i-0394f790feeba28b0 pcmk_host_map="node1:i-093f875f9f2ffa1db;node2:i-08649fdfb0a74bc9f;node3:i-0394f790feeba28b0"
location l-fence-node1 fence-node1 -inf: node1
location l-fence-node2 fence-node2 -inf: node2
location l-fence-node3 fence-node3 -inf: node3
property cib-bootstrap-options: \
 have-watchdog=false \
 dc-version=1.1.18-2b07d5c5a9 \
 cluster-infrastructure=corosync \
 cluster-name=clubionic \
 stonith-enabled=on \
 stonith-action=reboot \
 no-quorum-policy=stop

If I go to node2 and run the following command to reject connections from the network interface in use the node is properly fenced (in this case rebooted):

ubuntu@node2:~$ sudo iptables -A INPUT -i eth0 -j REJECT

After some minutes the node2 gets back online.

I also tested it without pacemaker in a standalone mode. I ran the following command to do that:

ubunt@node3:~$ sudo fence_aws --plug=<instance-id> --action=reboot --region=us-east-2 --access-key="xxx" --secret-key="xxx" --verbose

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package fence-agents - 4.0.25-2ubuntu1.2

---------------
fence-agents (4.0.25-2ubuntu1.2) bionic; urgency=medium

  * fence_aws backport from Focal (LP: #1894323):
    + d/p/lp1894323-01-fence_aws-new-agent.patch

 -- Rafael David Tinoco <email address hidden> Thu, 22 Oct 2020 04:47:00 +0000

Changed in fence-agents (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for fence-agents has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.