ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 Nitro instances from stopping normally

Bug #1840909 reported by Sam Stenvall
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ec2-hibinit-agent (Ubuntu)
Fix Released
Undecided
Balint Reczey
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Disco
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * EC2 Nitro instances (e.g. m5.*) don't shut down when stopping is requested via an EC2 interface.

[Test Case]

 * Start a Nitro instance, for example m5.large
 * Make sure that the fixed package is installed
 * Stop the instance from EC2 web console
 * Observe the instance stopping shortly.
 * Start the instance
 * Check in the systemd journal that the shutdown was performed without any issue.

[Regression Potential]

 * The root cause of the issue is that ec2-hibinit-agent ships configuration that makes logind ignore power button to be able to handle the sleep button event, but does not handle a power button event.
The fix is also handling the power button and requesting poweroff via dbus.
 * The change is very isolated and I tested that hibernation still works both on Xen based (c4.large) and Nitro based (m5.large) instances.
Introducing other regressions with this change is not likely.

[Original Bug Text]

Recently I've noticed a bunch of related issues with our AWS EC2 instances:

* stopping takes forever
* terminating takes forever (probably because it tries to stop first)
* lots of dangling nodes in our Consul cluster

Today I decided to debug what was going on. At first I thought it was something that we do to our AMIs that was the issue, but after starting a vanilla Ubuntu 18.04 official AMI (0cdab515472ca0bac to be exact) I could replicate the issue.

What happens is that you get "systemd-logind[816]: Power key pressed" in the journal when you issue a Stop action against your EC2 instance. However, after that nothing happens, until 300 seconds have passed and AWS terminates your instance instead. This means nothing exits cleanly, which explains why Consul nodes are left dangling.

At first I thought it was a bug in systemd-logind, until I found /usr/lib/systemd/logind.conf.d/ec2-hibinit-agent-ignore-powerkey.conf, containing:

[Login]
HandlePowerKey=ignore

Removing this file or uncommenting the last line fixes the problem.

So in effect this package completely prevents the normal shutdown mechanism from working correctly. I'm currently working on a workaround for this for our AMI building process but an official fix would be nice.

Just remove the file, it doesn't even come from upstream, but since it has been in this repository since version 1.0.0 I can't find anything in the git history regarding *why* it was added.

Revision history for this message
Sam Stenvall (negge) wrote :

I spun up some of our older AMIs (built a few months ago) and this package is not installed on them. Either this package should not be included by default (IMO a sane decision since I assume the majority of people don't hibernate their EC2 instances) or the HandlePowerKey override should be removed.

Based on https://github.com/aws/ec2-hibernate-linux-agent/issues/10, what actually happens when you hibernate an instance on AWS is "Suspend key pressed", so I don't know why you're messing with the power button in the first place.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ec2-hibinit-agent (Ubuntu):
status: New → Confirmed
tags: added: id-5d641c52f4f1908be9e021a8
Revision history for this message
Jake Withecombe (jakew009) wrote :

Can confirm this issue & that commenting out the line fixes it on AWS at least.

Balint Reczey (rbalint)
Changed in ec2-hibinit-agent (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Balint Reczey (rbalint)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu7

---------------
ec2-hibinit-agent (1.0.0-0ubuntu7) eoan; urgency=medium

  * Handle power button ACPI event with acpid (LP: #1840909)

 -- Balint Reczey <email address hidden> Thu, 29 Aug 2019 22:16:15 +0200

Changed in ec2-hibinit-agent (Ubuntu):
status: In Progress → Fix Released
Balint Reczey (rbalint)
description: updated
summary: - ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 instances from
+ ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 Nitro instances from
stopping normally
Balint Reczey (rbalint)
Changed in ec2-hibinit-agent (Ubuntu Xenial):
status: New → In Progress
Changed in ec2-hibinit-agent (Ubuntu Bionic):
status: New → In Progress
Changed in ec2-hibinit-agent (Ubuntu Disco):
status: New → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Sam, or anyone else affected,

Accepted ec2-hibinit-agent into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ec2-hibinit-agent/1.0.0-0ubuntu4.19.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ec2-hibinit-agent (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Changed in ec2-hibinit-agent (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello Sam, or anyone else affected,

Accepted ec2-hibinit-agent into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ec2-hibinit-agent/1.0.0-0ubuntu4~18.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ec2-hibinit-agent (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed-xenial
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello Sam, or anyone else affected,

Accepted ec2-hibinit-agent into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ec2-hibinit-agent/1.0.0-0ubuntu4~16.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Sam Stenvall (negge) wrote :

I ended up uninstalling the package since we don't use hibernation, but I trust the fix works since it's pretty clear what's going on.

Revision history for this message
Robert C Jennings (rcj) wrote :

I have created AMIs for testing with the -proposed ec2-hibinit-agent for xenial, bionic, and disco to ensure the VM stop regression has been addressed for each. Instances started on kvm and xen instance types with each AMI did correctly respond to the ACPI event and shut down when the instance was stopped from the AWS EC2 API. Marking this as verification done for each of those releases.

tags: added: verification-done verification-done-bionic verification-done-disco verification-done-xenial
removed: verification-needed verification-needed-bionic verification-needed-disco verification-needed-xenial
tags: added: id-5d6e71f0379c681008b66dc7
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu4.19.04.2

---------------
ec2-hibinit-agent (1.0.0-0ubuntu4.19.04.2) disco; urgency=medium

  * Handle power button ACPI event with acpid (LP: #1840909)

 -- Balint Reczey <email address hidden> Thu, 29 Aug 2019 22:17:07 +0200

Changed in ec2-hibinit-agent (Ubuntu Disco):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for ec2-hibinit-agent has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu4~18.04.3

---------------
ec2-hibinit-agent (1.0.0-0ubuntu4~18.04.3) bionic; urgency=medium

  * Handle power button ACPI event with acpid (LP: #1840909)

 -- Balint Reczey <email address hidden> Thu, 29 Aug 2019 22:17:07 +0200

Changed in ec2-hibinit-agent (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ec2-hibinit-agent - 1.0.0-0ubuntu4~16.04.3

---------------
ec2-hibinit-agent (1.0.0-0ubuntu4~16.04.3) xenial; urgency=medium

  * Handle power button ACPI event with acpid (LP: #1840909)

 -- Balint Reczey <email address hidden> Fri, 30 Aug 2019 15:41:21 +0200

Changed in ec2-hibinit-agent (Ubuntu Xenial):
status: Fix Committed → Fix Released
tags: added: id-5d6ff4bf7da90d142794bc75
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.