ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 Nitro instances from stopping normally
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ec2-hibinit-agent (Ubuntu) |
Fix Released
|
Undecided
|
Balint Reczey | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Disco |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
* EC2 Nitro instances (e.g. m5.*) don't shut down when stopping is requested via an EC2 interface.
[Test Case]
* Start a Nitro instance, for example m5.large
* Make sure that the fixed package is installed
* Stop the instance from EC2 web console
* Observe the instance stopping shortly.
* Start the instance
* Check in the systemd journal that the shutdown was performed without any issue.
[Regression Potential]
* The root cause of the issue is that ec2-hibinit-agent ships configuration that makes logind ignore power button to be able to handle the sleep button event, but does not handle a power button event.
The fix is also handling the power button and requesting poweroff via dbus.
* The change is very isolated and I tested that hibernation still works both on Xen based (c4.large) and Nitro based (m5.large) instances.
Introducing other regressions with this change is not likely.
[Original Bug Text]
Recently I've noticed a bunch of related issues with our AWS EC2 instances:
* stopping takes forever
* terminating takes forever (probably because it tries to stop first)
* lots of dangling nodes in our Consul cluster
Today I decided to debug what was going on. At first I thought it was something that we do to our AMIs that was the issue, but after starting a vanilla Ubuntu 18.04 official AMI (0cdab515472ca0bac to be exact) I could replicate the issue.
What happens is that you get "systemd-
At first I thought it was a bug in systemd-logind, until I found /usr/lib/
[Login]
HandlePowerKey=
Removing this file or uncommenting the last line fixes the problem.
So in effect this package completely prevents the normal shutdown mechanism from working correctly. I'm currently working on a workaround for this for our AMI building process but an official fix would be nice.
Just remove the file, it doesn't even come from upstream, but since it has been in this repository since version 1.0.0 I can't find anything in the git history regarding *why* it was added.
tags: | added: id-5d641c52f4f1908be9e021a8 |
Changed in ec2-hibinit-agent (Ubuntu): | |
status: | Confirmed → In Progress |
assignee: | nobody → Balint Reczey (rbalint) |
description: | updated |
summary: |
- ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 instances from + ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 Nitro instances from stopping normally |
Changed in ec2-hibinit-agent (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in ec2-hibinit-agent (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in ec2-hibinit-agent (Ubuntu Disco): | |
status: | New → In Progress |
tags: | added: id-5d6e71f0379c681008b66dc7 |
tags: | added: id-5d6ff4bf7da90d142794bc75 |
I spun up some of our older AMIs (built a few months ago) and this package is not installed on them. Either this package should not be included by default (IMO a sane decision since I assume the majority of people don't hibernate their EC2 instances) or the HandlePowerKey override should be removed.
Based on https:/ /github. com/aws/ ec2-hibernate- linux-agent/ issues/ 10, what actually happens when you hibernate an instance on AWS is "Suspend key pressed", so I don't know why you're messing with the power button in the first place.