Activity log for bug #1840909

Date Who What changed Old value New value Message
2019-08-21 11:40:51 Sam Stenvall bug added bug
2019-08-21 11:43:08 Sam Stenvall bug watch added https://github.com/aws/ec2-hibernate-linux-agent/issues/10
2019-08-21 11:46:04 Launchpad Janitor ec2-hibinit-agent (Ubuntu): status New Confirmed
2019-08-23 17:02:03 Brian Murray bug added subscriber Brian Murray
2019-08-26 18:56:25 Balint Reczey bug added subscriber Balint Reczey
2019-08-27 12:25:19 Francis Ginther tags id-5d641c52f4f1908be9e021a8
2019-08-29 20:15:33 Balint Reczey ec2-hibinit-agent (Ubuntu): status Confirmed In Progress
2019-08-29 20:15:36 Balint Reczey ec2-hibinit-agent (Ubuntu): assignee Balint Reczey (rbalint)
2019-08-30 12:41:10 Launchpad Janitor ec2-hibinit-agent (Ubuntu): status In Progress Fix Released
2019-08-30 14:11:56 Balint Reczey description Recently I've noticed a bunch of related issues with our AWS EC2 instances: * stopping takes forever * terminating takes forever (probably because it tries to stop first) * lots of dangling nodes in our Consul cluster Today I decided to debug what was going on. At first I thought it was something that we do to our AMIs that was the issue, but after starting a vanilla Ubuntu 18.04 official AMI (0cdab515472ca0bac to be exact) I could replicate the issue. What happens is that you get "systemd-logind[816]: Power key pressed" in the journal when you issue a Stop action against your EC2 instance. However, after that nothing happens, until 300 seconds have passed and AWS terminates your instance instead. This means nothing exits cleanly, which explains why Consul nodes are left dangling. At first I thought it was a bug in systemd-logind, until I found /usr/lib/systemd/logind.conf.d/ec2-hibinit-agent-ignore-powerkey.conf, containing: [Login] HandlePowerKey=ignore Removing this file or uncommenting the last line fixes the problem. So in effect this package completely prevents the normal shutdown mechanism from working correctly. I'm currently working on a workaround for this for our AMI building process but an official fix would be nice. Just remove the file, it doesn't even come from upstream, but since it has been in this repository since version 1.0.0 I can't find anything in the git history regarding *why* it was added. [Impact] * EC2 Nitro instances (e.g. m5.*) don't shut down when stopping is requested via an EC2 interface. [Test Case] * Start a Nitro instance, for example m5.large * Make sure that the fixed package is installed * Stop the instance from EC2 web console * Observe the instance stopping shortly. * Start the instance * Check in the systemd journal that the shutdown was performed without any issue. [Regression Potential] * The root cause of the issue is that ec2-hibinit-agent ships configuration that makes logind ignore power button to be able to handle the sleep button event, but does not handle a power button event. The fix is also handling the power button and requesting poweroff via dbus. * The change is very isolated and I tested that hibernation still works both on Xen based (c4.large) and Nitro based (m5.large) instances. Introducing other regressions with this change is not likely. [Original Bug Text] Recently I've noticed a bunch of related issues with our AWS EC2 instances: * stopping takes forever * terminating takes forever (probably because it tries to stop first) * lots of dangling nodes in our Consul cluster Today I decided to debug what was going on. At first I thought it was something that we do to our AMIs that was the issue, but after starting a vanilla Ubuntu 18.04 official AMI (0cdab515472ca0bac to be exact) I could replicate the issue. What happens is that you get "systemd-logind[816]: Power key pressed" in the journal when you issue a Stop action against your EC2 instance. However, after that nothing happens, until 300 seconds have passed and AWS terminates your instance instead. This means nothing exits cleanly, which explains why Consul nodes are left dangling. At first I thought it was a bug in systemd-logind, until I found /usr/lib/systemd/logind.conf.d/ec2-hibinit-agent-ignore-powerkey.conf, containing: [Login] HandlePowerKey=ignore Removing this file or uncommenting the last line fixes the problem. So in effect this package completely prevents the normal shutdown mechanism from working correctly. I'm currently working on a workaround for this for our AMI building process but an official fix would be nice. Just remove the file, it doesn't even come from upstream, but since it has been in this repository since version 1.0.0 I can't find anything in the git history regarding *why* it was added.
2019-08-30 14:11:59 Balint Reczey summary ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 instances from stopping normally ec2-hibinit-agent-ignore-powerkey.conf prevents EC2 Nitro instances from stopping normally
2019-08-30 14:21:41 Balint Reczey nominated for series Ubuntu Disco
2019-08-30 14:21:41 Balint Reczey bug task added ec2-hibinit-agent (Ubuntu Disco)
2019-08-30 14:21:41 Balint Reczey nominated for series Ubuntu Bionic
2019-08-30 14:21:41 Balint Reczey bug task added ec2-hibinit-agent (Ubuntu Bionic)
2019-08-30 14:21:41 Balint Reczey nominated for series Ubuntu Xenial
2019-08-30 14:21:41 Balint Reczey bug task added ec2-hibinit-agent (Ubuntu Xenial)
2019-08-30 14:21:51 Balint Reczey ec2-hibinit-agent (Ubuntu Xenial): status New In Progress
2019-08-30 14:21:57 Balint Reczey ec2-hibinit-agent (Ubuntu Bionic): status New In Progress
2019-08-30 14:22:00 Balint Reczey ec2-hibinit-agent (Ubuntu Disco): status New In Progress
2019-08-30 14:59:59 Steve Langasek ec2-hibinit-agent (Ubuntu Disco): status In Progress Fix Committed
2019-08-30 15:00:01 Steve Langasek bug added subscriber Ubuntu Stable Release Updates Team
2019-08-30 15:00:04 Steve Langasek bug added subscriber SRU Verification
2019-08-30 15:00:10 Steve Langasek tags id-5d641c52f4f1908be9e021a8 id-5d641c52f4f1908be9e021a8 verification-needed verification-needed-disco
2019-08-30 15:01:55 Steve Langasek ec2-hibinit-agent (Ubuntu Bionic): status In Progress Fix Committed
2019-08-30 15:02:01 Steve Langasek tags id-5d641c52f4f1908be9e021a8 verification-needed verification-needed-disco id-5d641c52f4f1908be9e021a8 verification-needed verification-needed-bionic verification-needed-disco
2019-08-30 15:03:17 Steve Langasek ec2-hibinit-agent (Ubuntu Xenial): status In Progress Fix Committed
2019-08-30 15:03:28 Steve Langasek tags id-5d641c52f4f1908be9e021a8 verification-needed verification-needed-bionic verification-needed-disco id-5d641c52f4f1908be9e021a8 verification-needed verification-needed-bionic verification-needed-disco verification-needed-xenial
2019-09-04 12:14:49 Robert C Jennings tags id-5d641c52f4f1908be9e021a8 verification-needed verification-needed-bionic verification-needed-disco verification-needed-xenial id-5d641c52f4f1908be9e021a8 verification-done verification-done-bionic verification-done-disco verification-done-xenial
2019-09-04 12:22:19 Francis Ginther tags id-5d641c52f4f1908be9e021a8 verification-done verification-done-bionic verification-done-disco verification-done-xenial id-5d641c52f4f1908be9e021a8 id-5d6e71f0379c681008b66dc7 verification-done verification-done-bionic verification-done-disco verification-done-xenial
2019-09-05 11:11:49 Launchpad Janitor ec2-hibinit-agent (Ubuntu Disco): status Fix Committed Fix Released
2019-09-05 11:11:53 Ɓukasz Zemczak removed subscriber Ubuntu Stable Release Updates Team
2019-09-05 11:12:03 Launchpad Janitor ec2-hibinit-agent (Ubuntu Bionic): status Fix Committed Fix Released
2019-09-05 11:12:15 Launchpad Janitor ec2-hibinit-agent (Ubuntu Xenial): status Fix Committed Fix Released
2019-09-06 12:20:54 Francis Ginther tags id-5d641c52f4f1908be9e021a8 id-5d6e71f0379c681008b66dc7 verification-done verification-done-bionic verification-done-disco verification-done-xenial id-5d641c52f4f1908be9e021a8 id-5d6e71f0379c681008b66dc7 id-5d6ff4bf7da90d142794bc75 verification-done verification-done-bionic verification-done-disco verification-done-xenial