2020-02-20 14:04:32 |
Francis Ginther |
bug |
|
|
added bug |
2020-02-20 14:09:16 |
Francis Ginther |
description |
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed. |
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working. |
|
2020-02-20 15:07:07 |
Francis Ginther |
tags |
|
apport-collected bionic ec2-images |
|
2020-02-20 15:07:09 |
Francis Ginther |
description |
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working. |
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0edf3b95e26a682df
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-aws 4.15.0.1058.59
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
Tags: bionic ec2-images
Uname: Linux 5.0.0-1025-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True |
|
2020-02-20 15:07:09 |
Francis Ginther |
attachment added |
|
Dependencies.txt https://bugs.launchpad.net/bugs/1864041/+attachment/5329809/+files/Dependencies.txt |
|
2020-02-20 15:07:11 |
Francis Ginther |
attachment added |
|
ProcCpuinfoMinimal.txt https://bugs.launchpad.net/bugs/1864041/+attachment/5329810/+files/ProcCpuinfoMinimal.txt |
|
2020-02-25 18:37:15 |
Balint Reczey |
bug task added |
|
ec2-hibinit-agent (Ubuntu) |
|
2020-02-25 18:38:22 |
Balint Reczey |
nominated for series |
|
Ubuntu Bionic |
|
2020-02-25 18:38:22 |
Balint Reczey |
bug task added |
|
linux-aws (Ubuntu Bionic) |
|
2020-02-25 18:38:22 |
Balint Reczey |
bug task added |
|
ec2-hibinit-agent (Ubuntu Bionic) |
|
2020-02-25 18:38:22 |
Balint Reczey |
nominated for series |
|
Ubuntu Focal |
|
2020-02-25 18:38:22 |
Balint Reczey |
bug task added |
|
linux-aws (Ubuntu Focal) |
|
2020-02-25 18:38:22 |
Balint Reczey |
bug task added |
|
ec2-hibinit-agent (Ubuntu Focal) |
|
2020-02-25 18:38:22 |
Balint Reczey |
nominated for series |
|
Ubuntu Eoan |
|
2020-02-25 18:38:22 |
Balint Reczey |
bug task added |
|
linux-aws (Ubuntu Eoan) |
|
2020-02-25 18:38:22 |
Balint Reczey |
bug task added |
|
ec2-hibinit-agent (Ubuntu Eoan) |
|
2020-02-25 18:38:38 |
Balint Reczey |
ec2-hibinit-agent (Ubuntu Focal): status |
New |
In Progress |
|
2020-02-25 18:50:38 |
Balint Reczey |
bug |
|
|
added subscriber Balint Reczey |
2020-03-12 15:02:15 |
Launchpad Janitor |
ec2-hibinit-agent (Ubuntu Focal): status |
In Progress |
Fix Released |
|
2020-03-23 15:52:08 |
Balint Reczey |
description |
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0edf3b95e26a682df
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-aws 4.15.0.1058.59
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
Tags: bionic ec2-images
Uname: Linux 5.0.0-1025-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True |
[Impact]
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
[Test Case]
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
[Regression Potential]
The workaround in ec2-hibinit-agent is reloading the xen_netfront kernel module before restarting systemd-networkd. If the kernel module is removed (for example when hitting LP: #1615381) the module reloading fails and
the instance can not restore network connections. This is expected to a be very rare situation and the module reload is the best workaround the Kernel Team found to mitigate the original issue.
[Original Bug Text]
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0edf3b95e26a682df
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-aws 4.15.0.1058.59
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
Tags: bionic ec2-images
Uname: Linux 5.0.0-1025-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True |
|
2020-03-23 16:07:10 |
Balint Reczey |
description |
[Impact]
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
[Test Case]
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
[Regression Potential]
The workaround in ec2-hibinit-agent is reloading the xen_netfront kernel module before restarting systemd-networkd. If the kernel module is removed (for example when hitting LP: #1615381) the module reloading fails and
the instance can not restore network connections. This is expected to a be very rare situation and the module reload is the best workaround the Kernel Team found to mitigate the original issue.
[Original Bug Text]
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0edf3b95e26a682df
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-aws 4.15.0.1058.59
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
Tags: bionic ec2-images
Uname: Linux 5.0.0-1025-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True |
[Impact]
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
[Test Case]
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
[Regression Potential]
The workaround in ec2-hibinit-agent is reloading the xen_netfront kernel module before restarting systemd-networkd. If the kernel module is removed (for example when hitting LP: #1615381) the module reloading fails and
the instance can not restore network connections. This is expected to a be very rare situation and the module reload is the best workaround the Kernel Team found to mitigate the original issue.
The workaround also adds a 2 second delay before reloading the modules to let things settle a bit after resuming. The 2 seconds is very short compared to the overall time needed resuming an instance.
[Original Bug Text]
The xen_netfront device is sometimes unresponsive after a hibernate and resume event. This is limited to the c4, c5, m4, m5, r4, r5 instance families, all of which are xen based, and support hibernation.
When the issue occurrs, the instance is inaccessible without a full restart. Debugging by running a process which outputs regularly to the serial console shows that the instance is still running.
A workaround is to build the xen_netfront module separately and restart the module and networking during the resume handler. For example:
modprobe -r xen_netfront
modprobe xen_netfront
systemctl restart systemd-networkd
With this workaround in place, the unresponsive issue is no longer observed.
To reproduce this problem:
1) Launch an c4, c5, m4, m5, r4, r5 instance type with a 5.0 or 5.3 kernel with on-demand hibernation support enabled.
2) Start a long-running process which generates messages to the serial console
3) Begin observing these messages on the console (using the AWS UI or CLI to grab a screenshot).
4) Suspend and resume the instance, continuing to refresh the console screenshot.
5) The screenshot should continue to show updates even if ssh access is no longer working.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-0edf3b95e26a682df
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-west-2a
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: linux-aws 4.15.0.1058.59
PackageArchitecture: amd64
ProcEnviron:
TERM=screen
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=C.UTF-8
SHELL=/bin/bash
ProcVersionSignature: User Name 5.0.0-1025.28-aws 5.0.21
Tags: bionic ec2-images
Uname: Linux 5.0.0-1025-aws x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True |
|
2020-03-23 17:28:27 |
Łukasz Zemczak |
ec2-hibinit-agent (Ubuntu Eoan): status |
New |
Fix Committed |
|
2020-03-23 17:28:33 |
Łukasz Zemczak |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2020-03-23 17:28:35 |
Łukasz Zemczak |
bug |
|
|
added subscriber SRU Verification |
2020-03-23 17:28:41 |
Łukasz Zemczak |
tags |
apport-collected bionic ec2-images |
apport-collected bionic ec2-images verification-needed verification-needed-eoan |
|
2020-03-23 17:29:40 |
Łukasz Zemczak |
ec2-hibinit-agent (Ubuntu Bionic): status |
New |
Fix Committed |
|
2020-03-23 17:29:47 |
Łukasz Zemczak |
tags |
apport-collected bionic ec2-images verification-needed verification-needed-eoan |
apport-collected bionic ec2-images verification-needed verification-needed-bionic verification-needed-eoan |
|
2020-04-02 11:49:36 |
Francis Ginther |
tags |
apport-collected bionic ec2-images verification-needed verification-needed-bionic verification-needed-eoan |
apport-collected bionic ec2-images verification-done-bionic verification-needed verification-needed-eoan |
|
2020-04-07 19:23:56 |
Launchpad Janitor |
ec2-hibinit-agent (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-04-07 19:24:06 |
Brian Murray |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2020-04-22 12:51:11 |
Francis Ginther |
tags |
apport-collected bionic ec2-images verification-done-bionic verification-needed verification-needed-eoan |
apport-collected bionic ec2-images id-5e459f823f8a2435d44842eb verification-done-bionic verification-needed verification-needed-eoan |
|
2020-07-02 20:33:04 |
Francis Ginther |
tags |
apport-collected bionic ec2-images id-5e459f823f8a2435d44842eb verification-done-bionic verification-needed verification-needed-eoan |
apport-collected bionic ec2-images id-5e459f823f8a2435d44842eb verification-done-bionic verification-done-eoan verification-needed |
|
2020-07-07 18:24:51 |
Launchpad Janitor |
ec2-hibinit-agent (Ubuntu Eoan): status |
Fix Committed |
Fix Released |
|
2020-08-18 17:01:36 |
Brian Murray |
linux-aws (Ubuntu Eoan): status |
New |
Won't Fix |
|
2020-10-20 16:09:12 |
Balint Reczey |
nominated for series |
|
Ubuntu Xenial |
|
2020-10-20 16:09:12 |
Balint Reczey |
bug task added |
|
linux-aws (Ubuntu Xenial) |
|
2020-10-20 16:09:12 |
Balint Reczey |
bug task added |
|
ec2-hibinit-agent (Ubuntu Xenial) |
|
2020-10-20 16:09:28 |
Balint Reczey |
ec2-hibinit-agent (Ubuntu Xenial): status |
New |
Incomplete |
|
2020-10-20 19:01:30 |
Balint Reczey |
ec2-hibinit-agent (Ubuntu Xenial): status |
Incomplete |
Invalid |
|