Docker container creation causes kernel oops on linux-aws 5.13.0.1028.31~20.04.22

Bug #1977919 reported by Steven Davidovitz
470
This bug affects 71 people
Affects Status Importance Assigned to Milestone
linux-aws-5.13 (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Fix Released
High
Tim Gardner
linux-azure-5.13 (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Fix Released
High
Tim Gardner
linux-gcp-5.13 (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Fix Released
High
Tim Gardner
linux-intel-iotg-5.15 (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Won't Fix
High
Tim Gardner
linux-oracle-5.13 (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Fix Released
High
Tim Gardner

Bug Description

Running the attached script on the latest AWS AMI for Ubuntu 20.04, I get a kernel panic and hard reset of the node.

[ 12.314552] VFS: Close: file count is 0
[ 12.351090] ------------[ cut here ]------------
[ 12.351093] kernel BUG at include/linux/fs.h:3104!
[ 12.355272] invalid opcode: 0000 [#1] SMP PTI
[ 12.358963] CPU: 1 PID: 863 Comm: sed Not tainted 5.13.0-1028-aws #31~20.04.1-Ubuntu
[ 12.366241] Hardware name: Amazon EC2 m5.large/, BIOS 1.0 10/16/2017
[ 12.371130] RIP: 0010:__fput+0x247/0x250
[ 12.374897] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 88 02 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
[ 12.389075] RSP: 0018:ffffb50280d9fd88 EFLAGS: 00010246
[ 12.393425] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffff9152e0716000
[ 12.398679] RDX: ffff9152cf075280 RSI: 0000000000000001 RDI: 0000000000000000
[ 12.403879] RBP: ffffb50280d9fdb0 R08: 0000000000000001 R09: ffff9152dfcba2c8
[ 12.409102] R10: ffffb50280d9fd88 R11: ffff9152d04e9d10 R12: ffff9152d04e9d00
[ 12.414333] R13: ffff9152dfcba2c8 R14: ffff9152cf0752a0 R15: ffff9152dfc2e180
[ 12.419533] FS: 0000000000000000(0000) GS:ffff9153ea900000(0000) knlGS:0000000000000000
[ 12.426937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.431506] CR2: 0000556cf30250a8 CR3: 00000000bce10006 CR4: 00000000007706e0
[ 12.436716] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 12.441941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 12.447170] PKRU: 55555554
[ 12.450355] Call Trace:
[ 12.453408] <TASK>
[ 12.456296] ____fput+0xe/0x10
[ 12.459633] task_work_run+0x70/0xb0
[ 12.463157] do_exit+0x37b/0xaf0
[ 12.466570] do_group_exit+0x43/0xb0
[ 12.470142] __x64_sys_exit_group+0x18/0x20
[ 12.473989] do_syscall_64+0x61/0xb0
[ 12.477565] ? exit_to_user_mode_prepare+0x9b/0x1c0
[ 12.481734] ? do_user_addr_fault+0x1d0/0x650
[ 12.485665] ? irqentry_exit_to_user_mode+0x9/0x20
[ 12.489790] ? irqentry_exit+0x19/0x30
[ 12.493443] ? exc_page_fault+0x8f/0x170
[ 12.497199] ? asm_exc_page_fault+0x8/0x30
[ 12.501013] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 12.505289] RIP: 0033:0x7f80d42a1bd6
[ 12.508868] Code: Unable to access opcode bytes at RIP 0x7f80d42a1bac.
[ 12.513783] RSP: 002b:00007ffe924f9ed8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 12.520897] RAX: ffffffffffffffda RBX: 00007f80d45a4740 RCX: 00007f80d42a1bd6
[ 12.526115] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 12.531328] RBP: 0000000000000000 R08: 00000000000000e7 R09: fffffffffffffe98
[ 12.536484] R10: 00007f80d3d422a0 R11: 0000000000000246 R12: 00007f80d45a4740
[ 12.541687] R13: 0000000000000002 R14: 00007f80d45ad708 R15: 0000000000000000
[ 12.546916] </TASK>
[ 12.549829] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd psmouse cryptd parport_pc input_leds parport ena serio_raw sch_fq_codel ipmi_devintf ipmi_msghandler msr drm ip_tables x_tables autofs4
[ 12.583913] ---[ end trace 77367fed4d782aa4 ]---
[ 12.587963] RIP: 0010:__fput+0x247/0x250
[ 12.591729] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 88 02 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
[ 12.605796] RSP: 0018:ffffb50280d9fd88 EFLAGS: 00010246
[ 12.610166] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffff9152e0716000
[ 12.615417] RDX: ffff9152cf075280 RSI: 0000000000000001 RDI: 0000000000000000
[ 12.620635] RBP: ffffb50280d9fdb0 R08: 0000000000000001 R09: ffff9152dfcba2c8
[ 12.625878] R10: ffffb50280d9fd88 R11: ffff9152d04e9d10 R12: ffff9152d04e9d00
[ 12.631121] R13: ffff9152dfcba2c8 R14: ffff9152cf0752a0 R15: ffff9152dfc2e180
[ 12.636358] FS: 0000000000000000(0000) GS:ffff9153ea900000(0000) knlGS:0000000000000000
[ 12.643770] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 12.648355] CR2: 0000556cf30250a8 CR3: 00000000bce10006 CR4: 00000000007706e0
[ 12.653610] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 12.658843] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 12.664076] PKRU: 55555554
[ 12.667279] Fixing recursive fault but reboot is needed!

This errors occurs on:

ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20220607 (ami-04f23e7f9aab5eab8)

# dpkg -s linux-aws
Package: linux-aws
Status: install ok installed
Priority: optional
Section: kernel
Installed-Size: 12
Maintainer: Ubuntu Kernel Team <email address hidden>
Architecture: amd64
Source: linux-meta-aws-5.13
Version: 5.13.0.1028.31~20.04.22
Provides: kernel-testing--linux-aws-5.13--full--aws, kernel-testing--linux-aws-5.13--full--preferred
Depends: linux-image-aws (= 5.13.0.1028.31~20.04.22), linux-headers-aws (= 5.13.0.1028.31~20.04.22)
Description: Complete Linux kernel for Amazon Web Services (AWS) systems.
 This package will always depend on the latest complete Linux kernel available
 for Amazon Web Services (AWS) systems.

But it works fine on:

ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20220606 (ami-078c065e38be7296e)

# dpkg -s linux-aws
Package: linux-aws
Status: install ok installed
Priority: optional
Section: kernel
Installed-Size: 12
Maintainer: Ubuntu Kernel Team <email address hidden>
Architecture: amd64
Source: linux-meta-aws-5.13
Version: 5.13.0.1025.27~20.04.20
Provides: kernel-testing--linux-aws-5.13--full--aws, kernel-testing--linux-aws-5.13--full--preferred
Depends: linux-image-aws (= 5.13.0.1025.27~20.04.20), linux-headers-aws (= 5.13.0.1025.27~20.04.20)
Description: Complete Linux kernel for Amazon Web Services (AWS) systems.
 This package will always depend on the latest complete Linux kernel available
 for Amazon Web Services (AWS) systems.

Tags: indeed
Revision history for this message
Steven Davidovitz (steven.davidovitz-ddl) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status: New → Confirmed
Revision history for this message
Roger Sikorski (rogersik) wrote (last edit ):

Can confirm it. A restore from last week 03.06.2022 on one node fixed it.

Another node i reinstalled Ubuntu 20.04 and i had still the same issue. Here i fixed it with a reinstall of Ubuntu 22.04

Revision history for this message
Peter (pagelypete) wrote :

Also can confirm - very easy to reproduce.

Revision history for this message
Alex Thomson (lxgaming) wrote :

I'm also having this issue but on Oracle Cloud (linux-oracle v5.13.0-1033.39)

Revision history for this message
Johannes Postler (johannespostler) wrote :

Google Compute Engine seems to be affected as well for Ubuntu 20.04. Using kernel 5.13.0-1030-gcp #36~20.04.1-Ubuntu

Revision history for this message
Alastair McClelland (alastairmcc) wrote :

Also seeing this on AWS Ubuntu 20.04 after an update to linux-image-aws/focal-updates 5.13.0.1028.31~20.04.22

Revision history for this message
Marvin Beckers (embik) wrote :

Perhaps this is obvious, but same thing happens when using containerd directly, without docker as intermediate.

Revision history for this message
Nigel (nigel-sim) wrote :

I believe I've got the same issue on Azure 5.13.0-1028-azure.

Revision history for this message
Samuel Gregorovič (samgre1881) wrote (last edit ):

Confirmed on AWS AMI ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20211129. We fixed it by reverting to kernel GNU/Linux 5.13.0-1025-aws x86_64, forcing GRUB to load it instead of a corrupted one.

P.S.: We faced loop rebooting and unkillable docker process. After the kernel downgrade, everything seems ok.

Revision history for this message
Roger Sikorski (rogersik) wrote :

I think it has something to do with docker network / volumes. Because with the container watchtower which doen'st use any open network ports or volumes don't make the system crashing.

Revision history for this message
Alastair McClelland (alastairmcc) wrote :

`docker run -it ubuntu bash` is enough to cause it to crash.

Revision history for this message
Szymon Lubieniecki (antares81) wrote :

Can't even build the image:
kernel:[ 221.374595] Kernel panic - not syncing: Fatal exception in interrupt

Revision history for this message
Kempsu (kneitola) wrote (last edit ):

Can confirm, one of my AWS EC2 instance running Ubuntu 20.04 is dying during reboot after installing the update. Also running docker on this instance.

Revision history for this message
John Chittum (jchittum) wrote :

We are actively working on the issue. This also affects more than the `linux-aws` kernel, as we've been able to reproduce on 5.13 versions of:

linux-oracle
linux-azure
linux-gcp
linux-aws

This appears to be confined to the latest 5.13 kernel update. We will provide more updates shortly on all kernels affected and changes

Revision history for this message
Kevin Keijzer (kkeijzer) wrote (last edit ):

This broke a lot of our AWS t2 servers running Docker, which I all had to restore by adding the root volume to a different instance and then changing /boot/grub/grub.cfg in order to boot 5.13.0-1025-aws again.

So another "I can confirm this" from me.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-gcp (Ubuntu):
status: New → Confirmed
affects: linux-meta-gcp-5.13 (Ubuntu) → linux-gcp (Ubuntu)
Revision history for this message
Gerard Kok (g-kok) wrote :

This happened to two of our instances in AWS. In the hope that this is helpful to anyone: in an attempt to avoid having to mount the root volumes on another instances, we disabled docker and containerd in the small timeframe between SSH becoming accessible and the kernel panic, by running something like this from a laptop:

while true; do
  ssh <instance> "sudo systemctl disable docker.service; sudo systemctl disable containerd.service"
done

This allowed us to revert to the previous kernel without having to mount the root volume on a different instance.

Revision history for this message
James Benkart (benkartjkb) wrote :

I have similar lernel panics launching docker-ce instances on the google cloud platform after recent ubuntu update, 20.04 LTS. 22.04 is unaffected.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-gcp (Ubuntu):
status: New → Confirmed
Revision history for this message
Francis Ginther (fginther) wrote (last edit ):

Work on this issue continues. We have identified the following impacted kernels and versions:

 focal linux-aws-5.13 5.13.0-1028.31~20.04.1
 focal linux-azure-5.13 5.13.0-1028.33~20.04.1
 focal linux-gcp-5.13 5.13.0-1030.36~20.04.1
 focal linux-oracle-5.13 5.13.0-1033.39~20.04.1

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Just tested this 5.13.0-1029.32~lp1977919.1 kernel and confirmed that it fixes the issue (doesn't crash when running the same docker container that would crash in the -1028 kernel)

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws-5.13 (Ubuntu Focal):
status: New → Confirmed
Tim Gardner (timg-tpi)
affects: linux-aws (Ubuntu) → linux-aws-5.13 (Ubuntu)
Changed in linux-gcp-5.13 (Ubuntu Focal):
status: New → Confirmed
Tim Gardner (timg-tpi)
affects: linux-gcp (Ubuntu) → linux-gcp-5.13 (Ubuntu)
Changed in linux-azure-5.13 (Ubuntu Focal):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → High
status: New → In Progress
Changed in linux-aws-5.13 (Ubuntu Focal):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → High
status: Confirmed → In Progress
Changed in linux-aws-5.13 (Ubuntu):
status: Confirmed → New
Revision history for this message
Tim Gardner (timg-tpi) wrote :

The fix commit is impish/linux 6a6dd081d512c812a937503d5949e4479340accb ("UBUNTU: SAUCE: overlayfs: prevent dereferencing struct file in ovl_vm_prfile_set()")

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws-5.13 (Ubuntu):
status: New → Confirmed
Changed in linux-azure-5.13 (Ubuntu):
status: New → Confirmed
Changed in linux-oracle-5.13 (Ubuntu Focal):
status: New → Confirmed
Changed in linux-oracle-5.13 (Ubuntu):
status: New → Confirmed
Revision history for this message
Dave Chiluk (chiluk) wrote (last edit ):

What are the chances we can remove the the affected kernels from the archives so more people don't get bit by this.

Dave Chiluk (chiluk)
tags: added: indeed
Revision history for this message
Jake Edwards (jake-edwards-fenwick) wrote (last edit ):
Download full text (9.1 KiB)

I believe I'm getting a similar issue on Azure with a linux & Docker (linux-azure) after updates last night.
Trying to bring up the docker network interface.

Adding stack trace for those looking for Azure-related kernel panic.

[ 37.662249] kernel BUG at include/linux/fs.h:3103!
[ 37.665024] invalid opcode: 0000 [#1] SMP PTI
[ 37.667710] CPU: 1 PID: 3383 Comm: id Not tainted 5.13.0-1028-azure #33~20.04.1-Ubuntu
[ 37.668464] device vethd7a96c6 entered promiscuous mode
[ 37.672439] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
[ 37.672441] RIP: 0010:__fput+0x247/0x250
[ 37.672446] Code: 00 48 85 ff 0f 84 8b fe ff ff f6 c7 40 0f 85 82 fe ff ff e8 ab 38 00 00 e9 78 fe ff ff 4c 89 f7 e8 2e 87 02 00 e9 b5 fe ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31 db 48
[ 37.672448] RSP: 0018:ffffaba0c3cdbde8 EFLAGS: 00010246
[ 37.672450] RAX: 0000000000000000 RBX: 00000000000a801d RCX: ffff99ce838a8000
[ 37.672451] RDX: ffff99ce8acf6b40 RSI: 0000000000000001 RDI: 0000000000000000
[ 37.672452] RBP: ffffaba0c3cdbe10 R08: 00000000000000a9 R09: ffff99ce8cf29d58
[ 37.672453] R10: ffffaba0c3cdbde8 R11: ffff99cea3891b10 R12: ffff99cea3891b00
[ 37.672454] R13: ffff99ce8cf29d58 R14: ffff99ce8acf6b60 R15: ffff99ce8ce95600
[ 37.672455] FS: 0000000000000000(0000) GS:ffff99cff7d00000(0000) knlGS:0000000000000000
[ 37.672456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.672458] CR2: 00005597f9210f2e CR3: 0000000230c10002 CR4: 00000000003706e0
[ 37.672459] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 37.672460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 37.693338] br: port 11(vethd7a96c6) entered blocking state
[ 37.693823] Call Trace:
[ 37.693825] <TASK>
[ 37.693827] ____fput+0xe/0x10
[ 37.696897] br: port 11(vethd7a96c6) entered forwarding state
[ 37.700943] task_work_run+0x6a/0xa0
[ 37.700947] do_exit+0x371/0xad0
[ 37.700950] do_group_exit+0x43/0xb0
[ 37.700952] __x64_sys_exit_group+0x18/0x20
[ 37.700954] do_syscall_64+0x61/0xb0
[ 37.760928] ? irqentry_exit+0x19/0x30
[ 37.763226] ? exc_page_fault+0x83/0x160
[ 37.765461] ? asm_exc_page_fault+0x8/0x30
[ 37.767666] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 37.770647] RIP: 0033:0x7f4bf2ee3f0b
[ 37.772594] Code: Unable to access opcode bytes at RIP 0x7f4bf2ee3ee1.
[ 37.776165] RSP: 002b:00007ffc10383c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 37.780576] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f4bf2ee3f0b
[ 37.784941] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 37.788989] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 37.792803] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[ 37.796666] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 37.800943] </TASK>
[ 37.802280] Modules linked in: veth xt_nat xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo ip6table_nat ip6table_filter ip6_tables xt_addrtype iptable_filter iptable_nat nf_nat br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm...

Read more...

Revision history for this message
Rob (robd003) wrote :
Download full text (13.9 KiB)

Also seeing this on AWS with t4g instances. Kernel panic:

[ 12.489272] kernel BUG at include/linux/fs.h:3104!
[ 12.490111] Internal error: Oops - BUG: 0 [#1] SMP
[ 12.490923] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce efi_pstore ena sch_fq_codel ipmi_devintf ipmi_msghandler drm ip_tables x_tables autofs4
[ 12.498762] CPU: 0 PID: 1349 Comm: id Not tainted 5.13.0-1028-aws #31~20.04.1-Ubuntu
[ 12.500092] Hardware name: Amazon EC2 t4g.micro/, BIOS 1.0 11/1/2018
[ 12.501189] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 12.502226] pc : __fput+0x240/0x248
[ 12.502844] lr : __fput+0xb0/0x248
[ 12.503451] sp : ffff80000a11baf0
[ 12.504039] x29: ffff80000a11baf0 x28: ffff000003de6c80 x27: 0000000000000000
[ 12.505287] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 12.506513] x23: ffff00000447ef00 x22: ffff00000ae0ca20 x21: 00000000000a801d
[ 12.507746] x20: ffff000001ce6020 x19: ffff000002df7500 x18: 0000000000000000
[ 12.508972] x17: 0000000000000000 x16: ffffcab3b1ed6968 x15: 0000000000000000
[ 12.510189] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 12.511411] x11: 0000000000000000 x10: 0000000000000001 x9 : ffffcab3b1ed6298
[ 12.512650] x8 : ffff00003e40b0c0 x7 : 0000000000000808 x6 : 0000000300000000
[ 12.513873] x5 : ffff000001ce6020 x4 : 0000000000008000 x3 : ffff000000f98800
[ 12.515101] x2 : ffffffffffffffff x1 : 0000000000000000 x0 : 0000000000000000
[ 12.516333] Call trace:
[ 12.516771] __fput+0x240/0x248
[ 12.517327] ____fput+0x18/0x28
[ 12.517886] task_work_run+0xc8/0x140
[ 12.518541] do_exit+0x20c/0x8e0
[ 12.519118] do_group_exit+0x4c/0xb0
[ 12.519753] __wake_up_parent+0x0/0x38
[ 12.520437] invoke_syscall+0x74/0xf0
[ 12.521085] el0_svc_common.constprop.0+0x184/0x1a8
[ 12.521939] do_el0_svc+0x2c/0x90
[ 12.522608] el0_svc+0x24/0x38
[ 12.523155] el0_sync_handler+0xb0/0xb8
[ 12.523834] el0_sync+0x19c/0x1c0
[ 12.524426] Code: 91059283 52800020 1400016f 17ffffa0 (d4210000)
[ 12.525503] ---[ end trace 8bd8624b9b8b9618 ]---
[ 12.531116] Fixing recursive fault but reboot is needed!
[ 12.537950] ------------[ cut here ]------------
[ 12.538742] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:638 rcu_eqs_enter.isra.0+0x68/0x70
[ 12.540129] Modules linked in: veth xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce efi_pstore ena sch_fq_codel ipmi_devintf ipmi_msghandler drm ip_tables x_tabl...

Revision history for this message
Rob (robd003) wrote (last edit ):

Just wondering, could we get a "run docker container" test as part of the QA process going forward before new kernels are released?

Revision history for this message
Aarni Koskela (akx) wrote :

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1967924 seems related.

"This patch is touching overlayfs, so we may see potential regressions in overlayfs." We did indeed... :)

Revision history for this message
Roger Sikorski (rogersik) wrote :

Is it possible to take this kernel back / away from repository before more system are get broken?

Revision history for this message
indy (cz172638) wrote :
Download full text (12.6 KiB)

hit same problem using podman in rootless on linux-image-5.15.0-1008-intel-iotg:
######################################################################
[ 1666.319425] ------------[ cut here ]------------
[ 1666.319433] kernel BUG at include/linux/fs.h:3082!
[ 1666.319443] invalid opcode: 0000 [#3] SMP NOPTI
[ 1666.319449] CPU: 0 PID: 17586 Comm: ls Tainted: G D 5.15.0-1008-intel-iotg #11~20.04.1-Ubuntu
[ 1666.319454] Hardware name: Dell Inc. Precision 5560/XXXXXX, BIOS 1.8.0 02/08/2022
[ 1666.319457] RIP: 0010:__fput+0x265/0x270
[ 1666.319466] Code: 00 48 85 ff 0f 84 6d fe ff ff f6 c7 40 0f 85 64 fe ff ff e8 6d 39 00 00 e9 5a fe ff ff 4c 89 f7 e8 70 96 02 00 e9 97 fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31
[ 1666.319471] RSP: 0018:ffffb3d605127d70 EFLAGS: 00010246
[ 1666.319477] RAX: 0000000000000000 RBX: 00000000000a801d RCX: 0000000000000000
[ 1666.319480] RDX: 0000000000000000 RSI: ffffffff9ffb59f1 RDI: 0000000000000000
[ 1666.319483] RBP: ffffb3d605127d98 R08: ffff942c84c70780 R09: ffff942c8c60b520
[ 1666.319485] R10: 0000000000000010 R11: ffff9433ef5f0c40 R12: ffff942c86b08300
[ 1666.319488] R13: ffff942c8c60b520 R14: ffff942c9079d060 R15: ffff942c8a54ef00
[ 1666.319490] FS: 0000000000000000(0000) GS:ffff9433ef400000(0000) knlGS:0000000000000000
[ 1666.319494] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1666.319497] CR2: 00007ffcbd0a85c9 CR3: 0000000236f96001 CR4: 0000000000770ef0
[ 1666.319500] PKRU: 55555554
[ 1666.319503] Call Trace:
[ 1666.319505] <TASK>
[ 1666.319510] ____fput+0xe/0x10
[ 1666.319515] task_work_run+0x6d/0xb0
[ 1666.319523] exit_to_user_mode_prepare+0x1b2/0x1c0
[ 1666.319529] syscall_exit_to_user_mode+0x27/0x50
[ 1666.319536] do_syscall_64+0x69/0xc0
[ 1666.319543] ? handle_mm_fault+0xd8/0x2b0
[ 1666.319550] ? exit_to_user_mode_prepare+0x3d/0x1c0
[ 1666.319555] ? do_user_addr_fault+0x1dc/0x650
[ 1666.319560] ? irqentry_exit_to_user_mode+0x9/0x20
[ 1666.319565] ? irqentry_exit+0x19/0x30
[ 1666.319569] ? exc_page_fault+0x89/0x160
[ 1666.319573] ? asm_exc_page_fault+0x8/0x30
[ 1666.319580] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1666.319585] RIP: 0033:0x60003530
[ 1666.319592] Code: Unable to access opcode bytes at RIP 0x60003506.
[ 1666.319595] RSP: 002b:00007ffcbd0a83e0 EFLAGS: 00000200 ORIG_RAX: 000000000000003b
[ 1666.319599] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1666.319602] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1666.319604] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1666.319606] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1666.319608] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1666.319612] </TASK>
[ 1666.319614] Modules linked in: overlay uhid rfcomm ccm snd_hda_codec_hdmi cmac algif_hash algif_skcipher af_alg bnep binfmt_misc joydev snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_ctl_led snd_sof_xtensa_dsp snd_sof snd_hda_codec_realtek snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec_generic snd_soc_acpi_intel_match...

Revision history for this message
indy (cz172638) wrote :
Download full text (6.4 KiB)

also present in linux-image-5.15.0-1008-intel-iotg:
##################################################
[ 1666.319425] ------------[ cut here ]------------
[ 1666.319433] kernel BUG at include/linux/fs.h:3082!
[ 1666.319443] invalid opcode: 0000 [#3] SMP NOPTI
[ 1666.319449] CPU: 0 PID: 17586 Comm: ls Tainted: G D 5.15.0-1008-intel-iotg #11~20.04.1-Ubuntu
[ 1666.319454] Hardware name: Dell Inc. Precision 5560/XXXXXX, BIOS 1.8.0 02/08/2022
[ 1666.319457] RIP: 0010:__fput+0x265/0x270
[ 1666.319466] Code: 00 48 85 ff 0f 84 6d fe ff ff f6 c7 40 0f 85 64 fe ff ff e8 6d 39 00 00 e9 5a fe ff ff 4c 89 f7 e8 70 96 02 00 e9 97 fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 31
[ 1666.319471] RSP: 0018:ffffb3d605127d70 EFLAGS: 00010246
[ 1666.319477] RAX: 0000000000000000 RBX: 00000000000a801d RCX: 0000000000000000
[ 1666.319480] RDX: 0000000000000000 RSI: ffffffff9ffb59f1 RDI: 0000000000000000
[ 1666.319483] RBP: ffffb3d605127d98 R08: ffff942c84c70780 R09: ffff942c8c60b520
[ 1666.319485] R10: 0000000000000010 R11: ffff9433ef5f0c40 R12: ffff942c86b08300
[ 1666.319488] R13: ffff942c8c60b520 R14: ffff942c9079d060 R15: ffff942c8a54ef00
[ 1666.319490] FS: 0000000000000000(0000) GS:ffff9433ef400000(0000) knlGS:0000000000000000
[ 1666.319494] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1666.319497] CR2: 00007ffcbd0a85c9 CR3: 0000000236f96001 CR4: 0000000000770ef0
[ 1666.319500] PKRU: 55555554
[ 1666.319503] Call Trace:
[ 1666.319505] <TASK>
[ 1666.319510] ____fput+0xe/0x10
[ 1666.319515] task_work_run+0x6d/0xb0
[ 1666.319523] exit_to_user_mode_prepare+0x1b2/0x1c0
[ 1666.319529] syscall_exit_to_user_mode+0x27/0x50
[ 1666.319536] do_syscall_64+0x69/0xc0
[ 1666.319543] ? handle_mm_fault+0xd8/0x2b0
[ 1666.319550] ? exit_to_user_mode_prepare+0x3d/0x1c0
[ 1666.319555] ? do_user_addr_fault+0x1dc/0x650
[ 1666.319560] ? irqentry_exit_to_user_mode+0x9/0x20
[ 1666.319565] ? irqentry_exit+0x19/0x30
[ 1666.319569] ? exc_page_fault+0x89/0x160
[ 1666.319573] ? asm_exc_page_fault+0x8/0x30
[ 1666.319580] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1666.319585] RIP: 0033:0x60003530
[ 1666.319592] Code: Unable to access opcode bytes at RIP 0x60003506.
[ 1666.319595] RSP: 002b:00007ffcbd0a83e0 EFLAGS: 00000200 ORIG_RAX: 000000000000003b
[ 1666.319599] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1666.319602] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1666.319604] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1666.319606] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1666.319608] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1666.319612] </TASK>
[ 1666.319614] Modules linked in: overlay uhid rfcomm ccm snd_hda_codec_hdmi cmac algif_hash algif_skcipher af_alg bnep binfmt_misc joydev snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_ctl_led snd_sof_xtensa_dsp snd_sof snd_hda_codec_realtek snd_soc_hdac_hda snd_hda_ext_core snd_hda_codec_generic snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_comp...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-intel-iotg-5.15 (Ubuntu Focal):
status: New → Confirmed
Changed in linux-intel-iotg-5.15 (Ubuntu):
status: New → Confirmed
Revision history for this message
Electric Daemon (electricdaemon) wrote :

Test kernel posted fixes crash but has another bug with unkillable stuck defunct docker-proxy service causing more issues. Bug is not solved. Tested on Linux AWS Lightsail instance.

Revision history for this message
lilideng (lilideng) wrote (last edit ):

below kernels on azure have this issue. please hold on the new images which contain these kernel releases. thanks.
focal/linux-azure-5.13: 5.13.0-1026-azure bad
focal/linux-azure-5.13: 5.13.0-1025-azure good

focal/linux-azure-5.15: 5.15.0-1008-azure bad
focal/linux-azure-5.15: 5.15.0-1007-azure good

Revision history for this message
Tim Gardner (timg-tpi) wrote :

@electricdaemon - please start a new bug report with sufficient detail that someone can diagnose the problem. Is this a regression from previous versions ?

Revision history for this message
Aarni Koskela (akx) wrote :

@timg-tpi Yes, in https://bugs.launchpad.net/bugs/1977973 I found 5.13.0-1027-gcp to work fine.

Revision history for this message
Northwest Nodes (northwestnodes) wrote :

We can confirm on: 5.13.0-1028-azure

Revision history for this message
Sebastián García Rojas (sebagr) wrote :

I can confirm going back to 5.13.0-1027-gcp from 5.13.0-1030-gcp fixed it for me.

Revision history for this message
indy (cz172638) wrote :

linux-intel-iotg-5.15:
 5.15.0-1003 good
 5.15.0-1008 bad

also reproducer (using podman) is smaller:

podman run --rm -it alpine:3.16 ls

which knocks down system
versus

podman run --rm -it busybox ls

which doesn't

Revision history for this message
andersonrfs (andersonrfsilva) wrote :

Bug confirmed on Oracle Cloud running Ubuntu 20.04.4 Kernel 5.13.0-1033-oracle.

Workaround with ssh by Gerard(g-kok) works. thx

Tim Gardner (timg-tpi)
Changed in linux-oracle-5.13 (Ubuntu Focal):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → High
status: Confirmed → Fix Committed
Changed in linux-aws-5.13 (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux-azure-5.13 (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux-gcp-5.13 (Ubuntu Focal):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → High
status: Confirmed → Fix Committed
Changed in linux-intel-iotg-5.15 (Ubuntu Focal):
assignee: nobody → Tim Gardner (timg-tpi)
importance: Undecided → High
status: Confirmed → Fix Committed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

@cz172638 - we are aware that the 5.15 Focal backport kernels have this issue as well. However, since 5.15 is not the default -edge kernel yet, it will have to wait until the next SRU cycle due for release 20-June, 2022.

Changed in linux-intel-iotg-5.15 (Ubuntu Focal):
status: Fix Committed → Won't Fix
Revision history for this message
Connor Riley (ctriley) wrote :

Sorry for my ignorance of the software development procedure for Ubuntu, but now that this fix has been committed, how long until it is available via apt on the normal release channels?

Revision history for this message
Bill B. (n1sni) wrote :

So this was painful for us. AWS hosted server running Ubuntu 20.04.4 LTS. Just for others, here are the steps we took thanks to the other comments here:

We had to force shut down the machine and wait (aws console). Then we got this script running, and started the machine back up:

while true; do
  ssh -o ConnectTimeout=2 -i <.pem file> ubuntu@<host> "sudo systemctl disable docker.service; sudo systemctl disable containerd.service"
done

---
Now, ssh into machine normally, and this is what we ran in our case:
sudo -i
cd /boot/grub
grep -Ei 'submenu|menuentry ' /boot/grub/grub.cfg | sed -re "s/(.? )'([^']+)'.*/\1 \2/"
The result on mine was:
---
menuentry Ubuntu
submenu Advanced options for Ubuntu
        menuentry Ubuntu, with Linux 5.13.0-1028-aws # <-- BAD BAD BAD
        menuentry Ubuntu, with Linux 5.13.0-1028-aws (recovery mode)
        menuentry Ubuntu, with Linux 5.13.0-1025-aws # <-- what we want
        menuentry Ubuntu, with Linux 5.13.0-1025-aws (recovery mode)
        menuentry Ubuntu, with Linux 5.4.0-1029-aws
        menuentry Ubuntu, with Linux 5.4.0-1029-aws (recovery mode)
menuentry Ubuntu 20.04.4 LTS (20.04) (on /dev/nvme0n1p1)
submenu Advanced options for Ubuntu 20.04.4 LTS (20.04) (on /dev/nvme0n1p1)
        menuentry Ubuntu (on /dev/nvme0n1p1)
        menuentry Ubuntu, with Linux 5.13.0-1028-aws (on /dev/nvme0n1p1)
        menuentry Ubuntu, with Linux 5.13.0-1028-aws (recovery mode) (on /dev/nvme0n1p1)
        menuentry Ubuntu, with Linux 5.13.0-1025-aws (on /dev/nvme0n1p1)
        menuentry Ubuntu, with Linux 5.13.0-1025-aws (recovery mode) (on /dev/nvme0n1p1)
        menuentry Ubuntu, with Linux 5.4.0-1029-aws (on /dev/nvme0n1p1)
        menuentry Ubuntu, with Linux 5.4.0-1029-aws (recovery mode) (on /dev/nvme0n1p1)
---
So we wanted off 1028, and back to 1025. Edited:
vi /etc/default/grub
changed:
GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.13.0-1025-aws"

NOTE: the first half is the "MENUENTRY" from above, then ">" and then the submenu.
Saved

run:
grub-mkconfig -o /boot/grub/grub.cfg
sudo reboot

then, after reboot:

sudo systemctl enable docker.service; sudo systemctl enable containerd.service
sudo reboot

Should be back up! It seems like upgrading to the newest kernel also helps based on the above, but will try that later.

Hope this helps someone like you all helped us!!

Revision history for this message
BenC (wiq-dev-bc) wrote :

@n1sni - thank you for your post.

With 5.13.0-1028-aws I could only run hello-world without killing the host.

Reverting back to 5.13.0-1025-aws from 5.13.0-1028-aws I can now run our build containers without problems.

Revision history for this message
Erik Kristensen (unhandledexception) wrote :

I would like to echo earlier comments, I think that all affected kernel packages should be pulled from the APT repositories, I also think that all cloud images built with the bad kernel should be pulled too.

Revision history for this message
jd (jeff-dyke) wrote :

@n1sni - wanted to extend my thanks as well, but on ubuntu 20.04 that settings was not present in /etc/default/grub, so i had to uninstall 1028 and install 1025. After adding that setting, and reloading and rebooting the change didn't take place, hence the reinstall. Going to make an AMI, until this fix is released.

sudo apt remove linux-image-5.13.0-1028-aws linux-image-aws -y
Then
sudo apt install -y linux-image-5.13.0-1025-aws
Then reboot. If you reboot between you won't have a kernel and won't be able to reboot.

Revision history for this message
Adam-morey (adam-morey) wrote :

For Debian and Ubuntu, I used "sudo grub-reboot 2", which forces grub menu 2's kernel on next reboot. Once rebooted, use "dpkg-l | grep 1028" and apt remove each package relate to kernel 1028. Apt will also update grub for you.

Don't forget to uninstall or "break" unattended-upgrades, which is how my server got the new kernel in the first place.

Revision history for this message
Connor Riley (ctriley) wrote :

On GCP the fix hit apt. So the easiest way to fix now is simply `sudo apt update && sudo apt upgrade`

Revision history for this message
Francis Ginther (fginther) wrote :

Updated kernels are in flight. The updated kernel packages and versions are:

linux-aws-5.13 - 5.13.0-1029.32~20.04.1
linux-azure-5.13 - 5.13.0-1029.34~20.04.1
linux-gcp-5.13 - 5.13.0-1031.37~20.04.1
linux-oracle-5.13 - 5.13.0-1034.40~20.04.1

The azure and gcp kernels are already in focal-updates. The aws kernel is in focal-proposed and the oracle kernel should be there very soon.

Revision history for this message
dan the person (dantheperson) wrote (last edit ):

For those who can't update, because the machine starts docker at startup and so crashes before you can get a shell open to upgrade to 1031, here's my method (on gcp)

stop and edit machine to detach disk
attach to another machine boot that and mount somewhere
edit <mountpath>/boot/grub/grub.cfg and add single as a kernel commandline parameter
shutdown temp box and detach disk
reattach disk to original machine and boot
connect via serial console
sudo systemctl disable docker
remove single from grub.cfg and reboot
ssh in and update to latest kernel and reboot
sudo systemctl enable docker

Revision history for this message
Erik Forsberg (forsberg) wrote (last edit ):

I had limited success with "grub-reboot 2", but the following worked fine for me on an AWS EC2 running Ubuntu 20.04.2

sudo grub-reboot "Advanced options for Ubuntu>Ubuntu, with Linux 5.13.0-1025-aws"

Revision history for this message
dan the person (dantheperson) wrote :

i'm intrigued, how do you 'sudo grub-reboot' when the machine is crashed?

And if anyone knows how to get the grub boot menu to respond to the keyboard over the serial console on GCP that'd be great, as it would have having to attach the disk to another instance to change the boot kernel or options

Revision history for this message
Jason Campanella (atlantis-stargate) wrote :

Not sure if it will work on GCP but in Azure you hold escape to get into Grub while the system is booting.

Revision history for this message
Erik Forsberg (forsberg) wrote :

The ability to do 'sudo grub-reboot' depends on the use-case. In my case, the docker jobs were started via crontab, and the machine didn't crash completely, so I was able to login.

Revision history for this message
Bernardo Hugo Signori (bernardos) wrote :

In Oracle Cloud you can start a cloud shell console connection then force reboot the instance and in the console press esc, in the Grub menu select the previous kernel. I was able to boot with kernel 5.13.0-1030-oracle without panics.

Revision history for this message
Francis Ginther (fginther) wrote :

All of the updated 5.13 kernels have now made it to the archive and into both the focal-updates and focal-security pockets. That list of kernels is:

linux-aws-5.13 - 5.13.0-1029.32~20.04.1
linux-azure-5.13 - 5.13.0-1029.34~20.04.1
linux-gcp-5.13 - 5.13.0-1031.37~20.04.1
linux-oracle-5.13 - 5.13.0-1034.40~20.04.1

Revision history for this message
Matthew Lenz (matthew-nocturnal) wrote :

How did people fix this on aws instances that have no serial console access? assuming the disk was mounted and grub.cfg was edited. what did you change in the grub.cfg?

Revision history for this message
Podesta (podesta) wrote :

Fixed kernel works like a charm.

@matthew-nocturnal you have to change the default GRUB that loads, so it is on /etc/default/grub. There you change the DEFAULT_GRUB with another one, as has been pointed out in the previous messages. But now you can simply run apt update / upgrade and it should get the latest kernel. If you can't access the machine to do this, you can either use a rescue machine, and do it with chroot, or try to disable docker before it crashes.

Revision history for this message
Sebastian Neumann (basti-megamorf+ubuntu-com) wrote (last edit ):
Download full text (37.8 KiB)

I can confirm that the problem is indeed not fully fixed. @electricdaemon said:

> Test kernel posted fixes crash but has another bug with unkillable stuck defunct docker-proxy service causing more issues. Bug is not solved. Tested on Linux AWS Lightsail instance.

And that's the problem that I'm seeing as well. Still gathering data for a bug report.

What I'm seeing is that docker-compose stacks either don't start at all or only start partially. In both cases the affected containers cannot start due to their host port being already allocated. I can say with absolute certainty that the ports on the host are dedicated to container applications and no other service is actually bound to the affected port numbers.

# uname -a
Linux ip-10-0-69-193 5.13.0-1029-aws #32~20.04.1-Ubuntu SMP Thu Jun 9 13:03:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

# docker-compose --version
docker-compose version 1.29.2, build 5becea4c

root@ip-10-0-69-193:/opt/myapp8/myappserv/int# docker-compose up -d
Creating network "myappserv-int_default" with the default driver
Creating myapp-migrator-int ... done
Creating myapp-dealer-int ...
Creating myapp-offer-int ...
Creating myapp-customer-int ...
Creating myapp-customer-int ... error
Creating myapp-dealer-int ... done
Creating myapp-offer-int ... done
: port is already allocated

ERROR: for customer Cannot start service customer: driver failed programming external connectivity on endpoint myapp8-customer-int (fe4112364528b0e7d192c793929c579e8a81af715118c8f83ad7e65e7397f3be): Bind for 0.0.0.0:9001 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.

root@ip-10-0-69-193:/opt/myapp8/myappserv/int# docker-compose down
Stopping myapp8-offer-int ... done
Stopping myapp8-dealer-int ... done
Removing myapp8-customer-int ... done
Removing myapp8-offer-int ... done
Removing myapp8-dealer-int ... done
Removing myapp8-migrator-int ... done
Removing network myappserv-int_default

root@ip-10-0-69-193:/opt/myapp8/myappserv/int# docker-compose up -d
Creating network "myappserv-int_default" with the default driver
Creating myapp8-migrator-int ... done
Creating myapp8-offer-int ...
Creating myapp8-customer-int ...
Creating myapp8-customer-int ... error
WARNING: Host is already in use by another container
Creating myapp8-offer-int ... done
ERROR: for myapp8-customer-int Cannot start service customer: driver failed programming external connectivity on endpoint myapp8-customer-int (72fc08854cd278e63cd3234e7fb03c08cb045efdcfb9e42075a1250d893645d5): Bind for 0.0.0.0:9001 failed
Creating myapp8-dealer-int ... done

ERROR: for customer Cannot start service customer: driver failed programming external connectivity on endpoint myapp8-customer-int (72fc08854cd278e63cd3234e7fb03c08cb045efdcfb9e42075a1250d893645d5): Bind for 0.0.0.0:9001 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.

# docker-compose config

services:
  customer:
    container_name: myapp8-customer-int
    depends_on:
      migrator:
        condition: service_completed_successfully
    image: reg.mydomain.tld/myapp8/...

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Sebastian Neumann (basti-megamorf+ubuntu-com) - please start a new bug report so that we can address your specific problem. It may or may not be related to the patch that fixed this kernel crash.

Changed in linux-aws-5.13 (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux-azure-5.13 (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux-gcp-5.13 (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux-oracle-5.13 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Sebastian Neumann (basti-megamorf+ubuntu-com) wrote :

Created a new bug report: https://bugs.launchpad.net/ubuntu/+source/linux-aws-5.13/+bug/1978475

Hopefully @electricdaemon and other affected users can help to provide a reproducible test.

Revision history for this message
dan the person (dantheperson) wrote :

@matthew-nocturnal

For me single user mode would stop docker starting, and thus avoid the crash. But if you don't have serial console, how would you then get a shell to then fix the machine?

For single user mode, find the first menuentry in grub.cfg and add single after ro

i.e change
linux /boot/vmlinuz-5.13.0-1030-gcp root=PARTUUID=3c480693-932a-4c3c-8409-1bc45cd64f32 ro console=ttyS0

to
linux /boot/vmlinuz-5.13.0-1030-gcp root=PARTUUID=3c480693-932a-4c3c-8409-1bc45cd64f32 ro single console=ttyS0

Revision history for this message
dan the person (dantheperson) wrote (last edit ):

Alternatively you can apparently just stop docker starting. instead of adding 'single' add 'systemd.mask=docker.service'

That will work better for you if you don't have serial console as then networking will still come up.

https://unix.stackexchange.com/a/176406/64349

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.