Ubuntu
linux package

EBS volume attachment during boot cause randomly EC2 instance to be stuck

Bug #2022329 reported by Andrea Cristalli on 2023-06-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Confirmed	Undecided	Unassigned

Bug Description

We create and deploy custom AMIs based on Ubuntu Jammy and we noticed since jammy-20230428 that randomly all the AMI based on it sometimes fail during the boot process. I can destroy and deploy again to get rid of this. The stack trace is always the same:

```
[ 849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[ 849.774999] Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[ 849.787081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 849.811223] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000
[ 849.883494] Call Trace:
[ 849.891369] <TASK>
[ 849.899306] __schedule+0x254/0x5a0
[ 849.907878] schedule+0x5d/0x100
[ 849.917136] io_schedule+0x46/0x80
[ 849.970890] blk_mq_get_tag+0x117/0x300
[ 849.976136] ? destroy_sched_domains_rcu+0x40/0x40
[ 849.981442] __blk_mq_alloc_requests+0xc4/0x1e0
[ 849.986750] blk_mq_get_new_requests+0xcc/0x190
[ 849.992185] blk_mq_submit_bio+0x1eb/0x450
[ 850.070689] __submit_bio+0xf6/0x190
[ 850.075545] submit_bio_noacct_nocheck+0xc2/0x120
[ 850.080841] submit_bio_noacct+0x209/0x560
[ 850.085654] submit_bio+0x40/0xf0
[ 850.090361] submit_bh_wbc+0x134/0x170
[ 850.094905] ll_rw_block+0xbc/0xd0
[ 850.175198] do_readahead.isra.0+0x126/0x1e0
[ 850.183531] jread+0xeb/0x100
[ 850.189648] do_one_pass+0xbb/0xb90
[ 850.193917] ? crypto_create_tfm_node+0x9a/0x120
[ 850.207511] ? crc_43+0x1e/0x1e
[ 850.211887] jbd2_journal_recover+0x8d/0x150
[ 850.272927] jbd2_journal_load+0x130/0x1f0
[ 850.280601] ext4_load_journal+0x271/0x5d0
[ 850.288540] __ext4_fill_super+0x2aa1/0x2e10
[ 850.296290] ? pointer+0x36f/0x500
[ 850.304910] ext4_fill_super+0xd3/0x280
[ 850.372470] ? ext4_fill_super+0xd3/0x280
[ 850.380637] get_tree_bdev+0x189/0x280
[ 850.384398] ? __ext4_fill_super+0x2e10/0x2e10
[ 850.388490] ext4_get_tree+0x15/0x20
[ 850.392123] vfs_get_tree+0x2a/0xd0
[ 850.395859] do_new_mount+0x184/0x2e0
[ 850.468151] path_mount+0x1f3/0x890
[ 850.471804] ? putname+0x5f/0x80
[ 850.475341] init_mount+0x5e/0x9f
[ 850.478976] do_mount_root+0x8d/0x124
[ 850.482626] mount_block_root+0xd8/0x1ea
[ 850.486368] mount_root+0x62/0x6e
[ 850.568079] prepare_namespace+0x13f/0x19e
[ 850.571984] kernel_init_freeable+0x120/0x139
[ 850.575930] ? rest_init+0xe0/0xe0
[ 850.579511] kernel_init+0x1b/0x170
[ 850.583084] ? rest_init+0xe0/0xe0
[ 850.586642] ret_from_fork+0x22/0x30
[ 850.668205] </TASK>
```
This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws.

I can easily reproduce in my dev environment where 15 EC2 instances are deployed at once, 1 or 2 of them randomly fails, next deployment will succeed.

I cannot debug more because of lacking connection, I could copy the trace above thanks to the virtual serial console. If there are some boot options I could add in order to gives more information I will happy to do that.

See original description

Tags:

Andrea Cristalli (esysc) on 2023-06-02

description:

updated

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2023-06-02: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2022329

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Andrea Cristalli (esysc) wrote on 2023-06-02:

It's not possible to attach logs due to the nature of the issue. the instances are completely isolated when it happens and i can only retrieve logs from AWS virtual serial console.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

EBS volume attachment during boot cause randomly EC2 instance to be stuck

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
linux package