linux-aws: Xen: Issues with detaching volume
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Impish |
Fix Released
|
Medium
|
Tim Gardner |
Bug Description
SRU Justification
[Impact]
We are observing issue with the secondary volume stuck in detaching. This is observed with the latest Canonical, Ubuntu EKS Node OS (k8s_1.19), 20.04 LTS, amd64 focal image build on 2022-03-08 and Xen instance type( for eg : m4, t2 instance type )
AMI in eu-west-1 : ami-0f4ffbcba23
AMI in us-east-1 : ami-021feb4aa3b
When terminating an instance with a stuck volume the shutdown process is interrupted by a xen task hanging:
[ 847.895334] INFO: task xenwatch:188 blocked for more than 483 seconds.
[ 847.901573] Not tainted 5.13.0-1017-aws #19~20.04.1-Ubuntu
[ 847.907144] "echo 0 > /proc/sys/
[ 847.914462] task:xenwatch state:D stack: 0 pid: 188 ppid: 2 flags:0x00004000
[ 847.914467] Call Trace:
[ 847.914469] <TASK>
[ 847.914472] __schedule+
[ 847.914478] schedule+0x4f/0xc0
[ 847.914479] schedule_
[ 847.914482] __mutex_
[ 847.914486] __mutex_
[ 847.914487] mutex_lock+
[ 847.914489] del_gendisk+
[ 847.914493] xlvbd_release_
[ 847.914499] blkback_
[ 847.914502] xenbus_
[ 847.914507] backend_
[ 847.914510] xenwatch_
[ 847.914513] ? wait_woken+
[ 847.914517] ? test_reply.
[ 847.914520] kthread+0x12b/0x150
[ 847.914523] ? set_kthread_
[ 847.914525] ret_from_
[ 847.914531] </TASK>
this looks like it's waiting on a xen block device to be released.
Following steps used to reproduce the issue:
* Created a m4,t2(xen) instance with the latest ami for latest Canonical, Ubuntu EKS Node OS (k8s_1.19), 20.04 LTS, amd64
* Created a filesystem on volume
* Mounted volume through OS
* Unmounted volume in OS
* Detached volume from AWS console
* Volume gets stuck.
We at the internal team observed this commit upstream: https:/
[...] and a del_gendisk from the block device release
method, which will deadlock.
It has a Fixes: tag referring to a commit from 5.13 so this could be the root-cause. While testing, we observed this commit is fixing the issue.
[Test Plan]
Amazon tested
[Where things could go wrong]
Detaching volumes could fail in new and bizarre ways
[Other Info]
SF: #00331175
Changed in linux-aws (Ubuntu Impish): | |
assignee: | nobody → Tim Gardner (timg-tpi) |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux-aws (Ubuntu): | |
status: | New → Invalid |
Patches submitted: https:/ /lists. ubuntu. com/archives/ kernel- team/2022- March/129029. html