AWS: Add udev rule to set Instance Store device IO timeouts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init (Ubuntu) |
Won't Fix
|
Medium
|
Matthew Ruffell | ||
Xenial |
Won't Fix
|
Medium
|
Matthew Ruffell | ||
Bionic |
Won't Fix
|
Medium
|
Matthew Ruffell | ||
Disco |
Won't Fix
|
Medium
|
Matthew Ruffell | ||
Eoan |
Won't Fix
|
Medium
|
Matthew Ruffell |
Bug Description
[Impact]
AWS wish to implement per-device IO timeouts in their cloud, since currently NVMe devices only support a single global timeout, and this doesn't play well with EBS volumes, which have error recovery capabilities built into the back end, and require large timeouts, and Instance Store / ephemeral volumes, which are to be treated as local disks, which require short timeouts.
AWS have proposed a solution which is to backport the below two patches to the Ubuntu kernels:
commit 65cd1d13b880920
Author: Weiping Zhang <email address hidden>
Date: Thu Nov 29 00:04:39 2018 +0800
subject: block: add io timeout to sysfs
commit 4d25339e32a1b6e
Author: Weiping Zhang <email address hidden>
Date: Tue Apr 2 21:14:30 2019 +0800
subject: block: don't show io_timeout if driver has no timeout handler
This enables a sysfs entry in /sys/block/
Kernel commits are being tracked in LP #1841461
EBS volumes will use the default timeout as set on the kernel command line of 4294966296, and Instance Store volumes will need to use a default timeout of 30000.
AWS have suggested that we deploy the below udev rule to automatically set the io_timeout of all Instance Store volumes to 30000:
KERNEL=
This bug is to add the above udev rule to cloud-init.
[Test Case]
This requires an AWS instance that has Instance Store volumes configured, and I suggest using c5d.large instances.
I have built a test kernel for bionic linux-aws, version 4.15.0-
https:/
Install the kernel with the below:
1) sudo add-apt-repository ppa:mruffell/
2) sudo apt-get update
Modify grub to boot it, this kernel is 1043, and current is 1044, so it will likely be "1>2" in grub config:
3) sudo vim /etc/default/grub
Change GRUB_DEFAULT=0 to GRUB_DEFAULT="1>2"
4) sudo update-grub
5) reboot
Once system is up, check kernel version:
6) uname -rv
4.15.0-1043-aws #45+hf240347v20
Verify that we have two nvme disks, one EBS and one Instance Store:
7) lsblk
Should have two disks, normally nvme0 and nvme1.
See what device is what:
8) sudo udevadm info --attribute-walk /dev/nvme0
For me, ATTR{model} is "Amazon Elastic Block Store"
9) sudo udevadm info --attribute-walk /dev/nvme1
For me, ATTR{model} is "Amazon EC2 NVMe Instance Storage"
Look at the two timeouts (Note no udev rule yet):
10) cat /sys/block/
4294966296
4294966296
Now we deploy the udev rule:
Place the following line in /lib/udev/
KERNEL=
Now trigger udev rules:
11) sudo udevadm trigger
Look at the timeouts now:
12) cat /sys/block/
4294966296
30000
[Regression Potential]
Regression potential is low since we are adding a udev rule which applies only to AWS instances, and only for instances which support Instance Store devices.
The only thing being modified is the device timeout and the udev rule is robust to device reordering as it goes by model attr information.
When the udev rule is used with unpatched kernels, nothing happens since the sysfs entry does not exist, and no errors or the like are reported.
[Other Info]
cloud-init appears to carry azure specific udev rules, which makes me think that cloud-init is the right place for this requested udev rule to live.
Related branches
- Scott Moser: Needs Information
- Ryan Harper: Pending requested
-
Diff: 8 lines (+2/-0)1 file modifiedudev/66-aws-io-timeout.rules (+2/-0)
Changed in cloud-init (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Bionic): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Disco): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Eoan): | |
status: | New → In Progress |
Changed in cloud-init (Ubuntu Xenial): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in cloud-init (Ubuntu Bionic): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in cloud-init (Ubuntu Disco): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
Changed in cloud-init (Ubuntu Eoan): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
tags: | added: sts |
Changed in cloud-init (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Bionic): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Disco): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Eoan): | |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu Disco): | |
status: | In Progress → Won't Fix |
Changed in cloud-init (Ubuntu): | |
status: | In Progress → Won't Fix |
Changed in cloud-init (Ubuntu Xenial): | |
status: | In Progress → Won't Fix |
Changed in cloud-init (Ubuntu Bionic): | |
status: | In Progress → Won't Fix |
The Eoan Ermine has reached end of life, so this bug will not be fixed for that release