bionic: default io_timeout for nvme is 255 on AWS

Bug #1758466 reported by Phil Sweeney on 2018-03-24
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Undecided
Unassigned
linux (Ubuntu)
Bionic
Medium
Kamal Mostafa

Bug Description

According to AWS docs, using newer instance types that use NVME drivers (c5/m5), the io timeout should be set to maximum (ideally 4294967295).
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

It appears this is done for 16.04, but in 18.04 it is the default.

Test done by spinning up an m5.large instance in ap-southeast-2.

Ubuntu 16.04 AMI on AWS (latest AMI - 20180126):

$ cat /sys/module/nvme/parameters/io_timeout
4294967295

$ uname -r
4.4.0-1049-aws

Ubuntu 18.04 AMI on AWS (latest nightly AMI - 20180323):

$ cat /sys/module/nvme_core/parameters/io_timeout
255

$ uname -r
4.15.0-1001-aws

Perhaps as part of the move to nvme_core this got lost.
---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.8-0ubuntu10
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: N/A
DistroRelease: Ubuntu 18.04
Ec2AMI: ami-5c5d903e
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: ap-southeast-2b
Ec2InstanceType: m5.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
JournalErrors:
 -- Logs begin at Sat 2018-03-24 01:00:37 UTC, end at Sat 2018-03-24 01:01:37 UTC. --
 Mar 24 01:00:56 ip-10-1-2-86 iscsid[849]: iSCSI daemon with pid=850 started!
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Amazon EC2 m5.large
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1001-aws root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 nvme_core.io_timeout=255
ProcVersionSignature: Ubuntu 4.15.0-1001.1-aws 4.15.3
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-1001-aws N/A
 linux-backports-modules-4.15.0-1001-aws N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic ec2-images
Uname: Linux 4.15.0-1001-aws x86_64
UnreportableReason: The report belongs to a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: False
dmi.bios.date: 10/16/2017
dmi.bios.vendor: Amazon EC2
dmi.bios.version: 1.0
dmi.board.asset.tag: i-0d27cecf4877dafa7
dmi.board.vendor: Amazon EC2
dmi.chassis.asset.tag: Amazon EC2
dmi.chassis.type: 1
dmi.chassis.vendor: Amazon EC2
dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:svnAmazonEC2:pnm5.large:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:
dmi.product.name: m5.large
dmi.sys.vendor: Amazon EC2

summary: - default io_timeout for nvme is 255 on AWS
+ bionic: default io_timeout for nvme is 255 on AWS

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1758466

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic

apport information

tags: added: apport-collected ec2-images
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Phil Sweeney (3-launchpa9-9) wrote :

Can see the difference in the dmesg output

16.04:
Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-1049-aws root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 nvme.io_timeout=4294967295

18.04:
Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1001-aws root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 nvme_core.io_timeout=255

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
tags: added: kernel-da-key
Changed in linux (Ubuntu Bionic):
assignee: nobody → Kamal Mostafa (kamalmostafa)
Robert C Jennings (rcj) on 2018-03-27
no longer affects: linux (Ubuntu)
tags: added: id-5ababf4c8583f22fdc78566d
Phil Sweeney (3-launchpa9-9) wrote :

This appears to have been fixed in:

ubuntu/images-testing/hvm-ssd/ubuntu-bionic-daily-amd64-server-20180415

(confirmed was still wrong in the previous AMI before that on 20180411)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers