virsh start of virtual guest domain fails with internal error due to low default aio-max-nr sysctl value
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu on IBM z Systems |
Fix Released
|
Medium
|
Canonical Server | ||
kvm (Ubuntu) |
Won't Fix
|
Undecided
|
Skipper Bug Screeners | ||
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Artful |
Fix Released
|
Medium
|
Unassigned | ||
procps (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned | ||
Xenial |
Won't Fix
|
Undecided
|
Unassigned | ||
Artful |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Starting virtual guests via on Ubuntu 16.04.2 LTS installed with its KVM hypervisor on an IBM Z14 system LPAR fails on the 18th guest with the following error:
root@zm93k8:
error: Failed to start domain zs93kag70038
error: internal error: process exited while connecting to monitor: 2017-07-
The previous 17 guests started fine:
root@zm93k8# virsh start zs93kag70020
Domain zs93kag70020 started
root@zm93k8# virsh start zs93kag70021
Domain zs93kag70021 started
.
.
root@zm93k8:
Domain zs93kag70036 started
We ended up fixing the issue by adding the following line to /etc/sysctl.conf :
fs.aio-max-nr = 4194304
... then, reload the sysctl config file:
root@zm93k8:/etc# sysctl -p /etc/sysctl.conf
fs.aio-max-nr = 4194304
Now, we're able to start more guests...
root@zm93k8:/etc# virsh start zs93kag70036
Domain zs93kag70036 started
The default value was originally set to 65535:
root@zm93k8:
65536
Note, we chose the 4194304 value, because this is what our KVM on System Z hypervisor ships as its default value. Eg. on our zKVM system:
[root@zs93ka ~]# cat /proc/sys/
4194304
ubuntu@zm93k8:/etc$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
ubuntu@zm93k8:/etc$
ubuntu@zm93k8:/etc$ dpkg -s qemu-kvm |grep Version
Version: 1:2.5+dfsg-
Is something already documented for Ubuntu KVM users warning them about the low default value, and some guidance as to
how to select an appropriate value? Also, would you consider increasing the default aio-max-nr value to something much
higher, to accommodate significantly more virtual guests?
Thanks!
---uname output---
ubuntu@zm93k8:/etc$ uname -a Linux zm93k8 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:12:54 UTC 2017 s390x s390x s390x GNU/Linux
Machine Type = z14
---Debugger---
A debugger is not configured
---Steps to Reproduce---
See Problem Description.
The problem was happening a week ago, so this may not reflect that activity.
This file was collected on Aug 7, one week after we were hitting the problem. If I need to reproduce the problem and get fresh data, please let me know.
/var/log/messages doesn't exist on this system, so I provided syslog output instead.
All data have been collected too late after the problem was observed over a week ago. If you need me to reproduce the problem and get new data, please let me know. That's not a problem.
Also, we would have to make special arrangements for login access to these systems. I'm happy to run traces and data collection for you as needed. If that's not sufficient, then we'll explore log in access for you.
Thanks... - Scott G.
I was able to successfully recreate the problem and captured / attached new debug docs.
Recreate procedure:
# Started out with no virtual guests running.
ubuntu@
Id Name State
-------
# Set fs.aio-max-nr back to original Ubuntu "out of the box" value in /etc/sysctl.conf
ubuntu@zm93k8:~$ tail -1 /etc/sysctl.conf
fs.aio-max-nr = 65536
## sysctl -a shows:
fs.aio-max-nr = 4194304
## Reload sysctl.
ubuntu@zm93k8:~$ sudo sysctl -p /etc/sysctl.conf
fs.aio-max-nr = 65536
ubuntu@zm93k8:~$
ubuntu@zm93k8:~$ sudo sysctl -a |grep fs.aio-max-nr
fs.aio-max-nr = 65536
ubuntu@zm93k8:~$ cat /proc/sys/
65536
# Attempt to start more than 17 qcow2 virtual guests on the Ubuntu host. Fails on the 18th XML.
Script used to start guests..
ubuntu@
Wed Aug 23 13:21:25 EDT 2017
virsh start zs93kag70015
Domain zs93kag70015 started
Started zs93kag70015 succesfully ...
virsh start zs93kag70020
Domain zs93kag70020 started
Started zs93kag70020 succesfully ...
virsh start zs93kag70021
Domain zs93kag70021 started
Started zs93kag70021 succesfully ...
virsh start zs93kag70022
Domain zs93kag70022 started
Started zs93kag70022 succesfully ...
virsh start zs93kag70023
Domain zs93kag70023 started
Started zs93kag70023 succesfully ...
virsh start zs93kag70024
Domain zs93kag70024 started
Started zs93kag70024 succesfully ...
virsh start zs93kag70025
Domain zs93kag70025 started
Started zs93kag70025 succesfully ...
virsh start zs93kag70026
Domain zs93kag70026 started
Started zs93kag70026 succesfully ...
virsh start zs93kag70027
Domain zs93kag70027 started
Started zs93kag70027 succesfully ...
virsh start zs93kag70028
Domain zs93kag70028 started
Started zs93kag70028 succesfully ...
virsh start zs93kag70029
Domain zs93kag70029 started
Started zs93kag70029 succesfully ...
virsh start zs93kag70030
Domain zs93kag70030 started
Started zs93kag70030 succesfully ...
virsh start zs93kag70031
Domain zs93kag70031 started
Started zs93kag70031 succesfully ...
virsh start zs93kag70032
Domain zs93kag70032 started
Started zs93kag70032 succesfully ...
virsh start zs93kag70033
Domain zs93kag70033 started
Started zs93kag70033 succesfully ...
virsh start zs93kag70034
Domain zs93kag70034 started
Started zs93kag70034 succesfully ...
virsh start zs93kag70035
Domain zs93kag70035 started
Started zs93kag70035 succesfully ...
virsh start zs93kag70036
error: Failed to start domain zs93kag70036
error: internal error: process exited while connecting to monitor: 2017-08-
Exiting script ... start zs93kag70036 failed
ubuntu@
# Show that there are only 17 running guests.
ubuntu@
17
ubuntu@
Id Name State
-------
25 zs93kag70015 running
26 zs93kag70020 running
27 zs93kag70021 running
28 zs93kag70022 running
29 zs93kag70023 running
30 zs93kag70024 running
31 zs93kag70025 running
32 zs93kag70026 running
33 zs93kag70027 running
34 zs93kag70028 running
35 zs93kag70029 running
36 zs93kag70030 running
37 zs93kag70031 running
38 zs93kag70032 running
39 zs93kag70033 running
40 zs93kag70034 running
41 zs93kag70035 running
# For fun, try starting zs93kag70036 again manually.
ubuntu@
Wed Aug 23 13:27:28 EDT 2017
error: Failed to start domain zs93kag70036
error: internal error: process exited while connecting to monitor: 2017-08-
# Show the XML (they're all basically the same)...
ubuntu@
<domain type='kvm'>
<name>
<memory unit='MiB'
<currentMemory unit='MiB'
<vcpu placement=
<os>
<type arch='s390x' machine=
</os>
<clock offset='utc'/>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<disk type='file' device='disk'>
<driver name ='qemu' type='qcow2' cache='none' io='native'/>
<source file='/
<target dev='vda' bus='virtio'/>
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
<boot order='1'/>
</disk>
<interface type='network'>
<source network=
<model type='virtio'/>
<mac address=
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
</interface>
<!--
<disk type='block' device='disk'>
<driver name ='qemu' type='raw' cache='none'/>
<source dev='/dev/
<target dev='vde' bus='virtio'/>
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0005'/>
<readonly/>
</disk>
-->
<disk type='file' device='disk'>
<driver name ='qemu' type='raw' cache='none' io='native'/>
<source file='/
<target dev='vdf' bus='virtio'/>
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0006'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/
<target dev='sda' bus='scsi'/>
<readonly/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='usb' index='0' model='none'/>
<memballoon model='none'/>
<console type='pty'>
<target type='sclp' port='0'/>
</console>
</devices>
</domain>
This condition is very easy to replicate. However, we may be losing this system in the next day or two, so please let me know ASAP if you need any more data. Thank you...
- Scott G.
== Comment: #11 - Viktor Mihajlovski <email address hidden> - 2017-09-14
In order to support many KVM guests it is advisable to raise the aio-max-nr as suggested in the problem description, see also http://
affects: | linux (Ubuntu) → kvm (Ubuntu) |
Changed in ubuntu-z-systems: | |
assignee: | nobody → Canonical Server Team (canonical-server) |
Changed in ubuntu-z-systems: | |
status: | New → Confirmed |
Changed in ubuntu-z-systems: | |
status: | Confirmed → In Progress |
Changed in ubuntu-z-systems: | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
no longer affects: | linux (Ubuntu Zesty) |
no longer affects: | procps (Ubuntu Zesty) |
Changed in ubuntu-z-systems: | |
status: | In Progress → Fix Released |
Default Comment by Bridge