Ubuntu Openstack Mitaka can only deploy up to 52 instances

Bug #1644785 reported by bugproxy
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Invalid
Critical
Billy Olsen
openstack (Ubuntu)
Invalid
Undecided
Skipper Bug Screeners

Bug Description

Ubuntu Openstack Mitaka can only deployment 52 instances on Cinder LVM volumes

---uname output---
Linux ub01 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:14:35 UTC 2016 s390x s390x s390x GNU/Linux

Machine Type = LinuxONE

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 Deploy instances via horizon portal.
1. Select to deploy to volume
2. Specify volume size and select "yes" for deleting volume when delete instance.
3. After deploy up to 52 instances, we get error messages for the 53th deployment. And that deployment failure.

Revision history for this message
bugproxy (bugproxy) wrote : Detailed Problem Description

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-149089 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → openstack (Ubuntu)
bugproxy (bugproxy)
tags: added: targetmilestone-inin16041
removed: targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (3.6 KiB)

------- Comment From <email address hidden> 2016-11-28 09:37 EDT-------
I found something new today. Just for your reference.

When the deployment gets failure. I can see below info in the syslog. It shows io_setup failed from multipathd.
Nov 27 22:24:31 ub01 kernel: [561221.664513] scsi host153: iSCSI Initiator over TCP/IP
Nov 27 22:24:32 ub01 kernel: [561222.677294] scsi 153:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Nov 27 22:24:32 ub01 kernel: [561222.677684] scsi 153:0:0:0: Attached scsi generic sg142 type 12
Nov 27 22:24:32 ub01 kernel: [561222.678596] scsi 153:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Nov 27 22:24:32 ub01 kernel: [561222.679067] sd 153:0:0:1: Attached scsi generic sg143 type 0
Nov 27 22:24:32 ub01 kernel: [561222.679105] sd 153:0:0:1: [sdcm] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
Nov 27 22:24:32 ub01 kernel: [561222.679305] sd 153:0:0:1: [sdcm] Write Protect is off
Nov 27 22:24:32 ub01 kernel: [561222.679308] sd 153:0:0:1: [sdcm] Mode Sense: 69 00 10 08
Nov 27 22:24:32 ub01 kernel: [561222.679393] sd 153:0:0:1: [sdcm] Write cache: enabled, read cache: enabled, supports DPO and FUA
Nov 27 22:24:32 ub01 kernel: [561222.680868] sdcm: sdcm1
Nov 27 22:24:32 ub01 kernel: [561222.681803] sd 153:0:0:1: [sdcm] Attached SCSI disk
Nov 27 22:24:32 ub01 multipathd[179286]: sdcm: add path (uevent)
Nov 27 22:24:32 ub01 multipathd[179286]: io_setup failed
Nov 27 22:24:32 ub01 multipathd[179286]: uevent trigger error
Nov 27 22:24:32 ub01 systemd-udevd[251528]: Could not generate persistent MAC address for qbrb21abf90-f6: No such file or directory
Nov 27 22:24:33 ub01 systemd-udevd[251596]: Could not generate persistent MAC address for qvob21abf90-f6: No such file or directory
Nov 27 22:24:33 ub01 systemd-udevd[251597]: Could not generate persistent MAC address for qvbb21abf90-f6: No such file or directory
Nov 27 22:24:33 ub01 iscsid: Connection150:0 to [target: iqn.2010-10.org.openstack:volume-54921f21-d2e1-49d6-a39c-4c21c458750b, portal: 148.100.42.50,3260] through [iface: default] is operational now
Nov 27 22:24:33 ub01 kernel: [561223.395321] IPv6: ADDRCONF(NETDEV_UP): qvbb21abf90-f6: link is not ready
Nov 27 22:24:33 ub01 kernel: [561223.465740] device qvbb21abf90-f6 entered promiscuous mode
Nov 27 22:24:33 ub01 kernel: [561223.613720] IPv6: ADDRCONF(NETDEV_CHANGE): qvbb21abf90-f6: link becomes ready
Nov 27 22:24:33 ub01 kernel: [561223.687587] device qvob21abf90-f6 entered promiscuous mode
Nov 27 22:24:33 ub01 kernel: [561223.919008] qbrb21abf90-f6: port 1(qvbb21abf90-f6) entered forwarding state
Nov 27 22:24:33 ub01 kernel: [561223.919014] qbrb21abf90-f6: port 1(qvbb21abf90-f6) entered forwarding state
Nov 27 22:24:33 ub01 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=120 -- --if-exists del-port qvob21abf90-f6 -- add-port br-int qvob21abf90-f6 -- set Interface qvob21abf90-f6 external-ids:iface-id=b21abf90-f649-4501-ae8f-f0fffa44090b external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:5d:f3:1d external-ids:vm-uuid=fc797a01-9f8d-4bbd-bd93-c14eb14b880b

And from the multipath -ll output. I see some below messages.

Nov 28 09:20:4...

Read more...

Frank Heimes (fheimes)
tags: added: openstack-ibm
Ryan Beisner (1chb1n)
tags: added: s390x uosci
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Can we please get juju status output from this deployment? That, plus the bundle used to deploy, will be helpful in understanding which applications are where, and how they are configured. Thank you.

Revision history for this message
Frank Heimes (fheimes) wrote :

... and please also the status about the disk utilization before and after you hit that situation?
Thx

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-29 20:16 EDT-------
(In reply to comment #10)
> Can we please get juju status output from this deployment? That, plus the
> bundle used to deploy, will be helpful in understanding which services are
> where, and how they are configured. Thank you.
>
The openstack is not installed via juju. We manually installed the openstack components.

> Can we please get juju status output from this deployment? That, plus the
> bundle used to deploy, will be helpful in understanding which applications
> are where, and how they are configured. Thank you.
>
> ... and please also the status about the disk utilization before and after
> you hit that situation?
> Thx

Before I hit the situation, below is the disk utilization. cloudvg is the LVM for cinder.
VG #PV #LV #SN Attr VSize VFree
cloudvg 10 48 0 wz--n- 10.00t 8.83t
ub01-vg 2 2 0 wz--n- 199.52g 1020.00m

When I get the problem, below is the disk utilization.
VG #PV #LV #SN Attr VSize VFree
cloudvg 10 57 0 wz--n- 10.00t 8.65t
ub01-vg 2 2 0 wz--n- 199.52g 1020.00m

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-11-30 04:17 EDT-------
Raised Priority to "Block" to highlight the urgency!

tags: added: severity-critical
removed: severity-high
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Ryan Beisner (1chb1n)
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Before I can do much with this, I need more information.

How were the applications/services deployed? Please provide a Juju bundle and Juju status output, sanitized.

What are the specs of the machine(s)?

Logs from api and compute services should be indicative. Are those available or can those be analyzed, then post relevant errors and tracebacks?

Thank you.

Changed in ubuntu-z-systems:
status: Triaged → Incomplete
Changed in openstack (Ubuntu):
status: New → Incomplete
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Apologies, I missed the comment which stated that the deployment was done manually without Juju. Unfortunately, that will make it very difficult to try to reproduce, and reinforces the need for logs to triage.

Revision history for this message
bugproxy (bugproxy) wrote : cinder log

------- Comment (attachment only) From <email address hidden> 2016-11-30 08:38 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : nova log

------- Comment (attachment only) From <email address hidden> 2016-11-30 08:39 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-30 08:49 EDT-------
(In reply to comment #14)
> Before I can do much with this, I need more information.
>
> How were the applications/services deployed? Please provide a Juju bundle
> and Juju status output, sanitized.
>
> What are the specs of the machine(s)?
>
> Logs from api and compute services should be indicative. Are those
> available or can those be analyzed, then post relevant errors and tracebacks?
>
> Thank you.

I have uploaded the log files. Please check the volume id b794c66f-6ed1-44ef-9223-5632969a1d5d. The is the 53th instance and deployed failed.

Revision history for this message
bugproxy (bugproxy) wrote : syslog

------- Comment (attachment only) From <email address hidden> 2016-11-30 08:40 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-11-30 08:58 EDT-------
Some info for your reference.

When the deployment failed, I noticed that the volume had been created successfully and attached to the instance. But somehow the VM cannot be booted due to the disk issue and detached from the instance. From that volume (20GB), I saw a 2.2GB partition was created, which is not supposed to be. It should be a 20GB partition. After the failure, I manually created the iscsi target and connected to an existing Ubuntu VM. I can see the file contents on this 2.2GB partition.

Revision history for this message
bugproxy (bugproxy) wrote : cinder log

------- Comment (attachment only) From <email address hidden> 2016-11-30 08:38 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : nova log

------- Comment (attachment only) From <email address hidden> 2016-11-30 08:39 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : syslog

------- Comment (attachment only) From <email address hidden> 2016-11-30 08:40 EDT-------

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Thanks for the logs. Nova conductor is reporting "NoValidHost: No valid host was found. There are not enough hosts available." when attempting to schedule and spawn an instance. That is a somewhat generic failure message. But the specific failure seems to be a backing storage issue:

2016-11-23T08:17:48.103748Z qemu-system-s390x: -drive file=/dev/disk/by-path/ip-148.100.42.50:3260-iscsi-iqn.2010-10.org.openstack:volume-cfbc521f-4d9f-4ecc-8feb-777d1e5446e1-lun-1,format=raw,if=none,id=drive-virtio-disk0,serial=cfbc521f-4d9f-4ecc-8feb-777d1e5446e1,cache=none,aio=native: The device is not writable: Bad file descriptor

...

2016-11-23 03:17:54.654 6858 WARNING nova.scheduler.utils [req-ed481e89-154a-4002-b402-30002ed6a80b 7cce79da15be4daf9189541d1d5650be 63958815625d4108970e78bacf578e32 - - -] [instance: 89240ffa-7dcf-444d-8a8f-b751cd8b5e19] Setting instance to ERROR state.
2016-11-23 03:17:58.965 6833 ERROR nova.scheduler.utils [req-ed481e89-154a-4002-b402-30002ed6a80b 7cce79da15be4daf9189541d1d5650be 63958815625d4108970e78bacf578e32 - - -] [instance: 0cb365ea-550f-4720-a630-93821d50d43b] Error from last host: ub01 (node ub01.marist.edu): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 0cb365ea-550f-4720-a630-93821d50d43b was re-scheduled: internal error: process exited while connecting to monitor: 2016-11-23T08:17:48.103748Z qemu-system-s390x: -drive file=/dev/disk/by-path/ip-148.100.42.50:3260-iscsi-iqn.2010-10.org.openstack:volume-cfbc521f-4d9f-4ecc-8feb-777d1e5446e1-lun-1,format=raw,if=none,id=drive-virtio-disk0,serial=cfbc521f-4d9f-4ecc-8feb-777d1e5446e1,cache=none,aio=native: The device is not writable: Bad file descriptor\n']
2016-11-23 03:17:59.011 6833 WARNING nova.scheduler.utils [req-ed481e89-154a-4002-b402-30002ed6a80b 7cce79da15be4daf9189541d1d5650be 63958815625d4108970e78bacf578e32 - - -] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 150, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 104, in select_destinations
    dests = self.driver.select_destinations(ctxt, spec_obj)

  File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py", line 74, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

I wonder if that is caused by one of the quota/limits in place. One thing to check would be Cinder. It has a default of 1000MB for maxTotalVolumeGigabytes.

$ cinder absolute-limits
+--------------------------+-------+
| Name | Value |
+--------------------------+-------+
| maxTotalBackupGigabytes | 1000 |
| maxTotalBackups | 10 |
| maxTotalSnapshots | 10 |
| maxTotalVolumeGigabytes | 1000 |
| maxTotalVolumes | 10 |
| totalBackupGigabytesUsed | 0 |
| totalBackupsUsed | 0 |
| totalGigabytesUsed | 0 |
| totalSnapshotsUsed | 0 |
| totalVolumesUsed | 0 |
+--------------------------+-------+

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.4 KiB)

------- Comment From <email address hidden> 2016-11-30 18:34 EDT-------
(In reply to comment #21)
> Thanks for the logs. Nova conductor is reporting "NoValidHost: No valid
> host was found. There are not enough hosts available." when attempting to
> schedule and spawn an instance. That is a somewhat generic failure message.
> But the specific failure seems to be a backing storage issue:
>
> 2016-11-23T08:17:48.103748Z qemu-system-s390x: -drive
> file=/dev/disk/by-path/ip-148.100.42.50:3260-iscsi-iqn.2010-10.org.openstack:
> volume-cfbc521f-4d9f-4ecc-8feb-777d1e5446e1-lun-1,format=raw,if=none,
> id=drive-virtio-disk0,serial=cfbc521f-4d9f-4ecc-8feb-777d1e5446e1,cache=none,
> aio=native: The device is not writable: Bad file descriptor
>
> ...
>
> 2016-11-23 03:17:54.654 6858 WARNING nova.scheduler.utils
> [req-ed481e89-154a-4002-b402-30002ed6a80b 7cce79da15be4daf9189541d1d5650be
> 63958815625d4108970e78bacf578e32 - - -] [instance:
> 89240ffa-7dcf-444d-8a8f-b751cd8b5e19] Setting instance to ERROR state.
> 2016-11-23 03:17:58.965 6833 ERROR nova.scheduler.utils
> [req-ed481e89-154a-4002-b402-30002ed6a80b 7cce79da15be4daf9189541d1d5650be
> 63958815625d4108970e78bacf578e32 - - -] [instance:
> 0cb365ea-550f-4720-a630-93821d50d43b] Error from last host: ub01 (node
> ub01.marist.edu): [u'Traceback (most recent call last):\n', u' File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in
> _do_build_and_run_instance\n filter_properties)\n', u' File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116, in
> _build_and_run_instance\n instance_uuid=instance.uuid,
> reason=six.text_type(e))\n', u'RescheduledException: Build of instance
> 0cb365ea-550f-4720-a630-93821d50d43b was re-scheduled: internal error:
> process exited while connecting to monitor: 2016-11-23T08:17:48.103748Z
> qemu-system-s390x: -drive
> file=/dev/disk/by-path/ip-148.100.42.50:3260-iscsi-iqn.2010-10.org.openstack:
> volume-cfbc521f-4d9f-4ecc-8feb-777d1e5446e1-lun-1,format=raw,if=none,
> id=drive-virtio-disk0,serial=cfbc521f-4d9f-4ecc-8feb-777d1e5446e1,cache=none,
> aio=native: The device is not writable: Bad file descriptor\n']
> 2016-11-23 03:17:59.011 6833 WARNING nova.scheduler.utils
> [req-ed481e89-154a-4002-b402-30002ed6a80b 7cce79da15be4daf9189541d1d5650be
> 63958815625d4108970e78bacf578e32 - - -] Failed to
> compute_task_build_instances: No valid host was found. There are not enough
> hosts available.
> Traceback (most recent call last):
>
> File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line
> 150, in inner
> return func(*args, **kwargs)
>
> File "/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py", line 104,
> in select_destinations
> dests = self.driver.select_destinations(ctxt, spec_obj)
>
> File "/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py",
> line 74, in select_destinations
> raise exception.NoValidHost(reason=reason)
>
> NoValidHost: No valid host was found. There are not enough hosts available.
>
> I wonder if that is caused by one of the quota/limits in place. One thing
> to check would be Cinder. It has a default of 1000MB for
> maxTotalVolumeGigabytes.
>
> $ cinder abs...

Read more...

Revision history for this message
Frank Heimes (fheimes) wrote :

Just as a side note: The background for asking about the quota was because this issue may have something to do with crossing some kind of a 1 TB boundary.
Given the fact that storage in partly calculated using different factors (e.g. 1000 or 1024) and 51 * 20480MB is surprisingly close to 1TB ...

Revision history for this message
Billy Olsen (billy-olsen) wrote :

As Ryan indicated in his previous comment, the reason that the instances are unable to launch is due to the libvirt error:

libvirtError: internal error: process exited while connecting to monitor: 2016-11-23T08:18:06.762943Z qemu-system-s390x: -drive file=/dev/disk/by-path/ip-148.100.42.50:3260-iscsi-iqn.2010-10.org.openstack:volume-f36c890c-1313-41de-b56d-991a2ece094c-lun-1,format=raw,if=none,id=drive-virtio-disk0,serial=f36c890c-1313-41de-b56d-991a2ece094c,cache=none,aio=native: The device is not writable: Bad file descriptor

This indicates that the block device mapped by the file path /dev/disk/by-path ... has a bad file descriptor. Bad file descriptor suggests to me either that the device came and went away, the device is (for some reason) visible as read-only, or that the device hasn't yet been fully attached. This error messages is provided by the qemu code and only logged when checking to ensure the block-device's file descriptor is writable.

I think it'd be useful to get a bit more data from the compute and cinder nodes. Can you collect a bunch of data regarding the system using a tool called sosreport (in the xenial archives)? It will collect various logs, metrics, and system configuration which is useful to perform diagnostics. Just make sure to run the sosreport tool with the -a command to ensure that all of the information is captured (more efficient to get it the first go 'round).

Before collecting the sosreport, it probably makes sense to increase the debug logging for the nova and qemu services prior to collecting the data. Setting logging levels for both to debug would provide lots of useful information.

Also note: the 2.2 GB partition isn't an issue AIUI. The volume is created by:
 1. downloading the image from glance
 2. expanding and writing the block-level content to the cinder volume (where content is expanded from 320 MB to 2.2 GB).
 3. When cloud-init runs on startup of the VM, the code detects the underlying disk is bigger than what is currently seen and will attempt to expand the partition and filesystem to consume the full content of the disk.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-12-01 21:35 EDT-------
(In reply to comment #24)

Thank you for looking into this. Please see my comments below.

> As Ryan indicated in his previous comment, the reason that the instances are
> unable to launch is due to the libvirt error:
>
> libvirtError: internal error: process exited while connecting to monitor:
> 2016-11-23T08:18:06.762943Z qemu-system-s390x: -drive
> file=/dev/disk/by-path/ip-148.100.42.50:3260-iscsi-iqn.2010-10.org.openstack:
> volume-f36c890c-1313-41de-b56d-991a2ece094c-lun-1,format=raw,if=none,
> id=drive-virtio-disk0,serial=f36c890c-1313-41de-b56d-991a2ece094c,cache=none,
> aio=native: The device is not writable: Bad file descriptor
>
> This indicates that the block device mapped by the file path
> /dev/disk/by-path ... has a bad file descriptor. Bad file descriptor
> suggests to me either that the device came and went away, the device is (for
> some reason) visible as read-only, or that the device hasn't yet been fully
> attached. This error messages is provided by the qemu code and only logged
> when checking to ensure the block-device's file descriptor is writable.
>

Right.. As I mentioned in the previous reply, the volume was created and attached to the instance. And then detached when it got this error.. It happened during the instance boot up after the volume was attached to.

> I think it'd be useful to get a bit more data from the compute and cinder
> nodes. Can you collect a bunch of data regarding the system using a tool
> called sosreport (in the xenial archives)? It will collect various logs,
> metrics, and system configuration which is useful to perform diagnostics.
> Just make sure to run the sosreport tool with the -a command to ensure that
> all of the information is captured (more efficient to get it the first go
> 'round).
>
> Before collecting the sosreport, it probably makes sense to increase the
> debug logging for the nova and qemu services prior to collecting the data.
> Setting logging levels for both to debug would provide lots of useful
> information.
>
> Also note: the 2.2 GB partition isn't an issue AIUI. The volume is created
> by:
> 1. downloading the image from glance
> 2. expanding and writing the block-level content to the cinder volume (where
> content is expanded from 320 MB to 2.2 GB).
> 3. When cloud-init runs on startup of the VM, the code detects the
> underlying disk is bigger than what is currently seen and will attempt to
> expand the partition and filesystem to consume the full content of the disk.

Understood.. I manually checked the volume after the deployment failed. The contents on 2.2GB partition has a full ubuntu linux file structure. So just don't know why it cannot be booted up during that time.

I have attached the sos file.

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-aa

------- Comment on attachment From <email address hidden> 2016-12-01 21:56 EDT-------

As the file is too big. I used the split command to divide it into parts.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-12-01 22:02 EDT-------
Comment on attachment 114452
sosreport-part-aa

Please ignore this as it is still too big..

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-01

------- Comment on attachment From <email address hidden> 2016-12-01 22:06 EDT-------

using split command to split the sosreport file.

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-02

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:07 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-03

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:08 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-04

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:08 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-05

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:09 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-06

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:10 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-07

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:10 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-08

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:11 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-09

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:12 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-10

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:13 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-11

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-12

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-13

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:15 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-14

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:16 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-15

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:16 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-05

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:09 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-12-15 12:23 EDT-------
An observation:

Having Cinder and Nova-compute on the same host, and using LVM and iSCSI backends might be causing some race conditions in multipath.

The syslog is littered with messages like this:
Nov 23 03:17:47 ub01 multipathd[179286]: io_setup failed
Nov 23 03:17:47 ub01 multipathd[179286]: uevent trigger error
Nov 23 03:17:47 ub01 systemd[188085]: dev-disk-by\x2dlabel-cloudimg\x2drootfs.device: Dev dev-disk-by\x2dlabel-cloudimg\x2drootfs.device appeared twice with different sysfs paths /sys/devices/platform/host16/session13/target16:0:0/16:0:0:1/block/sdav/sdav1 and /sys/devices/platform/host64/session61/target64:0:0/64:0:0:1/block/sdco/sdco1
Nov 23 03:17:47 ub01 systemd[1]: dev-disk-by\x2dlabel-cloudimg\x2drootfs.device: Dev dev-disk-by\x2dlabel-cloudimg\x2drootfs.device appeared twice with different sysfs paths /sys/devices/platform/host8/session5/target8:0:0/8:0:0:1/block/sdam/sdam1 and /sys/devices/platform/host64/session61/target64:0:0/64:0:0:1/block/sdco/sdco1
Nov 23 03:17:47 ub01 systemd[188085]: dev-disk-by\x2duuid-74722b08\x2d9998\x2d44e7\x2d8898\x2d5d5e96f303af.device: Dev dev-disk-by\x2duuid-74722b08\x2d9998\x2d44e7\x2d8898\x2d5d5e96f303af.device appeared twice with different sysfs paths /sys/devices/platform/host16/session13/target16:0:0/16:0:0:1/block/sdav/sdav1 and /sys/devices/platform/host64/session61/target64:0:0/64:0:0:1/block/sdco/sdco1
Nov 23 03:17:47 ub01 systemd[1]: dev-disk-by\x2duuid-74722b08\x2d9998\x2d44e7\x2d8898\x2d5d5e96f303af.device: Dev dev-disk-by\x2duuid-74722b08\x2d9998\x2d44e7\x2d8898\x2d5d5e96f303af.device appeared twice with different sysfs paths /sys/devices/platform/host8/session5/target8:0:0/8:0:0:1/block/sdam/sdam1 and /sys/devices/platform/host64/session61/target64:0:0/64:0:0:1/block/sdco/sdco1

No idea if this is the reason you're seeing the problem or not..

Ryan Beisner (1chb1n)
Changed in ubuntu-z-systems:
assignee: Ryan Beisner (1chb1n) → nobody
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Billy Olsen (billy-olsen)
Revision history for this message
Billy Olsen (billy-olsen) wrote :

I too noticed the:

Nov 23 03:17:47 ub01 multipathd[179286]: io_setup failed
Nov 23 03:17:47 ub01 multipathd[179286]: uevent trigger error

Errors - and they continue to persist in recent recreate attempts. I think it would be good to configure multipath such that the devices provided by Cinder are excluded. Likely the best approach would be a general blacklist of all devices and provide exclusions for the known backends that you have for other storage devices.

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-05

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:09 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-06

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:10 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-07

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:10 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-08

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:11 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-09

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:12 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-10

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:13 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-11

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-12

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-13

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:15 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-14

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:16 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-15

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:16 EDT-------

Revision history for this message
Billy Olsen (billy-olsen) wrote :

As I'm looking at the data some more, I see in the sos reports provided that there may be heavy pressure for the async I/O and that may actually be what's causing the problems here.

In the syslog, I see the following events getting reported from multipathd soon after the the device is discovered (its part of the udev event hooks I believe):

Jan 12 22:02:49 ub01 multipathd[179286]: sdch: add path (uevent)
Jan 12 22:02:49 ub01 multipathd[179286]: io_setup failed
Jan 12 22:02:49 ub01 multipathd[179286]: uevent trigger error

The io_setup is a system call into the kernel which can fail for a few different reasons, but one of the reasons it will fail is if the number of async io requests are greater than the current amount. It will error out with an EAGAIN, but there's explicit documentation in the multipath that one of the reasons that the io_setup may fail in this case is due to the settings of the fs.aio-nr-max, which is essentially the maximum number of events for all currently active aio contexts.

Checking the value collected I see these values:

# Quick check on the box today
fs.aio-max-nr = 65536
fs.aio-nr = 124928

# From the sosreport provided on 12/01
fs.aio-max-nr = 65536
fs.aio-nr = 131072

As you can see the aio-nr is well above the aio-max-nr. I'm not entirely sure what is causing this, but there are a large number of async io events queued up.

I think the plan of action should be to:

a) configure multipath to ignore the iscsi devices presented between cinder and nova
b) cleanup any errors in the current paths used by multipath, so that we know that disk i/o to the actul backend is not an issue
c) Possibly consider increasing the fs.aio-max-nr

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-16 14:38 EDT-------
Thanks for looking into this!

Regarding the multipath error, I changed the multipath configurations to put Linux instance volumes in multipath blacklist.. Then I don't see the multipath error. And I was able to deploy couple more instances successfully in this environment. However, unfortunately, when I tried to deploy more, I got the same problem. The message shows the disk is not writable during the deployment.. And I noticed that the async io is still very high (see below number.)
fs.aio-max-nr = 65536
fs.aio-nr = 131072

Then I also increased the air-max-nr to a bigger number. And I can deploy more instances successfully.. But the deployment time is also getting much longer! I think it's related to the io waiting/notification time. As you can see from below number, the aio-nr increase a lot with just one instance deployed (from 131072 to 133120).
fs.aio-max-nr = 1048576
fs.aio-nr = 133120

So I'm thinking the problem may tie to the KVM/QEMU. Maybe the qemu couldn't handle the async io properly on s390x? Or need some changes in the qemu/libvirt configuration?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-01-17 06:46 EDT-------
KVM for IBM z uses the following sysctl defaults:

fs.inotify.max_user_watches = 32768
fs.aio-max-nr = 4194304

Can you try with these values?

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-02

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:07 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-04

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:08 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-05

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:09 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-06

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:10 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-07

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:10 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-08

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:11 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-09

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:12 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-10

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:13 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-11

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-12

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-13

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:15 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-14

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:16 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-15

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:16 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : sosreport-part-12

------- Comment (attachment only) From <email address hidden> 2016-12-01 22:14 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-25 20:08 EDT-------
(In reply to comment #49)
> KVM for IBM z uses the following sysctl defaults:
>
> fs.inotify.max_user_watches = 32768
> fs.aio-max-nr = 4194304
>
> Can you try with these values?

Thank you for the suggestion!

Having set the system with these values, I can deploy way more 52 instances.. I tested up to 200+ instances.. As expected, the deployment time is increasing linearly. I think it's related to the asynchronized io increase in KVM hypervisor.

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1644785] Comment bridged from LTC Bugzilla

On 26 January 2017 at 01:17, bugproxy <email address hidden> wrote:
> ------- Comment From <email address hidden> 2017-01-25 20:08 EDT-------
> (In reply to comment #49)
>> KVM for IBM z uses the following sysctl defaults:
>>
>> fs.inotify.max_user_watches = 32768
>> fs.aio-max-nr = 4194304
>>
>> Can you try with these values?
>
> Thank you for the suggestion!
>
> Having set the system with these values, I can deploy way more 52
> instances.. I tested up to 200+ instances.. As expected, the deployment
> time is increasing linearly. I think it's related to the asynchronized
> io increase in KVM hypervisor.
>

Should these sysctl defaults be shipped in Ubuntu by default for s390x?

On my amd64 xenial desktop my max_user_watches is a lot higher (~524k
vs ~8k) than what I see on s390x zesty;
fs.aio-max-nr is set to ~65k in both. Why is the default so low or
(vice versa) why such a high number is required?

--
Regards,

Dimitri.

Revision history for this message
Frank Heimes (fheimes) wrote :

... after discussing with IBM

Changed in openstack (Ubuntu):
status: Incomplete → Invalid
Changed in ubuntu-z-systems:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.