Mirantis OpenStack

Broken ceph configuration => slow glance

Bug #1576669 reported by Sergey Arkhipov on 2016-04-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mirantis OpenStack	Status tracked in 10.0.x
	10.0.x	Invalid	High	Alexei Sheplyakov	Mirantis OpenStack 10.0

Bug Description

Detailed bug description:
Comparing to MOS 8.0, Glance has serious performance degradation: during scale test execution, list_images test scenario fails every time with following message:

2016-04-29 05:40:25.258 19900 ERROR rally.task.engine TimeoutException: Rally tired waiting for Image c_rally_91bca97c_gW8Sile4:0554d016-c2a9-4907-ba46-1e7dcf43f97b to become ('ACTIVE') current status SAVING

Steps to reproduce:
1. Deploy MOS 9.0
2. Run glance/list_images.yaml test case

Expected results:
1. Test passed 100% of times

Actual result:
1. Exception above

Reproducibility:
100%

Workaround:
N/A

Description of the environment:
* 10 baremetal nodes:
   - CPU: 12 x 2.10 GHz
   - Disks: 2 drives (SSD - 80 GB, HDD - 931.5 GB), 1006.0 GB total
   - Memory: 2 x 16.0 GB, 32.0 GB total
   - NUMA topology: 1 NUMA node
* Node roles:
  - 1 ElasticSearch / Kibana node
  - 1 InfluxDB / Grafana node
  - 3 controllers (1 was is offline because of disk problems)
  - 5 computes
* Details:
  - OS: Mitaka on Ubuntu 14.04
  - Compute: KVM
  - Neutron with VLAN segmentation
  - Ceph RBD for volumes (Cinder)
  - Ceph RadosGW for objects (Swift API)
  - Ceph RBD for ephemeral volumes (Nova)
  - Ceph RBD for images (Glance)
* MOS 8.0, build 227

Additional information:
Please find diagnostic snapshot here: http://mos-scale-share.mirantis.com/env14/fuel-snapshot-2016-04-29_06-01-06.tar.xz

Tags:

Revision history for this message

Sergey Arkhipov (sarkhipov) wrote on 2016-04-29:

Rally debug log and test result Edit (25.6 KiB, application/octet-stream)

Revision history for this message

Kairat Kushaev (kkushaev) wrote on 2016-05-04:

Looks like image copying(copy-from) from the following url: http://172.16.44.5/cirros-0.3.1-x86_64-disk.img takes more than timeout time because of some reason. I'll get back with issue analysis.

Kairat Kushaev (kkushaev) on 2016-05-04

Changed in mos:
status:	New → In Progress
assignee:	MOS Glance (mos-glance) → Kairat Kushaev (kkushaev)

Revision history for this message

Kairat Kushaev (kkushaev) wrote on 2016-05-04:

Here is results of "top" on both controllers where an issue reproduced:
https://paste.mirantis.net/show/2229/
https://paste.mirantis.net/show/2230/
It looks very suspicious that some processes periodically takes almost 100 CPU on both controllers. This seems the reason why glance-api is so slow.

Revision history for this message

Kairat Kushaev (kkushaev) wrote on 2016-05-04:

Waiting confirmation from oslo.messaging and LMA guys that it is not normal to have 100% CPU busy.

Revision history for this message

Sergey Arkhipov (sarkhipov) wrote on 2016-05-04:

Waiting confirmation from oslo.messaging and LMA guys that it is not normal to have 100% CPU busy.

Revision history for this message

Sergey Arkhipov (sarkhipov) wrote on 2016-05-04:

Sorry, pasted wrong comment (see https://bugs.launchpad.net/mos/+bug/1578172)

It is ok. Some processes may have 100% CPU time but each controller has 12 CPU so it is not a cause definitely.

Roman Podoliaka (rpodolyaka) on 2016-05-04

Changed in mos:
importance:	Undecided → High
tags:	added: area-keystone
tags:	added: area-glance removed: area-keystone

Revision history for this message

Kairat Kushaev (kkushaev) wrote on 2016-05-04:

According to logs
librbd.create(ioctx, image_name, size, order, old_format=False, features=int(features)) took almost 30 seconds.
image.resize(length) + image.write(chunk, offset) took almost a minute because of some reason.

Revision history for this message

Kairat Kushaev (kkushaev) wrote on 2016-05-04:

Ceph folks,
after executing rally tests on node requests to rbd though python (create, write, resize) lib is quite slow.
Could you please help with the problem?

Changed in mos:
assignee:	Kairat Kushaev (kkushaev) → MOS Ceph (mos-ceph)

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-23:

> Please find diagnostic snapshot here: http://mos-scale-share.mirantis.com/env14/fuel-snapshot-2016-04-29_06-01-06.tar.xz

(Semi-automatic reply)

Please post the relevant data instead of 5GB tarball.

* The output of ceph -s
* Configuration of each OSD:
- main storage parameters: drive type (ssd/hdd), filesystem type and size, mount options
- journal parameters: drive type, partition size

> - 5 computes
> - 3 controllers (1 was is offline because of disk problems)

So OSDs are co-hosted with computes and/or controllers. This is a bad idea, please allocate dedicated
hardware for OSDs

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-24:

#10

ceph cluster is badly misconfigured:

- OSD journals reside on rotating hard drives within the filesystem which stores the actual data [1],
which makes bandwidth at least 2x smaller (due to lots of fsync's) and latency 4x -- 10x bigger
due to numerious seeks,
- OSDs are co-hosted with hypervisors which is a bad idea (both need quite a lot of RAM and CPU)

In order to avoid performance problems please

- Put OSD journals on SSD, use a dedicated partition/logical volume instead of a file
- Allocate nodes specifically for OSDs, that is, don't co-host hypervisors/databases/etc there

[1] From node-51/var/log/ceph/ceph-osd.0.log

2016-04-25 06:49:10.141975 7f0e6532a800 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 20: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0
2016-04-25 06:49:10.142203 7f0e6532a800 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 20: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2016-05-24:

#11

The problem is a consequence of a (badly) wrong configuration, therefore I'm closing the bug as invalid.
Feel free to reopen the bug if the issue can be reproduced with a sane ceph configuration, i.e.

- ceph-osd nodes should not run hypervisors/databases/etc
- OSD journals should reside on SSDs, raw partitions (or logical volumes) should be used as a journals

Changed in mos:
status:	In Progress → Invalid
summary:	- Glance performance degradation in 'list_images' Rally test scenario + Broken ceph configuration => slow glance

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Rally debug log and test result Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.