Broken ceph configuration => slow glance

Bug #1576669 reported by Sergey Arkhipov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Alexei Sheplyakov

Bug Description

Detailed bug description:
Comparing to MOS 8.0, Glance has serious performance degradation: during scale test execution, list_images test scenario fails every time with following message:

2016-04-29 05:40:25.258 19900 ERROR rally.task.engine TimeoutException: Rally tired waiting for Image c_rally_91bca97c_gW8Sile4:0554d016-c2a9-4907-ba46-1e7dcf43f97b to become ('ACTIVE') current status SAVING

Steps to reproduce:
1. Deploy MOS 9.0
2. Run glance/list_images.yaml test case

Expected results:
1. Test passed 100% of times

Actual result:
1. Exception above

Reproducibility:
100%

Workaround:
N/A

Description of the environment:
* 10 baremetal nodes:
   - CPU: 12 x 2.10 GHz
   - Disks: 2 drives (SSD - 80 GB, HDD - 931.5 GB), 1006.0 GB total
   - Memory: 2 x 16.0 GB, 32.0 GB total
   - NUMA topology: 1 NUMA node
* Node roles:
  - 1 ElasticSearch / Kibana node
  - 1 InfluxDB / Grafana node
  - 3 controllers (1 was is offline because of disk problems)
  - 5 computes
* Details:
  - OS: Mitaka on Ubuntu 14.04
  - Compute: KVM
  - Neutron with VLAN segmentation
  - Ceph RBD for volumes (Cinder)
  - Ceph RadosGW for objects (Swift API)
  - Ceph RBD for ephemeral volumes (Nova)
  - Ceph RBD for images (Glance)
* MOS 8.0, build 227

Additional information:
Please find diagnostic snapshot here: http://mos-scale-share.mirantis.com/env14/fuel-snapshot-2016-04-29_06-01-06.tar.xz

Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Looks like image copying(copy-from) from the following url: http://172.16.44.5/cirros-0.3.1-x86_64-disk.img takes more than timeout time because of some reason. I'll get back with issue analysis.

Changed in mos:
status: New → In Progress
assignee: MOS Glance (mos-glance) → Kairat Kushaev (kkushaev)
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Here is results of "top" on both controllers where an issue reproduced:
https://paste.mirantis.net/show/2229/
https://paste.mirantis.net/show/2230/
It looks very suspicious that some processes periodically takes almost 100 CPU on both controllers. This seems the reason why glance-api is so slow.

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Waiting confirmation from oslo.messaging and LMA guys that it is not normal to have 100% CPU busy.

Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :

Waiting confirmation from oslo.messaging and LMA guys that it is not normal to have 100% CPU busy.

Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :

Sorry, pasted wrong comment (see https://bugs.launchpad.net/mos/+bug/1578172)

It is ok. Some processes may have 100% CPU time but each controller has 12 CPU so it is not a cause definitely.

Changed in mos:
importance: Undecided → High
tags: added: area-keystone
tags: added: area-glance
removed: area-keystone
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

According to logs
librbd.create(ioctx, image_name, size, order, old_format=False, features=int(features)) took almost 30 seconds.
image.resize(length) + image.write(chunk, offset) took almost a minute because of some reason.

Revision history for this message
Kairat Kushaev (kkushaev) wrote :

Ceph folks,
after executing rally tests on node requests to rbd though python (create, write, resize) lib is quite slow.
Could you please help with the problem?

Changed in mos:
assignee: Kairat Kushaev (kkushaev) → MOS Ceph (mos-ceph)
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

> Please find diagnostic snapshot here: http://mos-scale-share.mirantis.com/env14/fuel-snapshot-2016-04-29_06-01-06.tar.xz

(Semi-automatic reply)

Please post the relevant data instead of 5GB tarball.

* The output of ceph -s
* Configuration of each OSD:
  - main storage parameters: drive type (ssd/hdd), filesystem type and size, mount options
  - journal parameters: drive type, partition size

> - 5 computes
> - 3 controllers (1 was is offline because of disk problems)

So OSDs are co-hosted with computes and/or controllers. This is a bad idea, please allocate dedicated
hardware for OSDs

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

ceph cluster is badly misconfigured:

- OSD journals reside on rotating hard drives within the filesystem which stores the actual data [1],
  which makes bandwidth at least 2x smaller (due to lots of fsync's) and latency 4x -- 10x bigger
  due to numerious seeks,
- OSDs are co-hosted with hypervisors which is a bad idea (both need quite a lot of RAM and CPU)

In order to avoid performance problems please

- Put OSD journals on SSD, use a dedicated partition/logical volume instead of a file
- Allocate nodes specifically for OSDs, that is, don't co-host hypervisors/databases/etc there

[1] From node-51/var/log/ceph/ceph-osd.0.log

2016-04-25 06:49:10.141975 7f0e6532a800 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 20: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0
2016-04-25 06:49:10.142203 7f0e6532a800 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 20: 2147483648 bytes, block size 4096 bytes, directio = 1, aio = 0

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The problem is a consequence of a (badly) wrong configuration, therefore I'm closing the bug as invalid.
Feel free to reopen the bug if the issue can be reproduced with a sane ceph configuration, i.e.

- ceph-osd nodes should not run hypervisors/databases/etc
- OSD journals should reside on SSDs, raw partitions (or logical volumes) should be used as a journals

Changed in mos:
status: In Progress → Invalid
summary: - Glance performance degradation in 'list_images' Rally test scenario
+ Broken ceph configuration => slow glance
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.