Horizon page load 90%ile is 17.2 secs.

Bug #1616147 reported by Pavel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
In Progress
High
Pavel
9.x
Won't Fix
High
MOS Horizon

Bug Description

Detailed bug description:
Problem discovered on MOS 9.0 ISO #495 with MU1

Expected time for loading pages in Internet is 2.0 secs.
But 90%ile of page loads in Horizon dashboard is too long. Now, for example, for 500 users on the page 90%ile is up to 17.2 secs ( https://mirantis.testrail.com/index.php?/tests/view/11398651 ) and this is not acceptable for customers.
90%ile for page with 100 flavors is 8.0 secs ( https://mirantis.testrail.com/index.php?/tests/view/11398660 ).
90%ile for page with 100 images is 13.8 secs ( https://mirantis.testrail.com/index.php?/tests/view/11398645 ).

All results of test load horizon test are here https://mirantis.testrail.com/index.php?/runs/view/18250&group_by=cases:section_id&group_order=asc .

Also, sometimes server can suddenly answer to user 401 Unauthorized HTTP Code and that interrupts our automated tests scenario. How can i solve this problem? Thanks.

Steps to reproduce:
 Create 100 instances (or other items) in Horizon list item page. And refresh this page.
Expected results:
 Test passed. Page loaded for 2 secs.
Actual result:
 Test failed. Page loaded for more then 2 secs.
Reproducibility:
 100%
Workaround:
 n/a
Impact:
 unknown
Description of the environment:
 * 10 virt nodes:
    - CPU: 12 x 2.10 GHz
    - Disks: 2 drives (SSD - 80 GB, HDD - 931.5 GB), 1006.0 GB total
    - Memory: 2 x 16.0 GB, 32.0 GB total
    - NUMA topology: 1 NUMA node
 * Node roles:
   - 3 controllers
   - 7 computes
   - 3 Ceph OSD
   - 3 Telemetry - MongoDB
 * Details:
   - OS on controllers: Mitaka on Ubuntu 14.04
   - OS on computes: RHEL
   - Compute: KVM
   - Neutron with VLAN segmentation
   - Ceph RBD for volumes (Cinder)
   - Ceph RadosGW for objects (Swift API)
   - Ceph RBD for ephemeral volumes (Nova)
   - Ceph RBD for images (Glance)

Tags: area-horizon
Pavel (pshvetsov)
description: updated
Changed in mos:
status: New → Confirmed
tags: added: area-horizon
removed: horizon
Revision history for this message
Sergei Chipiga (schipiga) wrote :

Is it really horizon problem, or maybe backend services? According to testcases it's difficult to dive into. Looks like end-to-end time estimation.

Revision history for this message
Timur Sufiev (tsufiev-x) wrote :

We need measure these cases again, this time using Horizon Profiler.

Revision history for this message
Timur Sufiev (tsufiev-x) wrote :

Pavel, is 401 Unauthorized issue being raise for some specific test cases or for any of them? It might be better to (but it's up to you) file a bug for every test case separately and for 401 separately as well? As of now, the issue is too broad, there is no simple fix for that.

We definitely cannot fix it in the scope of 9.1, since fixing such an issue is more a feature request (and we passed FF already) than a bugfix - for Keystone Identity issue we still discussing the best approach in upstream, for example.

Moving to 9.2 release and setting to Incomplete for it, we need Profiler stats to estimate where the lags are coming from.

Changed in mos:
milestone: 9.1 → 9.2
status: Confirmed → Incomplete
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Pavel, please provide information requested by Timur.

Changed in mos:
assignee: MOS Horizon (mos-horizon) → Pavel (pshvetsov)
Pavel (pshvetsov)
Changed in mos:
status: Incomplete → In Progress
milestone: 9.2 → 10.0
Revision history for this message
Pavel (pshvetsov) wrote :

Now I am trying to use Horizon Profiler. I'll share detailed information about brnchmarking with this tool later.

Revision history for this message
Pavel (pshvetsov) wrote :

Updates for 9.2.

We have benchmarking information (but without profiler) on #607 build.
https://docs.google.com/document/d/1qUwb4KsfoxRiRiTSKrx3xn9r3jgl3yfrc7MscLJzw78/edit#heading=h.bro77gesw6e2
And it still painfully bad. Most test results has too big latency, as always.

I started to investigate and use Profiler Tool, but this work still in progress. Unfortunately I must use profiler on already deployed openstack cluster, not in dev environment. So i need to change some code of Horizon dashboard manually. Timur introduced me with what I need to do so I planned to continue my work on this task in february.

Revision history for this message
Pavel (pshvetsov) wrote :

Hello.

The bug.
Nikita Konovalov tested the empty stack with 10 nodes(3+7) and got this results:
https://docs.google.com/spreadsheets/d/1Ds9eKQ5uSPmZdg-NwtX1WWBHb5JE9Q9mOyQl0ZSZhN4/edit#gid=0
For example, 100 elements render on page costs 6.5 secs and 200 elements render costs 13.1 secs. I think, as a perf engineer, that its slow. So this is one more constatations of this bug for me.
As I know measuring was performing without ssl.

Test env on every note:
CPU 12 x 2.10 GHz
Memory 2 x 16.0 GB, 32.0 GB total

I must say that some engineers try to fix this problem in some way, like Anton Chevychalov on MOS7, but they do it separately, as I can see. I guess that this bug wouldn't be fixed in near future in that way.

The profiler.
Also, there are some problems with tries to install correctly working Profiler. I got instructions how to install it in condition that can store infromation on db. I'll try it in next few days.

Pavel (pshvetsov)
Changed in mos:
assignee: Pavel (pshvetsov) → MOS Horizon (mos-horizon)
assignee: MOS Horizon (mos-horizon) → Pavel (pshvetsov)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.