Horizon is slow at small scale

Bug #1576067 reported by Sergey Arkhipov
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
High
MOS Horizon
10.0.x
Won't Fix
High
MOS Horizon
9.x
Won't Fix
High
MOS Horizon

Bug Description

Detailed bug description:
Horizon is painfully slow at small baremetal lab (10 nodes overall). It almost impossible to use: usually pages load > 10 seconds (e.g it takes ~13 seconds to show empty list of instances)

Steps to reproduce:
1. Deploy cluster with Ceilometer, Sahara and Aodh
2. Run different rally tests for 1-2 days
3. Try to boot or list instances using Horizon

Expected results:
All requests should be done < 10 seconds

Actual result:
Performance is slow and UI is sluggish. A lot of requests are performed > 10 seconds.

Reproducibility:
100%. Even after rebooting _all_ nodes in cluster, and cleaning entities in OS, UI stays very slow and sluggish

Workaround:
N/A

Impact:
10 seconds delays make Horizon almost impossible to use.

Description of the environment:
Please find details in diagnostic snapshot below.

* 10 baremetal nodes:
   - CPU: 12 x 2.10 GHz
   - Disks: 2 drives (SSD - 80 GB, HDD - 931.5 GB), 1006.0 GB total
   - Memory: 2 x 16.0 GB, 32.0 GB total
   - NUMA topology: 1 NUMA node
* Node roles:
  - 1 ElasticSearch / Kibana node
  - 1 InfluxDB / Grafana node
  - 3 controllers (1 was is offline because of disk problems)
  - 5 computes
* Details:
  - OS: Mitaka on Ubuntu 14.04
  - Compute: KVM
  - Neutron with VLAN segmentation
  - Ceph RBD for volumes (Cinder)
  - Ceph RadosGW for objects (Swift API)
  - Ceph RBD for ephemeral volumes (Nova)
  - Ceph RBD for images (Glance)

Additional information:
Diagnostic snapshot: https://drive.google.com/open?id=0B9tzODpFABxkbTIxNGQ2T19qcm8
Small demo: https://drive.google.com/open?id=0B9tzODpFABxkazZZX29TSmxSNDQ (demo was recorded ~16:39 UTC, 27 Dec 2016)
Demo of instance page without instances: https://drive.google.com/open?id=0B9tzODpFABxkRWlNZ2U3ajA2YU0 (demo was recorded ~16:41 UTC, 27 Dec 2016)

Revision history for this message
Timur Sufiev (tsufiev-x) wrote :

Thank you very much for a very detailed bug report with videos!

The painful slowness of Overview panel is a well-know issue, as well horrible Quotas/Defaults implementation. As for the Instances panel, how many instances were there on a single table page? Currently that heavy influences the total performance of page rendering.

Another interesting thing I've seen was that Projects table took 6+ seconds to render, that's particularly interesting because that page requires almost no other sources of data (contrary to Instances, for example). Could you fetch the list of projects using CLI on the same env and report back the total time?

Changed in mos:
importance: Undecided → High
Revision history for this message
Timur Sufiev (tsufiev-x) wrote :

Unfortunately, Horizon slowness is not a thing that could be fixed easily within the scope of just one bug. It's going to be a slow and tedious process, involving many changes, bugfixes and blueprints. Still I'd prefer to have this bug to remind us the actual numbers.

Revision history for this message
Sergey Arkhipov (sarkhipov) wrote :

I've fetched project list with `keystone tenant-list` and using Horizon several times: it seems that this time is pretty close, ~4 seconds.

> As for the Instances panel, how many instances were there on a single table page?

Overall, I had ~20 instances at that time. But you may check latest video: I removed all instances from cluster but delay was too big as well.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Timur, please take a closer look. 20 instances is way too few...

Changed in mos:
status: New → Confirmed
tags: added: area-horizon
Revision history for this message
Robert Duncan (rduncan-t) wrote :

This seems like a legitimate assumption:
https://ask.openstack.org/en/question/91577/horizon-slow-causes/

the link above is related to a MOS deployment and concerns the 'workers' configuration of openstack services running on the controllers - it seems each service will run one worker for each cpu core and this leads to swapping, in MOS 9.0 my 3 controllers have 24GB RAM and 24 cores, disk swapping happens at rest immediately after deployment. this should be a critical fix as it could break disks.

Revision history for this message
Michael Petersen (mpetason) wrote :

I'm seeing similar issues in another 9.0 installation. Are there any updates for this issue? What information is needed to continue troubleshooting?

tags: added: customer-found
Revision history for this message
Timur Sufiev (tsufiev-x) wrote :

Michael, the approach we (me and Andrey Grebennikov) took to see what things cause the most lags was the following one:

* in case of Horizon HA ensure that there is only one node working (to simplify further debugging)
* ensure that 'DEBUG' level of logging is set for all LOGGING['loggers'] related to various python-clients in /etc/openstack-dashboard/local_settings.py
* ensure also that whichever handler that is actually used for writing horizon logs (the ones defined within LOGGING['handlers']) has 'DEBUG' level enabled as well.
* preferrably truncate horizon logs to start fresh
* load the page you're experiencing difficulties with
* among all the logged calls that have been made while rendering the page, find the ones that took most time.

Parsing Horizon logs to see which calls take which time is actually the most tedious part of the process. In late N/early O (actually it was early O, but can be easily backported to the tagged N) release Horizon received a new 'OpenStack Profiler' panel which allows to do the same task from Horizon itself - reload the page with profiler switched on and then see the which calls were responsible for the rendering delays through a nice graphs.

Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :

I can confirm this issue even in MOS 9.2 on clean cloud with no resources - horizon tabs take several seconds to load. Since I have already done extensive analysis of Horizon slowness in past, I can try to do the same with 9.2 and provide some insights. I will contact Nikita Konovalov regarding this.

Revision history for this message
Dmitry Sutyagin (dsutyagin) wrote :

UPD: after several refreshes all pages start to respond very quickly in MOS 9.2, main request usually takes < 1s to complete, the rest is css/js (which should be loaded from cache, I have different results in Chrome vs FF - FF shows as cached while Chrome keeps reloading js and css, chaching only .jpg. Or at least reports it this way, while I did not disable cache). Therefore I agree with the opinion that this behavior is due to swapping. However in each environment it can be different. In each particular case slowness has to be analyzed individually.

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

Dear Dmitry, kindly tell how did you analyze slowness of horizon.
On my installation I see that 'instances' page with 6 instances taking 5-6 secs to render. Is it slow or fast? Other pages taking 2-3 seconds.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.