ARM OpenStack API/Dashboard calls are very slow

Bug #1664764 reported by Raghuram Kota on 2017-02-14
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
horizon (Ubuntu)
Medium
Ryan Beisner

Bug Description

dann frazier (dannf) on 2016-12-23 :

I'm using openstack-base deployed on an arm64-only MAAS (xenial/newton). Both the CLI and dashboard are very slow to respond. CLI commands (e.g. nova list) take about 15s to complete. Logging into the dashboard takes more than 30s. 'top' shows plenty of available CPU/memory on the dashboard node.

Note: This isn't new - it is something we've observed with previous versions of Ubuntu OpenStack as well, but haven't yet gotten to the bottom of it.

Raghuram Kota (rkota) wrote :

Ryan Beisner (1chb1n) wrote on 2017-01-26:

Without scientific data at this point, I can confirm that every CLI/API call I make on a freshly-deployed Xenial-Mitaka arm64 cloud is noticeably sluggish compared with x86 deploys, where the x86 machines are 6 years old, have 75% of the RAM, and less than half the advertised cores.

This is worth profiling without OpenStack in the mix.

I'd like to have HWCERT or Server Team or Kernel Team to provide system performance profiling methodology info (presuming there is an existing approach). Then I'd like to exercise that on these arm64 machines and the x86 machines, then compare to get baseline info (again, absent OpenStack).

@raghu: please see if you can gather info from other teams and direct that info here. TIA!

Raghuram Kota (rkota) wrote :

Ryan Beisner (1chb1n) wrote on 2017-01-26:

For example, this should not take 19s on a cloud with no tenants, no instances, 1 image, and light system resource allocation per-node:

`time glance image-list`
+--------------------------------------+--------+
| ID | Name |
+--------------------------------------+--------+
| 0d66c03c-fe4c-4d26-bfac-74ecb55735ab | xenial |
+--------------------------------------+--------+

real 0m19.580s
user 0m0.888s
sys 0m0.068s

.

`time nova list`
jenkins@juju-osci1-machine-6:~/temp/openstack-charm-testing$ time nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

real 0m11.833s
user 0m1.028s
sys 0m0.076s

Raghuram Kota (rkota) wrote :

dann frazier (dannf) wrote on 2017-01-26: Re: [Bug 1652377] Re: API calls are very slow :

On Wed, Jan 25, 2017 at 6:31 PM, Ryan Beisner
<email address hidden> wrote:
> Without scientific data at this point, I can confirm that every CLI/API
> call I make on a freshly-deployed Xenial-Mitaka arm64 cloud is
> noticeably sluggish compared with x86 deploys, where the x86 machines
> are 6 years old, have 75% of the RAM, and less than half the advertised
> cores.
>
> This is worth profiling without OpenStack in the mix.

Do you know of a way to reproduce it without OpenStack in the mix?

  -dann

Raghuram Kota (rkota) wrote :

Billy Olsen (billy-olsen) wrote on 2017-02-13:

If we think this is a python performance issue, it might be beneficial to get some data running the python performance suite - which is available here https://github.com/python/performance. Also, its likely we should target this at the Python Ubuntu package to see if there's some compile time optimizations that can be done for the python interpreter.

tags: added: arm64 openstack uosci
Ryan Beisner (1chb1n) wrote :

I'd suggest removing OpenStack from the mix and profiling python as Billy mentions. My observation is that simple python calls such as `openstack --help` are also more sluggish than expected.

Jason Hobbs (jason-hobbs) wrote :

We're also seeing this in OIL. The dashboard takes so long to load sometimes that the browser times out.

tags: added: oil
James Page (james-page) wrote :

May be related to bug 1638695

dann frazier (dannf) wrote :

fwiw, I believe we've always seen this problem on arm64, including with our trusty + mitaka cloud archive testing.

Michael Reed (mreed8855) wrote :

I used the benchmark from the following link:

https://github.com/python/performance

I used the following command:

pyperformance run --python=python2 -o py2.json

Michael Reed (mreed8855) wrote :

Kernel version for the results produced in comment #9

~$ uname -a
Linux seidel 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 14:58:00 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux

Raghuram Kota (rkota) wrote :

@Billy Olsen : Michael R helped run the benchmarks you pointed to on ARM64 and posted the results on comm #9. Do you happen to know how they can be interpreted/compared with other archs ?

Thanks,
Raghu

Raghuram Kota (rkota) wrote :

@Billy : Nevermind comment #11. mreed8855 mentioned that the benchmark comes with a "compare" option that can be used to compare results. Thx.

Michael Reed (mreed8855) wrote :

uname -a

4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Michael Reed (mreed8855) wrote :

A comparison of the results

command
pyperformance compare py2_arm64.json py2_x86.json

mahmoh (mahmoh) wrote :

Very disturbing, ~4.5x slower in ARM!

summary: - ARM Opentack API/Dashboard calls are very slow
+ Python 4.5x slower on ARM as seen with Opentack API/Dashboard calls

Python slowness is only speculation at this time - changing the title back.

summary: - Python 4.5x slower on ARM as seen with Opentack API/Dashboard calls
+ ARM Opentack API/Dashboard calls are very slow
dann frazier (dannf) on 2017-05-01
summary: - ARM Opentack API/Dashboard calls are very slow
+ ARM OpenStack API/Dashboard calls are very slow
dann frazier (dannf) wrote :

Last week, sfeole deployed a new arm64-only OpenStack setup in our lab (xenial/newton). This is a hybrid setup with a mixture of X-Gene and ThunderX hardware. I am not able to reproduce on this setup:

$ time glance image-list
+--------------------------------------+-------------+
| ID | Name |
+--------------------------------------+-------------+
| 977ac529-3d0c-4c5d-b6f3-be4e3c4b5449 | xenial-uefi |
+--------------------------------------+-------------+

real 0m3.437s
user 0m1.988s
sys 0m0.132s

So, either this issue was fixed in an update somewhere, or something about this hybrid setup is avoiding it. Note that I originally saw this on a ThunderX-only deployment. One notable difference between these systems is core count - X-Gene is 8 core, ThunderX (1-socket) is 48. Perhaps something is scaling poorly?

dann frazier (dannf) wrote :

Here's a juju crashdump I took of my original (ThunderX-only) setup where I was seeing the problem, so we can compare component versions.

Sean Feole (sfeole) wrote :

Attached is a crashdump as described in comment #17 arm64 xenial/newton openstack deployment that surprisingly does not experience the latent response time in either the CLI / Dashboard.

Ryan Beisner (1chb1n) on 2017-08-23
Changed in horizon (Ubuntu):
status: New → Opinion
importance: Undecided → Medium
assignee: nobody → Ryan Beisner (1chb1n)
Ryan Beisner (1chb1n) wrote :

I revisited this across three CPU architectures. The machines in each arch have vastly varying specs of CPU, mem and disk. Regardless of that difference, the difference in API and CLI response times is insignificant in my opinion.

These are Xenial freshly-deployed hosts with the latest openstack-client.

#### ppc64el
$ time openstack --help
  ...
real 0m13.902s
user 0m12.624s
sys 0m0.252s

$ time openstack image list
+--------------------------------------+----------------+--------+
| ID | Name | Status |
+--------------------------------------+----------------+--------+
| d24aeec2-2217-4e4b-8dc3-07d1b42a6896 | cirros-ppc64el | active |
| fff934c8-b0ac-407c-86e0-93a1b23675c3 | xenial-ppc64el | active |
+--------------------------------------+----------------+--------+

real 0m2.419s
user 0m1.620s
sys 0m0.152s

#### arm64
$ time openstack --help
  ...
real 0m11.808s
user 0m11.372s
sys 0m0.364s

$ time openstack image list
+--------------------------------------+-------------+--------+
| ID | Name | Status |
+--------------------------------------+-------------+--------+
| 83dfc5ac-6245-41eb-81da-0d465aa627fe | xenial-uefi | active |
+--------------------------------------+-------------+--------+

real 0m5.906s
user 0m1.264s
sys 0m0.212s

#### amd64
$ time openstack --help
  ...
real 0m9.484s
user 0m8.476s
sys 0m0.360s

$ time openstack image list
+--------------------------------------+---------+--------+
| ID | Name | Status |
+--------------------------------------+---------+--------+
| 06a073c7-5805-4ed2-b88b-f1c6a8f46210 | cirros | active |
| a6e2aa91-e92f-4adc-9e00-237a16323feb | cirros2 | active |
| 4eafd638-5956-4192-8d06-564a57170ee3 | precise | active |
| df79259c-c668-4528-90f2-36bee4c96784 | trusty | active |
| 20e4b0d2-a101-4186-97c9-51ffdff57b33 | xenial | active |
+--------------------------------------+---------+--------+

real 0m3.491s
user 0m1.616s
sys 0m0.260s

Ryan Beisner (1chb1n) wrote :

Likewise, with the 3 different deployments up at the same time: ppc64el, amd64, and arm64, I logged into horizon and browsed to the same places, somewhat arbitrarily. The difference in the experience and response across these systems was not noticeable to me. The UI/UX was consistent in my opinion.

Ryan Beisner (1chb1n) wrote :

Given these exercises, I'm going to close this bug against OpenStack horizon/dashboard.

Changed in horizon (Ubuntu):
status: Opinion → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers