Too log execution times for sanity tests in Virtual Environment

Bug #1895858 reported by Alexandru Dimofte
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Ran An

Bug Description

Brief Description
-----------------
Normally the Sanity tests execution time per test in baremetal is 1-2-3 mins.
The same tests executed in Virtual Environment takes 3-4-5 mins per test.
We observed started after 10th of September (latest green Virtual Environment was: layered-20200910T013​244Z) some very long execution test time per test... like 25-30-40 mins.

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
Install the latest layered image and run some sanity tests, you'll be able to see pretty long execution times...

Expected Behavior
------------------
Expected is to have again exec. times like 3-4-5 mins per test in Virtual Env.

Actual Behavior
----------------
Because of this long execution times, our Sanity for Virtual is blocked. Reason: timeouts.

Reproducibility
---------------
100%

System Configuration
--------------------
Virtual Simplex, Virtual Duplex, Virtual Standard and Virtual Standard Ext.

Branch/Pull Time/Commit
-----------------------
Master branch, after 10th of Sept.

Last Pass
---------
latest green Virtual Environment was: layered-20200910T013​244Z

Timestamp/Logs
--------------
Logs will be attached.

Test Activity
-------------
Sanity Virtual Testing.

Workaround
----------
-

Revision history for this message
Alexandru Dimofte (adimofte) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Gerrit shows a large number of commits that merged on 2020-09-10 and would have been included in the 2020-09-11 load.
The full list of code merges is here: https://review.opendev.org/#/q/projects:starlingx+is:merged+branch:master

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per discussion in the community call on 2020-09-16, Yong's team will start the investigation as they have access to the Intel sanity suite.

Changed in starlingx:
assignee: nobody → yong hu (yhu6)
Revision history for this message
Angie Wang (angiewang) wrote :

I have some commits got merged on Sept10th.

The following commits make no difference to the current system unless a new parameter "virtual_system" is enabled in bootstrap overrides file. With the parameter enabled, it will allow Starlingx to be installed in a Nova VM with limit resource(small disk, cpu, memory) and this is for AIO only.
https://review.opendev.org/#/q/status:merged++branch:master+topic:StarlingX_in_Nova_VM

The following commits are just the cleanup in AIO kickstart to reduce the minimum cgts-vg size created by system. The changes shouldn't impact the system performance.
https://review.opendev.org/#/q/status:merged+branch:master+topic:bug/1892554

The following commits are only related to openstack application which makes all openstack requests to go through ingress pod. Are only openstack related testcases slow?
https://review.opendev.org/#/q/status:merged+branch:master+topic:bug/1880777

Austin Sun (sunausti)
Changed in starlingx:
assignee: yong hu (yhu6) → Ran An (an.ran)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as high priority given the impact on sanity

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.5.0 stx.sanity
Revision history for this message
Alexandru Dimofte (adimofte) wrote :

We tried to re-execute an older build which previously successfully passed and we observed the same problems. This means there is probably a problem with our servers. We will focus on this and we'll come with info asap.

Revision history for this message
Ran An (an.ran) wrote :

Hi Alexandru, did you top on the controller when it was running sanity test? What about the usage of CPU/Mem/Disk/Network?

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

I managed to fix one of the servers (Virtual Standard). The problem appeared because some packages on this servers were automatically updated. We reinstalled the OS on the server, reconfigured it and it works now. The same thing we should do with the other 2 servers. So basically is nothing what you can do now, I think this LP bug can also be closed... So the issue was: auto-update of the components on the server.

Ran An (an.ran)
Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.