increment in time to detect a failed vm

Bug #1850834 reported by Victor Manuel Rodriguez Bahena
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Triaged
High
ya.wang

Bug Description

StarlingX bug reporting guidelines:
Please use the template below when opening StarlingX bugs.

Brief Description
-----------------
After getting the initial performance numbers, we found that one of the important gaps we have is in the early detection of failed VM.

We confirm that this gap is fixed by the patch that your team very kindly provide from R1:

https://gist.github.com/VictorRodriguez/e137a8cd87cf821f8076e9acc02ce195

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
1) Launch a VM with the next features:

RAM 2GB
Disk 20GB
VCPUS 1
Properties hw:mem_page_size=large
Image Debian

2) Detect the compute where that VM was deployed and kill the QEMU process,
immediately after this, the initial time must be taken.
3) Make a constant of pull request of the VM status and stop the test when it
changes.
4) Finally take the end time and calculate the delta.

Expected Behavior
------------------
Around 500 ms

Actual Behavior
----------------
Seconds

Reproducibility
---------------
<Reproducible/Intermittent/Seen once>
State if the issue is 100% reproducible with 10% of Coefficient of Variation (CV). CV is a measure of relative variability. It is the ratio of the standard deviation to the mean (average)

System Configuration
--------------------
Two node system

Branch/Pull Time/Commit
-----------------------
R2.0

Bruce Jones (brucej)
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to distro.openstack PL to review/follow up.
This appears to be a request to propose a code change to nova. I'm not sure why this is an stx launchpad.

tags: added: stx.distro.openstack
Changed in starlingx:
assignee: nobody → yong hu (yhu6)
Revision history for this message
yong hu (yhu6) wrote :

This is a performance issue, and @Ya has been bringing the original patch to OpenStack Nova PTG last week.

Let's see how much progress we can make in stx.3.0 time frame.

Changed in starlingx:
assignee: yong hu (yhu6) → ya.wang (ya.wang)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Given that this was marked as High priority, I will add the stx.3.0 release tag for now. The PL can choose to re-gate if this cannot be addressed within the required time-line.

tags: added: stx.3.0
Changed in starlingx:
status: New → Triaged
Revision history for this message
yong hu (yhu6) wrote :

in PTG, @Ya brought this patch NOVA PTG in Nov 2019 OpenInfra Summit.
Right now, Ya is making a new patch so we expect Nova upstream can accept it.
As well, we need to assure the new patch indeed can improve the performance of detecting VM failures.

Revision history for this message
Victor Manuel Rodriguez Bahena (vm-rod25) wrote :

Thanks a lot Yong, we can test the past as soon as the team have it, we can also provide some data to sustain the patch ( measure in cycle times of the CPU ). These are great news, thanks a lot

Revision history for this message
ya.wang (ya.wang) wrote :

Hi, I registered a new LP with nova project:

https://bugs.launchpad.net/nova/+bug/1853259

Revision history for this message
yong hu (yhu6) wrote :

As discussed, this performance ticket won't gate stx.3.0.

@Ya, please go ahead with Nova upstream as usual.

tags: removed: stx.3.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.