StarlingX R2 duplex: VM not getting rebuild on controller-1 when controller-0 (active) is rebooted

Bug #1851332 reported by Akshay
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
zhipeng liu

Bug Description

Brief Description
-----------------
Setup: I have deployed Bare Metal StarlingX R2 duplex mode. While testing HA, I tested a case in which I have spawned 2 VMs from horizon together on IPv6 flat and IPv6 vlan networks. One gets spawned on controller-0 and other on controller-1. Both VM gets IP assigned on each network and they are able to ping each other.

Test Case: Now I rebooted the controller-0 (active node ) and the VM on controller-0 tries to rebuild itself.

Issue: But it stays in rebuilding state throughout the time taken by controller-0 to reboot and when controller-0 comes up and available, the VM gets rebuild on controller-0 only.

I tried this case many times with same result.
Please guide me to solve this issue.

Severity
--------

Critical

Steps to Reproduce
------------------
1. Deploy Bare Metal StarlingX R2 duplex mode.
2. Spawn 2 VMs together on IPv6 flat and IPv6 vlan from horizon.
3. Reboot the controller-0 (Active node).
4. Check the rebuilding process of VM on controller-0.

Expected Behavior
------------------
VM should be rebuilt on controller-1 immediately.

Actual Behavior
----------------
After reboot of controller-0, VM gets rebuilt on controller-0 only (~ After 20 mins).

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two node system

Last Pass
---------
NO

Revision history for this message
Akshay (yadavakshay58) wrote :

It is showing multiple behaviors.
1. For multiple times, it did the way explained above.
2. Sometimes it gets rebuild on controller-1 but it did not gets IP assigned to the vlan network inside the VM.
3. Sometimes it gets rebuild on controller-1 but did not get any IP assigned on any of the networks.

Please guide.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to the ditro-openstack PL for triage/next steps

tags: added: stx.distro.openstack
Changed in starlingx:
assignee: nobody → yong hu (yhu6)
yong hu (yhu6)
tags: added: stx.2.0
Revision history for this message
yong hu (yhu6) wrote :

@Akshay, please share the VM flavor info and once you see the issue again, you might catch the log by "collect" cmd on both controllers.

I suspect it's related to "anti-affinity" feature, which prevents VMs from being scheduling in the same node.

Changed in starlingx:
importance: Undecided → Medium
Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
zhipeng liu (zhipengs)
Changed in starlingx:
assignee: yong hu (yhu6) → zhipeng liu (zhipengs)
Revision history for this message
zhipeng liu (zhipengs) wrote :

Hi Akshay,

Any update from your latest test with our latest build?
I also saw the scenario case you mentioned.
Sometimes, when you reboot node 1, VM in node 1 do not evacuate to node 2 but rebuilding in node 1.
STX has a precondition when determining if evacuation should be triggered,
if node 1 is not stay in offline(may be it restarts quickly), evacuation will be replaced by
rebuilding in node 1.
For IP issue, we have not seen this kind of issue so far.
If you still have issue, please also provide complete log.

Thanks!
Zhipeng

zhipeng liu (zhipengs)
Changed in starlingx:
status: Triaged → Incomplete
Revision history for this message
Akshay (akshay346) wrote :

Hi Zhipeng,

Thanks for the information. Also we have not tried it with latest build.

Revision history for this message
zhipeng liu (zhipengs) wrote :

I propose to close it since no update for more than 1 month

Thanks!
Zhipeng

zhipeng liu (zhipengs)
Changed in starlingx:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.