instance brain split after evacuation

Bug #1879459 reported by suzhengwei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
masakari
New
Undecided
Unassigned

Bug Description

In my test env, after host failure recovery, there would be shadow instance on the failure host, which use the same storage backend with the instance on the destination node. Worsely, there would be brain split for the instances because they use the same ceph volume.

The root cause is that pacemaker usually monitor management connection heartbeat. If management connection is break, it trigger host recovery. But the instance is active, because tenant and storage connection is normal, we just can't manage it. So after evacuation, there will be another same instance on the destination node. The original instance becomes one shadow instance, no record in nova, but still running on the failure host, out of control.

There is one proposal solution to solve this brain split problem. Power-off the failure host, not just disable the nova-compute service. There will be no shadow instances, and no brain split problem.

suzhengwei (sue.sam)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.