masakari

instance brain split after evacuation

Bug #1879459 reported by suzhengwei on 2020-05-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	masakari	New	Undecided	Unassigned

Bug Description

In my test env, after host failure recovery, there would be shadow instance on the failure host, which use the same storage backend with the instance on the destination node. Worsely, there would be brain split for the instances because they use the same ceph volume.

The root cause is that pacemaker usually monitor management connection heartbeat. If management connection is break, it trigger host recovery. But the instance is active, because tenant and storage connection is normal, we just can't manage it. So after evacuation, there will be another same instance on the destination node. The original instance becomes one shadow instance, no record in nova, but still running on the failure host, out of control.

There is one proposal solution to solve this brain split problem. Power-off the failure host, not just disable the nova-compute service. There will be no shadow instances, and no brain split problem.

See original description

suzhengwei (sue.sam) on 2020-05-19

description:

updated

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.