OK, I managed to reproduce this (or close to this) issue... but in a different setup than what I understood.
Do note the scenario is very artificial so it is very unlikely to happen in real life (but something similar could still...).
Here is the setup I managed to cause Masakari trash all the hosts:
3 controllers (all APIs, DBs, Masakari Engine and Masakari hostmonitor)
some computes, all running pacemaker_remote
hostmonitors configured to monitor only remotes (as all computes are remotes here)
Blocking corosync traffic to one of the controllers makes it become isolated and lose quorum and think all the other nodes are offline. Hostmonitor is happy to tell Masakari all remote nodes are offline...
OK, I managed to reproduce this (or close to this) issue... but in a different setup than what I understood.
Do note the scenario is very artificial so it is very unlikely to happen in real life (but something similar could still...).
Here is the setup I managed to cause Masakari trash all the hosts:
3 controllers (all APIs, DBs, Masakari Engine and Masakari hostmonitor)
some computes, all running pacemaker_remote
hostmonitors configured to monitor only remotes (as all computes are remotes here)
Blocking corosync traffic to one of the controllers makes it become isolated and lose quorum and think all the other nodes are offline. Hostmonitor is happy to tell Masakari all remote nodes are offline...