Comment 3 for bug 1763181

Revision history for this message
Matthew Booth (mbooth-9) wrote :

There's definitely an obvious race here which I think Chris is describing in comment 1? I'm going to describe it again in my own words, just to be sure:

In _validate_instance_group_policy, whilst holding a lock on the name of the group we check the members of the group against InstanceList.get_uuids_by_host()[1]. However, at the time of executing this method, instance.host still refers to the source, so the current instance will not be returned in that list. instance.host is updated much later outside the critical section. The sequence of the race is therefore:

Instance A Instance B

instance.host = source instance.host = source
_validate_instance_group_policy() _validate_instance_group_policy()
  SUCCESS -> Neither A nor B on this host SUCCESS -> Neither A nor B on this host
... rebuild ... ... rebuild ...
instance.host = dest instance.host = dest

Note that to use a lock here we would need to update instance.host *in the same critical section as the test*. There may be other ways to achieve this, but this is what needs to happen in order to use a regular lock.

[1] Note that this critical section doesn't modify any data and this lock isn't used anywhere else which does modify data, so this lock serves no purpose. If we don't modify the code to make use of the critical section we should remove this lock: it achieves nothing and worse it seems to have confused folks.

Chris, I read your instance id comparison thing a few times but I couldn't parse it. I don't understand how this would work. Is there some StarlingX code to look at?