After discussions with mriedem on IRC, it's worth noting that the above patch doesn't fix the underlying issue so much as a side-effect of that, namely, the inability of nova-compute to restart after the error has occurred. What is does fix is bug #1738373, which is solely focused on that side effect. That bug has now been marked as a duplicate of this one. Cleaned up logs from the IRC discussion on #nova-compute below. [19-12 15:12:52] mriedem: Not to distract you now, but did you make a mistake on https://github.com/openstack/nova/commit/cdf8ba5acb ? You've said it fixes https://bugs.launchpad.net/nova/+bug/1784579 but that bug is for live migration, not compute service restart which is what your commit addresses [19-12 15:13:18] mriedem: I ask because I found a similar bug which does deal with the compute service restart https://bugs.launchpad.net/nova/+bug/1738373 [19-12 15:18:06] stephenfin: yes bug 1784579 is about os-vif port binding failed errors right? [19-12 15:19:14] mriedem: Yup, but it's to do with live migration and the fix is only for the service startup code path [19-12 15:19:40] At least, assuming I'm reading it right. I'll do some digging but just wanted to sanity check it before I dived down the rabbit hole :) [19-12 15:19:56] stephenfin: the live migratoin fails because of the port binding failures [19-12 15:21:15] stephenfin: comment [19-12 15:21:16] 2 [19-12 15:21:17] "To summarize, it looks like the pre_live_migration method on the destination host fails to plug vifs and you end up with the "binding_failed" error, which is raised and makes the source live_migration method fail as expected. The failure is on the dest host. As a result, the info cache is updated with "binding_failed" which causes the source compute restart to fail here:" [19-12 15:22:19] stephenfin: so no i didn't fix the original reason for the port binding failure in pre_live_migration, because that could have been for any number of reasons (neutron agent was down on the dest host?) [19-12 15:22:38] i fixed a symptom of that failure, which was nova-compute failed to restart after that failure [19-12 15:22:53] as the commit message says, "Admittedly this isn't the smartest thing and doesn't attempt [19-12 15:22:54]     to recover / fix the instance networking info" [19-12 15:22:59] mriedem: I'm missing something. Why make changes to 'ComputeManager.init_host' (via '_init_instance') in that commit? The exception was being seen in the live migration flow [19-12 15:23:01] ahhhhh [19-12 15:23:21] 1. live migratoin fails, port binding failed - that gets saved in the info cache [19-12 15:23:31] 2. restart source compute - that blows up because it wasn't handling binding_failed vif types in the os-vif conversion code [19-12 15:23:38] i handle #2 [19-12 15:23:46] #1 is sort of out of my control [19-12 15:23:50] Your fix would inadvertently resolve https://bugs.launchpad.net/nova/+bug/1738373 so [19-12 15:24:12] i mean, we probably shouldn't be saving off busted port binding information when pre_live_migration fails, [19-12 15:24:27] since that overwrites the previously good port binding information from the source host [19-12 15:25:03] i would have to dig into where we save off the bad port binding information [19-12 15:25:07] Yup, there's a related fix (also for live migration) that you worked on which looks more involved https://bugs.launchpad.net/nova/+bug/1783917 [19-12 15:26:02] ^ was a regression in rocky [19-12 15:26:45] so i suppose my fix should have been related to bug 1784579 [19-12 15:26:49] not closes it