Comment 48 for bug 1323658

Revision history for this message
Matt Riedemann (mriedem) wrote :

Trying to read through this again (there are a lot of comments), for awhile it sounded like this was happening more for reboot and we've had a long-standing problem with reboot where we don't test soft reboot in tempest because we can't actually tell when a soft reboot happens or it fallsback to hard reboot, see bug 1014647. That's really more of a test sanity issue though I think, stepping back it sounds like we are really just hearing about a problem bringing up a guest after stop/start (resize) and/or ssh'ing into that guest.

If the problem is ssh, and we think it's due to missing network_info in the instance_info_cache, as suggested in comment 41 and comment 42, we could add some diagnostic trace to tempest on failure by getting the network info from the instance and see if there are inconsistencies, e.g. comment 41 where there is no network listed but there are interfaces attached.

There have been races in the neutronv2 API code in nova around refreshing the info cache, I'm wondering if there is something in the compute manager in the resize/reboot operations where we need to refresh the info_cache from neutron and that's missing today.

We could also look at refreshing https://review.openstack.org/#/c/134689/ but changing it given Dan Smith's comments about how we don't need to refresh everything, just instance_info_cache. I can take a look at that.