Comment 16 for bug 1452641

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Just to summarize my understanding, and perhaps clarify for others, this bug is focused on stale connection_info for rbd volumes (not rbd images). rbd images have a related issue during live migration that is being handled in a separate bug (see comment 12 above).

Focusing on connection_info for rbd volumes now (and thanks to Matt Riedemann's comments for the tips here). connection_info appears to be properly refreshed for live migration in pre_live_migration() where _get_instance_block_device_info() is called with refresh_conn_info=True (see comment 9 above and https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L5977).

Is the fix as simple as flipping refresh_conn_info=False to True for some of the other calls to _get_instance_block_device_info()? Below is an audit of the _get_instance_block_device_info() calls.

Calls to _get_instance_block_device_info() with refresh_conn_info=False:
  _destroy_evacuated_instances()
  _init_instance()
  _resume_guests_state()
  _shutdown_instance()
  _power_on()
  _do_rebuild_instance()
  reboot_instance()
  revert_resize()
  _resize_instance()
  resume_instance()
  shelve_offload_instance()
  check_can_live_migrate_source()
  _do_live_migration()
  _post_live_migration()
  post_live_migration_at_destination()
  rollback_live_migration_at_destination()

Calls to _get_instance_block_device_info() with refresh_conn_info=True:
  finish_revert_resize()
  _finish_resize()
  pre_live_migration()

Based on xavpaice's comments in (see comment 13 above -- "... existing, running, instances were fine, fresh new instances were fine, but when we stopped instances via nova, then started them again, they failed to start ..."), it would seem that the following should also have refresh_conn_info=True:
  _power_on() # solves xavpaice's scenario?
  _do_rebuild_instance()
  reboot_instance()