Static Ceph mon IP addresses in connection_info can prevent VM startup

Bug #1452641 reported by Arne Wiebalck on 2015-05-07
100
This bug affects 16 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Corey Bryant
nova (Ubuntu)
Medium
Corey Bryant

Bug Description

The Cinder rbd driver extracts the IP addresses of the Ceph mon servers from the Ceph mon map when the instance/volume connection is established. This info is then stored in nova's block-device-mapping table and is never re-validated down the line.
Changing the Ceph mon servers' IP adresses will prevent the instance from booting as the stale connection info will enter the instance's XML. One idea to fix this would be to use the information from ceph.conf, which should be an alias or a loadblancer, directly.

Josh Durgin (jdurgin) wrote :

Nova stores the volume connection info in its db, so updating that
would be a workaround to allow restart/migration of vms to work.
Otherwise running vms shouldn't be affected, since they'll notice any
new or deleted monitors through their existing connection to the
monitor cluster.

Perhaps the most general way to fix this would be for cinder to return
any monitor hosts listed in ceph.conf (as they are listed, so they may
be hostnames or ips) in addition to the ips from the current monmap
(the current behavior).

That way an out of date ceph.conf is less likely to cause problems,
and multiple clusters could still be used with the same nova node.

Changed in cinder:
importance: Undecided → Medium
status: New → Confirmed
Eric Harney (eharney) on 2015-05-07
tags: added: ceph

The problem with adding hosts to the list in Cinder is that those previous mon hosts might be re-used in another Ceph clusters, thereby causing an authentication error when a VM tries an incorrect mon host at boot time. (This is due to the Ceph client behaviour not to try another monitor after authentication error... which I think is rather sane).

Bin Zhou (binzhou) on 2016-03-07
Changed in cinder:
assignee: nobody → Bin Zhou (binzhou)

Unassigning due to no activity.

Changed in cinder:
assignee: Bin Zhou (binzhou) → nobody
Eric Harney (eharney) on 2016-11-08
tags: added: drivers
Changed in cinder:
assignee: nobody → Jon Bernard (jbernard)
Kevin Fox (kevpn) wrote :

How are you supposed to deal with needing to re'ip mons?

Unassigning due to no activity for > 6 months.

Changed in cinder:
assignee: Jon Bernard (jbernard) → nobody
Matt Riedemann (mriedem) wrote :

Talked about this at the queens ptg, notes are in here:

https://etherpad.openstack.org/p/cinder-ptg-queens

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
no longer affects: cinder
tags: added: volumes
removed: drivers
Walt Boring (walter-boring) wrote :

I have a customer that is seeing something similar to this. I thought about filing a new bug, but this might be sufficient to just piggy back this one.

They have running VMs that are boot from ceph volume and also has attached ceph volumes.
He adds a new monitor to his ceph cluster and updates ceph.conf on all of the openstack nodes to reflect the new monitor IP.

He does a live migration to try and get nova to update the libvirt.xml and it seems that only the volumes section is updated, not the vms section.

He added a patch to migration.py to fix this, but wasn't sure it was the right thing to do. I have added his patch as an attachment here.
Let me know if this might be ok, and I can submit the patch to gerrit.

This is a copy of xml after the live migrate.

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='nova'>
        <secret type='ceph' uuid='820ccd0b-b180-4528-93ed-76ae82edf832'/>
      </auth>
      <source protocol='rbd' name='vms/3b97914e-3f9b-410a-b3d9-6c1a83244136_disk'> <-- this one is NOT changed, old ips
        <host name='192.168.200.12' port='6789'/>
        <host name='192.168.200.14' port='6789'/>
        <host name='192.168.200.24' port='6789'/>
        <host name='192.168.240.17' port='6789'/>
        <host name='192.168.240.23' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='nova'>
        <secret type='ceph' uuid='820ccd0b-b180-4528-93ed-76ae82edf832'/>
      </auth>
      <source protocol='rbd' name='volumes/volume-6d04520d-0029-499c-af81-516a7ba37a54'> <-- this one is changed, new ips
        <host name='192.168.200.12' port='6789'/>
        <host name='192.168.200.14' port='6789'/>
        <host name='192.168.200.24' port='6789'/>
        <host name='192.168.210.15' port='6789'/>
        <host name='192.168.240.17' port='6789'/>
        <host name='192.168.240.23' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>6d04520d-0029-499c-af81-516a7ba37a54</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

Matt Riedemann (mriedem) wrote :

That patch is way too rbd specific I think. Here is a more detailed conversation we had in IRC and also goes over some of what was discussed at the Queens PTG:

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-01-04.log.html#t2018-01-04T22:26:24

Lee Yarwood (lyarwood) wrote :

~~~
<source protocol='rbd' name='vms/3b97914e-3f9b-410a-b3d9-6c1a83244136_disk'> <-- this one is NOT changed, old ips
        <host name='192.168.200.12' port='6789'/>
        <host name='192.168.200.14' port='6789'/>
        <host name='192.168.200.24' port='6789'/>
        <host name='192.168.240.17' port='6789'/>
        <host name='192.168.240.23' port='6789'/>
</source>
~~~

For ephemeral rbd images we fetch the mon ips during the initial instance creation but don't refresh this during LM [1]. IMHO this is a separate issue to the volume connection_info refresh problem being discussed in this bug.

[1] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L163

Walt Boring (walter-boring) wrote :

Thanks Lee,
  I filed a separate bug for updating the rbd images here:
https://bugs.launchpad.net/nova/+bug/1741364

Xav Paice (xavpaice) on 2018-06-07
tags: added: canonical-bootstack
Xav Paice (xavpaice) wrote :

This manifested itself again on a Mitaka cloud, we had moved the Ceph mons and existing, running, instances were fine, fresh new instances were fine, but when we stopped instances via nova, then started them again, they failed to start. Editing the xml didn't fix anything of course because Nova overwrite the xml on machine start.

I ended up fixing the nova db:

update block_device_mapping set connection_info = replace(connection_info, '"a.b.c.d", "a.b.c.e", "a.b.c.f"', '"a.b.c.foo", "a.b.c.bar", "a.b.c.baz"') where connection_info like '%a.b.c.d%'
 and deleted_at is NULL;

The select query could have been better (don't copy me!) but you get the point.

Subscribing field-high because this is something that will continue to bite folks every time ceph-mon hosts are moved around.

James Page (james-page) wrote :

I guess the alternative is to update the mapping for the block device on a stop/start nova operation.

The attachment "virt/libvirt/migration.py patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Corey Bryant (corey.bryant) wrote :

Just to summarize my understanding, and perhaps clarify for others, this bug is focused on stale connection_info for rbd volumes (not rbd images). rbd images have a related issue during live migration that is being handled in a separate bug (see comment 12 above).

Focusing on connection_info for rbd volumes now (and thanks to Matt Riedemann's comments for the tips here). connection_info appears to be properly refreshed for live migration in pre_live_migration() where _get_instance_block_device_info() is called with refresh_conn_info=True (see comment 9 above and https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L5977).

Is the fix as simple as flipping refresh_conn_info=False to True for some of the other calls to _get_instance_block_device_info()? Below is an audit of the _get_instance_block_device_info() calls.

Calls to _get_instance_block_device_info() with refresh_conn_info=False:
  _destroy_evacuated_instances()
  _init_instance()
  _resume_guests_state()
  _shutdown_instance()
  _power_on()
  _do_rebuild_instance()
  reboot_instance()
  revert_resize()
  _resize_instance()
  resume_instance()
  shelve_offload_instance()
  check_can_live_migrate_source()
  _do_live_migration()
  _post_live_migration()
  post_live_migration_at_destination()
  rollback_live_migration_at_destination()

Calls to _get_instance_block_device_info() with refresh_conn_info=True:
  finish_revert_resize()
  _finish_resize()
  pre_live_migration()

Based on xavpaice's comments in (see comment 13 above -- "... existing, running, instances were fine, fresh new instances were fine, but when we stopped instances via nova, then started them again, they failed to start ..."), it would seem that the following should also have refresh_conn_info=True:
  _power_on() # solves xavpaice's scenario?
  _do_rebuild_instance()
  reboot_instance()

Xav Paice (xavpaice) wrote :

FWIW, in the cloud we saw this, migrating the (stopped) instance also updated the connection info - it was just that migrating hundreds of instances wasn't practical.

Changed in nova:
assignee: nobody → Corey Bryant (corey.bryant)
Changed in nova (Ubuntu):
assignee: nobody → Corey Bryant (corey.bryant)
Corey Bryant (corey.bryant) wrote :

I did some initial testing with the default parameter value for refresh_conn_info set to True in _get_instance_block_device_info() and unfortunately an instance with rbd volume attached does not successfully stop/start after ceph-mon's are moved to new IP addresses.

Fix proposed to branch: master
Review: https://review.openstack.org/579004

Changed in nova:
status: Confirmed → In Progress
Changed in nova (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers