Static Ceph mon IP addresses in connection_info can prevent VM startup

Bug #1452641 reported by Arne Wiebalck on 2015-05-07
This bug affects 9 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)

Bug Description

The Cinder rbd driver extracts the IP addresses of the Ceph mon servers from the Ceph mon map when the instance/volume connection is established. This info is then stored in nova's block-device-mapping table and is never re-validated down the line.
Changing the Ceph mon servers' IP adresses will prevent the instance from booting as the stale connection info will enter the instance's XML. One idea to fix this would be to use the information from ceph.conf, which should be an alias or a loadblancer, directly.

Josh Durgin (jdurgin) wrote :

Nova stores the volume connection info in its db, so updating that
would be a workaround to allow restart/migration of vms to work.
Otherwise running vms shouldn't be affected, since they'll notice any
new or deleted monitors through their existing connection to the
monitor cluster.

Perhaps the most general way to fix this would be for cinder to return
any monitor hosts listed in ceph.conf (as they are listed, so they may
be hostnames or ips) in addition to the ips from the current monmap
(the current behavior).

That way an out of date ceph.conf is less likely to cause problems,
and multiple clusters could still be used with the same nova node.

Changed in cinder:
importance: Undecided → Medium
status: New → Confirmed
Eric Harney (eharney) on 2015-05-07
tags: added: ceph

The problem with adding hosts to the list in Cinder is that those previous mon hosts might be re-used in another Ceph clusters, thereby causing an authentication error when a VM tries an incorrect mon host at boot time. (This is due to the Ceph client behaviour not to try another monitor after authentication error... which I think is rather sane).

Bin Zhou (binzhou) on 2016-03-07
Changed in cinder:
assignee: nobody → Bin Zhou (binzhou)

Unassigning due to no activity.

Changed in cinder:
assignee: Bin Zhou (binzhou) → nobody
Eric Harney (eharney) on 2016-11-08
tags: added: drivers
Changed in cinder:
assignee: nobody → Jon Bernard (jbernard)
Kevin Fox (kevpn) wrote :

How are you supposed to deal with needing to re'ip mons?

Unassigning due to no activity for > 6 months.

Changed in cinder:
assignee: Jon Bernard (jbernard) → nobody
Matt Riedemann (mriedem) wrote :

Talked about this at the queens ptg, notes are in here:

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
no longer affects: cinder
tags: added: volumes
removed: drivers
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers