nova compute service will down when ceph public network down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
New
|
Undecided
|
Unassigned |
Bug Description
Description
===========
We use ceph as the backend storage of nova, but if the compute node's ceph public network down, the nova compute progress will hangs and it's heartbeat will not report again, this will lead to the compute node's nova_compute service down.
Reasons of problem:
nova-compute periodic task need check disk usage, in this process it needs to connect ceph cluster by rados client, but this process will hangs when ceph publice network down.
The reason as below:
def _connect_
client = rados.Rados(
try:
# NOTE(luogangyi): open_ioctx >= 10.1.0 could handle unicode
# arguments perfectly as part of Python 3 support.
# Therefore, when we turn to Python 3, it's safe to remove
# str() conversion.
ioctx = client.
return client, ioctx
except rados.Error:
# shutdown cannot raise an exception
raise
client.connect() parameter timeout has been abandoned begin with ceph Nautilus version, instead, use client_
We should set client_