nova compute service will down when ceph public network down

Bug #2060758 reported by benlei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Description
===========
We use ceph as the backend storage of nova, but if the compute node's ceph public network down, the nova compute progress will hangs and it's heartbeat will not report again, this will lead to the compute node's nova_compute service down.

Reasons of problem:
nova-compute periodic task need check disk usage, in this process it needs to connect ceph cluster by rados client, but this process will hangs when ceph publice network down.
The reason as below:
    def _connect_to_rados(self, pool=None):
        client = rados.Rados(rados_id=self.rbd_user,
                                  conffile=self.ceph_conf)
        try:
            client.connect(timeout=self.rbd_connect_timeout)
            pool_to_open = pool or self.pool
            # NOTE(luogangyi): open_ioctx >= 10.1.0 could handle unicode
            # arguments perfectly as part of Python 3 support.
            # Therefore, when we turn to Python 3, it's safe to remove
            # str() conversion.
            ioctx = client.open_ioctx(str(pool_to_open))
            return client, ioctx
        except rados.Error:
            # shutdown cannot raise an exception
            client.shutdown()
            raise
client.connect() parameter timeout has been abandoned begin with ceph Nautilus version, instead, use client_mount_timeout parameter in ceph.conf. So if storage public network down, the rados client will use default timeout mechanism, the total timeout period is 50 minutes。The single timeout duration is 5 minutes and retry 10 times.

We should set client_mount_timeout para in ceph.conf file to resolve this issue.

Tags: ceph nova
benlei (benleixu)
tags: added: ceph nova
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.