Opening NFS tab in the dashboard leads to ceph mgr crash - orchestrator._interface.NoOrchestrator: No orchestrator configured

Bug #2039955 reported by Nobuto Murata
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph Dashboard Charm
New
Undecided
Unassigned
ceph (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Whenever the NFS tab in the Ceph dashboard is opened, NoOrchestrator exception is raised and it's considered as a ceph mgr module crash (although it's not an actual process crash).

Other tabs that require orchestrator handle the situation well, those tabs prints the following message but no exception is raised.

====
Orchestrator is not available
Orchestrator is unavailable: No orchestrator configured (try `ceph orch set backend`)
Please consult the documentation on how to configure and enable the management functionality.
====

In the meantime, with the NFS tab, exception is raised.

https://dashboard.example.com:8443/#/nfs
====
NFS-Ganesha is not configured

Remote method threw exception: Traceback (most recent call last): File "/usr/share/ceph/mgr/nfs/module.py", line 169, in cluster_ls return available_clusters(self) File "/usr/share/ceph/mgr/nfs/utils.py", line 38, in available_clusters completion = mgr.describe_service(service_type='nfs') File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 1488, in inner completion = self._oremote(method_name, args, kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 1555, in _oremote raise NoOrchestrator() orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)
Please consult the documentation on how to configure and enable the management functionality.
====

# ceph health
HEALTH_WARN 2 mgr modules have recently crashed

# ceph crash ls
ID ENTITY NEW
2023-10-20T00:40:40.362363Z_2f461bb5-343c-4cb4-8134-99ae29ddc60c mgr.juju-ffeb43-0-lxd-0 *
2023-10-20T02:24:37.980204Z_9bf106e2-0dd2-4a88-b0f4-647dfa82697f mgr.juju-ffeb43-0-lxd-0 *

# ceph crash info 2023-10-20T00:40:40.362363Z_2f461bb5-343c-4cb4-8134-99ae29ddc60c
{
    "backtrace": [
        " File \"/usr/share/ceph/mgr/nfs/module.py\", line 169, in cluster_ls\n return available_clusters(self)",
        " File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n completion = mgr.describe_service(service_type='nfs')",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n completion = self._oremote(method_name, args, kwargs)",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.6",
    "crash_id": "2023-10-20T00:40:40.362363Z_2f461bb5-343c-4cb4-8134-99ae29ddc60c",
    "entity_name": "mgr.juju-ffeb43-0-lxd-0",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "NoOrchestrator",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04.3 LTS",
    "os_version": "22.04.3 LTS (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mgr",
    "stack_sig": "b01db59d356dd52f69bfb0b128a216e7606f54a60674c3c82711c23cf64832ce",
    "timestamp": "2023-10-20T00:40:40.362363Z",
    "utsname_hostname": "juju-ffeb43-0-lxd-0",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-87-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023"
}

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: ceph-mgr-dashboard 17.2.6-0ubuntu0.22.04.1
ProcVersionSignature: Ubuntu 5.15.0-87.97-generic 5.15.122
Uname: Linux 5.15.0-87-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: lxd
CloudName: lxd
CloudPlatform: lxd
CloudSubPlatform: LXD socket API v. 1.0 (/dev/lxd/sock)
Date: Fri Oct 20 09:49:25 2023
PackageArchitecture: all
ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: ceph
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Nobuto Murata (nobuto) wrote :
tags: added: field-ceph-dashboard
Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-high.

Even though it may be an upstream issue, we should look into this since whenever somebody clicks the tab, the whole Ceph cluster status will turn into WARNING and that will trigger alerts for operators.

Revision history for this message
Nobuto Murata (nobuto) wrote :

The test cluster was deployed with the steps in:
https://bugs.launchpad.net/charm-ceph-dashboard/+bug/2039763/comments/1

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Samuel Allan (samuelallan) wrote :
Download full text (5.7 KiB)

Definitely an upstream issue, not related to the ceph-dashboard charm.

Exploring the ceph repository:

`src/pybind/mgr/dashboard/controllers/nfs.py`

```
    @Endpoint()
    @ReadPermission
    def status(self):
        status = {'available': True, 'message': None}
        try:

            # this is where the call happens that causes the crash - the crash is coming from ceph though, not the fault of this
            # NOTE: running `sudo ceph nfs cluster ls` prints:
            # Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
            # but does not show a traceback.
            # This may be limited to the python api?
            mgr.remote('nfs', 'cluster_ls')

        except (ImportError, RuntimeError) as error:
            logger.exception(error)
            status['available'] = False
            status['message'] = str(error) # type: ignore

        return status
```

When the orchestrator is not present, we see this traceback:

```
{
    "archived": "2023-11-20 04:58:57.151697",
    "backtrace": [
        " File \"/usr/share/ceph/mgr/nfs/module.py\", line 169, in cluster_ls\n return available_clusters(self)",
        " File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n completion = mgr.describe_service(service_type='nfs')",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n completion = self._oremote(method_name, args, kwargs)",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.6",
    "crash_id": "2023-11-20T04:47:16.737623Z_8a944527-1cc1-4ed5-b58b-86bf97bcf3b1",
    "entity_name": "mgr.juju-108031-1-lxd-1",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "NoOrchestrator",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04.3 LTS",
    "os_version": "22.04.3 LTS (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mgr",
    "stack_sig": "b01db59d356dd52f69bfb0b128a216e7606f54a60674c3c82711c23cf64832ce",
    "timestamp": "2023-11-20T04:47:16.737623Z",
    "utsname_hostname": "juju-108031-1-lxd-1",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-88-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023"
}

```

I guess this is the part that maps directly to the `cluster_ls` method:
```
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
```

This is `cluster_ls`, in `src/pybind/mgr/nfs/module.py`.

```
    # this raises an error, causing a module crash, if orchestrator is not available
    def cluster_ls(self) -> List[str]:
        return available_clusters(self)
```

^ This is the root of the traceback we're seeing.

I guess the reason we're seeing a crash, is because this method doesn't catch any errors thrown from `available_clusters`.
For reference, other methods I've checked here will handle the error.
For example:

(in `src/pybind/mgr/nfs/...

Read more...

Revision history for this message
Samuel Allan (samuelallan) wrote :
Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

I'm not sure that patch is the correct fix. Looking at how other controllers for the dashboard operate, it would appear that the `NFSGanesha.status` and `NFSGaneshaCluster.list` methods should be decorated with the
@raise_if_no_orchestrator and @handle_orchestrator_error functions. Like so:

```
class NFSGanesha(RESTController):

  @EndPoint()
  @ReadPermission
  @raise_if_no_orchestrator()
  @handle_orchestrator_error('nfs')
  def status(self):
    ...
```

Revision history for this message
Samuel Allan (samuelallan) wrote :

Hmm I see what you mean for there, but I'm not sure about it in the context of the status check endpoints - the others don't have this error handler, and the client expects a specific json response.

The core issue here too is that the mgr module crashes when calling the cluster_ls method too, which is what I was trying to solve by catching the missing orchestrator error there, and introducing a new method to check if nfs is available.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ceph (Ubuntu):
status: New → Confirmed
Revision history for this message
Ponnuvel Palaniyappan (pponnuvel) wrote :

The proposed patch [0] to fix this has been merged in main.

I have created the backport PRs:
Reef: https://github.com/ceph/ceph/pull/58283
Squid: https://github.com/ceph/ceph/pull/58285
Quincy: https://github.com/ceph/ceph/pull/58284

[0] https://github.com/ceph/ceph/pull/56876

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.