Comment 5 for bug 2039955

Revision history for this message
Samuel Allan (samuelallan) wrote :

Definitely an upstream issue, not related to the ceph-dashboard charm.

Exploring the ceph repository:

`src/pybind/mgr/dashboard/controllers/nfs.py`

```
    @Endpoint()
    @ReadPermission
    def status(self):
        status = {'available': True, 'message': None}
        try:

            # this is where the call happens that causes the crash - the crash is coming from ceph though, not the fault of this
            # NOTE: running `sudo ceph nfs cluster ls` prints:
            # Error ENOENT: No orchestrator configured (try `ceph orch set backend`)
            # but does not show a traceback.
            # This may be limited to the python api?
            mgr.remote('nfs', 'cluster_ls')

        except (ImportError, RuntimeError) as error:
            logger.exception(error)
            status['available'] = False
            status['message'] = str(error) # type: ignore

        return status
```

When the orchestrator is not present, we see this traceback:

```
{
    "archived": "2023-11-20 04:58:57.151697",
    "backtrace": [
        " File \"/usr/share/ceph/mgr/nfs/module.py\", line 169, in cluster_ls\n return available_clusters(self)",
        " File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n completion = mgr.describe_service(service_type='nfs')",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n completion = self._oremote(method_name, args, kwargs)",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.6",
    "crash_id": "2023-11-20T04:47:16.737623Z_8a944527-1cc1-4ed5-b58b-86bf97bcf3b1",
    "entity_name": "mgr.juju-108031-1-lxd-1",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "NoOrchestrator",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04.3 LTS",
    "os_version": "22.04.3 LTS (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mgr",
    "stack_sig": "b01db59d356dd52f69bfb0b128a216e7606f54a60674c3c82711c23cf64832ce",
    "timestamp": "2023-11-20T04:47:16.737623Z",
    "utsname_hostname": "juju-108031-1-lxd-1",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-88-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023"
}

```

I guess this is the part that maps directly to the `cluster_ls` method:
```
"mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
```

This is `cluster_ls`, in `src/pybind/mgr/nfs/module.py`.

```
    # this raises an error, causing a module crash, if orchestrator is not available
    def cluster_ls(self) -> List[str]:
        return available_clusters(self)
```

^ This is the root of the traceback we're seeing.

I guess the reason we're seeing a crash, is because this method doesn't catch any errors thrown from `available_clusters`.
For reference, other methods I've checked here will handle the error.
For example:

(in `src/pybind/mgr/nfs/cluster.py`, called from `ceph nfs cluster ls` handler in `_cmd_nfs_cluster_ls()` in `src/pybind/mgr/nfs/module.py`)

```
    def list_nfs_cluster(self) -> List[str]:
        try:
            return available_clusters(self.mgr)
        except Exception as e:
            log.exception("Failed to list NFS Cluster")
            raise ErrorResponse.wrap(e)
```

I tried the same pattern of catching the error, and raising `ErrorResponse` within `cluster_ls`,
but that still resulted in a crash:

```
{
    "backtrace": [
        " File \"/usr/share/ceph/mgr/nfs/module.py\", line 173, in cluster_ls\n return available_clusters(self)",
        " File \"/usr/share/ceph/mgr/nfs/utils.py\", line 38, in available_clusters\n completion = mgr.describe_service(service_type='nfs')",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1488, in inner\n completion = self._oremote(method_name, args, kwargs)",
        " File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 1555, in _oremote\n raise NoOrchestrator()",
        "orchestrator._interface.NoOrchestrator: No orchestrator configured (try `ceph orch set backend`)",
        "\nThe above exception was the direct cause of the following exception:\n",
        "Traceback (most recent call last):",
        " File \"/usr/share/ceph/mgr/nfs/module.py\", line 175, in cluster_ls\n raise ErrorResponse.wrap(e)",
        "object_format.ErrorResponse: No orchestrator configured (try `ceph orch set backend`)"
    ],
    "ceph_version": "17.2.6",
    "crash_id": "2023-11-20T04:59:04.018086Z_2a16b6a4-85e5-49ee-93f0-c1b552f1df06",
    "entity_name": "mgr.juju-108031-1-lxd-1",
    "mgr_module": "nfs",
    "mgr_module_caller": "ActivePyModule::dispatch_remote cluster_ls",
    "mgr_python_exception": "ErrorResponse",
    "os_id": "22.04",
    "os_name": "Ubuntu 22.04.3 LTS",
    "os_version": "22.04.3 LTS (Jammy Jellyfish)",
    "os_version_id": "22.04",
    "process_name": "ceph-mgr",
    "stack_sig": "6a64a2a392fc0ad969c705c51ccec3206fab079f3c53ef566d1ed1d6f5088851",
    "timestamp": "2023-11-20T04:59:04.018086Z",
    "utsname_hostname": "juju-108031-1-lxd-1",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-88-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023"
}
```

I'm not sure what kind of pattern is required here for this kind of remote module method call where it's not a cli command.
We still need to convey an error response to the remote called (eg. ceph-dashboard in this case),
but without "crashing".