cephfs ganesha deployment fails - no space left on device

Bug #2031106 reported by Lukas Koenen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Invalid
Undecided
Unassigned
tripleo
New
Undecided
Manojkatari

Bug Description

While deploying manila using the "manila-cephfsganesha-config" heat template the deployment fails during the "get dbus-1 file" task which is part of the "tripleo_cephadm" ansible role (tasks/nfs.yaml) due to "no space left on device". This happens during the container creation.

## Observations

Unlike other tasks of the role "get dbus-1 file" uses the default user and not root. The default user has very limited disk space (/dev/mapper/vg-lv_home 1.2G 630M 555M 54% /home) and watching disk space during deployment "/dev/mapper/vg-lv_home" spikes up close to 100% usage which might be the reason for the error.

## Possible Solution

Using "become: true" and therefore the root user in the "get dbus-1 file" tasks fixes the problem.

## Steps to reproduce

Add "/usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsganesha-config.yaml" to the deployment.

## Error
(the registry url has been chagned due to security reasons)

{
  "changed": false,
  "cmd": [
    "podman",
    "run",
    "--rm",
    "--entrypoint=cat",
    "registry:5000/tripleowallabycentos9/daemon:current-ceph",
    "/etc/dbus-1/system.d/org.ganesha.nfsd.conf"
  ],
  "delta": "0:00:20.367417",
  "end": "2023-06-16 10:56:14.870843",
  "msg": "non-zero return code",
  "rc": 125,
  "start": "2023-06-16 10:55:54.503426",
  "stderr": "Trying to pull registry:5000/tripleowallabycentos9/daemon:current-ceph...\nGetting image source signatures\nCopying blob sha256:bb9e6989315f6f108f7cab8ae301e09c9e41a982d776c25860169efaeb5f7530\nCopying blob sha256:f1ee40d9db4a2bf9b96ea48d6cb45c602a6761650f67dc84bba5a0d2495e845a\nCopying blob sha256:0d557d32f54ebd277fdffbbdf656b90442ee9d8753aec9ebac429eee967f4dee\nCopying blob sha256:6c5de04c936da27e33992af1e54e929f1cb39c8e1473d9d25ed1f1dc2d842fd4\nCopying blob sha256:49a61133ba378a1ca5e8f77b3c05ed9980f80e89ff729db22a51bb524363e6a5\nCopying blob sha256:17facd475902d6709cff908630b59271c7ad18f64c3a1d0143d438c6988504ef\nCopying blob sha256:fbd68c38accbcb670817752f4faa8f3f845774fab06dc0b5e4b8c860f787ecd3\nCopying blob sha256:a57fb466f17b81e5b74877994d78b14170898db92c980ec5842dcfcbeefb324f\nCopying blob sha256:24972a36f4e1f7d9063e3cbc4ff3c171371cdeee9529e5581ab6e352a574b818\nCopying blob sha256:f48748416b3be33005287f0c884e0f98eebd74037311be116f8eff3cd85c3d51\nCopying blob sha256:e6e5b41683139f3942ff86736d3d416020b9b38c7d118df4ccf71e6809136c65\nCopying blob sha256:93cc817d8e84cb7580ee7d1930702f1ec087a93bcb30534a0c23341344113bb4\nCopying blob sha256:6e10a1f887f8a529f097f805ce4f6f64b06f13cfb638a6e1322bebe5113a86f4\nError: writing blob: adding layer with blob \"sha256:49a61133ba378a1ca5e8f77b3c05ed9980f80e89ff729db22a51bb524363e6a5\": processing tar file(write /usr/lib64/samba/libsmbd-base-samba4.so: no space lefton device): exit status 1",
  "stderr_lines": [
    "Trying to pull registry:5000/tripleowallabycentos9/daemon:current-ceph...",
    "Getting image source signatures",
    "Copying blob sha256:bb9e6989315f6f108f7cab8ae301e09c9e41a982d776c25860169efaeb5f7530",
    "Copying blob sha256:f1ee40d9db4a2bf9b96ea48d6cb45c602a6761650f67dc84bba5a0d2495e845a",
    "Copying blob sha256:0d557d32f54ebd277fdffbbdf656b90442ee9d8753aec9ebac429eee967f4dee",
    "Copying blob sha256:6c5de04c936da27e33992af1e54e929f1cb39c8e1473d9d25ed1f1dc2d842fd4",
    "Copying blob sha256:49a61133ba378a1ca5e8f77b3c05ed9980f80e89ff729db22a51bb524363e6a5",
    "Copying blob sha256:17facd475902d6709cff908630b59271c7ad18f64c3a1d0143d438c6988504ef",
    "Copying blob sha256:fbd68c38accbcb670817752f4faa8f3f845774fab06dc0b5e4b8c860f787ecd3",
    "Copying blob sha256:a57fb466f17b81e5b74877994d78b14170898db92c980ec5842dcfcbeefb324f",
    "Copying blob sha256:24972a36f4e1f7d9063e3cbc4ff3c171371cdeee9529e5581ab6e352a574b818",
    "Copying blob sha256:f48748416b3be33005287f0c884e0f98eebd74037311be116f8eff3cd85c3d51",
    "Copying blob sha256:e6e5b41683139f3942ff86736d3d416020b9b38c7d118df4ccf71e6809136c65",
    "Copying blob sha256:93cc817d8e84cb7580ee7d1930702f1ec087a93bcb30534a0c23341344113bb4",
    "Copying blob sha256:6e10a1f887f8a529f097f805ce4f6f64b06f13cfb638a6e1322bebe5113a86f4",
    "Error:writing blob: adding layer with blob \"sha256:49a61133ba378a1ca5e8f77b3c05ed9980f80e89ff729db22a51bb524363e6a5\": processing tar file(write /usr/lib64/samba/libsmbd-base-samba4.so: no space left on device): exit status 1"
  ],
  "stdout": "",
  "stdout_lines": []
}

Tags: ganesha
description: updated
description: updated
description: updated
Vida Haririan (vhariria)
tags: added: ganesha
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

This one looks interesting. I quickly checked the tripleo_cephadm role but there are a few more tasks launching containers without become: true. (The "Get Ceph version" right after "get dbus-1 file" task, for example).

I'm not too sure if that very small /home partition is desired for operations, I tend to agree that adding become: true makes sense, because that would avoid pulling the same ceph container image for different users(root and tripleo-admin).

Revision history for this message
Francesco Pantano (fmount) wrote :

Even though that task is supposed to get a file and register (hence no `become: true` is required), I understand that it causes re-pulling an image w/ a diff user, which let the playbook fail while adding the following layer:

```
    "Error:writing blob: adding layer with blob \"sha256:49a61133ba378a1ca5e8f77b3c05ed9980f80e89ff729db22a51bb524363e6a5\": processing tar file(write /usr/lib64/samba/libsmbd-base-samba4.so: no space left on device): exit status 1"
  ],
```

Having a `dev/mapper/vg-lv_home 1.2G 630M 555M 54% /home` is too small and not recommended (even for dev purposes), but we're going to investigate if running that single task with `become: true` (which doesn't hurt in this case) will solve the issue.

Manojkatari (mkatari)
Changed in manila:
assignee: nobody → Manojkatari (mkatari)
Vida Haririan (vhariria)
Changed in manila:
status: New → Invalid
Revision history for this message
Vida Haririan (vhariria) wrote :
Changed in tripleo:
assignee: nobody → Manojkatari (mkatari)
Changed in manila:
assignee: Manojkatari (mkatari) → nobody
Revision history for this message
Manojkatari (mkatari) wrote :

Hi Lukas Koenen,

I couldn't reproduce the issue on a deployment with "manila-cephfsganesha-config" template.

Is the issue tentative or specific to few environments ?

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

@Manoj

The issue is specific to their deployment which has quite small /home partition.
However you may be able to check the container images in overcloud nodes, then you'd see the ceph image is pulled by both root and tripleo-admin.

[tripleo-admin@controller-0 ~]$ sudo podman images
[tripleo-admin@controller-0 ~]$ podman images

Revision history for this message
Lukas Koenen (lukaskoenen) wrote (last edit ):

Hi,

sorry for the late answer. We use the overcloud-hardened-uefi-full.qcow2 image (https://images.rdoproject.org/centos9/wallaby/rdo_trunk/current-tripleo/) in our tripleo baremetal deployment and do not modify those in any way. So I would suspect this issue to occur in this kind of deployment setup.

Also I am not sure if I have ever seen an option to increase/decrease partition size in any Heat template.

I do remember a change in the overcloud images which had to do something with thin-provisioning, I am however not sure if this has any relevance.

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I think it would be preferable to use become:true to ensure container images are stored in /var/lib instead of /home/tripleo-admin. The size of the /home volume is chosen on the assumption that there are no substantial storage requirements for a typical install, and manila-cephfsganesha-config seems to be unique in requiring that storage.

If this fix is not possible, another workaround would be to set custom growvols args for this role to grow /home to the required size, see:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#grow-volumes-playbook

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.