prometheus-ceph-exporter hangs up with confinement

Bug #1933869 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Prometheus Ceph Exporter Charm
New
Undecided
Unassigned

Bug Description

prometheus-ceph-exporter is deployed via a charm into a LXD container.

$ snap version
snap 2.51
snapd 2.51
series 16
ubuntu 20.04
kernel 5.8.0-59-generic

$ snap list prometheus-ceph-exporter
Name Version Rev Tracking Publisher Notes
prometheus-ceph-exporter 3.0.0-nautilus 21 latest/edge xavpaice -

When the binary is under the snap confinement, the process hangs up. However, when it's invoked directly without the confinement, it works fine.

[snap]

$ sudo snap run prometheus-ceph-exporter.ceph-exporter
* Running /snap/prometheus-ceph-exporter/21/bin/ceph_exporter with args: -ceph.user prometheus-ceph-exporter
2021/06/28 18:36:29 Starting ceph exporter on ":9128"

$ curl localhost:9128/metrics
-> hangs forever

Jun 28 18:36:29 cloud-node-005 kernel: audit: type=1400 audit(1624905389.350:1603): apparmor="DENIED" operation="open" namespace="root//lxd-juju-d2d9d5-4-lxd-12_<var-snap-lxd-common-lxd>" profile="snap.prometheus-ceph-exporter.ceph-exporter" name="/var/tmp/" pid=2011184 comm="ceph_exporter" requested_mask="r" denied_mask="r" fsuid=1000000 ouid=1000000

[direct]

$ sudo /snap/prometheus-ceph-exporter/21/bin/ceph_exporter -ceph.config /var/snap/prometheus-ceph-exporter/current/ceph.conf -ceph.user exporter -ceph.user prometheus-ceph-exporter
2021/06/28 18:38:08 Starting ceph exporter on ":9128"

$ curl localhost:9128/metrics
-> returns output

Revision history for this message
Nobuto Murata (nobuto) wrote :

$ systemctl cat snap.prometheus-ceph-exporter.ceph-exporter.service
# /etc/systemd/system/snap.prometheus-ceph-exporter.ceph-exporter.service
[Unit]
# Auto-generated, DO NOT EDIT
Description=Service for snap application prometheus-ceph-exporter.ceph-exporter
Requires=snap-prometheus\x2dceph\x2dexporter-21.mount
Wants=network.target
After=snap-prometheus\x2dceph\x2dexporter-21.mount network.target snapd.apparmor.service
X-Snappy=yes

[Service]
EnvironmentFile=-/etc/environment
ExecStart=/usr/bin/snap run prometheus-ceph-exporter.ceph-exporter
SyslogIdentifier=prometheus-ceph-exporter.ceph-exporter
Restart=on-failure
WorkingDirectory=/var/snap/prometheus-ceph-exporter/21
TimeoutStopSec=30
Type=simple

[Install]
WantedBy=multi-user.target

Revision history for this message
Nobuto Murata (nobuto) wrote :

$ systemctl cat snap-prometheus\\x2dceph\\x2dexporter-21.mount
# /etc/systemd/system/snap-prometheus\x2dceph\x2dexporter-21.mount
[Unit]
Description=Mount unit for prometheus-ceph-exporter, revision 21
Before=snapd.service
After=zfs-mount.service

[Mount]
What=/var/lib/snapd/snaps/prometheus-ceph-exporter_21.snap
Where=/snap/prometheus-ceph-exporter/21
Type=fuse.squashfuse
Options=nodev,ro,x-gdu.hide,x-gvfs-hide,allow_other
LazyUnmount=yes

[Install]
WantedBy=multi-user.target

# /run/systemd/generator/snap-prometheus\x2dceph\x2dexporter-21.mount.d/container.conf
[Mount]
Type=fuse.squashfuse
Options=nodev,ro,x-gdu.hide,allow_other
LazyUnmount=yes

Revision history for this message
Nobuto Murata (nobuto) wrote :

Attaching ~field-critical as it's blocking a customer engagement.

Revision history for this message
Nobuto Murata (nobuto) wrote :

An ugly workaround is in place for the time being. Downgrading it to ~field-high.

/etc/systemd/system/snap.prometheus-ceph-exporter.ceph-exporter.service.d/override.conf

[Service]
ExecStart=
ExecStart=/snap/prometheus-ceph-exporter/current/bin/ceph_exporter -ceph.config /var/snap/prometheus-ceph-exporter/current/ceph.conf -ceph.user prometheus-ceph-exporter

Revision history for this message
Paweł Stołowski (stolowski) wrote :

Is the apparmor denial from the description the only denial you see? Could you please attach full journal log?

Also, could you run it with --strace and attach the output? e.g.

$ sudo snap run --strace=-f prometheus-ceph-exporter.ceph-exporter

Revision history for this message
Nobuto Murata (nobuto) wrote :

Thanks for the response.

> $ sudo snap run --strace=-f prometheus-ceph-exporter.ceph-exporter

This actually fails probably due to running the snap inside the LXD container (which is the default architecture of our cloud delivery).

# snap run --strace=-f prometheus-ceph-exporter.ceph-exporter
error: exit status 1

Jun 29 09:18:34 cloud-node-005 kernel: audit: type=1400 audit(1624958314.224:5104): apparmor="DENIED" operation="ptrace" namespace="root//lxd-juju-d2d9d5-4-lxd-12_<var-snap-lxd-common-lxd>" profile="/snap/snapd/12159/usr/lib/snapd/snap-confine" pid=1689246 comm="strace" requested_mask="tracedby" denied_mask="tracedby" peer="unconfined"
Jun 29 09:18:34 cloud-node-005 kernel: audit: type=1400 audit(1624958314.224:5105): apparmor="DENIED" operation="exec" info="ptrace prevents transition" error=-13 profile="lxd-juju-d2d9d5-4-lxd-12_</var/snap/lxd/common/lxd>" name="/snap/snapd/12159/usr/lib/snapd/snap-confine" pid=1689246 comm="strace" requested_mask="x" denied_mask="x" fsuid=1000000 ouid=1000000 target="lxd-juju-d2d9d5-4-lxd-12_</var/snap/lxd/common/lxd>//&:lxd-juju-d2d9d5-4-lxd-12_<var-snap-lxd-common-lxd>:/snap/snapd/12159/usr/lib/snapd/snap-confine"
Jun 29 09:18:34 cloud-node-005 kernel: audit: type=1400 audit(1624958314.224:5106): apparmor="DENIED" operation="exec" info="ptrace prevents transition" error=-13 namespace="root//lxd-juju-d2d9d5-4-lxd-12_<var-snap-lxd-common-lxd>" profile="unconfined" name="/snap/snapd/12159/usr/lib/snapd/snap-confine" pid=1689246 comm="strace" requested_mask="x" denied_mask="x" fsuid=1000000 ouid=1000000 target="/snap/snapd/12159/usr/lib/snapd/snap-confine"

Revision history for this message
Nobuto Murata (nobuto) wrote :

I've realized that it's not only about confined vs direct process, but also using librados.so.2 inside the snap vs librados.so.2 in the system.

I might have seen a related bug before. /me searching...

Revision history for this message
Nobuto Murata (nobuto) wrote :

It was a librados problem in the end.

> 2021-06-29T10:08:06.204+0000 7f1ad7a7a700 0 cephx server client.prometheus-ceph-exporter: attempt to reclaim global_id 63486 without presenting ticket
> 2021-06-29T10:08:06.204+0000 7f1ad7a7a700 0 cephx server client.prometheus-ceph-exporter: could not verify old ticket

This clearly suggests the issue is about the incompatibility with the latest security flag:
https://bugs.launchpad.net/charm-ceph-mon/+bug/1929262

no longer affects: snapd
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.