oscap crashes during audit on the system with ceph-mds package installed

Bug #2060345 reported by Przemyslaw Hausman
26
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Security Guide
Invalid
Undecided
Unassigned
openscap (Ubuntu)
In Progress
Undecided
Eduardo Barretto
Focal
In Progress
Undecided
Eduardo Barretto
Jammy
In Progress
Undecided
Eduardo Barretto

Bug Description

Ubuntu 22.04.4 LTS
usg version: 22.04.6

On the system with ceph-mds package installed, running `usg audit ...` produces report with several rules having "unknown" result. Looking at the logfile produced by `usg audit`, I can see that oscap fails as follows:

```
I: oscap: Evaluating systemdunitdependency test 'oval:ssg-test_multi_user_wants_rsyslog:tst:1': systemd test.
I: oscap: Querying systemdunitdependency object 'oval:ssg-object_multi_user_target_for_rsyslog_enabled:obj:1', flags: 0.
I: oscap: Creating new syschar for systemdunitdependency_object 'oval:ssg-object_multi_user_target_for_rsyslog_enabled:obj:1'.
I: oscap: Starting probe on URI 'pipe:///usr/lib/x86_64-linux-gnu/openscap/probe_systemdunitdependency'.
I: oscap: FAIL: recv failed: dsc=0x5640da79d670, errno=4, Interrupted system call.
I: oscap: FAIL: ctx=0x5640db712780, sd=9, errno=4, Interrupted system call.
W: oscap: Can't receive message: 4, Interrupted system call.
E: oscap: Can't close sd: 10, No child processes.
E: oscap: Recv: retry limit (0) reached.
I: oscap: Test 'oval:ssg-test_multi_user_wants_rsyslog:tst:1' evaluated as (null).
```

At the same time, crash file is created in /var/crash/_usr_lib_x86_64-linux-gnu_openscap_probe_systemdunitdependency.0.crash.

Rules resulting in "unknown" state:
- Enable systemd-journald Service (xccdf_org.ssgproject.content_rule_service_systemd-journald_enabled)
- Enable rsyslog Service (xccdf_org.ssgproject.content_rule_service_rsyslog_enabled)
- Verify nftables Service is Enabled (xccdf_org.ssgproject.content_rule_service_nftables_enabled)
- Enable cron Service (xccdf_org.ssgproject.content_rule_service_cron_enabled)

Potentially related to:
- https://github.com/OpenSCAP/openscap/issues/1738
- https://github.com/OpenSCAP/openscap/pull/1533

Steps to reproduce:

1. Bootstrap localhost Juju controller
```
juju bootstrap localhost
```

2. Create two LXD machines for testing
```
# Create reference ubuntu LXD machine
juju deploy ubuntu --series jammy

# Create ceph-mon LXD machine
juju deploy ceph-mon --series jammy --channel quincy/stable
```

3. Push CIS tailoring file to both `ubuntu` and `ceph-mon` machines
```
lxc file push cis_level1_server_tailoring.xml <ubuntu-lxc-machine-name>/home/ubuntu/
lxc file push cis_level1_server_tailoring.xml <ceph-mon-lxc-machine-name>/home/ubuntu/
```

4. Apply CIS hardening on both `ubuntu` and `ceph-mon` machines.

```
# Attach UPro to a machine under test
sudo pro attach <ubuntu-pro-token>

# Enable CIS
sudo pro enable usg

# Install USG
sudo apt-get update --yes && sudo apt-get install --yes usg

# Apply CIS hardening
sudo usg fix --debug --tailoring-file cis_level1_server_tailoring.xml

# Reboot machine after applying the hardening
sudo reboot

# Audit
sudo usg audit --debug --tailoring-file cis_level1_server_tailoring.xml

# Review audit results, check files in /var/lib/usg:
# - remediation-<date>.log
# - usg-report-<date>.html
# - usg-log-<date>.log

# Finally detach Ubuntu Pro if you're not using it anymore
sudo pro detach
```

As a result of this reproducer, the audit on `ceph-mon` machine ends up with rules mentioned above in "unknown" state, while for `ubuntu` machine, audit is successful.

I have compared packages installed on `ubuntu` and `ceph-mon` machines and, by elimination, I identified that ceph-mds package is causing the problem.

Analysing the CoreDump from the .crash file shows that the problem occurs when ceph-mds.target is being analysed. See frame 14 below:

```
#0 0x00007fd7e07e31a8 in _dbus_validate_signature_with_reason () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#1 0x00007fd7e07e39bd in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#2 0x00007fd7e07e3770 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#3 0x00007fd7e07e3a59 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#4 0x00007fd7e07e3c30 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#5 0x00007fd7e07e3da4 in _dbus_validate_body_with_reason () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#6 0x00007fd7e07faf05 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#7 0x00007fd7e07e7157 in _dbus_message_loader_queue_messages () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#8 0x00007fd7e07ef820 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#9 0x00007fd7e07ef95d in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#10 0x00007fd7e07f17b1 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#11 0x00007fd7e07f1c9d in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#12 0x00007fd7e07d65ed in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#13 0x00007fd7e07eadcc in dbus_pending_call_block () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#14 0x00005628b3497d08 in get_property_by_unit_path (conn=conn@entry=0x7fd7cc001d00, unit_path=unit_path@entry=0x7fd7cd03a840 "/org/freedesktop/systemd1/unit/ceph_2dmds_2etarget",
    property=<optimized out>, property@entry=0x5628b349d3c8 "Requires") at ../../../../src/OVAL/probes/unix/linux/systemdunitdependency.c:82
#15 0x00005628b349823a in get_all_dependencies_by_unit (conn=conn@entry=0x7fd7cc001d00, unit=<optimized out>, cbarg=cbarg@entry=0x7fd7cc0031c0, include_requires=include_requires@entry=true,
    include_wants=include_wants@entry=true, callback=0x5628b34932f0 <dependency_callback>) at ../../../../src/OVAL/probes/unix/linux/systemdunitdependency.c:149
#16 0x00005628b34981a5 in get_all_dependencies_by_unit (conn=conn@entry=0x7fd7cc001d00, unit=<optimized out>, cbarg=cbarg@entry=0x7fd7cc0031c0, include_requires=include_requires@entry=true,
    include_wants=include_wants@entry=true, callback=0x5628b34932f0 <dependency_callback>) at ../../../../src/OVAL/probes/unix/linux/systemdunitdependency.c:182
...
```

I have attached tailoring file (xml), audit report (html), audit log and crash file. All in a zip file.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Eduardo Barretto (ebarretto) wrote :

Hi Przemyslaw,

As you mentioned, this is not an issue on the normal Ubuntu machine. Therefore it is something about the `ceph-mom` machine. I would recommend contacting whomever produces it to understand what is the difference.
This certainly seems outside the expected target of usg/CIS, which is based for default Server/Desktop images of Ubuntu and that specific machine we have no visibility on what it is.

Changed in usg:
status: New → Opinion
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

I'm sorry, Eduardo, but I have to disagree. oscap crashes with a core dump during audit. Even if an application installed on Ubuntu is misbehaving, the auditing tool should not crash. Have you analysed core dump? Why did oscap crash?

To give you some more context. ceph-mds is a part of Ceph, quite major building block for environments such as OpenStack or Kubernetes, that we build for customers. CIS hardening is becoming more and more requested feature. It is in our interest to make sure that the CIS hardening works well with Ceph.

This bug is a result of CIS hardening effort for one of our prominent customers. I'm subscribing field-high and once again ask you to take a look into this problem.

I'm attaching some more files from failed CIS audit on a fresh Ubuntu 22.04, with a ceph-mds package installed. Audit crashes for the following rules:

xccdf_org.ssgproject.content_rule_service_systemd-journald_enabled
xccdf_org.ssgproject.content_rule_service_rsyslog_enabled
xccdf_org.ssgproject.content_rule_service_ufw_enabled
xccdf_org.ssgproject.content_rule_service_cron_enabled
xccdf_org.ssgproject.content_rule_postfix_network_listening_disabled
xccdf_org.ssgproject.content_rule_service_timesyncd_enabled

Attached please see lp2060345.tar.gz with the following files:

/var/crash/_usr_lib_x86_64-linux-gnu_openscap_probe_systemdunitdependency.0.crash
/var/lib/usg/usg-log-20240415.1554.log
/var/lib/usg/usg-results-20240415.1554.xml
/var/lib/usg/usg-report-20240415.1554.html
/var/lib/usg/ssg-ubuntu2204-oval.xml.result-20240415.1554.xml
/var/lib/usg/ssg-ubuntu2204-cpe-oval.xml.result-20240415.1554.xml

Changed in usg:
status: Opinion → New
Revision history for this message
Peter Jose De Sousa (pjds) wrote (last edit ):

okay, I suspect the pointer is just returned NULL, the for loop is just incrementing the memory address. Eventually, the pointer is incremented enough that it points outside of the application memory address space triggering SIGSEGV for memory access violation

[1]https://github.com/OpenSCAP/openscap/blob/7f94172ec69cf887b2347f3aff7c17389c629047/src/OVAL/probes/unix/linux/systemdunitdependency_probe.c#L156
[2] https://github.com/OpenSCAP/openscap/blob/7f94172ec69cf887b2347f3aff7c17389c629047/src/OVAL/probes/unix/linux/systemdunitdependency_probe.c#L159

pointer is just incremented repeatedly
[3] https://github.com/OpenSCAP/openscap/blob/7f94172ec69cf887b2347f3aff7c17389c629047/src/OVAL/probes/unix/linux/systemdunitdependency_probe.c#L165 - the function just returns, the value is invalid
[4] https://github.com/OpenSCAP/openscap/blob/7f94172ec69cf887b2347f3aff7c17389c629047/src/OVAL/probes/unix/linux/systemdunitdependency_probe.c#L159

for loop continues, pointer memory address is incremented.
thats my suspicion I'd need to recompile openscap with optimisation disabled to confirm

no longer affects: openscap
Changed in usg:
status: New → Invalid
Revision history for this message
Peter Jose De Sousa (pjds) wrote :
Revision history for this message
Eduardo Barretto (ebarretto) wrote :

Peter, do note that this fix never landed on 1.2 openscap, it will require some backporting.
To land this fix it should be done through an SRU process.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

@pjds, thanks a lot for troubleshooting the issue and finding the bug!

From the conversation outside of launchpad, we have also learned that building 1.2.17 from source (thanks to @fdesi) on jammy produces the oscap build that does not crash, even with circular dependency.

@ebaretto, please let us know if there's anything else we can help you with kicking off the fixing process.

Revision history for this message
Eduardo Barretto (ebarretto) wrote :

@phausman I won't be doing the SRU. Since Peter is investigating it, it is best if it comes from him.

If you are building from source and it does not produce a crash, then the bug mentioned by Peter is not really necessary and something else might be the issue.
As the circular dependency does not happen on a normal Ubuntu image, my belief is that this is still an issue with systemd in this ceph-mds image.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Hi Eduardo, we have already spent countless hours troubleshooting the issue, collecting logs, reporting the problem. We went above and beyond analyzing the crash dump, researching bugs in upstream, eventually building from source and providing the feedback. I believe we have supported the process of maintaining high quality of our software already well enough. I would expect that the Security Team takes care of whatever needs to be done to fix the issue now, since it is now apparent where the problem lies.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :

The most simple reproducer:

1. Have a machine with Ubuntu Pro attached, and usg enabled; bare-metal, VM or LXC container
2. Run:

apt install ceph-mds
usg generate-tailoring cis_level1_server /root/cis-l1.xml
usg audit --tailoring-file /root/cis-l1.xml

Crash happens during `usg audit`.

Revision history for this message
Miha Purg (mihap) wrote (last edit ):

I narrowed down the issue to 'ceph-*.target' unit files (e.g. ceph-mds.target, ceph-mon.target).
The unit files list 'ceph.target' both in the "Wants" and the "WantedBy" directives, which trigger the
bug in OpenSCAP. If either of the directives are removed (in all problematic files), OpenSCAP no longer crashes:

---
[Unit]
Description=ceph target allowing to start/stop all ceph-mds@.service instances at once
PartOf=ceph.target
After=ceph-mon.target
Before=ceph.target
#Wants=ceph.target ceph-mon.target
Wants=

[Install]
#WantedBy=multi-user.target ceph.target
WantedBy=multi-user.target
---

# systemctl daemon-reload
# systemctl reenable ceph*target

I'm not familiar with ceph so I don't really know how either of these changes will
affect functionality, but it might be worth looking into as a potential
workaround for the time being.

no longer affects: openscap (Ubuntu)
Changed in openscap (Ubuntu):
status: New → Confirmed
Changed in openscap (Ubuntu Focal):
status: New → In Progress
Changed in openscap (Ubuntu Jammy):
status: New → In Progress
Changed in openscap (Ubuntu Focal):
assignee: nobody → Eduardo Barretto (ebarretto)
Changed in openscap (Ubuntu Jammy):
assignee: nobody → Eduardo Barretto (ebarretto)
Changed in openscap (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Eduardo Barretto (ebarretto)
summary: - oscap crashes during audit on the system with ceph-mds package installed
+ [SRU] oscap crashes during audit on the system with ceph-mds package
+ installed
summary: - [SRU] oscap crashes during audit on the system with ceph-mds package
- installed
+ oscap crashes during audit on the system with ceph-mds package installed
Revision history for this message
Eduardo Barretto (ebarretto) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Thanks a lot, @ebaretto & @mihap!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.