Random auditd start failures on Ubuntu 20.04 EC2 AMIs

Bug #1989599 reported by Alan Sparks
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
audit (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Description: Ubuntu 20.04.5 LTS
Release: 20.04

linux-image-aws 5.15.0.1019.23~20.04.11
auditd 1:2.8.5-2ubuntu6

I am having issues with auditd on Ubuntu 20.04 LTS Ubuntu official AMIs. I have tested this with published AMIs ami-0123376e204addb71 and ami-00bb3d0b5b36e89b8.

I am following a process that has worked up to June 20 2022. The process installs and configures the audit package for CIS hardening. The process steps are:

• Launch an instance as a base, I’ve used ami-0123376e204addb71 or ami-00bb3d0b5b36e89b8 (official Ubuntu AMIs).
• Installed the packages listed below.
• Copied the “auditdconf” contents as /etc/audit/auditd.conf
• Copied the “auditrules” contents as /etc/audit/rules.d/audit.rules
• Edit /etc/default/grub, and set: GRUB_CMDLINE_LINUX="audit=1 selinux=1 audit_backlog_limit=8192"
• Run: grub-mkconfig > /boot/grub/grub.cfg
• Stopped the instance, and created an AMI.

I then launch 10 or 14 instances of this AMI in us-west-2. Most will come up with auditd service running, and all rules loaded. Usually at least two come up broken for unknown reason, with the auditd service reporting an error I cannot understand:

● auditd.service - Security Auditing Service
     Loaded: loaded (/lib/systemd/system/auditd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2022-09-14 15:08:14 UTC; 22min ago
       Docs: man:auditd(8)
             https://github.com/linux-audit/audit-documentation
    Process: 357 ExecStart=/sbin/auditd (code=exited, status=1/FAILURE)

Sep 14 15:08:14 ip-10-210-197-90 systemd[1]: Starting Security Auditing Service...
Sep 14 15:08:14 ip-10-210-197-90 auditd[382]: Error receiving audit netlink packet (No buffer space available)
Sep 14 15:08:14 ip-10-210-197-90 auditd[382]: Error setting audit daemon pid (No buffer space available)
Sep 14 15:08:14 ip-10-210-197-90 auditd[382]: Unable to set audit pid, exiting
Sep 14 15:08:14 ip-10-210-197-90 auditd[357]: Cannot daemonize (Success)
Sep 14 15:08:14 ip-10-210-197-90 auditd[357]: The audit daemon is exiting.
Sep 14 15:08:14 ip-10-210-197-90 auditd[382]: The audit daemon is exiting.
Sep 14 15:08:14 ip-10-210-197-90 systemd[1]: auditd.service: Control process exited, code=exited, status=1/FAILURE
Sep 14 15:08:14 ip-10-210-197-90 systemd[1]: auditd.service: Failed with result 'exit-code'.
Sep 14 15:08:14 ip-10-210-197-90 systemd[1]: Failed to start Security Auditing Service.

When I launch the above, it is a launch of 10 or so instances from the same AMI, with the same parameters. Matter of fact, the launch is done by requesting X number of instances during the EC2 instance launch

I've been trying to solve this for some time, and I've found the only way I can make the instances always start correctly is to remove the kernel "audit_backlog_limit" setting entirely - no value for the parameter works correctly (tried 320, 8192, 16384, 32768).

See attachments for the above mentioned files.
Thanks.
-Alan

expected behavior is:
* service loaded and active
* "auditctl -l" shows list of loaded rules

seen behavior:
* service dead with errors shown above.
* "auditctl -l" reports "No rules".

Revision history for this message
Alan Sparks (alsparks) wrote :
Revision history for this message
Alan Sparks (alsparks) wrote :
Revision history for this message
Alan Sparks (alsparks) wrote :
Revision history for this message
Alan Sparks (alsparks) wrote :

Found that I can limit the package install to just:
apt install auditd audispd-plugins -y

with the changes to audit.rules and auditd.conf, and the /etc/default/grub file.
I get random launch failures with the service failure and "no rules".

Alan Sparks (alsparks)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in audit (Ubuntu):
status: New → Confirmed
Revision history for this message
John Pfuntner (pfuntner) wrote :

I'm seeing this happen too and it's very frustrating. I've been able to get auditd to start by commenting out the audit backlog limit in the rules file and reboot. But when I restore the statement and reboot, auditd still starts. I didn't expect it to start after I restored the audit backlog limit.

Revision history for this message
John Pfuntner (pfuntner) wrote :

Additionally, I also have better "luck" hitting the problem when I use an instance with arm64 (aarch64) architecture.

Revision history for this message
m4t (m4t) wrote :

Also seeing this intermittently on Debian 10 / latest kernel 4.19 there, so it appears to be something kernel related that was backported as the original report here mentions 5.15.

The workaround I've come up with (not fully validated yet, but manually starting when it fails on boot seems to work) is to add a systemd override (systemctl edit auditd.service) like follows:

[Unit]
StartLimitBurst=5
StartLimitIntervalSec=60

[Service]
Restart=on-failure
RestartSec=5

Kinda gross, but better than having audit messages spam dmesg. I didn't bisect, but I suspect it's one of these: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/kernel/audit.c?h=v4.19.306

audit: ensure userspace is penalized the same as the kernel when under pressure
audit: improve audit queue handling when "audit=1" on cmdline

Might be worthwhile to try reverting those and rebuilding kernel to see if the issue persists.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.