auditd fails after moving /var it a new filesystem and turning /var/run into a symlink to /run
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
audit (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Auditd was working on my system (Ubuntu 18.04LTS, kernel 4.15.0-1065-aws) until recently. But after splitting off /var into a new filesystem it fails to launch.
running '/sbin/auditd -f' as root indicates a problem writing the pid file (no file exists even when it says one does) Post config load command output:
Started dispatcher: /sbin/audispd pid: 16927
type=DAEMON_START msg=audit(
config_manager init complete
Error setting audit daemon pid (File exists)
type=DAEMON_ABORT msg=audit(
Unable to set audit pid, exiting
The audit daemon is exiting.
Error setting audit daemon pid (Permission denied)
/var/run is a symlink to /run
/var/run permissions are 777 root:root
/run permissions are 755f root:root
no /run/auditd.pid and subsiquently no /var/run/auditd.pid exists (even though the error incorrectly reports otherwise.
/var/log/
type=DAEMON_START msg=audit(
ined res=success
type=DAEMON_ABORT msg=audit(
I have been pulling my hair out over this one. So I ran 'strace /sbin/auditd -f' and found the following line in the output.
"openat(AT_FDCWD, "/var/run/
I am grasping at straws, but suspect that the O_NOFOLLOW option is causing a failure in creating the pid file since /var/run is a symlink. I could be wrong but I can't find anything else to suspect.
Since it is best practice to split/var into a separate file system to prevent filling the root filesystem in case of an unexpected increase in log collection I suspect this is a bug. So either the system needs to be able to follow symlinks or an option such as pid_file=[filepath] needs to be available in /etc/audit/
Running under strace may change the execution environment enough that it's not reflective of the actual error, but it's still worth a shot -- can you pastebin the whole auditd strace logs? That openat() line is actually a success -- the error we're looking for will come from the audit_set_pid(3) function, which uses netlink, which is an incredibly complicated protocol. The error may not look like an error in strace output.
Is there any chance the kernel has logged whatever the failure was in dmesg output?
Thanks