Comment 23 for bug 1848330

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I prepared two bionic instances to run over the weekend.

One is running auditd from bionic, and the other is running the SRU proposed package.

I have auditd being restarted via this script in both (just the email message is different, to say which package it was):
#!/bin/bash

result=0

while /bin/true; do
    date
    sudo systemctl restart auditd || result=$?
    if [ "$result" -ne "0" ]; then
        echo "FAILED, result=$result"
        break
    fi
    pid=$(pidof auditd) || result=$?
    if [ "$result" -ne "0" ]; then
        echo "FAILED, auditd not running"
        break
    fi
    echo "auditd pid = $pid"
    sleep 2
    echo
done
mail -s "ALERT: audit orig test failed" <email address hidden> <<EOF
$(date)
Hello, the audit test failed with the orig package
result = $result
EOF

The one running the SRU version hasn't failed yet, but the one running the original auditd pkg from bionic has failed after a few hours, with the error message that this bug is addressing. I saved the logs, and kicked another run, just to try to get another failure. If it fails again, I'll update the package there to the SRU one, and let it run again.

I'll attach logs and a summary after this experiment is over, hopefully later today if the failure repeats itself in the same amount of time.

Here is a glimpse:
Fri Jan 22 19:20:29 UTC 2021
auditd pid = 25215

Fri Jan 22 19:20:31 UTC 2021
auditd pid = 25255

Fri Jan 22 19:20:33 UTC 2021
auditd pid = 25334
...
Fri Jan 22 22:43:47 UTC 2021
auditd pid = 23985

Fri Jan 22 22:43:49 UTC 2021
auditd pid = 24022

Fri Jan 22 22:43:51 UTC 2021
Job for auditd.service failed because a timeout was exceeded.
See "systemctl status auditd.service" and "journalctl -xe" for details.
FAILED, result=1

And from syslog:
...
Jan 22 22:43:51 orig-audit-bionic systemd[1]: Stopping Security Auditing Service...
Jan 22 22:43:51 orig-audit-bionic auditd[24022]: The audit daemon is exiting.
Jan 22 22:43:51 orig-audit-bionic kernel: [13955.899540] audit: type=1305 audit(1611355431.494:81546): audit_pid=0 old=24022 auid=4294967295 ses=4294967295 res=1
Jan 22 22:43:51 orig-audit-bionic systemd[1]: Stopped Security Auditing Service.
Jan 22 22:43:51 orig-audit-bionic kernel: [13955.901464] audit: type=1131 audit(1611355431.498:81547): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=auditd comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 22 22:43:51 orig-audit-bionic systemd[1]: Starting Security Auditing Service...
Jan 22 22:43:51 orig-audit-bionic auditd[24058]: Started dispatcher: /sbin/audispd pid: 24060
Jan 22 22:43:51 orig-audit-bionic audispd: No plugins found, exiting
Jan 22 22:45:21 orig-audit-bionic systemd[1]: auditd.service: Start operation timed out. Terminating.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: State 'stop-sigterm' timed out. Killing.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Killing process 24057 (auditd) with signal SIGKILL.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Killing process 24058 (auditd) with signal SIGKILL.
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Control process exited, code=killed status=9
Jan 22 22:46:51 orig-audit-bionic systemd[1]: auditd.service: Failed with result 'timeout'.

We can see that the message "dispatcher <pid> reaped" isn't shown, which is exactly the bug: auditd hangs while trying to log that message inside a signal handler.

So, looking good. Let's see if I can get another failure.