audit_printk_skb slowing down boot

Bug #1458014 reported by peterzay
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

Subjectively, my system slowed down after the recent GRUB update.

As you can see from the following, audit_printk_skb is consuming a lot of boot time:

[ 13.243280] vboxdrv: Successfully loaded version 4.3.10_Ubuntu (interface 0x001a0007).
[ 13.257947] vboxpci: IOMMU not found (not registered)
[ 13.862999] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 13.865996] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 14.195776] r8169 0000:04:00.0 eth0: link down
[ 14.195796] r8169 0000:04:00.0 eth0: link down
[ 14.195827] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 14.196138] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 15.769090] r8169 0000:04:00.0 eth0: link up
[ 15.769097] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 16.223084] init: plymouth-upstart-bridge main process ended, respawning
[ 42.424836] audit_printk_skb: 195 callbacks suppressed
[ 42.424839] type=1400 audit(1431891089.974:77): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=2632 comm="apparmor_parser"
[ 42.424844] type=1400 audit(1431891089.974:78): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2632 comm="apparmor_parser"
[ 42.425185] type=1400 audit(1431891089.974:79): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2632 comm="apparmor_parser"
(END)

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: apparmor 2.8.95~2430-0ubuntu5.1
ProcVersionSignature: Ubuntu 3.13.0-53.88-generic 3.13.11-ckt19
Uname: Linux 3.13.0-53-generic i686
ApportVersion: 2.14.1-0ubuntu3.10
Architecture: i386
CurrentDesktop: Unity
Date: Fri May 22 14:18:46 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-04-29 (388 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release i386 (20140417)
ProcKernelCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-53-generic root=UUID=8cf458ab-4ff9-4505-9a16-27da1ea7ec10 ro quiet splash vt.handoff=7
SourcePackage: apparmor
Syslog:

UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
peterzay (peterzay) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

It's highly unlikely that the audit_printk_skb() function is itself slowing down your boot; it is designed to operate with extremely low overhead, and the ratelimiting that is in effect here means that the slowest portion of the function, actually writing characters to the system log, is skipped entirely.

Probably what is actually slow is recompiling AppArmor policies; this is an extremely computational heavy task that has been subject to extensive optimization. We're planning further caching efforts in the future to reduce the number of times that the policies must be recomputed. Chances are quite good a second reboot, immediately after this one, would run faster, since the profiles would probably not need to be rebuilt immediately.

Thanks

Revision history for this message
peterzay (peterzay) wrote :

Here are the outputs of 2 consecutive boots for dmesg | less, they both indicate 26 "lost" CPU seconds:

[ 12.580984] vboxdrv: Successfully loaded version 4.3.10_Ubuntu (interface 0x001a0007).
[ 12.593834] vboxpci: IOMMU not found (not registered)
[ 13.307547] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 13.307807] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 13.671770] r8169 0000:04:00.0 eth0: link down
[ 13.671790] r8169 0000:04:00.0 eth0: link down
[ 13.671819] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 13.672126] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 15.219548] r8169 0000:04:00.0 eth0: link up
[ 15.219570] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 15.687163] init: plymouth-upstart-bridge main process ended, respawning
[ 41.953317] audit_printk_skb: 177 callbacks suppressed
[ 41.953319] type=1400 audit(1432326862.429:71): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=2677 comm="apparmor_parser"
[ 41.953324] type=1400 audit(1432326862.429:72): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2677 comm="apparmor_parser"
[ 41.953664] type=1400 audit(1432326862.429:73): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2677 comm="apparmor_parser"
(END)

[ 12.716182] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 12.861751] vboxdrv: module verification failed: signature and/or required key missing - tainting kernel
[ 12.864483] vboxdrv: Found 4 processor cores.
[ 12.864950] vboxdrv: fAsync=0 offMin=0x214 offMax=0x1658
[ 12.865004] vboxdrv: TSC mode is 'synchronous', kernel timer mode is 'normal'.
[ 12.865006] vboxdrv: Successfully loaded version 4.3.10_Ubuntu (interface 0x001a0007).
[ 12.878317] vboxpci: IOMMU not found (not registered)
[ 14.374026] r8169 0000:04:00.0 eth0: link up
[ 14.374033] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 15.556422] init: plymouth-upstart-bridge main process ended, respawning
[ 41.489482] audit_printk_skb: 159 callbacks suppressed
[ 41.489484] type=1400 audit(1432327140.961:65): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=2788 comm="apparmor_parser"
[ 41.489489] type=1400 audit(1432327140.961:66): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2788 comm="apparmor_parser"
[ 41.489832] type=1400 audit(1432327140.961:67): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2788 comm="apparmor_parser"
(END)

The PC is a Dell Inspiron 660 Core i5-3330 quad core (physical, not virtual) all running at 3GHz.

26 "lost" CPU seconds on this machine is a long time.

Is this ok?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apparmor (Ubuntu):
status: New → Confirmed
Revision history for this message
Steve Beattie (sbeattie) wrote :

You're not seeing what has happened in the meantime because of printk rate limiting, as evidenced by "[ 41.489482] audit_printk_skb: 159 callbacks suppressed". To stop rate limiting kernel printk output, temporarily add a file in /etc/sysctl.d/ named something like 99-printk_ratelimiting.conf (the suffix is the important part of the name) that contains:

kernel.printk_ratelimit=0

and then reboot. This will let you see the audit messages that the kernel is dropping due to rate limiting.

Changed in apparmor (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
peterzay (peterzay) wrote :

I have added /etc/sysctl.d/99-printk_ratelimiting.conf with kernel.printk_ratelimit=0 and rebooted.
Then I removed the file and rebooted yet again.
Each time I captured the last few lines of dmesg | less

The log is attached.

Could you please explain the first log output (kernel.printk_ratelimit=0) and the time lag.

Thanks.

Changed in apparmor (Ubuntu):
status: Incomplete → New
Revision history for this message
Seth Arnold (seth-arnold) wrote :

[ 15.940712] vboxdrv: Successfully loaded version 4.3.10_Ubuntu (interface 0x001a0007).
[ 14.431291] vboxdrv: Successfully loaded version 4.3.10_Ubuntu (interface 0x001a0007).

Drawing conclusions from only two tests may be asking for trouble, but it looks like you've experienced 1.5 seconds of difference in the first few seconds of the system's boot. This large of a difference seems worth investigating further. If you've got the time and inclination, please generate some detailed bootcharts: https://wiki.ubuntu.com/BootCharting

Thanks

Changed in apparmor (Ubuntu):
status: New → Incomplete
Revision history for this message
peterzay (peterzay) wrote :

I will look into bootcharts as soon as possible.

I was expecting more info from the log of step #6.

You will recall that rate limiting gave the following kind of output:

[ 17.062400] init: plymouth-upstart-bridge main process ended, respawning
[ 42.218430] audit_printk_skb: 177 callbacks suppressed

When /etc/sysctl.d/99-printk_ratelimiting.conf with kernel.printk_ratelimit=0 was implemented, I expected detail on those callbacks.

Instead, I got:

[ 17.530175] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 19.396738] init: plymouth-upstart-bridge main process ended, respawning
[ 44.236269] type=1400 audit(1437703237.706:71): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=2413 comm="apparmor_parser"

What happened between 19 secs and 44 secs? Remember, this is a fast 4 core machine.

I would appreciate your thoughts on that?

Then we can look at bootcharts.

Thanks.

Changed in apparmor (Ubuntu):
status: Incomplete → New
Revision history for this message
peterzay (peterzay) wrote :

You will find attached a sample bootchart.

Revision history for this message
Alessio Tomelleri (alessio-tomelleri) wrote :

I confirm this bug, still alive...

...

[ 5.995186] vboxdrv: Found 4 processor cores.
[ 5.995409] vboxdrv: fAsync=0 offMin=0x3de offMax=0x1b0c
[ 5.995486] vboxdrv: TSC mode is 'synchronous', kernel timer mode is 'normal'.
[ 5.995488] vboxdrv: Successfully loaded version 4.3.34 (interface 0x001a000b).
[ 6.042370] init: plymouth-upstart-bridge main process ended, respawning
[ 6.050689] init: plymouth-upstart-bridge main process (1274) terminated with status 1
[ 6.050708] init: plymouth-upstart-bridge main process ended, respawning
[ 6.207649] vboxpci: IOMMU not found (not registered)
[ 6.278948] init: plymouth-stop pre-start process (1344) terminated with status 1
[ 34.573827] audit_printk_skb: 153 callbacks suppressed
[ 34.573832] type=1400 audit(1453101557.556:62): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=2344 comm="apparmor_parser"
[ 34.573842] type=1400 audit(1453101557.556:63): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2344 comm="apparmor_parser"
[ 34.574413] type=1400 audit(1453101557.560:64): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=2344 comm="apparmor_parser"
[ 37.117995] e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: None
[ 37.118003] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[ 37.118055] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apparmor (Ubuntu):
status: New → Confirmed
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Alessio, please attach bootcharts to allow better investigating the isssue. Thanks.

Revision history for this message
Robin (robinc999) wrote :

Hello

I am tracking this bug as a very bemused sufferer.

Please see attached bootchart from my system - hope it helps.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Robin, you do indeed have a painfully slow boot, but the audit_skb_entries in your logs are, like the others here, just the last thing the kernel happened to log. The AppArmor processes in your boot take ~1.6 seconds (if I've estimated them correctly), most of which appears to be spent asleep waiting on IO. Even if we could cut the AppArmor time in half (we can't, btw) it wouldn't even come close to denting your boot up time -- while those tasks are running your system is busy doing disk IO.

Thanks

Revision history for this message
Alessio Tomelleri (alessio-tomelleri) wrote :

Sorry for late, here the bootchart...

Revision history for this message
John Johansen (jjohansen) wrote :

Alessio,

so from the boot chart I am not able to say what is causing the delay. What I do see is a large gap in activity for both the cpu and i/o.
That gap lines up roughly with the start of pulse audio, but that doesn't necessarily make it the culprit. We then get a large gap of little to no activity and then what looks like the user sessions starting up, another pulse daemon, notify osd, compiz, nautilus, ...
So it looks to me like there is something waiting, and then timing out, the question is what.

The apparmor logging you are seeing is largely an artifact of policy being loaded piece meal. Apparmor actually does at least some of its load during second 4 of your boot (look for apparmor_parser), the load around 34s appears to be a secondary load initiated by cups, which is being started, and it is ensuring its policy is loaded.

Mathew Hodson (mhodson)
Changed in apparmor (Ubuntu):
importance: Undecided → Low
Revision history for this message
Alessio Tomelleri (alessio-tomelleri) wrote :

thx, for your time Johansen.

Revision history for this message
pito (pito-j) wrote :

Similarly here with even longer delay:
[ 30.704627] aufs 4.15-20180219
[ 146.736081] kauditd_printk_skb: 5 callbacks suppressed

Revision history for this message
Seth Arnold (seth-arnold) wrote :

pito, it'd probably be best to head to https://askubuntu.com/ or #ubuntu on irc.freenode.net to try to figure out where exactly your slow performance is coming from.

Install the systemd-bootchart package, then at grub's kernel command line, add init=/lib/systemd/systemd-bootchart

Attach the stripchart to the bug report.

Thanks

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.