omapfb module floods system with udev events on samsung galaxy nexus

Bug #1234743 reported by Colin Ian King on 2013-10-03
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart
Undecided
Unassigned
linux (Ubuntu)
Undecided
Colin Ian King
Saucy
Undecided
Unassigned
Trusty
Undecided
Colin Ian King
lxc-android-config (Ubuntu)
Undecided
Dimitri John Ledkov
Saucy
Undecided
Unassigned
Trusty
Undecided
Dimitri John Ledkov
powerd (Ubuntu)
Undecided
Unassigned
Saucy
Undecided
Unassigned
Trusty
Undecided
Unassigned
systemd (Ubuntu)
High
Martin Pitt
Saucy
Undecided
Unassigned
Trusty
High
Martin Pitt

Bug Description

[SRU justification]
On the Galaxy Nexus (maguro) phone, the video driver wakes up udev and upstart 60 times a second sending "vsync" uevents. This is a deeply flawed driver design, but we can't easily change it because the userspace side of the driver relies on the current behavior. The only feasible workaround is to have udev filter out these events, so that the device isn't kept busy (with about 5% load) waking up system daemons.

[Test case]
1. Install the updated package on an x86 system.
2. Reboot.
3. Confirm that the machine boots successfully.
4. Install the updated package on a non-maguro arm system.
5. Reboot.
6. Confirm that the machine boots successfully.
7. Install the updated package on a maguro system.
8. Reboot.
9. Confirm that the machine boots successfully.
10. Confirm using 'top' that udev is not taking up a significant percentage of the CPU time.

[Regression potential]
This is a change to a core system daemon, and while the code change is guarded by an architecture check, we should be careful about updating such a core package which could cause regressions, in particular on other ARM devices. The test case should guard against this by ensuring we verify the effect of this change on non-maguro systems prior to releasing the SRU.

Playing an mp4 on a Samsung Galaxy Nexus using today's image (3 Oct 2013) for 30 minutes I observed that init is busy and also consuming heap quite rapidly.

Attached is the output from running health-check (found in PPA:colin-king/white) on init pid 1114.

Key points:

1. messages being read/written at ~600 messages a second, hence the high context switch rate and ~4.9% CPU load.
2. heap consumption: ~30K a second using brk() and 2K a second via mmap

To reproduce:

Install health-check:

sudo add-apt-repository ppa:colin-king/white
sudo apt-get update && sudo apt-get install health-check

Download a large mp4 to the phone. Keep screen from blanking using:

sudo powerd-cli display on bright &

then play the mp4:

dbus-launch mediaplayer-app test.mp4 --desktop_file_hint=/usr/share/applications/mediaplayer-app.desktop --stage_hint=main_stage

And then observe that init is busy for 300 seconds:

ps -e | grep init
    1 ? 00:02:56 init
  348 ? 00:00:00 init
 1114 ? 00:03:22 init

sudo health-check -p 1114 -d 300

Attached are my results for a 30 minute run.

Changed in upstart (Ubuntu):
importance: Undecided → High
Steve Langasek (vorlon) wrote :

Colin, could you please put upstart into debug mode with 'initctl log-priority debug', and capture the log output from upstart when running this test, so we can see what events upstart is spending its time on?

Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
status: New → Incomplete
Colin Ian King (colin-king) wrote :

Attached: gzip'd syslog

James Hunt (jamesodhunt) wrote :

Hi Colin,

This issue looks similar to bug 1201865 - Upstart is being spammed by the upstart-udev-bridge which itself is being spammed by all the SUBSYSTEM=platform udev events from the kernel.

I cannot recreate the issue you are seeing on my nexus 7, so that coupled with the fact that the event are udev-related suggests a kernel bug.

Please could you create the system job below, play your mp4 and see what's in /var/log/upstart/bug-1234743.log

$ cat <<EOT | sudo tee /etc/init/bug-1234743.conf
start on platform-device-changed
exec env
EOT

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1234743

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

Attached is the requested log.

James Hunt (jamesodhunt) wrote :

Thanks Colin. All the platform-device-changed events upstart is emitting are from omapfb.ko:

KERNEL=omapfb
DEVPATH=/devices/platform/omapfb
SUBSYSTEM=platform
ACTION=change
DRIVER=omapfb
MODALIAS=platform:omapfb
SEQNUM=13545
USEC_INITIALIZED=1146951
VSYNC=12601146402237

So certainly looks like a device-specific kernel issue. Removing upstart...

summary: - init on samsung galaxy nexus is busy and consuming memory when playing
- an mp4
+ omapfb module floods system with udev events on samsung galaxy nexus
+ when playing mp4
no longer affects: upstart (Ubuntu)

Thanks James, I will look into what the driver is doing.

Changed in linux (Ubuntu):
assignee: nobody → Colin King (colin-king)
Colin Ian King (colin-king) wrote :

After some digging I've found the following:

1. commit a4e449e8a08e727b8160cb4e4a0367c447148d2f enables drivers/video/omap2/omapfb/omapfb-main.c to send VSYNC uevents on each vsync.

2. hwcomposer.omap4.so handles this VSYNC event, and inspecting /proc/$pid/maps it appears this is used by surface flinger.

3. reverting this commit does indeed reduce the events as received by init, however, surface flinger runs incredibly slowly as it does not get the vsync events.

Since this is a tight coupling between an Android H/W abstraction and the kernel driver, I'm not sure how we progress on this. I don't have the know-how on hwcomposer to figure out how to decouple this.

Steve Langasek (vorlon) wrote :

I suppose this is a driver that would still be in the picture when switching to Mir, and we can't hope to avoid the issue by changing display servers? You could test this by following the directions at <https://wiki.ubuntu.com/Touch/Testing/Mir>.

Colin Ian King (colin-king) wrote :

Enabling Mir with VSYNC disabled in the omapfb driver leaves me in a state with just a blank screen. I can login via adb shell, so I need to debug this a bit deeper.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package powerd - 0.13+13.10.20131007.1-0ubuntu1

---------------
powerd (0.13+13.10.20131007.1-0ubuntu1) saucy; urgency=low

  [ Seth Forshee ]
  * Don't reset the activity timer for touch events if the screen is of
    (LP: #1234743). (LP: #1234743)

  [ Ubuntu daily release ]
  * Automatic snapshot from revision 98
 -- Ubuntu daily release <email address hidden> Mon, 07 Oct 2013 12:06:34 +0000

Changed in powerd (Ubuntu):
status: New → Fix Released
Ricardo Salveti (rsalveti) wrote :

We can't disable VSYNC uevent, it's required by the driver. All we can do is to ignore it later on upper layers, such as upstart.

Steve Langasek (vorlon) wrote :

So upstart's udev bridge needs to have a capability of filtering events. Adding an upstart upstream task

Oliver Grawert (ogra) wrote :

pitti pointed on IRC to https://android.googlesource.com/kernel/msm/+/57195292 ... which doesnt help for our driver, but the problem does not seem to be uniqe

Oliver Grawert (ogra) wrote :

adjusting bug description, this is sadly a constant condition, not related to movie playback

summary: omapfb module floods system with udev events on samsung galaxy nexus
- when playing mp4
Oliver Grawert (ogra) wrote :

ooooh !

look what i found !

http://review.cyanogenmod.org/#/c/28068/

Oliver Grawert (ogra) wrote :

i cut that down to http://paste.ubuntu.com/6217518/, applied it and built a kernel ... sadly now Mir does not start at all, SurfaceFlinger does, but is very slow and complains:

W/SurfaceFlinger( 659): Timed out waiting for hw vsync; faking it

so i guess for using this patch some changes on the userspace side to actually use sysfs instead of uevents is required

Colin Ian King (colin-king) wrote :

isn't that's kind of what we concluded back in comment #11?

On 10 October 2013 12:59, Colin King <email address hidden> wrote:
> isn't that's kind of what we concluded back in comment #11?
>

Ok, but are those events expected on the udev source or only on the
kernel source?
Can I filter and not emit them on udev source? Or is it actually what
the blob is listening on.

Regards,

Dmitrijs.

Colin Ian King (colin-king) wrote :

I believe the library /vendor/lib/hw/hwcomposer.omap4.so depends on these VSYNCs, I'm unsure how the plumbing works between the kernel and this library - is this proprietary code?

The kernel just shoves these VSYNC uevents outs for the hwcompiser.omap4.so to handle. I am of the current understanding that it is not used by anything else.

Dimitri John Ledkov (xnox) wrote :

On 10 October 2013 14:15, Colin King <email address hidden> wrote:
> I believe the library /vendor/lib/hw/hwcomposer.omap4.so depends on
> these VSYNCs, I'm unsure how the plumbing works between the kernel and
> this library - is this proprietary code?
>
> The kernel just shoves these VSYNC uevents outs for the
> hwcompiser.omap4.so to handle. I am of the current understanding that it
> is not used by anything else.
>

Proprietary or not it's source code is here:

https://android.googlesource.com/platform/hardware/ti/omap4-aah/+/master/hwc/hwc.c

Or at least appears to be and/or similarish.

Regards,

Dmitrijs.

Dimitri John Ledkov (xnox) wrote :

On 10 October 2013 14:37, Dmitrijs Ledkovs <email address hidden> wrote:
> On 10 October 2013 14:15, Colin King <email address hidden> wrote:
>> I believe the library /vendor/lib/hw/hwcomposer.omap4.so depends on
>> these VSYNCs, I'm unsure how the plumbing works between the kernel and
>> this library - is this proprietary code?
>>
>> The kernel just shoves these VSYNC uevents outs for the
>> hwcompiser.omap4.so to handle. I am of the current understanding that it
>> is not used by anything else.
>>
>
> Proprietary or not it's source code is here:
>
> https://android.googlesource.com/platform/hardware/ti/omap4-aah/+/master/hwc/hwc.c
>
> Or at least appears to be and/or similarish.

Which appears to be getting a FD to "kernel" event source, via socket created by
https://android.googlesource.com/platform/hardware/libhardware_legacy/+/android-4.3_r3/uevent/uevent.c

so if we filter those VSYNC events on the "udev" source, systemd-udev
& upstart & et al user space shouldn't be spammed with those and the
driver will still work.

Regards,

Dmitrijs.

Colin Ian King (colin-king) wrote :

@Dmitrijs,

Today's image is far better, see the attached data for mediaplayer-app, unity8 and init when playing an mp4. You will see that init is now far less loaded and not being saturated by events and hence not context switching like crazy.

Dimitri John Ledkov (xnox) wrote :

when systemd-udevd bug #1235649 is implemented lxc-android-config should include a udev rule to tag the offending device with tag "ignore".

Changed in lxc-android-config (Ubuntu):
status: New → Confirmed
assignee: nobody → Dmitrijs Ledkovs (xnox)
milestone: none → ubuntu-13.10
Dimitri John Ledkov (xnox) wrote :

Actually ignore #25.

This bug is about spam from the omapfb modules:
- spam is blocked/ignored in upstart-udev-bridge -> Fix Released
- we still need the kernel uevent in linux package -> Invalid
- systemd (udevd) should gain ability to ignore events from certain devices (by tag) -> In progress
- lxc-android-config should tag spammy devices (by tag) with .rules snippet -> In progress

Changed in systemd (Ubuntu):
status: New → In Progress
Changed in lxc-android-config (Ubuntu):
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in upstart:
status: New → Fix Released
Changed in systemd (Ubuntu):
assignee: nobody → Dmitrijs Ledkovs (xnox)
Martin Pitt (pitti) wrote :

Can anyone confirm that you see these bits:

DRIVER=omapfb
VSYNC=<number>

in "udevadm monitor --kernel -e" on that device? It would be really strange if not, I just want to make double-sure about that before I start working on a fix tomorrow morning.

Colin Ian King (colin-king) wrote :

@Martin, output from "udevadm monitor --kernel -e"

KERNEL[71.872711] change /devices/platform/omapfb (platform)
ACTION=change
DEVPATH=/devices/platform/omapfb
DRIVER=omapfb
MODALIAS=platform:omapfb
SEQNUM=4650
SUBSYSTEM=platform
VSYNC=71872436527

KERNEL[71.889678] change /devices/platform/omapfb (platform)
ACTION=change
DEVPATH=/devices/platform/omapfb
DRIVER=omapfb
MODALIAS=platform:omapfb
SEQNUM=4651
SUBSYSTEM=platform
VSYNC=71889404300

etc..

Martin Pitt (pitti) wrote :

Taking over from Dmitrijs as per Steve's request, Dmitrijs got pulled into installer bugs.

Changed in systemd (Ubuntu):
assignee: Dmitrijs Ledkovs (xnox) → Martin Pitt (pitti)
Martin Pitt (pitti) wrote :

> Which appears to be getting a FD to "kernel" event source, via socket created by https://android.googlesource.com/platform/hardware/libhardware_legacy/+/android-4.3_r3/uevent/uevent.c

> so if we filter those VSYNC events on the "udev" source, systemd-udev & upstart & et al user space shouldn't be spammed with those and the driver will still work.

Note, ideally we would already filter them at the "kernel" netlink source, so that udevd doesn't have to wake up 60 times a second. Filtering them at the "libudev" source would avoid waking up clients, but not udevd itself. However, that requires matching (kernel) properties in the netlink packet with only the rather limited set of instructions from http://www.gsp.com/cgi-bin/man.cgi?section=4&topic=bpf#5 . I'll try that, but if it turns out to be impractical, I'll instead add a hack to at least short-circuit those in udevd, avoid all rules processing, and not forward them to the libudev netlink source. That'll keep the udevd wakeups but at least minimize the resources spent for each one.

Martin Pitt (pitti) wrote :

> Note, ideally we would already filter them at the "kernel" netlink source

That's possible because that android driver doesn't use libudev to get these events, but sets up its own netlink socket. We really don't want to see these events anywhere else.

Martin Pitt (pitti) wrote :

This patch adds netlink filter rules to ignore change events from omapfb if they are long enough to be a VSYNC event. It is a rather gross hack, I don't want to expect the VSYNC= at a specific place in the raw packet as the order or length of the other properties might change. But I verified that doing a manual "echo change | tee /sys/devices/platform/omapfb/uevent" still gets through, whereas the offending vsync events are now quiet and don't wake up anything any more.

For testing, I ran my locally built udev with "sudo stop udev", "sudo ./systemd-udevd --daemon", started a monitor with "./udevadm monitor -e", and then:

 * screen activity now does not produce any omapfb change events any more. Tested remotely with starting and stopping messaging-app with "start application APP_ID=messaging-app; sleep 5; pkill -e messaging-app"

 * "echo change | sudo tee /sys/devices/platform/omapfb/uevent" shows that synthetic uevent on both the KERNEL and LIBUDEV netlink sources.

Thanks to Omer for giving me ssh access to his maguro, which helped to shorten the turnaround time. This device was running with SurfaceFlinger, so for making triple sure another test should be done on Mir.

Martin Pitt (pitti) wrote :

This is systemd-udevd with the patch, built on the Ubuntu armhf porter box. That's what I used for testing on the maguro.

Martin Pitt (pitti) wrote :

This is udevadm with the patch.

Martin Pitt (pitti) wrote :

Steve wanted a 0-day SRU with this so that we can build new phone images with that included. The way the patch is done is that it doesn't change anything on !arm; this both avoids potentially breaking i386/amd64/powerpc, and also avoids penalizing those platforms with the unnecessary extra netlink filter code. I uploaded it to unapproved to avoid blocking on that upload. But please feel free to reject if you would like the patch done in a different way, or some regression comes up during testing with the attached binaries.

Changed in systemd (Ubuntu):
importance: Undecided → High
status: In Progress → Fix Committed
Colin Ian King (colin-king) wrote :

Martin: tested this fix and it definitely stops the messages. Thanks!

Changed in lxc-android-config (Ubuntu):
status: In Progress → Invalid
Thomas Voß (thomas-voss) wrote :

On top of pitti's patch, we should ensure that Mir correctly calls into the power HAL as in https://android.googlesource.com/platform/frameworks/native/+/03e407270c7ad76632f982c886d0776bed9e9b4c/services/surfaceflinger/DisplayHardware/PowerHAL.cpp:47. We ignored the call as it is a no-op on Mako.

tags: added: patch
Steve Langasek (vorlon) wrote :

I agree, this is a horrible, horrible patch; and thank you very much for writing it.

Hello Colin, or anyone else affected,

Accepted systemd into saucy-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/systemd/204-0ubuntu19 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

description: updated
tags: added: verification-needed
Colin Ian King (colin-king) wrote :

[Test case]
1. Install the updated package on an x86 system.
2. Reboot.
3. Confirm that the machine boots successfully.

Check on Lenovo x220. Works OK. Passed.

4. Install the updated package on a non-maguro arm system.
5. Reboot.
6. Confirm that the machine boots successfully.

Checked on LG Nexus 4. Works OK. Passed.

7. Install the updated package on a maguro system.
8. Reboot.
9. Confirm that the machine boots successfully.
10. Confirm using 'top' that udev is not taking up a significant percentage of the CPU time.

Checked on Samsung Galaxy Nexus. udev is not busy, looks good to me. Thanks!

 tag verification-done

Thanks Colin for the comprehensive testing!

Steve Langasek (vorlon) on 2013-10-18
tags: removed: verification-needed
Changed in linux (Ubuntu Saucy):
status: New → Invalid
Changed in lxc-android-config (Ubuntu Saucy):
status: New → Invalid
Steve Langasek (vorlon) on 2013-10-18
Changed in powerd (Ubuntu Saucy):
status: New → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 204-0ubuntu19

---------------
systemd (204-0ubuntu19) saucy; urgency=low

  * Add 0031-ignore-omap-vsync.patch: Gross, but effective hack to ignore
    VSYNC change uevents from omapfb, as they happen at each vsync and thus
    keep waking up udevd (including rules processing), upstart, and other
    parts of the system 60 times a second. (LP: #1234743)
 -- Martin Pitt <email address hidden> Thu, 17 Oct 2013 12:01:42 +0200

Changed in systemd (Ubuntu Saucy):
status: New → Fix Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 204-0ubuntu19

---------------
systemd (204-0ubuntu19) saucy; urgency=low

  * Add 0031-ignore-omap-vsync.patch: Gross, but effective hack to ignore
    VSYNC change uevents from omapfb, as they happen at each vsync and thus
    keep waking up udevd (including rules processing), upstart, and other
    parts of the system 60 times a second. (LP: #1234743)
 -- Martin Pitt <email address hidden> Thu, 17 Oct 2013 12:01:42 +0200

Changed in systemd (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers