omapfb module floods system with udev events on samsung galaxy nexus
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
upstart |
Undecided
|
Unassigned | |||
linux (Ubuntu) |
Undecided
|
Colin Ian King | |||
Saucy |
Undecided
|
Unassigned | |||
Trusty |
Undecided
|
Colin Ian King | |||
lxc-android-config (Ubuntu) |
Undecided
|
Dimitri John Ledkov | |||
Saucy |
Undecided
|
Unassigned | |||
Trusty |
Undecided
|
Dimitri John Ledkov | |||
powerd (Ubuntu) |
Undecided
|
Unassigned | |||
Saucy |
Undecided
|
Unassigned | |||
Trusty |
Undecided
|
Unassigned | |||
systemd (Ubuntu) |
High
|
Martin Pitt | |||
Saucy |
Undecided
|
Unassigned | |||
Trusty |
High
|
Martin Pitt |
Bug Description
[SRU justification]
On the Galaxy Nexus (maguro) phone, the video driver wakes up udev and upstart 60 times a second sending "vsync" uevents. This is a deeply flawed driver design, but we can't easily change it because the userspace side of the driver relies on the current behavior. The only feasible workaround is to have udev filter out these events, so that the device isn't kept busy (with about 5% load) waking up system daemons.
[Test case]
1. Install the updated package on an x86 system.
2. Reboot.
3. Confirm that the machine boots successfully.
4. Install the updated package on a non-maguro arm system.
5. Reboot.
6. Confirm that the machine boots successfully.
7. Install the updated package on a maguro system.
8. Reboot.
9. Confirm that the machine boots successfully.
10. Confirm using 'top' that udev is not taking up a significant percentage of the CPU time.
[Regression potential]
This is a change to a core system daemon, and while the code change is guarded by an architecture check, we should be careful about updating such a core package which could cause regressions, in particular on other ARM devices. The test case should guard against this by ensuring we verify the effect of this change on non-maguro systems prior to releasing the SRU.
Playing an mp4 on a Samsung Galaxy Nexus using today's image (3 Oct 2013) for 30 minutes I observed that init is busy and also consuming heap quite rapidly.
Attached is the output from running health-check (found in PPA:colin-
Key points:
1. messages being read/written at ~600 messages a second, hence the high context switch rate and ~4.9% CPU load.
2. heap consumption: ~30K a second using brk() and 2K a second via mmap
To reproduce:
Install health-check:
sudo add-apt-repository ppa:colin-
sudo apt-get update && sudo apt-get install health-check
Download a large mp4 to the phone. Keep screen from blanking using:
sudo powerd-cli display on bright &
then play the mp4:
dbus-launch mediaplayer-app test.mp4 --desktop_
And then observe that init is busy for 300 seconds:
ps -e | grep init
1 ? 00:02:56 init
348 ? 00:00:00 init
1114 ? 00:03:22 init
sudo health-check -p 1114 -d 300
Attached are my results for a 30 minute run.
Colin Ian King (colin-king) wrote : | #1 |
Changed in upstart (Ubuntu): | |
importance: | Undecided → High |
Steve Langasek (vorlon) wrote : | #2 |
Changed in upstart (Ubuntu): | |
assignee: | nobody → James Hunt (jamesodhunt) |
status: | New → Incomplete |
Colin Ian King (colin-king) wrote : | #3 |
Attached: gzip'd syslog
James Hunt (jamesodhunt) wrote : | #4 |
Hi Colin,
This issue looks similar to bug 1201865 - Upstart is being spammed by the upstart-udev-bridge which itself is being spammed by all the SUBSYSTEM=platform udev events from the kernel.
I cannot recreate the issue you are seeing on my nexus 7, so that coupled with the fact that the event are udev-related suggests a kernel bug.
Please could you create the system job below, play your mp4 and see what's in /var/log/upstart/bug-1234743.log
$ cat <<EOT | sudo tee /etc/init/bug-1234743.conf
start on platform-
exec env
EOT
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1234743
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Colin Ian King (colin-king) wrote : Re: init on samsung galaxy nexus is busy and consuming memory when playing an mp4 | #6 |
Attached is the requested log.
James Hunt (jamesodhunt) wrote : | #7 |
Thanks Colin. All the platform-
KERNEL=omapfb
DEVPATH=
SUBSYSTEM=platform
ACTION=change
DRIVER=omapfb
MODALIAS=
SEQNUM=13545
USEC_INITIALIZE
VSYNC=126011464
So certainly looks like a device-specific kernel issue. Removing upstart...
summary: |
- init on samsung galaxy nexus is busy and consuming memory when playing - an mp4 + omapfb module floods system with udev events on samsung galaxy nexus + when playing mp4 |
no longer affects: | upstart (Ubuntu) |
Colin Ian King (colin-king) wrote : Re: omapfb module floods system with udev events on samsung galaxy nexus when playing mp4 | #8 |
Thanks James, I will look into what the driver is doing.
Changed in linux (Ubuntu): | |
assignee: | nobody → Colin King (colin-king) |
Colin Ian King (colin-king) wrote : | #9 |
After some digging I've found the following:
1. commit a4e449e8a08e727
2. hwcomposer.omap4.so handles this VSYNC event, and inspecting /proc/$pid/maps it appears this is used by surface flinger.
3. reverting this commit does indeed reduce the events as received by init, however, surface flinger runs incredibly slowly as it does not get the vsync events.
Since this is a tight coupling between an Android H/W abstraction and the kernel driver, I'm not sure how we progress on this. I don't have the know-how on hwcomposer to figure out how to decouple this.
Steve Langasek (vorlon) wrote : | #10 |
I suppose this is a driver that would still be in the picture when switching to Mir, and we can't hope to avoid the issue by changing display servers? You could test this by following the directions at <https:/
Colin Ian King (colin-king) wrote : | #11 |
Enabling Mir with VSYNC disabled in the omapfb driver leaves me in a state with just a blank screen. I can login via adb shell, so I need to debug this a bit deeper.
Launchpad Janitor (janitor) wrote : | #12 |
This bug was fixed in the package powerd - 0.13+13.
---------------
powerd (0.13+13.
[ Seth Forshee ]
* Don't reset the activity timer for touch events if the screen is of
(LP: #1234743). (LP: #1234743)
[ Ubuntu daily release ]
* Automatic snapshot from revision 98
-- Ubuntu daily release <email address hidden> Mon, 07 Oct 2013 12:06:34 +0000
Changed in powerd (Ubuntu): | |
status: | New → Fix Released |
Ricardo Salveti (rsalveti) wrote : | #13 |
We can't disable VSYNC uevent, it's required by the driver. All we can do is to ignore it later on upper layers, such as upstart.
Steve Langasek (vorlon) wrote : | #14 |
So upstart's udev bridge needs to have a capability of filtering events. Adding an upstart upstream task
Oliver Grawert (ogra) wrote : | #15 |
pitti pointed on IRC to https:/
Oliver Grawert (ogra) wrote : | #16 |
adjusting bug description, this is sadly a constant condition, not related to movie playback
summary: |
omapfb module floods system with udev events on samsung galaxy nexus - when playing mp4 |
Oliver Grawert (ogra) wrote : | #17 |
Oliver Grawert (ogra) wrote : | #18 |
i cut that down to http://
W/SurfaceFlinger( 659): Timed out waiting for hw vsync; faking it
so i guess for using this patch some changes on the userspace side to actually use sysfs instead of uevents is required
Colin Ian King (colin-king) wrote : | #19 |
isn't that's kind of what we concluded back in comment #11?
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1234743] Re: omapfb module floods system with udev events on samsung galaxy nexus | #20 |
On 10 October 2013 12:59, Colin King <email address hidden> wrote:
> isn't that's kind of what we concluded back in comment #11?
>
Ok, but are those events expected on the udev source or only on the
kernel source?
Can I filter and not emit them on udev source? Or is it actually what
the blob is listening on.
Regards,
Dmitrijs.
Colin Ian King (colin-king) wrote : | #21 |
I believe the library /vendor/
The kernel just shoves these VSYNC uevents outs for the hwcompiser.omap4.so to handle. I am of the current understanding that it is not used by anything else.
Dimitri John Ledkov (xnox) wrote : | #22 |
On 10 October 2013 14:15, Colin King <email address hidden> wrote:
> I believe the library /vendor/
> these VSYNCs, I'm unsure how the plumbing works between the kernel and
> this library - is this proprietary code?
>
> The kernel just shoves these VSYNC uevents outs for the
> hwcompiser.omap4.so to handle. I am of the current understanding that it
> is not used by anything else.
>
Proprietary or not it's source code is here:
https:/
Or at least appears to be and/or similarish.
Regards,
Dmitrijs.
Dimitri John Ledkov (xnox) wrote : | #23 |
On 10 October 2013 14:37, Dmitrijs Ledkovs <email address hidden> wrote:
> On 10 October 2013 14:15, Colin King <email address hidden> wrote:
>> I believe the library /vendor/
>> these VSYNCs, I'm unsure how the plumbing works between the kernel and
>> this library - is this proprietary code?
>>
>> The kernel just shoves these VSYNC uevents outs for the
>> hwcompiser.omap4.so to handle. I am of the current understanding that it
>> is not used by anything else.
>>
>
> Proprietary or not it's source code is here:
>
> https:/
>
> Or at least appears to be and/or similarish.
Which appears to be getting a FD to "kernel" event source, via socket created by
https:/
so if we filter those VSYNC events on the "udev" source, systemd-udev
& upstart & et al user space shouldn't be spammed with those and the
driver will still work.
Regards,
Dmitrijs.
Colin Ian King (colin-king) wrote : | #24 |
@Dmitrijs,
Today's image is far better, see the attached data for mediaplayer-app, unity8 and init when playing an mp4. You will see that init is now far less loaded and not being saturated by events and hence not context switching like crazy.
Dimitri John Ledkov (xnox) wrote : | #25 |
when systemd-udevd bug #1235649 is implemented lxc-android-config should include a udev rule to tag the offending device with tag "ignore".
Changed in lxc-android-config (Ubuntu): | |
status: | New → Confirmed |
assignee: | nobody → Dmitrijs Ledkovs (xnox) |
milestone: | none → ubuntu-13.10 |
Dimitri John Ledkov (xnox) wrote : | #26 |
Actually ignore #25.
This bug is about spam from the omapfb modules:
- spam is blocked/ignored in upstart-udev-bridge -> Fix Released
- we still need the kernel uevent in linux package -> Invalid
- systemd (udevd) should gain ability to ignore events from certain devices (by tag) -> In progress
- lxc-android-config should tag spammy devices (by tag) with .rules snippet -> In progress
Changed in systemd (Ubuntu): | |
status: | New → In Progress |
Changed in lxc-android-config (Ubuntu): | |
status: | Confirmed → In Progress |
Changed in linux (Ubuntu): | |
status: | Incomplete → Invalid |
Changed in upstart: | |
status: | New → Fix Released |
Changed in systemd (Ubuntu): | |
assignee: | nobody → Dmitrijs Ledkovs (xnox) |
Martin Pitt (pitti) wrote : | #27 |
Can anyone confirm that you see these bits:
DRIVER=omapfb
VSYNC=<number>
in "udevadm monitor --kernel -e" on that device? It would be really strange if not, I just want to make double-sure about that before I start working on a fix tomorrow morning.
Colin Ian King (colin-king) wrote : | #28 |
@Martin, output from "udevadm monitor --kernel -e"
KERNEL[71.872711] change /devices/
ACTION=change
DEVPATH=
DRIVER=omapfb
MODALIAS=
SEQNUM=4650
SUBSYSTEM=platform
VSYNC=71872436527
KERNEL[71.889678] change /devices/
ACTION=change
DEVPATH=
DRIVER=omapfb
MODALIAS=
SEQNUM=4651
SUBSYSTEM=platform
VSYNC=71889404300
etc..
Martin Pitt (pitti) wrote : | #29 |
Taking over from Dmitrijs as per Steve's request, Dmitrijs got pulled into installer bugs.
Changed in systemd (Ubuntu): | |
assignee: | Dmitrijs Ledkovs (xnox) → Martin Pitt (pitti) |
Martin Pitt (pitti) wrote : | #30 |
> Which appears to be getting a FD to "kernel" event source, via socket created by https:/
> so if we filter those VSYNC events on the "udev" source, systemd-udev & upstart & et al user space shouldn't be spammed with those and the driver will still work.
Note, ideally we would already filter them at the "kernel" netlink source, so that udevd doesn't have to wake up 60 times a second. Filtering them at the "libudev" source would avoid waking up clients, but not udevd itself. However, that requires matching (kernel) properties in the netlink packet with only the rather limited set of instructions from http://
Martin Pitt (pitti) wrote : | #31 |
> Note, ideally we would already filter them at the "kernel" netlink source
That's possible because that android driver doesn't use libudev to get these events, but sets up its own netlink socket. We really don't want to see these events anywhere else.
Martin Pitt (pitti) wrote : | #32 |
This patch adds netlink filter rules to ignore change events from omapfb if they are long enough to be a VSYNC event. It is a rather gross hack, I don't want to expect the VSYNC= at a specific place in the raw packet as the order or length of the other properties might change. But I verified that doing a manual "echo change | tee /sys/devices/
For testing, I ran my locally built udev with "sudo stop udev", "sudo ./systemd-udevd --daemon", started a monitor with "./udevadm monitor -e", and then:
* screen activity now does not produce any omapfb change events any more. Tested remotely with starting and stopping messaging-app with "start application APP_ID=
* "echo change | sudo tee /sys/devices/
Thanks to Omer for giving me ssh access to his maguro, which helped to shorten the turnaround time. This device was running with SurfaceFlinger, so for making triple sure another test should be done on Mir.
Martin Pitt (pitti) wrote : | #33 |
This is systemd-udevd with the patch, built on the Ubuntu armhf porter box. That's what I used for testing on the maguro.
Martin Pitt (pitti) wrote : | #34 |
This is udevadm with the patch.
Martin Pitt (pitti) wrote : | #35 |
Steve wanted a 0-day SRU with this so that we can build new phone images with that included. The way the patch is done is that it doesn't change anything on !arm; this both avoids potentially breaking i386/amd64/powerpc, and also avoids penalizing those platforms with the unnecessary extra netlink filter code. I uploaded it to unapproved to avoid blocking on that upload. But please feel free to reject if you would like the patch done in a different way, or some regression comes up during testing with the attached binaries.
Changed in systemd (Ubuntu): | |
importance: | Undecided → High |
status: | In Progress → Fix Committed |
Colin Ian King (colin-king) wrote : | #36 |
Martin: tested this fix and it definitely stops the messages. Thanks!
Changed in lxc-android-config (Ubuntu): | |
status: | In Progress → Invalid |
Thomas Voß (thomas-voss) wrote : | #37 |
On top of pitti's patch, we should ensure that Mir correctly calls into the power HAL as in https:/
tags: | added: patch |
Steve Langasek (vorlon) wrote : | #38 |
I agree, this is a horrible, horrible patch; and thank you very much for writing it.
Hello Colin, or anyone else affected,
Accepted systemd into saucy-proposed. The package will build now and be available at http://
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
description: | updated |
tags: | added: verification-needed |
Colin Ian King (colin-king) wrote : | #40 |
[Test case]
1. Install the updated package on an x86 system.
2. Reboot.
3. Confirm that the machine boots successfully.
Check on Lenovo x220. Works OK. Passed.
4. Install the updated package on a non-maguro arm system.
5. Reboot.
6. Confirm that the machine boots successfully.
Checked on LG Nexus 4. Works OK. Passed.
7. Install the updated package on a maguro system.
8. Reboot.
9. Confirm that the machine boots successfully.
10. Confirm using 'top' that udev is not taking up a significant percentage of the CPU time.
Checked on Samsung Galaxy Nexus. udev is not busy, looks good to me. Thanks!
Martin Pitt (pitti) wrote : Re: [Bug 1234743] Re: omapfb module floods system with udev events on samsung galaxy nexus | #41 |
tag verification-done
Thanks Colin for the comprehensive testing!
tags: | removed: verification-needed |
Changed in linux (Ubuntu Saucy): | |
status: | New → Invalid |
Changed in lxc-android-config (Ubuntu Saucy): | |
status: | New → Invalid |
Changed in powerd (Ubuntu Saucy): | |
status: | New → Fix Released |
Launchpad Janitor (janitor) wrote : | #42 |
This bug was fixed in the package systemd - 204-0ubuntu19
---------------
systemd (204-0ubuntu19) saucy; urgency=low
* Add 0031-ignore-
VSYNC change uevents from omapfb, as they happen at each vsync and thus
keep waking up udevd (including rules processing), upstart, and other
parts of the system 60 times a second. (LP: #1234743)
-- Martin Pitt <email address hidden> Thu, 17 Oct 2013 12:01:42 +0200
Changed in systemd (Ubuntu Saucy): | |
status: | New → Fix Released |
The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.
Launchpad Janitor (janitor) wrote : | #44 |
This bug was fixed in the package systemd - 204-0ubuntu19
---------------
systemd (204-0ubuntu19) saucy; urgency=low
* Add 0031-ignore-
VSYNC change uevents from omapfb, as they happen at each vsync and thus
keep waking up udevd (including rules processing), upstart, and other
parts of the system 60 times a second. (LP: #1234743)
-- Martin Pitt <email address hidden> Thu, 17 Oct 2013 12:01:42 +0200
Changed in systemd (Ubuntu Trusty): | |
status: | Fix Committed → Fix Released |
Colin, could you please put upstart into debug mode with 'initctl log-priority debug', and capture the log output from upstart when running this test, so we can see what events upstart is spending its time on?