WebRTC-related crashes

Bug #1849615 reported by Gabriele Svelto
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Critical
firefox (Ubuntu)
Fix Released
High
Unassigned

Bug Description

In Mozilla we've detected a spike of crashes in the Ubuntu-packaged version of Firefox 69.0.x. All the crashes are happening in the WebRTC code and specifically libc's FD_SET, FD_CLR and FD_ISSET range-checking safety is causing an abort. Is it possible that Firefox is being built with a FD_SETSIZE value that's different from what's used in libc?

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

This bug is for crash report bp-abe63a7e-7862-4e4d-b2f7-313140191023.

```
Top 10 frames of crashing thread:

0 libc-2.27.so raise /build/glibc-OTsEL5/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51
1 libc-2.27.so abort /build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c:79
2 libc-2.27.so __libc_message /build/glibc-OTsEL5/glibc-2.27/libio/../sysdeps/posix/libc_fatal.c:181
3 libc-2.27.so __fortify_fail_abort /build/glibc-OTsEL5/glibc-2.27/debug/fortify_fail.c:33
4 libc-2.27.so __fortify_fail /build/glibc-OTsEL5/glibc-2.27/debug/fortify_fail.c:44
5 libc-2.27.so __chk_fail /build/glibc-OTsEL5/glibc-2.27/debug/chk_fail.c:28
6 libc-2.27.so __fdelt_chk /build/glibc-OTsEL5/glibc-2.27/debug/fdelt_chk.c:25
7 libxul.so webrtc::videocapturemodule::DeviceInfoLinux::EventCheck /build/firefox-WlAST4/firefox-69.0.2+build1/media/webrtc/trunk/webrtc/modules/video_capture/linux/device_info_linux.cc:92
8 libxul.so webrtc::videocapturemodule::DeviceInfoLinux::ProcessInotifyEvents /build/firefox-WlAST4/firefox-69.0.2+build1/media/webrtc/trunk/webrtc/modules/video_capture/linux/device_info_linux.cc:133
9 libxul.so webrtc::videocapturemodule::DeviceInfoLinux::InotifyProcess /build/firefox-WlAST4/firefox-69.0.2+build1/media/webrtc/trunk/webrtc/modules/video_capture/linux/device_info_linux.cc:166

```

This crash is happening in the Ubuntu-packaged version of Firefox. It seems to affect 64-bit users only. I'll file a bug on Ubuntu's tracker and link it here.

Revision history for this message
Gabriele Svelto (gsvelto) wrote :

Here's the relevant bug on our tracker: https://bugzilla.mozilla.org/show_bug.cgi?id=1590984

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

I filed a bug on Ubuntu's tracker. I've also dug a little bit further in the crashes and this is not related to WebRTC per-se. The most visible signatures are in WebRTC code but they're not the only ones. This is caused by checks in libc that check if the file descriptor values passed to the `FD_SET`, `FD_CLR` and `FD_ISSET` macros are less than the `FD_SETSIZE` value. All code calling `select()` both directly and indirectly seems to be affected. In WebRTC we're receiving file descriptors from `inotify_init()`, is it possible that this will return values larger than `FD_SETSIZE`?

Switching to `poll()` would probably fix the issue here but there's a few things that are odd about this bug: for starters, why is it only happening on Ubuntu/Debian and not on our builds? Also, this is happening in code that calls `select()` outside of Firefox and those crashes are also happening only on Ubuntu/Debian builds, not ours.

Revision history for this message
Gabriele Svelto (gsvelto) wrote :

I did some further digging and this isn't a WebRTC-specific issue though it tends to crash WebRTC code more often than other code.

See this for another similar crash in code we don't control (and which might also be using select() and thus triggering the safety check):

https://crash-stats.mozilla.com/report/index/d347998e-6c60-4761-aece-e7e8c0191023

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

I did some further digging and this is happening all over the place but mostly in code we don't control. It's always `select()`'s fault though. Daniel, this code is different from upstream WebRTC (which is also using `select()` but in other places), is there a reason for this? I can cook up a patch to switch this code to `poll()` instead but I wonder if it's the right thing to do or if this should be fixed upstream first.

Revision history for this message
Gabriele Svelto (gsvelto) wrote :

Nevermind, this isn't WebRTC-specific and not even Ubuntu-specific since I could find Debian instances of it. It's caused by select() being used instead of more modern methods that don't have limits on the file descriptor values you pass to them. Sorry for the noise.

Changed in firefox (Ubuntu):
importance: Undecided → High
Changed in firefox:
importance: Unknown → Critical
status: Unknown → Confirmed
Revision history for this message
In , Dminor (dminor) wrote :

(In reply to Gabriele Svelto [:gsvelto] from comment #2)
> I did some further digging and this is happening all over the place but mostly in code we don't control. It's always `select()`'s fault though. Daniel, this code is different from upstream WebRTC (which is also using `select()` but in other places), is there a reason for this? I can cook up a patch to switch this code to `poll()` instead but I wonder if it's the right thing to do or if this should be fixed upstream first.

Hi Gabriele, this code is our local modification to webrtc.org that we have never merged upstream. As such, feel free to rewrite it to use poll() if that will fix things. I'm not the original author, I just happened to do the last upstream merge, so I can't comment on why select was chosen rather than poll. :jesup might know if you want to follow up with him.

One thing to be careful of is that this code is handling cameras being plugged and unplugged which is not something we have coverage for in automation, so you'll have to do some manual testing of your changes. The STR from https://bugzilla.mozilla.org/show_bug.cgi?id=1581193#c0 might be helpful here.

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

(In reply to Dan Minor [:dminor] from comment #3)
> Hi Gabriele, this code is our local modification to webrtc.org that we have never merged upstream. As such, feel free to rewrite it to use poll() if that will fix things. I'm not the original author, I just happened to do the last upstream merge, so I can't comment on why select was chosen rather than poll. :jesup might know if you want to follow up with him.
>
> One thing to be careful of is that this code is handling cameras being plugged and unplugged which is not something we have coverage for in automation, so you'll have to do some manual testing of your changes. The STR from https://bugzilla.mozilla.org/show_bug.cgi?id=1581193#c0 might be helpful here.

Thanks, I'll try to cook up a patch tomorrow. I must have a USB camera sitting around somewhere so testing shouldn't be a problem.

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

Taking this as I've got a WIP patch. It just needs some good testing.

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

Created attachment 9104623
Bug 1590984 - Use poll() instead of select() in WebRTC code r=drno

The use of select() was leading to crashes when the file descriptor value was
larger than FD_SETSIZE. Recent versions of glibc have checks in the FD_CLR(),
FD_SET() and FD_ISSET() macros that will abort() the program instead of doing
an out-of-bounds access. poll() doesn't have limitations on the file
descriptor values and provides behavior that is otherwise identical to
select() thus solving the problem.

Revision history for this message
In , Pulsebot (pulsebot) wrote :

Pushed by <email address hidden>:
https://hg.mozilla.org/integration/autoland/rev/59fb6760bb67
Use poll() instead of select() in WebRTC code r=drno

Revision history for this message
In , Ncsoregi (ncsoregi) wrote :
Changed in firefox:
status: Confirmed → Fix Released
Olivier Tilloy (osomon)
Changed in firefox (Ubuntu):
status: New → Fix Committed
Revision history for this message
In , Release-mgmt-account-bot (release-mgmt-account-bot) wrote :

[Bugbug](https://github.com/mozilla/bugbug/) thinks this bug is a regression, but please revert this change in case of error.

Revision history for this message
In , Catalin-sasca (catalin-sasca) wrote :

Hi Gabriele, is qa needed here? And if so, could you please provide us some steps? Thanks!

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

No testing is needed, thanks Catalin.

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

There's quite a few crashes coming from ESR68 on Debian. The patch is not large and low-risk, maybe we could uplift it. I'll see if it applies cleanly there.

Revision history for this message
In , Ryanvm (ryanvm) wrote :

Looks like this grafts cleanly to ESR68 - did you want to nominate it for uplift?

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

(In reply to Ryan VanderMeulen [:RyanVM] from comment #13)
> Looks like this grafts cleanly to ESR68 - did you want to nominate it for uplift?

Yes, let's do it.

Revision history for this message
In , Gabriele Svelto (gsvelto) wrote :

Comment on attachment 9104623
Bug 1590984 - Use poll() instead of select() in WebRTC code r=drno

### ESR Uplift Approval Request
* **If this is not a sec:{high,crit} bug, please state case for ESR consideration**: This fixes a crash that can easily occur when using WebRTC.
* **User impact if declined**: Firefox may crash when using WebRTC; this will happen independently of the user since it's triggered by the file descriptor numbers used by the WebRTC devices (camera, microphone, etc...).
* **Fix Landed on Version**: 72
* **Risk to taking this patch**: Low
* **Why is the change risky/not risky? (and alternatives if risky)**: The fix is small and almost mechanical and the patch has been in nightly and then beta for two months now without causing regressions.
* **String or UUID changes made by this patch**: None

Revision history for this message
In , Ryanvm (ryanvm) wrote :

Comment on attachment 9104623
Bug 1590984 - Use poll() instead of select() in WebRTC code r=drno

Fixes a Linux-only WebRTC crash. Approved for 68.4esr.

Revision history for this message
In , Ryanvm (ryanvm) wrote :
Olivier Tilloy (osomon)
Changed in firefox (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.